draft writing

Harsa
ee.docx

   

3. Access and authority control

An access point is a field for a record that can be searched. Access points are defined in any record to make the record searchable. Access points are selected such that they compliment users’ searching behavior and cater to their needs. An access point represents information that is returned when a user enters a search term into a field. Access points generally have authority control applied to them.

Authority control is an important concept that greatly helps in standardizing data and reducing inconsistencies. It is defined as a way of controlling or manipulating data that is entered into a field so that standardization of data is enforced and achieved. Authority control ensures that only allowed or acceptable terms are used when entering data into a field or for searching. Authority control, when used, is applicable to the cataloguers as well as users: the cataloguers need to mandatorily adhere to the authorized terms when entering data into fields; the users in turn, are provided with relevant results only when they select the right terms to search for an object. Authority control can be of the following types:

One type of authority control is where use of controlled vocabulary is made. Controlled vocabulary is a list of standardized or authorized terms that can be used to retrieve information about an object. This type of authority control is very efficient and practical when the authorized terms are limited; for example, a drop-down menu for Genre field where terms are limited. This type of authority control does not allow for increase in number of authorized terms as the collection grows. In this collection, the field Tags (for Subject and Genre) has predictable terms and should have controlled vocabulary applied. Of these, the field Tags (for Subject) has a very large number of terms as possible input to the field and must therefore be under controlled vocabulary using a thesaurus. Comment by Jeannie Naylor: Make sure to clearly define the two types of authority control- name and subject - and then the mechanisms - NA File, thesaurus, and validation list. All three mechanisms are a form of controlled vocabulary. - 1 point

Another type of authority control is one which is very applicable where name-type fields are being considered. This is called name authority control. It uses name authority file, and authorized terms grow with growth in collection. The field Author in this collection should have name authority control applied to it. Similarly, the field Publisher should have name authority control applied to it to ensure different variations of user input still retrieve relevant objects.

4. Representation of information content

4.1. Subject access

An information object can have two types of descriptions associated with it. Bibliographic description is information where physical features of an object, such as title of the object, number of pages, the audience level, etc. can be determined. Intellectual description, on the other hand, is relevant to the aboutness of the object. It refers to the subject of object. A subject of an object is defined as the central idea, the main theme of the object; such fields as subject, topic, theme, tag, etc. can be considered to describe an object intellectually. Comment by Jeannie Naylor: Don’t use etc.

Subject access is a broad concept that deals with the intellectual content or topic searched by users. It is a collective term that encompasses all the procedures and measures taken in a system to provide access to intellectual content of the objects within a collection. This represents all the fields that cover subject access such as, in the case of this collection, Subject, Genre, Plot. Here the concept of natural language indexing versus authority control is also important. Natural language is what comes freely to people while communicating, whether written or oral, and natural language indexing as a result is where no terms used are controlled and as close to natural language as possible. Authority control, on the other hand, allows for the use of only authorized terms. It is essential to remember here that the main aim is for users to be able to access the right objects in the most easy and convenient manner. For this collection, the field Plot is not searchable and as such, needs no indexing, while Subject and Genre, as discussed above, have controlled vocabulary.

An important process involved in subject representation is subject analysis. Subject analysis is mainly for cataloguers and is defined as the process of finding out terms for representing the information object. Also, depending on the field and field rules, subject analysis may be done for natural language indexing or authority control. In both cases, the three steps of familiarization, extraction and assignment are common. Familiarization involves figuring out the major theme or idea of the object, in this case, a book, and this need not be done in depth, it is done in a cursory manner. Once the cataloguer is familiar with the book, they move to the next step of extraction, which is where the cataloguer starts thinking of terms to use based on what they now know of the book. This is also where the field and its input rules come into picture and decision on whether to use natural language indexing or authority control is made. If using natural language indexing, the cataloguer’s domain knowledge comes into play for term selection and based on those selected terms, the final step, assignment, is attained. Assignment is simply entering the selected terms into the field, adhering to input rules. In the case of authority control application, after extraction, an additional step of translation also comes into picture, where the cataloguer compares extracted terms with what is allowed or authorized. Every term extracted is compared with the controlled vocabulary and the term which is most similar or the closest in meaning or relevance is chosen for the purpose of assignment.

So subject representation involves subject analysis carried out by the cataloguer, which leads to subject access. It is important to know that subject analysis also carries partially into the process of classification in that it determines the physical location of the object by virtue of the fact that cataloguer determines the predominant subject term for classification. So in essence, the process of subject analysis begins right from the author, to the publishers, the cataloguers, indexers, classifiers and finally the users.

Classification is the process used to organize information objects in a systematic way. Using one or more subject based fields to classify information objects allows technical users to organize them in a manner that is more user-friendly; users searching for information objects based on their subject find it easier to access the information object in such classification. In this collection, two subject based fields are used in the classification scheme: Tags (for Subject) and Tags (For Genre).

4.2. Thesaurus structure

Subject authority control is defined as the process of applying controlled vocabulary to subject search terms as well as subject headings. A subject heading is the closest word or group of words to the subject of a book. The field chosen in this collection for subject authority control is the Subject field. This is because for users using this field to retrieve right results, controlled vocabulary needs to be applied to it to minimize inconsistencies and eliminate disparity between what users search for and what cataloguers enter in the field. Authority control is applied to Genre field for the same reason, i.e., to reduce inconsistencies and impose standardization.

Subject authority control makes use of subject authority files which contain subject records, which in turn contain the controlled vocabulary that represents the subject. Subject authority files are of the types thesaurus and subject heading lists. A thesaurus is defined as a document containing words with associated relationships and it allows for vocabulary control, thereby improving search results retrieval. In this collection, the thesaurus is developed for the Subject field.

Controlled vocabulary, which is previously defined, is a solution to indexing problems that result from natural language and it allows usage of a single term, spelled a single, specific way for content representation purpose. While considering controlled vocabulary in terms of a thesaurus, it is important to understand what authorized and unauthorized terms are. An authorized term is what is selected by the indexer as allowed or acceptable. Use of any other term than the authorized term in the field is unacceptable. Unauthorized terms are those terms that are not to be used or unacceptable in the field; in their place, the related authorized term needs to be used.

Semantic relationships are associations between words based on their meanings. Semantic relationships follow the syndetic structure, which is defined as cross-referencing between terms used in the controlled vocabulary, in this case, in the thesaurus. There are three types of semantic relationships taken into consideration for building the thesaurus for the Subject field: equivalent, hierarchical and associative. Equivalent relationship is one where the associated words have the same meaning, or very close to it. For example, the terms bravery and valor are nearly identical in meaning and therefore share an equivalent relationship. Hierarchical relationship is where the associated terms are such that one is a broader representation of the other and conversely the other is a narrower representation of the first. For example, armed forces and air force share the broad-narrow relationship respectively because air force is a type of armed forces. An associative relationship is where the two terms considered are associated terms. For example, Holocaust and concentration camps are associated or related terms and therefore have an associative relationship.

As previously explained, semantic relationship approaches are so defined that every relationship contains associations that are complementary, and these cross references are called mandatory reciprocals. For example, the equivalent relationship has the USE FOR – USE cross reference, the hierarchical relationship has the BROADER TERM – NARROWER TERM cross reference and the associative relationship has the RELATED TERM – RELATED TERM cross reference.

The domain of a thesaurus the complete range, concept-wise, of terms that can be used in the field that the thesaurus is designed for. The scope, on the other hand, defines the limit or boundary that is applied on the domain. In this collection, the domain for the thesaurus is topics and themes related to World War II, whereas the scope of the thesaurus is that topics and themes pertaining only to World War II are allowed.

Specificity is the extent of precision of terms used to represent the subject of the book in the chosen field. Higher the level of specificity, higher is the precision of the subject representation. Conversely, lower the level of specificity, lesser the accuracy of subject representation. Specificity partially depends on the concreteness, or lack thereof, of the chosen terms. For terms that are more abstract, specificity is generally low. For this collection, high level of specificity is appropriate as users have high domain knowledge. The terms selected represent the theme of the book accurately. High level of specificity results in high precision and low recall.

Exhaustivity determines the number of terms assigned for representing the subjects of each object in the collection. It is the extent of subject representation for every object. Depending on subject coverage, exhaustivity is classified further into depth indexing and summarization. Depth indexing covers more ground, covering main as well as sub- topics, whereas summarization covers only main subject of the object. Depth indexing yields high exhaustivity whereas summarization yields low exhaustivity. Depth indexing is more applicable in case where selected terms are more abstract and more terms are assigned to each record. For this thesaurus, the depth-indexing method is used to yield better results for each search because the user domain knowledge is high in this case but subject terms are more abstract. So for example, even though the term bravery precisely specifies the subject of the book, bravery can also be described as courage or chivalry and all these terms need to be taken into consideration. The exhaustivity level is high for this thesaurus. For depth indexing, recall is high, and precision is low.

Refer Appendix D for thesaurus for this collection.

4.3. Classification scheme

Classification is a system of information organization that enables proper organizing and arrangement of information objects. The system of classification is implemented via classification schemes, which are useful in that they enable proper ordering of information objects, and also make them logically easier to locate. Classification can be done using two approaches: hierarchical approach and faceted approach.

The hierarchical approach uses prearrangement into classes and subclasses, where classes are a category of similar objects and sub-classes are a further classification of classes. This approach is exhaustive in terms of including all possible concepts and is rigid, modifications are not allowed. The faceted approach requires prior selection of subject fields that are possible candidates for facets. Here, there is no prearrangement of classes and subclasses. This approach requires prior analysis of the information object, and based on that analysis, the notation is coined. Faceted approach allows for certain modifications if required, such as addition of classes in the future, etc.

In the case of this collection, the faceted approach is used. Facets are different types of classes or categories and they enable better organization of objects. The user questions are analyzed and based on this information it is concluded that user searching behavior involves knowledge of Genre and Subject fields, which are subject class candidates, as well as author name. The primary facet is selected to be Genre field as it has limited terms and allows for better organization and as a result, retrieval. Along with these, the Publication Date field is selected to come up with the classification notation, followed by a unique identifier; the unique identifier is not a facet. A unique identifier is, in this case, a number which is assigned only to a single information object, in this case a book, which distinguishes it from all other information objects, and decides the physical shelf location of the book. So, the unique identifier starts from the first record created and increases by one for every new record created. The facets and unique identifier enable for precise identification of the book, and the notation created allows for logical ordering and placement of the book.

The classification scheme is for this collection is designed to produce the following kind of code. Considering a book from the collection, the genre of which is classified as Holocaust (Hol), the first Tags (for Subject) term is Auschwitz, the last name of the author is Morris and the year of publication from the Publication Date is 2018, following the notation as defined in Appendix E., the classification code is Hol.Aus.Mor.2018/10. The notation requires the usage of the abbreviation of the Genre term as provided in Appendix E table and followed by period, the use of the first three letters of the first Subject term, first letter capitalized, followed by period, the first three letters of the last name of Author with first letter capitalized and followed by period, the four digits of the year field in Publication Date followed by a period, followed finally by the unique identifier assigned to the book.