Reading Summary and Analysis Discussion Post: Exploring Information Systems
Introduction
Inherent in the concept of interactive information retrieval is the notion that we interact with some search user interface (SUI) beyond the submission of an initial query. Perhaps the most familiar SUI to many is the streamlined experience provided by Google, but many more exist in online retail, digital archives, within-website (vertical) search, legal records and elsewhere. Amazon, for example, provides a multitude of different features that together make a flexible, interactive and highly suitable gateway between users and products.
The aim of this chapter is to provide a framework for thinking about the elements that make up different SUI designs, taking into account when and where they are typically used.
Search: the way we usually see it The SUI that many people now see daily is Google, and Figure 8.1 overleaf shows the 14 notable SUI features it provides for users on its search engine results page (SERP). The most common feature searchers expect to see is the query box (#1 in Figure 8.1), which in Google provides a maintained context so that the query can easily be edited or changed without going to the previous page. Searchers are free to enter whatever they like, including special operators that imply specific phrases or make sure certain words are not included. The second most obvious feature is the display of results (#2), which is usually1 ordered by how relevant they are to the search terms. Results typically highlight how they relate to the search terms by showing parts in bold font. Users are typically able to view additional results using the pagination control (#3).
We also see many control and modifier SUI features. Google provides fixed options across the top (#4) and relevant options down the left (#5) for
139
8 ●●●
Interfaces for information retrieval
Max Wilson
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
140
specializing the search towards certain types of results. Further, Google allows users to restrict their results (#6), or change how they are shown (#7). It is typical for search engines to provide an advanced search to help define searches more specifically (#8). Finally, most search engines provide recommendations for related queries (#9).
Google also provides extra information, such as an indicator on the number of results found (#10), and information about when you may have made an error (#11). Finally, Google also provides personalizable features that are accessible when signed in (#14), such as settings (#13) and information about your prior searches (#12).
A starting framework for thinking about SUI designs Broadly, we can break the elements of a SUI, like those discussed in the Google example above, into four main groups:
• input features – which allow the user to express what they are looking for • control features – which help users to modify or restrict their input • informational features – which provide results or information about results • personalizable features – which relate specifically to searchers and their
previous interactions.
Figure 8.1 Fourteen notable features in the Google search user interface
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
These groups are highlighted in zones in Figure 8.2 (input as 1 and 8, control as 4, 5, 6, 7 and 9, informational as 2, 3, 10 and 11, personalizable as 12, 13 and 14), and will be revisited throughout the chapter as other search interfaces provide different features in these groups. Often new SUIs or SUI features innovate in one of these groups. Finally, it is important to note that these groups can overlap. Informational features are often modified by personalizable features, for example, and some features can act as input, control and informational features.
Early search user interfaces A brief early history
The roots of information retrieval systems are in library and information science. In libraries, books are indexed by a subject-oriented classification scheme and to find books we interact with the physical spaces, signposting, and librarians within them. Yet the study of information retrieval was motivated by the development of computers in the 1960s, which could automatically perform one of the tasks that librarians do: retrieve a document (or book). The interface with computers, however, was with punch cards at first, and then command lines sometime after. Immediately, we can see the model kind of support we wanted to provide to users (a librarian) but were so far limited by technology.
141
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.2 The Google SUI zoned by the different types of feature categories
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
Conversation and dialogue
Given the user interface limitations, and the influence of librarianship, some of the initial SUIs were modelled around conversations or ‘dialogues’. In analysing, for example, the roles, questions and answers that took place in conversations between visitors and librarians (Winograd and Flores, 1986), early researchers developed question and answer style SUIs. Figure 8.3 shows an early command-line dialogue-style system introduced in the 1970s (Slonim, Maryanski and Fisher, 1978), which tried to help users describe what they were searching for. These SUIs typically asked the searchers for any information they already had about what they wanted, so that when it came to performing the search (which could last a number of minutes or hours even) it was more likely to return the correct result.
This conversational style was analysed for some time, and was also influenced by those interested in artificial intelligence and natural language processing. As technology improved and results were returned faster, the emphasis of the conversational perspective moved towards modelling a continued dialogue over multiple searches within interactive information retrieval. The MERIT system (Belkin et al., 1995), for example, was designed based on a much more flexible, continuing, conversation model.
Browsing
Another early type of system, still using command-line interaction, supported ‘browsing’. Similar to the initial dialogue-based systems, browsing systems like the 1979 BROWSE-NET (Palay and Fox, 1980) in Figure 8.4 (on page 144) presented different modes to scan through databases and provided options for different ways of accessing the documents. Again, we see these browsing style systems appear over the course of interactive information retrieval design, although in 1983 research identified that people ‘browsed’ less on the early online newsgroups. Geller and Lesk (1983) hypothesized that this may have been because people often knew more about what was in a fixed dataset than in the oft-changing web collection we have now. Despite this hypothesis, we later saw the rise of website directories, like the Yahoo! Directory. Directories, while still available, were never as successful as web search engines, perhaps providing evidence for Geller and Lesk’s hypothesis. More recently, we see browsing interfaces appear within individual websites, as discussed further in the discussion of faceted browsing in the section ‘Faceted metadata’.
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
142
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
Form filling
As SUIs became more directly interactive, with the onset of commercially available graphical user interfaces2 in the early 1980s, the common paradigm we see today of ‘form filling’ became more popular. This advanced the conversational response SUIs, which took input over time from a series of questions, by providing
143
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.3 An early command-line dialogue-style system (Slonim, Maryanski and Fisher, 1978). Copyright © 1978 ACM, Inc. doi>10.1145/800096.803134. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
144
all the data entry fields spatially. Although ‘form filling’ includes normal keyword searching, this technique allowed systems to present all the fields that could be individually searched in a way that we now commonly call an advanced search. The EUROMATH system, shown in Figure 8.5 designed by McAlpine and Ingwersen (1989), has a custom form highlighting all the fields that can be searched individually or in combination.
Boolean searching
One advance in the algorithmic technol - ogies was to process Boolean queries, so that we could ask for information about ‘Kings OR Queens’, and get a more comprehensive set about, in this case, Monarchs. This technological advance was made before the majority of SUI develop ments, as can be seen in Figure 8.5. The advent of GUIs, however, provided an opportun ity to help people construct Boolean queries more easily and visually. The STARS system (Anick et al., 1990), shown in Figure 8.6, allowed users to organize their query in
a 2D space, where horizontal space represented ‘AND’ joins, and anything aligned vertically were ‘OR’ joins. Like all these early ideas, Boolean searching is still prevalent in our modern interactive information retrieval SUIs, including Google (see Figure 8.1); the ‘-’ before the word is equivalent to a Boolean NOT, in this case.
Summary
The initial advances in information retrieval were typically made in technological improvements. Consequently, these SUI advances in the early days related mainly to the input SUI features, with the exception of some advances (like the browsing and form filling) which provided information about the structure of the data, making them also contribute to the informational SUI
Figure 8.4 An early browsing interface for databases (Palay and Fox, 1981). Copyright © 1980 ACM, Inc. Reprinted by permission.
Figure 8.5 The EUROMATH interface (McAlpine and Ingwersen, 1989). Copyright © 1989 ACM, Inc. doi>10.1145/75334.75341. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
features. Other informational advances included simple highlighting in a result where it matched the query, as shown in Figure 8.7 where the horizontal bar at the bottom indicated where in a book any search terms appear. The onset of GUIs meant that SUIs became more interactive, with Pejtersen’s fiction browser (Pejtersen, 1989) presenting an explorable-world view of a bookshop, as shown in Figure 8.8 overleaf. Pejtersen’s fiction bookshop allowed users to browse the bookshop using different strategies, where the figures shown are engaging in each strategy. We were not yet, however, engaging in what we now call interactive
145
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.6 The STARS system (Anick et al.,1990). Copyright © 1990 ACM, Inc. doi>10.1145/96749.98015. Reprinted by permission.
Figure 8.7 Use of highlighting for terms that match a query (Teskey, 1988). Copyright © 1988 ACM, Inc. doi>10.1145/62437.62481. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
146
information retrieval, where we consider interactive information retrieval to be the ongoing interaction over multiple searches to reach a goal, rather than the single search that is still often considered in information retrieval.
The onset of modern interactive information retrieval SUIs
The onset of modern interactive information retrieval SUIs began around the time that we first saw web search engines like AltaVista,3 but before Google was launched. One of the first studies to demonstrate that there were significant and specific benefits to interactive information retrieval, where users actively engage in refining and submitting subsequent queries, was provided by Koenemann and Belkin (1996). Using a query engine that was popular at the time called INQUERY, Koenemann and Belkin built the RU-INQUERY SUI, shown in Figure 8.9 (b). Searchers could submit a query in the search box at the top left, and see a scrollable list of results on the right hand side. The current query was then displayed in the box underneath the search box. The full text of any selected result was displayed beneath the results on the right. The RU- INQUERY interface had hidden, visible, and interactive relevance feedback terms; the interactive terms provided the most effective support for users.
Figure 8.8 Pejtersen’s fiction bookshop (Pejtersen, 1989). Copyright © 1989 ACM, Inc. doi>10.1145/75334.75340. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
The experiment was built to leverage relevance feedback (discussed in Chapter 6, ‘Access Models’), which used key terms from the results marked as ‘relevant’, using the check boxes, and added them to the search to get more precise results. To demonstrate the benefits of interaction in information retrieval, three altern - ative versions were developed:
• opaque – provided the typical relevance feedback experience that was common at the time, where terms from the selected relevant documents were added, but there was nothing in the SUI to display what those additional terms were (Figure 8.9(b))
• transparent – provided a similar experience to the opaque version, except that the added terms were made visible in the ‘current query’ box
• penetrable – allowed the users to choose additional terms from the relevant documents; the keywords associated with the relevant documents were listed in a separate box below the ‘current query’ box (Figure 8.9 (a)), and could be added to the current query box manually.
While all three experimental versions provided improved support within a task- based user study, the most interactive penetrable version provided statistically significant improvements and did not significantly increase the time involved in searching. When analysed according to the framework described in the section ‘A starting framework for thinking about SUI designs’ above, this study showed the initial value of having control SUI features that help people modify and manipulate a search.
147
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.9 The RU-INQUERY interface (Koenemann and Belkin, 1996). Copyright © 1996 ACM, Inc. doi>10.1145/238386.238487. Reprinted by permission.
(a) Penetrable condition
(b) Opaque condition
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
148
Modern search user interfaces and features
This section covers many of the more modern advances in SUI designs, and is structured according to the framework described in the section ‘A starting framework for thinking about SUI designs’. It begins by discussing input features, before moving on to control, informational and personalizable features.
Input features
While there have been many technical advances in the processing of user queries and matching them against documents, the plain white search box has remained pleasingly simple. This section begins by examining the design of the search box, before moving on to other input methods.
The search box
The search box pervades SUIs and searchers can feel at a loss when they do not have a small white text field to spill their search terms into. The search box has many advantages:
• Flexibility – It is extremely flexible (assuming the technology behind it is well made), uses the searcher’s language and the searcher can be as generic or specific as they like.
• An informational feature – As well as being primarily used as an input feature, the search box can – and should – be used as an informational feature. When not being used to enter keywords, the search box should be informing the user of what is currently being searched for.
• The auto-complete function – This can help people avoid entering unproductive search queries. By providing information to the user as they query, auto- complete helps make the search box a better informational feature as well as an input feature. Auto-complete can be rich with context, with the Apple website providing images, short descriptions and even prices, as can be seen in Figure 8.10 (a). Furthermore, auto-complete can be personizable, as with Google in Figure 8.10(b), which shows queries the searcher has used before.
• Operators and advanced search – The keyword search box itself has only really had minor visual changes, with some suggesting this may affect the number of words people put in their query. Regardless, studies indicate that searchers submit between two- and three-word queries (Jansen, Spink and Saracevic, 2000; Kamvar et al., 2009), and around 10% of searchers use special operators to block certain words or match explicit phrases. Advanced search boxes, when implemented well, can help guide people towards providing more explicit queries in the search box.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
149
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
(a) Apple – shows lots of contextual data (b) Google – prioritizing previous searches
Query by example
There is a range of searching systems that take example results as the input. One example commonly seen in SERPs is a ‘More Like This’ button, which returns pages that are related to a specific page. While these could be seen as control examples, an example demonstrator called Retrievr 4 lets searchers sketch a picture and returns similar pictures. Similarly, services like Shazam5 use recorded audio as a query to find music. Shazam and Retrievr are examples that are explicitly query by example input features, while others can be seen as input and/or control.
Adding metadata
While there have been some variations in how we enter information into a search box, the alternative is typically to present useful and usable metadata to the users as an input feature. The presentation and use of metadata in SUIs, however, can be very hard to delineate in its contribution between input, control, informational and personalizable features. Indeed, well designed use of metadata can serve as a feature in each of these feature types. Presented on the front page of a SUI, categories can, for example, allow the searcher to input their query by browsing. If a searcher can filter their keyword search, or make sub-category choices, then metadata can quickly become a control feature. Further, if results are accompanied by how they are categorized, then metadata
Figure 8.10 Examples of auto-complete
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
can become an informational feature too; research has shown this to be popular with searchers (Drori and Alon, 2003). Finally, it’s not beyond the realm of possibility to highlight popular or previously used category options to make them personalizable too.
Categories
Websites, including the Yahoo! Directory, often present high-level categories to help users externalize what they are looking for. Several studies (Egan et al., 1989; Dumais, Cutrell and Chen, 2001) have shown that categorizing results in SUIs can help users to find results more quickly and more accurately. One key early system called SuperBook, which automatically created a categorized index over full-text documents, was shown to help people learn, as measured by quality of short open-book essays (Egan et al., 1989). More recently, eBay and Amazon provide searchers with higher level categories so that they can first define what type of object they are looking for before browsing with richer metadata.
Clusters
One challenge for categories, especially for the whole web, is to categorize all the data. Another approach, using clustering algorithms in the backend, is to cluster results by key topics in their content. One early cluster system, called Scatter/Gather, divided results into clusters of similar topics to highlight the range of topics covered in a SERP. Evaluation of the Scatter/Gather approach showed that searchers were easily and quickly able to identify groups of more relevant documents compared with a standard SERP (Hearst and Pedersen, 1996).
A more recent system, Clusty 6 (Figure 8.11), embodies a clustering method that creates automatic hierarchical clusters based on the results that are returned, but is primarily used as a control feature. Despite some studies showing evidence that clusters help searchers to search (e.g. Turetken and Sharda, 2005), research has suggested that well designed carefully planned metadata is better for SUIs than automatically generated annotations (Hearst, 2006a).
Faceted metadata
It has been popular to categorize results in multiple different ways, so that searchers can express several constraints. Research has shown that, compared with keyword search, faceted systems can improve search experiences in more open-ended or subjective tasks (where no single right answer is available)
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
150
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
(Stoica and Hearst, 2004). The popular Epicurious website, for example, allows users to describe recipes that they would like by several types of categories (called facets), including cuisine, course, ingredient and preparation method. While the first selection in a facet acts as an input, subsequent selections in facets act as refinements, and can thus be considered as control.
The Flamenco interface 7 (Yee et al., 2003), shown in Figure 8.12, provides
151
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.11 The Clusty system
Figure 8.12 The Flamenco interface
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
152
several different categories (called facets), which can be used in combination to define a query. It was used to demonstrate the value of faceted browsing and represents the standard faceted SUI design. Many variations have been designed since. Typically, a range of hierarchical or linear facets are provided, and users can make selections in one or more of them. In Flamenco, used facets are removed from view, so that remaining facets can receive more attention, and selections are placed in a breadcrumb list of choices. Removing used facets provides an effective approach for quickly narrowing results. Other systems like mSpace8 (schraefel et al., 2006) leave facets in place to encourage exploration by quickly changing and comparing decisions. mSpace provides an advanced faceted SUI where the order of facets implies importance and gaps from left to right are highlighted. Figure 8.13 shows that the two clips in the far right column are from 1975 and 1974, which would not normally be conveyed in faceted SUIs. mSpace (and iTunes) facets are only filtered in a left to right direction, and highlights have been shown to help searchers learn and discover related items in the remaining unused facets (Wilson, André and schraefel, 2010). Other systems, including mSpace and eBay, permit multiple selections within single facets, so, for example, searchers can see results that relate to two price brackets.
Faceted categories are typically used within fixed collections of results, such as within one website (typically called vertical search), as there must be common attributes across all the data to categorize them effectively. Although researchers have tried to apply facets to general web search (Kules, Kustanowitz and Shneiderman, 2006), Google does not typically provide faceted search, except in Google Shopping.9 In the narrower space of searching for products, there are common factors like price and shop that can be easily applied to all of the results.
Figure 8.13 The mSpace interface
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
More detailed literature can be read about the design of faceted metadata interactive information retrieval systems (Hearst, 2006b; Tunkelang, 2009).
Social metadata
The rise of social media websites has led to the popular use of socially generated metadata, such as tags. Tag clouds are now familiar in many SUIs, with services like Flickr allowing you to explore by popular tags.10 Further, the research prototype called Mr Taggy11 (Figure 8.14) allows users to search the web using different types of tags, separating out adjectives and nouns, collected from Delicious. Studies of MrTaggy suggest that searchers can explore and learn more when tag clouds are available (Kammerer et al., 2009). Research has also shown that building tags into the display of SERP can help users to (control or) change their search at later stages (Gwizdka, 2010). Other social metadata can be used, but is often informational, such as what other searchers have found or how results have been rated.
Control features
Control features can be considered particularly important from an interactive information retrieval perspective, as they facilitate and control the ways in which the search continues and thus make it interactive. Very recent research has emphasized the importance of good control features, showing that the right support can significantly enhance an interactive information retrieval task, but the wrong support can distract or slow down searchers (Diriye, Blandford and Tombros, 2009).
153
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.14 The MrTaggy prototype
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
Interactive query changes
One of the key ways that we continue to interact with a query is to alter and refine a search, known as interactive query expansion (IQE). IQE is discussed more extensively in Chapter 9, ‘Interactive Techniques’, but the aim is to suggest additional, or replace ment, words to the searcher that might help the system return more precise results. If, for example, a user searches for ‘Queens of the stone age’, Google (in Figure 8.15(b)) returns a series of extensions to that query that might further define what the searcher is looking for. IQEs may vary in location and type. Continuing their philosophy on facets, Google presents IQEs at the bottom of the page, assuming that the user will only wish to refine their query if they don’t find what they want in the
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
154
Figure 8.15 Interactive query suggestions (refinements and alternatives)
(a) Bing provides IQEs for every search on the left
(b) Google provides IQEs for some searches at the bottom of the page
(c) Amazon provides fewer IQEs above the search results
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
first ten results. Amazon provides fewer IQEs, positioned above the search results (Figure 8.15(c)). Bing12 defines itself as a ‘discovery engine’ and provides a collection of refinements (to narrow) and alternative searches (to change focus) permanently on the left side of the screen, as shown in Figure 8.15 (a).
Corrections
One good rule for interactive information retrieval is to never let the user reach dead-ends where the interaction has to stop, and so another common form of control feature helps correct and notify users when they are heading down a bad path. In making two errors and searching for ’Quens of the Stonage’, for example, services often suggest corrections or auto-correct their search. Amazon (shown in Figure 8.16 (b)) states clearly that no results were found for this query, provides the correction, and provides example results for that spelling correction.
Bing (shown in Figure 8.16(a)) and Google often auto-correct errors, but provide a link to results on the exact query, if they are confident that the corrected version is more likely. Amazon (Figure 8.16(b)) says it cannot find results for an incorrect query, but provides a link and example results for a corrected version.
Sorting
One method of helping searchers stay in control of what they are looking for is to allow them to decide how they want results to be ordered. Web search engines typically order results by how relevant they are to a query. It is common, in online retail, for example, to be able to change results to be ordered by price, and either by most expensive to least expensive, or vice versa. Figure 8.17 (a)–(c) shows a
155
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.16 Correcting errors in user queries to avoid dead ends
(a) Correcting in Bing
(b) Correcting in Amazon
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
156
Figure 8.17 Features for sorting search results
(a) Sorting options in Amazon (b) Sorting options in Walmart
(d) Tabular sorting in Scan.co.uk
(c) Sorting options in Yahoo
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
range of ways in which products can be reordered. Scan.co.uk (Figure 8.17 (d)) takes an approach that is more commonly seen in tabular views, where any column in the result set can be used to order results. Choosing a column orders the results by the metadata in that column. Rechoosing a column typically reverse- orders them.
Filters Although not too dissimilar from using metadata-based input methods as a control feature, as discussed above, SUIs often provide ways of filtering results. Aside from entering special sub-domains (images, shopping and so on), Google allows searchers to filter results to: recent results, results with pictures in them, previously visited results, new results, and many more. These types of filters are single web links that can be selected one at a time. It is becoming increasingly popular to have dynamic filters like sliders and checkboxes (for multiple selections). Originally studied in the early 1990s (Ahlberg and Shneiderman, 1994), as shown in Figure 8.18, dynamic query filters that immediately affect the display of results have been shown to help searchers express their needs more quickly and effectively. We now see examples of responsive dynamic filters, and other interactive features, on many modern systems, including Globrix13 and Volkswagen.14
157
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.18 Dynamic query filters (Ahlberg and Shneiderman, 1994). Copyright © 1994 ACM, Inc. doi>10.1145/191666.191775. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
Grouping
Another approach to reordering results is to group the results by their type. Flamenco (Figure 8.12) can group results by any facet of metadata. Similarly, the iTunes store organizes results by whether they are matched by artist, album, or song title and so on. Searchers are then able to focus in on the results that most closely match their intentions. The notions of grouping results and tabular views, as a form of control feature discussed above, begin to branch into how results are presented and so can also be considered as informational features.
Informational features Although they can appear fairly simple, the way we present individual results in a SERP has been very well researched, but not all of the ideas have stuck. Web search results, as shown in Figure 8.19 typically include the title of the result, a snippet of text from the result, and the URL for the result.
• Text snippets – These have been well studied. Informally, two lines are typically chosen as the optimal balance of communicating useful information and including as many results as possible above the first- scroll point. More formal research (White, Jose and Ruthven, 2003) has shown that snippets are most effective when they include the search terms, which are usually also highlighted. The size of snippets rarely varies, with early research showing that lengthening them had a significant impact on their benefit (Paek, Dumais and Logan, 2004). Some systems provide a ‘more’ button to extend a snippet, while Google provides an option to change the default length for all snippets.
• Usable information and deep links – These provide users with shortcuts to make their search more efficient. It is common, for example, for online retail ers to let searchers buy products directly from the SERP, or to jump direct to parts of individual pages, such as reviews. Such shortcuts are typically called deep links, and can be seen clearly on Google (Figure 8.20). They help searchers jump directly to key pages within more popular websites.
• Images and thumbnails15 – These have been used at various times by most interactive information retrieval systems, but their benefit remains
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
158
Figure 8.19 A typical example of a single result in a Google results page
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
inconclusive. While critical for image search, and beneficial for visual con - tent like products, they are rarely used in web search. One conclusion is that they aid revisitation and refinding (Teevan et al., 2009), but other research suggests that users can make accurate snap judgements about the quality and professionalism of websites (Zheng et al., 2009). When thumb nails are used, research recommends that thumbnails be approximately 200 pixels wide (Kaasten, Greenberg and Edwards, 2002). Further, the principle of thumbnail previews can stretch to other multimedia (schraefel et al., 2006).
• Immediate feedback – Another strong principle of SUI design is to provide results as soon as possible, as the user may see their result and never need to refine it. While this typically means returning results and control features together after the first search, Google Instant16 takes this principle to the extreme by providing results after the very first character is entered into the search box.
Visualizing relevance
Search systems rarely display explicit relevance values, as everyday searchers will not know what they mean or how they are calculated, but interactive information retrieval systems can communicate relevance in other ways. When sorting by specific criteria, such as average rating or number of downloads, we can clearly demonstrate these meaningful values in the way we display results.
We frequently see examples of bar-chart style indicators to communicate the relevance of a result (Veerasamy and Belkin, 1996). Figure 8.21 overleaf shows examples of bar-chart style relevance indicators found with Apple’s email software and PDF reader. Such relevance judgements can be more specifically helpful when results are sorted by a different dimension, as in Figure 8.21(a).
One early example of abstracted relevance, called TileBars (Hearst, 1995) (Figure 8.22 on page 161), tried to provide detailed information about the relevance of documents, and the parts within the document. Each row of a
159
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.20 Deep links in Google
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
160
TileBar represents the elements of the query (three terms in this case). The length of a TileBar, instead of representing an overall relevance like a bar chart, represents the length of a document. Each horizontally aligned square represents a section of the document. The colour intensity of each square indicates the relevance of that section of the document to that term in the query. Consequently, users get a sense of the size of the document, and the amount of it that is relevant to the different elements of the search criteria. Although we do not see TileBars, as originally designed, frequently in search systems, we see examples of the TileBar principles in more modern systems. The experimental HotMap (Hoeber and Yang, 2009) interface, shown in Figure 8.23, provides a rotated TileBar-style view over the results of a search. Each darker square indicates the relevance of a result to one individual term in the keyword search.
The InfoCrystal system (Spoerri, 1993) (Figure 8.24 on page 162) provided a unique visualization for conveying relevance to different search terms.
(a) Searching for email in Apple Mail
(b) Searching for terms in a PDF using Apple Preview
Figure 8.21 Examples of indicating relevance with a bar chart in Apple OS X
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
161
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.22 TileBars (Hearst, 1995). Copyright © 1995 ACM, Inc.doi>10.1145/223904.223912. Reprinted by permission.
Figure 8.23 The HotMap interface (Hoeber and Yang, 2009). Copyright © 2009 Wiley. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
InfoCrystal creates a shape with as many corners as search terms. Addi - tional smaller shapes, placed within that shape, indicate how many results are relevant to combinations of these terms, where the number of sides and their colours indicate which search terms the documents are relevant to. Using this system, which can be hierarchically nested, searchers can determine the set of results that are relevant to different combinations of search terms.
2D displays of results Branching into the realms of inform -
ation visualization, many systems have explored the use of horizontal and vertical dimensions to present results. Most commonly we see search results displayed as a grid (e.g. image search), where we expect (at least in the western world) relevance to go left-to-right across rows, and to repeat for all subsequent rows. This does not so much use two dimensions to present results as present one dimension of relevance in a different layout.
Loosely organized 2D spaces Embodying the idea of clustering, many 2D displays have been used to present results spatially according to how relevant they are to each other. ‘Self-organizing maps’ (SOMs) (Kohonen, 1990), for example, cluster results around key terms, as shown in Figure 8.25. The layout is determined by the content of the documents, and the positioning tries to find the optimal distance from the relevance scores between documents. SOMs are based on neural network algorithms, and they can create a layout that can look like a surface with mounds where there are many similar results. Colour intensity, as with TileBars, is often used to create this effect. SOMs can visualize large numbers of results, where the webSOM in Figure 8.25 is showing 1 million documents. The webSOM organizes newsgroups by how they relate to each other. Searchers can zoom in on areas of the map and select individual documents that are represented as nodes.
Similarly, there are several variations of graph-based visualizations, where the emphasis in the visualization is on the connections between clusters, rather than the clusters themselves. The ClusterMap visualization, shown in Figure 8.26 (on
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
162
Figure 8.24 Spoerri’s (1993) InfoCrystal interface. Copyright © 1993 IEEE. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
page 164), shows the cluster con - nections by highlighting the topics in the documents, and documents that share multiple topics. Figure 8.26 (overleaf) shows a ClusterMap generated by Aduna software.17 The graph shows results that are related to four categories, where 24 documents, for example, are about both Microsoft Word Documents and RDF.18
There are many alternatives to these examples but one limitation they all share is that it can be hard to know where the results are. They are ordered by how they relate to certain topics and each other, and this order may not be clear to the person searching. Similarly, research into the layout of tag clouds showed that predictable orders, such as alphabetical order, were important for searchers. One alternative is to let users control and manipulate these 2D spaces.
163
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.25 The webSOM interface
(b) First zoom level
(c) Second zoom level
(a) Overview level
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
Searcher organized 2D spaces
In order to make 2D spaces more meaningful for users, research has investigated search methods that are similar to the way we can view folders in Windows and the Mac OS X operating systems. Research into the TopicShop SUI (Amento et al., 2000), shown in Figure 8.27 found that users were better able to find useful websites when they were able to manipulate documents in a 2D space. Further, users were able to use spatial memory to remember where they had left documents. Giving users control over such spaces allows them to make the spaces personally meaningful.
Structured 2D spaces
When results are classified by categories and facets, these metadata can be used to give a 2D space a structure. The TreeMap visualization (Shneiderman, 1992), shown in Figure 8.28 shows results based on the hierarchy of cate gorization. This TreeMap shows 850 files in a four-level hierarchy, where colours represent different file types. Each level of the hierarchy is broken down by changing
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
164
Figure 8.26 The ClusterMap interface
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
165
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.27 The TopicShop interface (Amento et al., 2000). Copyright © 2000 ACM, Inc. doi>10.1145/354401.354771. Reprinted by permission.
Figure 8.28 The TreeMap interface (Schneiderman,1992). Copyright © 1992 ACM Inc. doi>10.1145/102377.115768. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
divisions between columns and rows. The 2D space is divided among the top- level categories, in proportion to the volume of results they include. Each of these top-level groups is, in turn, divided among its sub-categories, where the size of those is determined by the number of results they include. Where the space is first divided into columns, each column is then divided into rows. The process continues recursively. This process provides a top-down view of a hierarchy, and the results within them. Colour can often be used to highlight results within this layout, according to a different dimension, such as price. Although circular versions of TreeMaps have been explored, they are generally considered to be less efficient and less clear about volume than the original design.19
While TreeMaps provide a more structured layout, it can still be hard to visually search through the results. To provide an even clearer view of a 2D space, some visualizations allocate specific facets of results to each of the two dimensions. If one dimension shows price, and another shows quality, then users can easily identify the cheapest results that match a certain quality. The GRIDL system (Shneiderman et al., 2000) (Figure 8.29) is an example that breaks each dimension into groups, and lists results in the marked-out space that match. Each dimension is browsable and/or filterable, so users can explore the data at different granularities. The visualization within each square can be
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
166
Figure 8.29 The GRIDL interface (Schneiderman et al., 2000). Copyright © 1992 ACM Inc. doi>10.1145/336597.336637. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
changed. Similarly, the Envision system (Nowell et al., 1996) allowed users to set the horizontal axis, vertical axis, icon, shape, colour coding and labels to represent one of many different dimensions in the data. Timelines and maps also use dimensions that are common and meaningful to searchers.
3D displays of results
Naturally, in line with the progression in technology, many visualizations have been extended into 3D versions in order to increase the number of results that can be displayed. 3D perspectives have been added to SOMs, and the DataMountain (G. Robertson et al., 1998), shown in Figure 8.30, adds a third dimension to the user-controlled spatial layouts. The BumpTop software,20 now acquired by Google, provides a 3D layout and mock-physics to allow users to organize files on the desktop. Further, the exploration of a hierarchy was extended in a 3D space in a design called ConeTrees (G. Robertson, Mackinlay and Card, 1991), as shown in Figure 8.31 (overleaf). ConeTrees were later used in the Cat-a- Cone prototype to enrich the 3D display with multiple simultaneous categorizations, where any found results would be highlighted in multiple places
167
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Figure 8.30 The DataMountain interface (G. Robertson et al., 1998). Copyright © 1992 ACM Inc. doi>10.1145/288392.288596. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
168
across the full context of the hierarchy (Hearst and Karadi, 1997). There have also been examples of stars-in-space style visualizations, which users can explore in 3D. While many of these ideas are exciting, they have rarely been shown to provide significant improvements to searchers (e.g. Cockburn and McKenzie, 2000). Essentially, the overhead of manipulating and navigating through a 3D space currently overrides the benefits of adding the third dimension. 3D visualizations also typically involve overlap and occlusion. Further, some research has shown that around 25% of the population finds it hard to perceive 3D graphics on a 2D display (Modjeska, 2000). Finally, the technology to deliver 3D environments on the web is still somewhat limited. Research, however, continues into 3D environments and 3D controls, and so it is not beyond the realm of possibility that we will see more common use of 3D result spaces in future systems.
Additional informational controls
So far this section on informational features has mainly discussed the way in which we can display results. There are many other secondary informational features that can convey additional information, guide and support searchers in a SUI:
• Guiding numbers – These can help searchers maintain awareness of how they are narrowing down results; the pagination and number of results in Google convey whether the user is going in the right or wrong direction. Similarly, numbers can inform searchers as to how many results are in a category, and thus help them make better browsing decisions.
• Zero-click information – Web search engines provide zero-click
Figure 8.31 The ConeTrees interface (G. Robertson, Mackinlay and Card, 1991). Copyright © 1991 ACM Inc. doi>10.1145/108844.108883. Reprinted by permission.
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
information, from converting currency to displaying local movie times, which guides searchers and can help them be more efficient.
• Signposting – Signposting such as breadcrumbs (e.g. Flamenco in Figure 8.12) makes the status of a current search clear to users. Further, interactive breadcrumbs allow users to control their search by jumping back to previous searches.
• Simple animations 21 – These can help guide users towards change (Baudisch et al., 2006). When hovering over a new notification in Facebook, for example, its colour highlights and fades to show that it has been addressed. Animation is especially important in conveying change in 3D SUIs (G. Robertson et al., 2002).
Personalizable features Personalizable features of SUIs tailor the search experience to the searcher, either by their actions or by those of other searchers who are personally related by some social network. Although these features have so far typically affected the content and the display of informational and control features, there are many specific features designed for personalization. For example, shopping carts provide searchers with a space to collect search results proactively, while many systems automatically track and display recent searches.
Much more could be said about individual and social personalization, but these two topics are covered in much more detail in Chapter 10, ‘Web Retrieval, Ranking and Personalization’, and Chapter 11, ‘Recommendation, Collaboration and Social Search’. It is important to remember that such influences can pervasively manifest themselves in SUIs as dedicated functionalities and within existing features.
Summary This chapter has focused on describing SUIs and SUI features. It began by breaking down the search features involved in the current Google SUI, and categorizing the features into four groups: input, control, informational and personalizable features. It then described examples of the four categories of SUI features. Input features enable searchers in describing what they are searching for. Beyond searching with keywords on the web, most approaches strive to provide metadata to searchers that can be recognized and selected. Control features help us to manipulate our searches, either by refining what we are looking for, or by filtering the results to be more specific. Informational features provide results to searchers in many different forms, but can also help the searcher be aware of what they have done and can do next. Finally personalizable features are designed to impact the preceding types of features
169
WILSON • INTERFACES FOR INFORMATION RETRIEVAL
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge
so they mean more to the individual searcher. The aim of this chapter has been to introduce SUIs and SUI features, and discuss
how they are designed. The goal was to provide a framework for thinking about designs. Readers should think about the users first, and the technology that will help design the right interface. When assessing or evaluating existing SUIs, readers should think about the different features and the one or more categories they fall into. When designing new SUIs, readers should think about the range of support they are providing with the combination of features being included. Further, readers should consider whether features can be extended or improved so that they fall into more than one category of feature. More detailed surveys of SUIs are available in dedicated and more detailed books (e.g. Hearst, 2009; Morville and Callender, 2010; M. L. Wilson et al., 2010). The next chapter continues to help us think about interactive information retrieval systems by focusing more on the theory and models we have about searchers and searching behaviours.
Notes 1 Alternatives to ranking results by relevance are discussed later in this chapter. 2 Graphical user interfaces are typically defined as involving icons and windows and
as allowing mouse-based input, which differentiated them from earlier systems that involved command lines or keyboard-based navigation of text user interfaces.
3 See www.altavista.com. 4 See http://labs.systemone.at/retrievr/. 5 See www.shazam.com/. 6 See www.clusty.com. 7 See http://flamenco.berkeley.edu/. 8 See http://mspace.fm/. 9 See www.google.com/products.
10 See www.flickr.com/photos/tags/. 11 See http://mrtaggy.com/. 12 See www.bing.com. 13 See www.globrix.com/. 14 See www.volkswagen.co.uk/new/. 15 Thumbnails are small screenshots of the website, as in a picture of the website,
rather than a picture from it. 16 See www.google.com/instant/. 17 See www.aduna-software.com/. 18 RDF is a data markup language for the Semantic Web that adds relationships to XML. 19 See http://lip.sourceforge.net/ctreemap.html. 20 See www.bumptop.com. 21 Animation should be used carefully and purposefully in SUI design.
INTERACTIVE INFORMATION SEEKING, BEHAVIOUR AND RETRIEVAL
170
Core terms of use, available at https://www.cambridge.org/core/terms. https://doi.org/10.29085/9781856049740.010 Downloaded from https://www.cambridge.org/core. The University of British Columbia Library, on 04 Feb 2022 at 23:34:23, subject to the Cambridge