Increasingly, ‘on web’ means available via Google, says Lorcan Dempsey. Can we combine the best of all search possibilities to cater for the variety of user needs?
Hardly a day goes by without another arrangement between an information provider and Google or Yahoo to expose its collections for search on the web. Everybody wants to be ‘on web’. Google and Yahoo, in turn, are eager to find as many ways as possible of connecting their users to valuable material currently hidden in ‘off-web’ database silos. Most of the digital resources that libraries manage are currently off web: they do not offer themselves up to the Google user. Increasingly, ‘on web’ means available in Google.
Since the coming of the web, we can talk about three stages of library search. To explain this, think of the three components of a search system – user interface, search engine and data. Then let us think of how we can combine these in different ways.
Stage 1. Monolithic search systems: the search operator provides data and search engine.
This is the norm. We provide a particular collection of indexed data for searching within a particular search engine – library catalogues, A&I databases, e-print repositories, and so on. The pattern is that we go to a user interface and do a search. Independent development of many such systems has some consequences which are not helpful in a densely connected network environment. For example, there is a tendency for any one database provider to imagine that they are the sole focus of a user’s attention, and to design accordingly. They imagine any prospective user will reciprocate with their own time and attention the time and attention that has gone into the design of those special features, that online help system – in fact, all that makes this user interface different from other user interfaces.
However, this focus on individual features and functionality means that users have to switch mental models between systems, have to refamiliarise themselves with features on each visit, or learn a new system if they wish to interact with a new resource. How well does this match the search patterns of potential web users?
Such differences do not fit very well into the overall fabric of the web. A focus on features rather than overall experience increases search costs: time and attention are scarce, and difficulty means that these resources will not release their full value in use. Of course, the benefits of using a resource will often outweigh the search costs. We all may be initiate users of those few databases that are important to our work or learning, or we may be prepared to take time to interact with a particularly valuable resource.
Stage 2: Metasearch: the search operator provides data and search, the metasearch operator provides a user interface which interacts with the search.
The fragmentation caused by monolithic search systems has led to ‘meta-search’, where an additional layer is added which aims to hide database boundaries and reduce search costs.
This approach may also reduce functionality, levelling out what can be done across databases. There is also an increase in operator costs. The cost to the metasearch operator is currently high in terms of configuration, maintenance and systems. There is a cost to the search operator in providing an appropriate machine level interface, whether it is Z39.50 or something else. I discussed these issues in the last issue of Update.1
Stage 3: ‘It’s the data stupid’: data is transferred to another search and user interface operator. What we are now seeing is a growth in data export –
exporting the data to somebody else’s search engine and user interface. There are two trends here. One is the emergence of data-sharing approaches facilitated by the Open Archives Initiative Protocol for Metadata Harvesting, where database operators can share their data with other people’s search and user interface service. In the UK, Jisc is experimenting with this approach in several projects.2 The other is exposing data for gathering by the search engines. As noted above, the motivation here is clear: to be where the majority of searching now occurs, to be ‘on web’. For many users, it is no longer good enough for the user interface of a monolithic search system to be on the web. The search costs may be too high for the user for whom Google or Yahoo is the first and last resort of research. If the data is not in the search engines then they will not find it.
This approach means that data is flattened to web pages, structure is thrown away. However, there is an opportunity to reintroduce structure as, in effect, we turn our systems inside out, and try to make the functionality that was available only within monolithic search systems now available on the open web. But we are only really beginning to think about what this means, and web search itself and the companies providing it will evolve over coming years.3
These three stages emerged successively, but will continue to live side by side. We will need to think hard about providing resources in several ways – to satisfy the variety of users and the variety of user needs.
References
1 ‘Pick up a portal.’ October 2004 Update (www.cilip.org.uk/publications/updatemagazine
/archive/archive2004/october/lorcan.htm). 2 See for example the projects in the Jisc Fair programme (www.jisc.ac.uk/index.cfm?name=programme_fair).
3 OCLC is exposing millions of bibliographic records to Google and Yahoo. It sees this as a way of connecting users to library services. The user who finds this data in Google is brought back to a rendezvous page. The functionality can be reintroduced at that page: find records by the same author, for example. See what a rendezvous page looks like at www.worldcatlibraries.org/wcpa/ow/
8ebbe6ee8d051310a19afeb4da09e526.html
Lorcan Dempsey is Vice-President Research, and Chief Strategist, OCLC Inc. (dempseyl@oclc.org).
Updated: 20 January 2005