Penny Bailey reports from Internet Librarian International and comments on new developments in search in the UK.

Frank Cervone of North Western University (US) defined federated search as a tool which ‘searches multiple library resources in the background and provides a merged results list’. Indeed federated search has huge value for:

  • searching several resources within the same institution – library catalogue, articles, serials, research database, etc
  • searching the library catalogues in close geographical proximity – university, public and other local libraries
  • searching distributed libraries offering the same type of content
  • searching several online resources at the same time
  • searching any combination of the above.

The benefits to the library user are obvious – not having to repeat searches across several different resources, by
carrying out a simultaneous search and receiving blended results. [To see how federated searching works at Cervone’s
academic library see his slide presentation.1]

In summary, federated search is essentially an umbrella search often utilising, as far as possible, the search facilities already offered by each library resource. It is sometimes referred to as a ‘library portal’.

Problems with federated search
Cervone outlined some problems with federated search:
  • not all databases interpret the search in the same way
  • it can take time to configure databases and resources
  • inconsistent de-duplication of results
  • relevance ranking can be difficult.

These problems have meant that federated search has sometimes been referred to as taking search functionality to the ‘lowest common denominator’. It remains to be seen whether consistency for relevance ranking and de-duplication can be achieved if it is also desirable to respect the underlying search methodologies of each resource.

WebFeat claims to have solved the problem of configuring databases, announcing in February 2007 that it had:
‘launched the industry’s first automated federated search engine. WebFeat Express version 2.1 enables libraries to configure their WebFeat federated search engines in minutes using the latest release of the WebFeat Administrative Console (WAC)… librarians can easily select databases for inclusion in their WebFeat system, as well as their own library catalog and remote user authentication. WAC now automatically configures the library’s WebFeat system − performing any required handshake with the library’s content providers.’3

Current developments
Cervone portrayed federated search as a ‘shrinking world’ − in which library system vendors, firstly, are merging but, secondly, are integrating with the same few federated search engine tools (Encore, WebFeat, Primo and Vivisimo). Cervone also argued that increasingly open source strategies were being deployed including:
  • LibraryFind at Oregon State University4
  • Dbwiz at Simon Fraser Library5
  • PazPar2 for Index Data.

Other trends he identified were:
  • more resources migrating to XML information feeds, but the Z39.50 standard still prevalent
  • better control of search parameters
  • data pre-processing options including visualisation and clustering
  • off-site hosting.

Open source search
There were two papers on the development of open source search tools. The first was a case study on a search system called Summa, developed by the State and University Library of Denmark.6 Key features are:

  • developed collaboratively by librarians, IT-developers, and usability experts
  • search results sorted by relevance
  • results in any character script
  • uses Java and Apache Lucene, a free/open source information retrieval library.7

Interestingly, the authors do not have confidence in commercial systems being successful or developing fast enough.

Ron Davies and Ian Hamilton from the European Commission delivered a paper on ‘open source library portals’. They defined a library portal as ‘…a web-based service that allows users to discover relevant information resources organised into logical, user-related categories; to use a common interface to search one or more simultaneously…’8

It was refreshing to see that the authors recognised the true cost of open source software − implementation takes development time and expertise, there can sometimes be little or no documentation or demonstration samples and it requires contribution from a larger community to thrive.

They reviewed OpenSiteSearch, dbWiz and LibraryFind and identified common features as:

  • ability to create categories of different resources
  • Z39.50 protocol
  • search by field (author, title, subject)
  • merge and sort search results 
  • link to native search
  • no saved searches or SDI functionality

OpenSiteSearch has been active since 2002, but has declined in activity in the last two years and only supports the Z39.50 search protocol. DbWiz dates to 2005 and can support SQL, XML and web interfaces as well as Z39.50. Launched in 2007 with a grant from Oregon State University Library, LibraryFind is the newest, but relies on Perl and object-oriented languages. It seems, however, that not many libraries want to control the tools they use, and less than 10 per cent of implementations use open source tools. Free software does after all have hidden costs.

Federated search vs. native search
The final paper at the conference was a case study from Applied Materials Inc., presented by Sharon Mehl (see the presentation on the Information Today website).9 Mehl’s conclusion is worth noting:

‘The present situation is that the searching is done both on the Knowledge Center portal, using individual databases, and the ksearcher federator. We have concluded that the federator is unable to offer the sophisticated and flexible searching of the individual databases.’

For me Mehl’s conclusion is quite an important one and needs expanding. Federated search is a very good starting point when users don’t know which resources they should be using, because they don’t know what each collection contains or how to search it. It is also a good tool to avoid repeating searches using different search syntax in different resources. But when sophisticated searchers know what they need and are proficient in the advanced or flexible search options, they need to revert back to the purpose-built search engine for that resource.

It is encouraging to see the open-source tools mentioned above allowing reversion to the native search engine. There has been resistance by some content vendors to co-operating with federated searching, so this should also reassure content vendors, who may fear that their carefully developed retrieval tools will be ignored and that users will not get the experience they have so carefully engineered. If anything, a federated search makes content more accessible and increases users’ awareness of the existence of the resource, what it contains and how to use it, so that next time they might reach for it straight away.

Enterprise search
The conference papers were very much aimed at public and large academic libraries. But similar search dilemmas face corporate libraries, but with the added dimension of the need to search internal or unpublished information. It might be useful to introduce two related search technologies and explain how they complement federated searching. An ‘enterprise search’ performs a full-text concept search across different applications and database repositories.

Wikipedia defines ‘enterprise search’ as ‘the practice of identifying and enabling specific content across the enterprise to be indexed, searched, and displayed to authorized users.’10
The content of each database needs to be pre-indexed and the key difference here is that there may not be a search tool for each contributing resource. Each resource has to be categorised and indexed first and then searched.

There is definitely overlap between federated search and enterprise search, as the list of providers on Wikipedia includes Vivisimo, while Apache Lucene is listed as one of the open-source options. Probably the key difference here is that, as the name suggests, ‘enterprise search’ is the term used for search tools applied in corporate or ‘enterprise’ settings to search internal repositories of information, and federated search and library portals in more traditional library situations. Both types of search will simultaneously search and present blended or grouped results, with grouped results perhaps more prevalent in enterprise search results.

Integrated search
It seems a pity with the conference taking place in London that no UK case studies on federated search were included.

Taking it a step further, UK firm Solcara presents its SolSearch as an ‘integrated search’ or ‘hybrid’ approach, combining both enterprise and federated search. In a law firm context, an integrated search would search a combination of internal and external information: 
  • Internal or enterprise search: library catalogue, document management system, knowledge, intranet, practice management system, contact relationship manager, file systems, email etc 
  • Federated search of online resources: subscription services such as LexisNexis, WestLaw, Justis, PLC, Informa etc, and free legal, government and regulatory services such as the FSA, the UKTI, Companies House, HMRC etc.

I think the exciting aspect of Solcara’s integrated approach is that searching is woven into the organisation’s intranet. Information retrieval is no longer the poor relation of the content management system hidden away as a link to ‘search the library’, but instead provides the intranet with its backbone and structure. Finding information is the raison d’être of the intranet and, for once, information retrieval assumes centre stage. [See the case study opposite on how Ashurst’s law firm, with a team of over 190 partners in 12 countries, has put the fee earners in control of what they search and how searching is integral to their intranet.]
In conclusion, federated search, also known as meta-search, cross-search or library portal, is a vital tool for delivering all information resources to library users and making research easier with one search even when those resources use different file structures, require different software platforms and offer different search interfaces. At the same time, as there is unlikely to be a standard for storing information (which of course comes in many shapes and formats), the need for native search tools for the sophisticated researcher will probably not disappear for a long time to come. The difference, however, is that now there
are greater possibilities for embedding and ‘meshing’ search results in other interfaces.

* * *

TAILOR-MADE SEARCH FROM LAW FIRM ASHURST

Ashurst is a major international firm with approximately 900 lawyers in 12 jurisdictions. It advises corporations and financial institutions; its core areas of expertise are M&A (mergers and acquisitions), corporate and structured finance.

Ashurst’s know-how intranet system, Arachne, is the primary way in which Ashurst lawyers access the firm’s model form contracts, briefing notes and other know-how. Much of the information on the award-winning* system has been developed internally and represents ‘high value’ to the firm − it is often specifically client- or sector-focused and is in essence a distillation of the firm’s experience in its core areas.

The system has been developed over the last 10 years and its categorisation and presentation have won praise internally and externally. Bringing the system up to the next level presented a challenge to the firm − how to integrate external sources without allowing the high-value know-how to become lost in a mass of external material.

Ashurst’s solution was to resist the temptation to deliver to lawyers a set of results which contained a mixture of internal and external material. Instead the firm developed its own solution − an optional extra search for lawyers on the search results page. This extra search enables lawyers to carry out one search across a number of external databases. Lawyers simply choose a search phrase and then select one of the appropriate options, ‘legislation’, ‘cases’ or ‘commentary’. The search has been configured to do everything else. A lawyer-led team at Ashurst worked closely with software company Solcara, using its Solsearch product to achieve the kind of results that lawyers wanted.

This approach has proved even more popular than anticipated. Users find the ability to search simultaneously across multiple databases an enormous timesaver. Also, having one search and one easy-to-use interface means that lawyers use this search as a starting point for research even on highly complex legal topics. In a time-pressured environment, lawyers will choose this search rather than navigate through several different external sites, each with its own sometimes complex interface and search methodology.

For Ashurst the investment in a tailor-made solution, resting on its tried-and-tested Arachne system, has proved a great success.

*Winner of Legal Business Award, Contribution to Law Firm Success 2006.

* * *

References
1 www.internet-librarian.com/Presentations/A201_cervone.pps  
2 http://catalog.lib.msu.edu/screens/encore.html  
3 www.webfeat.org/releases/1Feb07_WebFeatExpress2.htm  
4 http://osulibrary.oregonstate.edu
5 http://libraryoftexas.org  
6 http://new.statsbiblioteket.dk/summa/features-text-in-english  
7 http://en.wikipedia.org/wiki/Lucene  
8 www.internet-librarian.com/Presentations
/A202_davies_hamilton.pps
 
9 www.internet-librarian.com/Presentations/A203_Mehl.pdf  
10 http://en.wikipedia.org/wiki/Enterprise_search  

Further reading
Barclay Hill. ‘Federated search at the Intel library.’ Information Outlook, Vol. 11, No. 9, September 2007, pp. 11-23 (case study on implementing federated search in Intel’s corporate library).

Penny Bailey had a former career as a library consultant and is now Managing Director of Bailey Solutions(www.baileysolutions.co.uk). She will be exhibiting at the Library Management Showcase on 14 March at CILIP Ridgmount Street.



Updated: 20 February 2008
Registered charity no. 313014
VAT Registration No GB 233 1573 87
© Copyright CILIP 2008
CILIP, 7 Ridgmount Street, London WC1E 7AE
Tel: +44 (0)20 7255 0500 Fax: +44 (0)20 7255 0501