Create a revolution by seeking order
Access to The National Archives has been revolutionised by digital technology and so have the roles and responsibilities of the information professionals that work there. John Sheridan explains how this happened and the role that non-relational databases played in the process.
Debates about databases are not click-bait. They are unlikely to trend on Twitter, get posted on Instagram or liked on Facebook. But the flexibility of databases has played a key role in enabling these activities and providing billions of people with access to them. As libraries and archives attempt to capture and reveal this fast changing world, it is often these non-relational or NoSQL databases that they use.
The process of implementing NoSQL can be a strange one though. The huge changes in practice that it enables are often accompanied by similarly huge changes to the organisations that implement them.
It is a process that John Sheridan, Digital Director, The National Archives (TNA), has witnessed, particularly in the development and implementation of the TNA’s Discovery Platform, which has opened up the archives to millions of users.
“If you’re an archive and you over-standardise”, John says, “the real world will very quickly confound you with the flexibility it needs. That sense of the real world being more complicated than any model you might make of it is the big appeal of
NoSQL technology. Libraries and archives keep running into it, the real world being more complicated than our models of it. So using technologies that have more open assumptions – that really helps us.”
The effect at TNA has been revolutionary. Some of this was expected. Some not.
“Our strategy talks about The National Archives becoming a disruptive digital archive and what we mean by that is disrupting archival practice, developing new archival practice that is capable of dealing with the challenges that we see.”
He said “We want to change how people think about archives. Fundamentally. So archives are useful in ways that people don’t necessarily anticipate and that’s the opportunity.”
John says “I was on holiday in Newlyn and I saw a little bit of railway. I Googled the search term ‘Newlyn light railway’ and because it’s old and no longer exists there isn’t a page on Wikipedia. But because there are some records in the Cornish Archive, Discovery, much to my surprise, was the first result on Google.
"It never occurred to me, Digital Director at The National Archives, that the first answer would be a catalogue description through Discovery. I just put in my three word search term and then I found myself benefiting from an archival catalogue description.”
The implications are revolutionary, because people who aren’t intending to look in archives are finding themselves in them. “That’s a big new audience and how do we go about meeting their needs? That’s been the really big change for us over the five years that we’ve had Discovery.” It has also meant big changes in the way work is done.
To cater, not just for the new audience, but any digital audience used to using fast, intuitive and free online resources, John says: “What you find yourself creating are multi disciplinary product-oriented teams who have all of the skills needed to deliver the digital service. You have people with traditional archival knowledge and skills. You have people with user research and user experience skills and you have people with software development and database skills and you bring them together into the same team so that they can maximise the benefits of coordination.”
Crown Copyright, Courtesy of The National Archives
But where does this leave the professional archivist? “The archivist continues to have a really important role as a mediator and some of it is how you take traditional archival knowledge and the traditional archival role of the mediator and start to embed some of that in your digital product and service. How do we take some of the tacit knowledge that the archivists have and, through their work in a product-oriented team, then frame that in the context of the product that people can use?”
End of profession?
To some, the idea of imparting archive skills into digital services might sound ominous. But rather than digital heralding the demise of archival skills, John sees demand ballooning. “Where there is the huge disruption is around digital archiving. This is where the collection itself is digital and where the archivist role, the digital archivist, is at the forefront not just of being a mediator but actually developing an entirely new body of archival practice to cope with the challenge of digital records, preserving them contextualising them, presenting them.”
The TNA’s apprenticeship scheme could be seen as a manifestation of this. “For us developing out digital skills is a big strategic priority”, John says. The scheme is run via The National College for Digital Skills, ADA, and its first five apprentices started in May. “They are here learning a range of digital skills, hard technical skills: software development, databases, how to develop digital services in relation to user need, and some of the more creative skills.”
The intake includes school leavers bypassing university as well as graduates in other disciplines looking to develop digital skills.
He is agnostic on how you arrive as a digital archivist: “Whether you arrive by adding digital to archiving or archiving to digital I don’t know that we have too much of strong point of view about. But what I do know is that we need digital archivists.” However, the apprentices are not being trained in archiving skills.
The question is whether archives can compete with other sectors for digital talent. John doubts this will be a problem: “Show me a domain that has more interesting computer science problems than digital archives? Seriously that’s the case. Actually digital archiving is hugely drawn from, and about computer science.
Crown Copyright, Courtesy of The National Archives
"It’s a relatively easy-sell to attract people with maths and computing backgrounds. You just point to a new technology and say ‘how do you think we might preserve that? How might you preserve a neural network? How might you preserve a blockchain or distributed ledger?” That’s a computer science problem. How might you take video content where the encoding of the information is going to be changed over time and you want to ensure the informational content of that video. What techniques are there for doing that? It’s a really interesting challenge in signal processing. A really interesting computer science problem, not solved yet. So it’s quite easy to get computer scientists excited about those things.”
Traditional relational databases have been used by most organisations since the 1970s. Records can be added to them if they are known to share attributes with existing content. For a non-relational database, no knowledge of attributes is required. In principle, any record that can be encoded can be included: a document, a video, an email.
Digital technology enables the encoding and has also separated out all the other database processes which can now be carried out at different times by an array of tools and techniques that have been developed. Techniques for encoding, storing, modelling and querying can be combined in many variations to meet specific objectives.
“It comes down to the sequence in which you do things. Do you arrive at your schema from the data that you have or do you impose a model that you make the data comply with? Relational database technology has tended to be the latter, you build your model and then you go through the journey of discovering if that model is a good one or a bad one. Often what happens is that you end up shoehorning stuff.
“The NoSQL model is the other way around. It is much more permissive in terms of the data you can have. You still need to do work somewhere along the line if you want to query across that data, to work out that this concept and this concept are the same thing.”
The relative newness of the technology means lots of variation and competing products. It also means that information professionals looking at NoSQL have a lot of decisions to make.
“No SQL technology has been terribly fashionable and it’s looked like a fad”, John says. “But we’re not using it for the sake of saying ‘we’re using a NoSQL technology’ – we’re using it because we have a concrete business reason. We’re making different technology choices in different domains against the needs that we have. Those choices are informed by the nature of the content that we’re dealing with.”
TNA currently uses three types of NoSQL. One is Native XML for its
Legislation.co.uk archive, another is RDF Triple Store for metadata in its Digital Records Infrastructure while Discovery uses a product called MongoDB.
The three considerations for making choices were scalability, flexibility and value for money: “The value for money argument was particularly important with Discovery given the licensing fees you pay for some commercial products. There was a lot of appeal in something that was open source in terms of minimising our lock-in and allowing us to deliver really good value for money.”
Crown Copyright, Courtesy of The National Archives
Discovery deals with tens of millions of descriptions, so scalability is an important concern, but also flexibility.
John said: “Discovery also has descriptions not just of records that we have at the national archives but records held elsewhere by other archives and we provide search of both our collections and search of archival collections through the UK and this is part and parcel of our role as the archives sector lead. Now the minute you start to provide search across multiple collections you need flexibility because you are not the originator of the data. You need a flexible infrastructure that’s going to maximise your chances of being able to cope with the data however it comes at you.”
Who needs it?
Some organisations will feel that these changes are a long way off. And while John accepts that information handled by some organisations may not have an implicit need for the technology, he believes the culture that comes with NoSQL will become harder to avoid.
“A whole bunch of those things make sense for you to do regardless. Shifting to the cloud, bringing developers and operational people more closely together; working in an agile way, shortening your development cycles, designing for users needs. These make sense whether your data is shaped for a relational database or shaped to go in a NoSQL data base. It just makes sense.
"The expectations for libraries and archives are high and keep rising in terms of the online services that we make available. People that use our services experience a whole bunch of other services on the web and they have the same expectations for us as when they use the best online products. We have to respond to those high and ever rising expectations and it’s because of that that we need to pick up the pace and be really user focused. To do that we need technologies that are going to get us there.”