5 tips to run a sustainable digital preservation project

4 graphics representing community, existing work, longevity and custodians

Preserving digital data for the long term is a complex business that requires sustained activity. It needs to be embedded in an organisation with a dedicated budget and applied as a business as usual activity.

Unfortunately we're a long way from this ideal, and a lot of digital preservation work is dependent on external grant funding. Transient projects can all too easily produce transient outputs, so how can we make sure the results are more effective and more sustainable? 

This blog post expands on some of the lessons I've learned about impact and sustainability that I presented in a recent conference poster. They are based on digital preservation work but certainly have applicability way beyond the preservation field.

Why is sustainability difficult to get right?

Project funders understandably want new and innovative work, and they want to see results branded with the project name, stamped with the funder's logo and distributed via the project's website. These are not unreasonable requirements, but they can encourage a blinkered mind-set that isolates projects from what is already going on in the field, and makes it difficult for results to have a life beyond the project.

For project implementers, the essential elements for real sustainability can often be tacked onto the end of a project rather than designed as an integral part of it.Tasks such as engaging with user communities, establishing connections with existing work or disseminating results will clearly struggle in isolation from the core functions of a project.

The combination of short term funding and poor sustainability can often impact on the value that the end users get from project outputs.

How to run a sustainable digital preservation project

To have a reasonable chance atsustainability, projects need to engage more effectively with the world around them and find viable homes (and users!) for their valuable outputs long before that little bar labelled "Dissemination" is reached at the end of the project plan Gantt chart. 

We've got to the point where most project websites are now web archived before disappearing, but the outputs published on them then become cast in stone. Effectively they are dead. They may as well be PDF files, and indeed they often are. 

The classic example of this is where a project provides interactive functionality on their website, such as a forum or wiki. Building sufficient critical mass to get the discussions going on a new forum is not easy, but the effort that goes into that work will ultimately be wasted when the forum dies at project end. Could in fact the project have utilised an existing forum elsewhere? 

In doing so, could the discussions live on in a location that other people will find, build on and refer to? Of course, an existing forum will already have an engaged community making the initial start-up phase much simpler. I'll come back to project websites at the end of this post.

1.Engage with the community

I've seen many project based developments, be it software tools, services or other support materials, which have seen very little use by the community. And herein lies perhaps the most obvious lesson to learn from work that has not panned out quite as expected: make sure you solve problems that real users actually have! 

Engaging with those users, identifying practitioners that can own and describe the problem, and working with real data, are all essential components in putting the real users in the driving seat. If you publish the end result of your work, and this is the first thing the community has heard about it, something has gone dramatically wrong. 

Share your approach as early as possible. Discuss it with the community. Give the community the opportunity to point you towards existing solutions that are already out there, or tell you about a similar project that already tried and failed with your strategy. Some good early community engagement can steer you away from many of the common pitfalls.

2. Build on existing work

A further note on requirements. We all have our own unique requirements don't we? Libraries, museums, archives, galleries, whoever. Even from library to library or archive to archive, we always have to ensure our own needs are met. 

Well there are of course some differences, and our constant lack of shared terminology only amplifies these, but I don't think our needs are as different as we often make out. Countless times I've seen new or competing developments appearing because requirements exercises proclaimed specialist solutions were required. 

With the benefit of hindsight those competing solutions look pretty similar several years down the line. I think we're getting better at collaborating, and there are some excellent examples in shared archive/storage initiatives that have emerged over the last few years, particularly in the US. But there's still too much of a temptation to do our own thing.

The default solution to a problem should not be a new software development project. Building on existing work and adding support for the small number of genuinely unique requirements that we have is often the way to go. Perhaps we can get better at publicly sharing ourrequirements documentation, never mind the end result? 

3. Designing for longevity

Technology choices are of course important, although it’s not always easy to predict whether a new technology will still be in popular use within even the lifetime of a 3 or 4 year project. Keeping things as simple as possible, and exploiting existing and proven technology can pay dividends. 

Over the last decade, a number of initiatives have appeared to develop registries of information that would support digital preservation activity. Typically they focus on information about dependent technologies: file formats, software tools and so on. Many of these initiatives lie unused and with little data in them to make them of value to today’s digital conservators. Most involved spending significant funds on software development to create systems to manage this digital preservation information. 

Two more recent such initiatives took a completely different approach andfocused their efforts on community and on the data. Not the technology. Both the COPTR registry (which helps practitioners find software tools to solve digital preservation challenges) and the Just Solve registry (which details file formats to help practitioners understand how to preserve their data) are built inMediawiki. 

The result is easy to use, dependable, easy to interface with and extract data from and of course it was very cheap to establish. Once you have a decent amount of real data, it’s a lot easier to understand how it can be used, and so modelling and manipulating the data and then developing fancy new services that leverage that data,becomesmuch easier.

4. Ally with a custodian

Collaborating with a custodian from the very beginning of the project will simplify many aspects of sustainability in one go. The custodian will buy into the results and care about questions of sustainability as they are created. Have they been documented properly? Have theybeen based on manageable technologies? Have they been designed with extensibility in mind? The custodian may also be able to host results in a long lived location, surrounded by other active content.

Contributing project results to existing and successful locations elsewhere on the web does not need to be in competition with the requirements of project funders. Branding and funding acknowledgements can be carefully placed around project products, so those products still make sense when the project is no longer active. The project website can of course link to the new product, wherever it is on the web, and still sell it strongly. 

The last significant grant funded project I worked on, the Jisc funded Spruce Project, developed a number of new resources for supporting digital preservation practitioners. Homes were found for a number of these resources before they were even created. One example is the Digital Preservation Business Case Toolkit, created mostly in a 3 day "book sprint". It's maintained on the Digital Preservation Coalition's wiki, not the project website, and has seen a number of updates and additions since its creation, including from other grant funded projects.

5. Maximize your impact at project wrap up

Good project sustainability should begin before the project itself even starts. But that formal wrap up and dissemination phase at the end of the project is still important. How do you get it right? 

Again, it's not rocket surgery, and this is where a good project website can play an immense role in helping others find, use and build on your project results. A bad website will do the opposite. Most are bad. Most swamp the best outputs in a sea of information that is unimportant once the project has ended. They use terminology tied to the inner workings of a project and can be unintelligible to those on the outside. And they talk about the project in the future tense, despite the fact it was completed a decade ago. Or was it? So many project websites fail to even tell you when the project started and when it’s due to finish.

The most important part of a project website is the home page. By the time you get to the project end, the aims of that home page will have significantly changed. So make a new one! Move the old home page to a different URL and create a new front page with this information:

  • Summarise what the project was about in no more than 4 sentences. Make it a practical description of what you did, who it's for and what they can get out of it. Do not copy and paste what the project proposal said the project would do as by now it will most likely be wrong!
  • Clearly state when the project began and ended.
  • Clearly state licensing conditions for all project outputs to make it easy for others to reuse and build on your work in the future. Creative Commons licences do a great job here of course.
  • Most projects have a final report, evaluation or write up of some kind. Link to it!
  • List the most useful project outputs. Include a short description of what each output is, who it will be useful for and what it can do for them. Include a URL for each output.
  • So the project led to a sequel, or perhaps spawned a Centre of Excellence?Great! This is where some of your project work will live on. Make sure you link to this follow on activity, which will no doubt have its own website!
  • Provide at least two ways of contacting project representatives and try to avoid email addresses that will close when the project ends
  • Link to the old home page, and make sure you have redirects in place if the changes result in new URLs for anything on the site.

We followed this approach with the SPRUCE Project website. It’s not visually stunning, as it’s hosted on a wiki, but you can find all the best stuff the project did very easily. The difference between the Old Home Page and the New Home Page is quite clear. As I said, this is really simple stuff but there are so many projects that fail to get this right. Perhaps the funders should be a little stricter on their project wrap up requirements?

The most important lesson for me in delivering project work has been to put thoughts of sustainability at the very heart of the project, and if you set out with that ambition it’s not a great challenge to make it happen in practice.

Have you been involved in digital preservation projects? Share your tips in the comments below


Image source: Image credits: "Design" by Scott Lewis, "Building Restoration" by Sean Connolly, "Institution" designed by James Fenton, and "Community" by Edward Boatman. All are from TheNounProject.com


Read our blog comment guidelines