Information Services

Digital Library Programme Automates Digital Preservation

The Information Services Group recently completed a key project to scale and automate digital preservation of valuable digitised collections.

The Information Services Group’s Library and Project Services teams recently worked together within the Digital Library programme to progress a programme of work focused on digital transformation. One such project focused on the need to scale the digital preservation of digitised collections.

Digital preservation is the series of managed activities necessary to ensure continued access to digital materials for as long as necessary. The University of Edinburgh is home to a wide collection of archives, rare and unique books and artifacts, museum items, artwork, and more. Increasingly, many of these types of collections are created in digital formats – from University records to oral history recordings to 3D models. The formats used to digitally store audio, images, documents, and more complex digital resources change over time as does the compatible software and hardware.

Digital preservation takes steps to mitigate the risks posed by these changes. The volume of digital content produced and acquired by the Library is enormous and digital preservation of all this valuable stuff is not possible on a manual scale. This project achieved a significant step towards automating parts of a complex workflow, 

Sara Day ThomsonDigital Archivist for the Centre for Research Collections

One of the greatest risks to digital content is loss of institutional memory including how content was created, where it comes from and why it is important. This must be documented to mitigate losing access to the Library’s valuable digital collections.

The automation project aimed to link together two key Library systems in order to make the digital preservation process more streamlined and robust.

The first system is Goobi Workflow, an open-source software application for digitization used by the Library’s Digital Imaging Unit to manage workflows when digitizing items. Goobi automates manual repetitive tasks to ensure reliability and consistency across digital assets.

The other system, Archivematica, is an open-source software that runs a series of microservices, such as file format validation and fixity checking, to create information packages for long-term preservation. The system analyses content and extracts rich technical metadata that makes it easier for future users to understand and re-use the content.

Old books on shelf

This project focuses on automating the link between these two applications. More specifically, files and metadata exported from Goobi during the digitization process are automatically pulled into Archivematica to kick off digitital preservation. Archivematica generates archival packages based on preset requirements which then flow automatically into deep archival storage for safe keeping. This automation from Archivematica to deep storage was developed by University of Edinburgh developer Hrafn Malmquist in 2018 and has now been integrated into the open-source code repository for other institutions to use.

“The staff required to make this project happen come from all different backgrounds. As a result, it occasionally required a few clarifications (of terminology, of process) to get on the same page, but the diversity of skills and perspectives was a real strength,” said Sara.

This work builds on the previous IS project DLIB004 with the launch of the Goobi Workflow and is being conducted concurrently with project DLIB012 that is focusing on upgrading Archivematica to the latest version of the software and implementing some additional features.

“The technical functionality is in place and a template for creating an automated link from one Library system to digital preservation has been established. Both of these really impressive outcomes will support the on-going planning and processing required to implement an end-to-end workflow from the creation of digital images to long-term digital preservation,” said Sara. “As always, there are important steps that can’t be automated – such as developing policies and providing context and metadata – that remain a challenge due to staff capacity and the volume of material."

These projects will collectively help the library to deliver a robust and comprehensive digital preservation strategy, and the automation of these tasks will enhance the consistency, replicability and accuracy of these processes whilst drastically reducing the amount of manual work required.