Sunday, August 28, 2011

Practical Approaches to Born-Digital Records: Archivematica


Archivists mingle around a full-sized skeleton cast of Sue, the largest, most complete, and best preserved Tyrannosaurus rex fossil ever discovered, during a Society of American Archivists reception at the Field Museum, Chicago, Illinois, 26 August 2011. Sue is 42 feet (12.8 m) long and 12 feet (3.66 m) high at the hip.

I’ve always feared getting sick at the annual meeting of the Society of American Archivists, and yesterday it happened. I stayed in bed and missed all of yesterday’s sessions and a section meeting. I somehow dragged myself to the last few minutes of evening reception at the Field Museum, but I felt quite like Sue, the magnificent T. rex who presided over the festivities: an empty-headed and mildly scary-looking dead thing.

I was still a bit shaky today, and I managed to miss all of this morning’s first session and part of Session 610, Practical Approaches to Born-Digital Records: What’s Coming Next, which focused on Archivematica. Archivematica is a digital preservation platform that brings together a wide array of open source anti-virus, metadata extraction, file conversion, and other tools and supports automated processing of archival electronic records. We’ve just started experimenting with Archivematica, and I really wanted to hear about other archivists’ experiences with it.

I didn’t get to hear Peter Van Garderen of Artefactual Systems discuss Archivematica’s development or plans for future enhancements and came in as Glenn Dingwall (City of Vancouver Archives) was wrapping up his presentation.

In lieu of recapping the presentations of Paul Jordan (International Monetary Fund) and Angela Jordan (University of Illinois Urbana-Champaign) or summarizing the question-and-answer component of this session, I’m simply going to highlight the most interesting points that arose during its second half. I think that Archivematica holds great promise, and many of the presenters and audience members were of the same opinion, so don’t let this post deter you from investigating it yourself. However, you should keep in mind that Archivematica:
  • Is not a complete digital preservation system. It creates Archival Information Packages (AIPs) that can be preserved over the long term, but it doesn’t provide for storage of these AIPs.
  • Is designed with scalability in mind. It can be run on a desktop in a small repository or on a very large server array. From a technical point of view, the chief bottlenecks limiting large-scale implementations are processing speed and capacity and limits on the time of staff needed to obtain intellectual control over the materials.
  • Will be of particular interest to small repositories; however, not all of them will be able to meet the platform’s hardware requirements or acquire the requisite technical knowledge.
  • Requires some degree of technical know-how and quite a bit of willingness to get one’s hands dirty. Archivematica requires a real or virtual Linux environment. Most archivists aren’t familiar with Linux and must be willing to learn. Moreover, the installation process isn’t as straightforward as it could be. Fortunately, Michael Bennett has written really useful installation instructions and Angela Jordan has posted about her experience; FWIW, I’ve also posted about our own installation experience.
  • May require customization. For example, the International Monetary Fund will have to do figure out how to keep classified documents that should be included in AIPs out of the Dissemination Information Packages that Archivematica creates.
  • Requires some additional development. (Given that it has yet to reach the beta stage of development, this need isn't surprising.) Session participants articulated several desired improvements that would give archivists the ability to specify which preservation/normalization formats will be employed, enable them to reinsert or otherwise deal with files or folders that Archivematica rejects, and shed light upon why the ingest process sometimes stalls. Participants also wanted to see Archivematica support creation of Submission Information Packages, improve processing of e-mail, and integrate records management.

1 comment:

Chris Prom said...

Improved processing of email is a worthy goal, but unrealistic given the fact that there is no open source software that can be integrated to do it. The only two decent tools I have found to do this are Aid4Mail and EmailChemy. Both are fine tools, but they are paid.

Migrating email is incredibly complex, and I think the only realistic option for something like Archivematica is to expect people to submit one of the mbox flavors.

Another thing to note about Archivematica is that it is under the GPL license. There are very good reasons why Artefactual chose that license, but it is somewhat constraining in terms of the other open source libraries and tools than can be included in the distribution package.

To quote from the GPL FAQ (http://www.gnu.org/licenses/gpl-faq.html#GPLAndPlugins): "If the program dynamically links plug-ins, and they make function calls to each other and share data structures, we believe they form a single program, which must be treated as an extension of both the main program and the plug-ins. This means the plug-ins must be released under the GPL or a GPL-compatible free software license, and that the terms of the GPL must be followed when those plug-ins are distributed.