Wednesday, June 24, 2009

NDIIPP project partners meeting, day one

I’m in Washington, DC for the National Digital Information Infrastructure Preservation Program (NDIIPP) grant partners meeting. NDIIPP is a program of the Library of Congress (LC), and we learned today that it has just been awarded permanent status in the federal budget. As a result, the program should receive an annual appropriation and become a permanent part of the digital preservation landscape.

In response to this change, LC is thinking of creating a National Digital Stewardship Alliance (the name may change), which would allow current NDIIPP partners to continue working with LC and attract new partners. Organizations that are willing and able to direct resources to NDIIPP initiatives will have a voice in the operations of the alliance, and other interested institutions and individuals can become observers. I’ll be sure to post more information about this alliance as it becomes available.

Martha Anderson (LC) opened the meeting by furnishing a quick overview of NDIIPP’s progress to date and in the process of doing so highlighted a simple fact that reinforces the conclusions the PeDALS project partners and many other people are reaching: “metadata is your worldview.” In other words, no two organizations use metadata in precisely the same way, and while there may be broad agreement as to how standards should be used, there must always be room for local practice.

The keynote speaker, Clay Shirky, noted that in many respects this persistence of local practice is a very good thing: the existence of different preservation regimes, practices, and approaches reduces the risk of catastrophic error.

Shirky’s work focuses on the social and economic impact of Internet technologies, and his address and follow-up comments highlighted the social nature of digital preservation problems.

The Internet has enabled just about anyone who wishes to create media to do so. Instead of the top-down, one-to-one or one-to-many models that have characterized the production of information since the invention of the printing press, we are seeing the emergence of a many-to-many model. Traditional media have high upfront costs, but the cost of disseminating information via the Internet is negligible. Instead of asking “why publish?” we now ask “why not publish?”

The profusion of Internet media has helped to popularize the notion of “information overload,” but our problem is in fact “filter failure.” Information overload has existed since the invention of the printing press, but we generally didn’t notice it because bookstores, libraries and other institutions created systems for facilitating access to printed information. However, on the Internet, information is now like knowledge itself: loosely arranged and variably available.

Shirky asserted that in the “why not publish?” era, librarians, archivists, and others seeking to preserve digital resources no longer need to decide which social uses for a given piece of information should be privileged by cataloging. Instead, they should actively seek to incorporate user-supplied information into the descriptive metadata (i.e., the filters) that they maintain. For example, user-created Flickr tags indicate that a given Smithsonian image of a fish is of demonstrated interest to both an ichthyologist and to a crafter who placed an image of the fish on a purse. Prior to the rise of the Internet, cataloguers would give the scientist’s use of this image more weight than that of the craftsperson. However, as long as metadata is creating value for some group of people, why not allow it to be applied as broadly as possible? In other words, the question we must answer is no longer “why label it this way?” but “why not label it this way?”

The incorporation of user-supplied metadata challenges librarians and archivists who fear losing control over the ways in which information is presented to researchers. However, as Shirky pointed out, this loss of control has already happened: it’s a mistake to believe that we can control how our institutions and holdings will be discussed. All we can really do is decide how to participate in these discussions. President Obama’s 2008 campaign provides a good example of active participation: the campaign understood right away that providing a really clear vision for Obama would empower supporters to talk about Obama without the campaign’s assistance. It then made use of the best user-generated content. In order to do so, it had to accept that some people would make critical and even bigoted use of the material it made available.

Shirky also noted that digital preservation itself has to be social. The “Invisible College,” a sixteenth-century group of intellectuals who established a set of principles for investigating the natural world and sharing their research, is a good model: its expectation that the results of research would be available for review and development of further inquiry gave rise to modern science, and we are now starting to think of digital preservation as an endeavor requiring collaboration, sharing, and community-building.

The social dimension of preservation also extends to end users. One of the big mental shifts in the NDIIPP project has been from “light” (i.e., totally open) and “dark” (i.e., completely inaccessible) archives to “dim” archives. The more secret something is, the harder it is to preserve. In some cases, we have no choice but to bear the costs of preserving inaccessible materials. However, to the degree that we can turn up the dimmer switch, we should do so. If we allow someone to view a movie -- even in five-minute snippets -- s/he will at least be able to tell us if a snippet has gone bad. Even a little exposure lowers the costs of preservation, and lowering costs increases the possibility that something will be preserved. Moreover, if we develop simple, low-cost tools that enable end users to take an active role in preserving information that is important to them, we’ll get a clearer picture of what they find important and increase the chance that information of enduring value is preserved.

All in all, a really vivid, thought-provoking presentation; this summary doesn’t do it justice.

After a lengthy break, Katherine Skinner and Gail MacMillan of the MetaArchive Cooperative furnished a fascinating overview of the results of two digital preservation surveys, one of which focused on electronic theses and dissertations and the other on cultural heritage institutions of all kinds.

The surveys were meant to identify institutions that were collecting digital materials, types of materials being collected, how these materials are stored, barriers to preservation, and the most desired preservation offerings. Respondents self-selected to take these surveys.

As Skinner and MacMillan noted, the findings reveal some unsettling problems:
  • Most institutions are actively collecting digital materials, and survey respondents hold an average of 2 TB of data.
  • Most respondents hold many different types of file formats and genres of information
  • Storage protocols vary widely. Some respondents are using purpose-built preservation environments (e.g., iRODS), others are relying upon access systems to preserve materials, while others have home-grown systems. Some respondents simply store materials on creator-supplied portable media.
  • The manner in which materials are organized also varies widely, and in many instances organizational schemes (or the lack thereof) pose preservation challenges.
  • Respondents are actively engaging with the ideas, have a high level of knowledge about community-based approaches to digital preservation, and still feel responsible for preservation.
  • Preservation readiness is low -- most institutions aren’t even backing up files, and most also lack preservation plans and policies -- but desire is high. People want training, independent assessments of their capacity, and the ability to manage their own digital preservation solutions. People don’t want to outsource digital preservation; however, some outsourcing will be needed, particularly for smaller institutions.
  • Respondents themselves identified insufficient preservation resources as the biggest threat; inadequate policies and plans, deteriorating storage media, and technological obsolescence were also mentioned.
  • Interestingly, the preservation offerings that respondents most desired did not address the threats that they identified. Cultural heritage institutions wanted training provided by professional organizations, independent study/assessment, local courses in computer or digital technology, new staff with digital knowledge and experience, consultants, and training from vendors. Colleges and universities responsible for electronic theses and dissertations wanted cooperative preservation framework, standards, training on best practices, model policies, conversion or migration services, preservation services provided by third-party vendors, and access services.
Skinner and MacMillan concluded that the most effective preservation strategies incorporate replication of content, geographic distribution, secure locations for storage, and private networks of trusted partners. However, most respondents seem to have fallen prey to “cowpath syndrome:” they have idiosyncratic, ad-hoc data storage structures that grew out of pressing needs, but these structures are increasingly difficult to expand and maintain over time, and some sort of triage will eventually become necessary. Moreover, there is a disconnect between administrators and people who are actually responsible for hands-on preservation work: administrators want to keep things in-house and under control, but hands-on people see the value of collaboration and distributed storage.

I suspect that everyone at this meeting faces at least some of these challenges and shortcomings and that many of us are going to go home and discuss at least some of these findings with our colleagues and managers . . . .

No comments: