Last week, I attended the Digital Preservation 2014 meeting in Washington, DC. It is an amazing event that gathers researchers, practitioners, technologists, designers, artists, and many other professionals and curious individuals. The meeting opened with a keynote by Micah Altman, chair of the NDSA Coordinating Committee, who emphasized that there is a lot of "digital stuff" out there and that this stuff is different from non-digital. It can be accessed, replicated, and changed much faster and more easily. There are more forms and media, more collaborators, and more connections and embeddedness. The National Agenda for Digital Stewardship 2015, described in the keynote, provides a roadmap of opportunities in preserving digital objects as it prioritizes areas of digital content and defines roles, policies, practices, and the base for technical infrastructure and research. Many important aspects of digital preservation mentioned by Altman were subsequently presented and discussed during the meeting, including building communities and sharing knowledge, creating repositories and metadata, preserving video games, software, data and obsolete digital media, developing emulation environments, and many others.
Another opening talk by Matthew Kirschenbaum explored what it means to go beyond the functional side of software and think about it as a human artifact, something that is produced within and interweaved with the many tropes and contexts of human existence, such as industrial production and capitalism, craftsmanship and artisanry, automation, big data, and so on. Software has a past, a present, and a future. It can have long-term impact or become cruft—useless and dysfunctional.
As a former data curation postdoc, I was particularly interested in conversations around data preservation. The panel "Stewarding space data" talked about the astronomical challenges of data preservation. These include managing petabytes of data and large collections of audio tapes, addressing physical damage, and mitigating the risks of loosing something crucial for past and future scientific discoveries. The talk had many connections to my own presentation about data as research objects: the complexity and heterogeneity of data, the need to preserve context, the importance of provenance, and the forward-thinking approaches to incorporate possible scenarios of reuse.
Among the breakout sessions that I attended, the session on digital workflows and web archiving was of particular interest to me. In establishing digital workflows there are many small but very important decisions that need to be made. For example, content can be blocked to avoid its modification, but what if some content is infected with virus? What if there are many duplicates of the same files? What if the names of the files are non-standard and use forbidden characters? As there is no single answer to such questions, it's always helpful to share practical solutions and discuss optimal approaches. Web archiving is another challenging topic and even though several tools, such as Web Archiving Services or HTTrack , exist, it is necessary to evaluate how well they do what they are supposed to do. Kim Schroeder reported on research that evaluated several archiving tools and found that they sometimes fail to capture format, certain web elements, and or even the whole pages.
In her thoughtful and provocative presentation "Contending with the Network," George Oates, former designer/developer at Flickr and Internet Archive and now director of her own design company, Good, Form & Spectacle, talked about personal archiving (see also "Ms. Oates goes to Washington”). She encouraged the audience to be proactive and think about individuals and their digital traces rather than objects that can be neatly categorized and placed in boxes. The personal is intertwined with business and institutions in many ways. Each digital object has a story, and without such stories preserved objects may be useless in the future (unless later we come up with new stories, which sometimes happens in archiving).
Most of the presentations are already posted, and it's worth taking time to browse through them. See also:
Twitter hashtag #digpres14