By: Jason Evans Groth
As the NCSU Libraries Born Digital Strategic Initiative has grown over the last year and a half, we have been fortunate to interact with many talented librarians and archivists who are also building programs at their own institutions. While conferences like SAA 2014 in Washington, D.C. and, more recently, NEA/MARAC 2015 in Boston, have provided a context for us to share our work in person with others, we have also made the effort to reach out to individuals both in the Triangle Research Library Network and, more widely, through email and phone calls to those whose projects and work have surfaced beyond their respective institutions. It is safe to say that all of these interactions have, at some point or another, approached the topic that is on the minds of all of us working to make born digital collections discoverable, accessible, and responsibly preserved: “Am I doing this right?”
We have decided on (at least) two answers at NCSU Libraries. The first is “If you’re doing anything then, yes, you are doing it right.” And the second is the all-powerful “it depends,” which is quickly followed by “but if you are doing anything then, yes, you are doing it right.” Of course, “right” is a loaded word. As discussed in a previous post, flexibility is an important consideration when building a born digital program, since so many things can change in the processing of different digital objects. For NCSU “right” means the following: We established our core requirements for general processing based on our needs for access, which we mapped out before we knew how we could process anything, and we built in enough flexibility to the workflow that, when changes (inevitably) rear their expected heads, we have room inside of our workflow to accomodate.
NC State Students in the 1980s, potentially creating data that we need to store and make accessible now.
Why all of this doubt, though? Archivists are already well-equipped to handle the daunting task of establishing physical order, appropriate room conditions, and an organizational system to provide the fulfillment of the promise to keep things safe and, hopefully, accessible for as long as possible (forever, for lack of a better word). What makes digital so different? It could be that digital computing devices and data, now almost one hundred years old, are still relatively young in the context of archives. It could be that we have faced challenges with storage and retrieval of digital objects in other professional domains, and we know the challenges associated with digital preservation and with maintenance of disks in general. It could be that we are a humble profession, and despite being information experts – largely through computing interfaces – we have decided that we are not “techies” and may not be able to approach this challenge properly. It could be that we’re afraid of the speed with which digital assets can be shared, which is far different than our traditional patron-in-the-reading-room model. It’s possible that all of these things contribute to the doubt, but, just like there is not one single tool that will solve any institution’s born digital challenges, none of these are the only reason we’re doubtful about born digital. These concerns do feed one of the most prevalent problems, which is the penchant we have for worrying about worst-case and, often, edge-case scenarios when it comes to digital collections.
There is no such thing as a perfect born digital curation and preservation program, and setting out to eliminate all problems, especially those we hear about in worst-case or edge-case scenarios, is a losing game. The majority of these cases likely do not now – and never will – apply to our institutions. For example, we currently have no workflow in the SCRC to handle 8″ floppy disks or data cassettes, but we know that other institutions do. Rather than worry we’re not doing born digital right because we can’t account for this legacy data carrier absence in our program, we have, instead, surveyed our collections and found very few of these items. We have decided that other formats, for which we have the capability to process, are higher priority. But rather than give up hope, we have built in some flexibility to discuss these formats in the future should we get to a point where they are in demand. In other words, we have devoted our resources to media we know we’ll see more of, while constantly scanning for solutions to cases that are decidedly more on the edge. This decision is practical and also empowering because it has set us in motion to focus on what we know we can do well rather than worrying about what we can’t do at all. But it also leaves room for us to consult our colleagues who either have these capabilities or have experience with appropriate vendors and make informed choices when and if the time comes to take care of that data.
A photo of now obsolete media from the NCSU student newspaper, The Technician, November 2, 1983
We also know that we will face lots of data that can’t properly be processed by applications we have at our hands right now. While we’re not placing bets about robust virtualization environments being available to us anytime soon, we can’t let this keep us from at least migrating the data from legacy media that we can handle to monitored hard disk environments that afford preservation practices. In other words, freeing the currently unreadable data from their media jails gives us a chance to see it later; not doing anything guarantees that our chances will grow slimmer by the month to ever even approach it again. On top of all of this, since most digital curation and preservation programs like ours are so young, we can’t decisively say what it is our patrons or researchers actually want, so keeping it all in a responsible way and paying attention to patron activity will help us keep our program one that works for patrons rather than one that works for the ideas we have about them.
This may sound a little reductive, but the essential component of a born digital program is the safe transfer of data from one place to another that does not harm the data and that allows us to monitor it safely for the duration while providing access to it, too. Sound familiar? It’s just like what we do with papers, books, and physical objects in our archives, with one key difference: It can happen very quickly and can be both deceptively simple and complicated. That is, a hard drive will fail, so just because it feels easy to see and access data right away when the hard drive is fresh, it doesn’t mean we can take our eyes off of it and assume it, like a book stored properly, will last several lifetimes. And, on the other hand, a lot of people create a lot of complication around the basic component of born digital, sometimes just because they can. Making sure that what we do with the data when we free it from its original carrier and add it to our repository matches the goals of access and care we have established from the beginning keeps us from experiencing “digital creep” (making something simple in the digital world very complicated because of the affordance of tools at our disposal) and helps us to move our processing forward so we actually can get to our backlog and keep up with what’s coming in.
In general, what rises to the top regarding news of born digital are crises that result in data breaches, huge technical failures, unreadable media, forensics tools that do every possible thing to a bitstream that one can currently think of, and on, and on, and on. What isn’t generally discussed are smart archivists making plans to accomplish the goals their archives have established for proper access and preservation of their digital holdings. These archivists do not let the idea of technology or tragedy get in their way. They realize the skills they need to deal with this technology are truly basic, since so many other smart people who develop software and hardware have made it easy for them. They realize, too, that they already know how to accomplish the majority of this work by using the skills they have honed with traditional collections. Their organizational and planning skills, along with some updated vocabulary and either a write-blocker or a write-blocking script for their USB ports, are the firm foundation for a solid born digital program.