By: Jason Evans Groth
One of the most significant benefits of working in the digital domain is the power to search quickly and accurately. Open a physical copy of To Kill a Mockingbird and then open a digital copy on a machine with a search engine. Now, imagine how long it would take to count the amount of times the word Scout appears in the text using your physical copy, and compare that to a quick ctrl- or cmd-f, typing the word “Scout” in the search box, and watching the search engine parse the results. Even if a number is not presented, pressing return and counting would take several hours less time than going page by page and marking up your book, counting by hand. This is not a value judgment regarding physical versus digital, but a point of fact – quantitative and focused research can be done significantly faster in the digital domain. Now, imagine applying that power to research in an archive, searching for “rhino” across, say, the Mitchell Bush Papers and immediately retrieving accurate and usable results. In addition to saving an enormous amount of time for the researcher who may already be in the reading room, remote users could analyze results before ever setting foot in the library, and would have a better of idea of exactly what to look for when it came time for the meat of their work.
NCSU Libraries’ born digital strategic initiative was established in 2013 to attempt to make this promise a reality. At this point, a year and a half after starting the initiative in earnest, we feel confident that our exploration of tools and our ideas about arranging and describing materials will lead us, sooner rather than later, to making digital collections as easy to use as the opening paragraph of this post dreams. But as we step to the brink of making literally millions of files easy to find and potentially as easy to access, the specific challenges of an ambitious born digital program really come to light. One of those challenges is making those files easily and widely discoverable.
Murray Downs, Burton Beers, Jim Rasor, and Jimmy Williams review photographs in the NCSU University Archives.
With the advent of inexpensive digital storage has come an explosion of stored (and often unmanaged) data. An 80gb hard drive used for testing born digital workflows in the SCRC – which only had 20gb of actual information on it – contains 176,000 files. Internal hard drives in new computers are often at least 250gb or more, and 5tb external hard drives cost less than $150. When the inevitable happens and we receive a hard drive with millions of files, it will be impossible for us to examine each file individually. As reported in our “Let the Bits Describe Themselves” post, we use automated tools to generate data that our own tool idea, “Archivision,” can read and then display easily to the interested party as a virtual file explorer in a web browser. What we are providing is context, as the actual workflow will look something like this: We process the disk or disk drive in question, we run tools on the drive to create a preservation package (an “image” of the drive) which goes to storage, but, at the same time, create the files that can be read by Archivision, and we tell many already-in-place systems that we’re doing this so we can immediately make these things discoverable. Thus, in the case of the Mitchell Bush Papers referenced above, as soon as we have gone through the process of safely making an accurate copy of the data, our proposed workflow will take over and automatically make the existence of those files viewable by researchers by adding an easy to follow link directly to our finding aid.
An ad for Macintosh Computers in the NCSU Technician, Vol. 71 No. 41, December 4, 1989
The goal of an archive is to make as much of its material discoverable and usable as possible, while maintaining status as a trusted repository for its donors and managing the materials responsibly for the long-term. The digital domain, in one respect, brings this goal closer to reality through the affordances granted by technology. When the material comes in already digitized we have a better chance to make that material discoverable even more quickly.
To boil down the goal of the archive even further is to say that we are here to provide access. Knowing that these digital files exist and actually being able to use them for research are two different things. But we believe that using a tool like Archivision to increase visibility of digital holdings is the first step, and we have plans – referenced explicitly in our “Access and Born Digital Collections” post – to allow researchers to use an in-house laptop filled with indexed versions of our responsibly stored disk images, so they can put themselves into the shoes of the person or institution who previously used that content. Unlike many collections in the physical realm, we are given the opportunity, through born digital, to experience objects the way the donor left them (exactly, in some cases). And unlike physical collections we can easily make available the list of files in the context of the disk as they came in, getting us one step closer to automatically, and as quickly as possible, making needed material discoverable to scholars everywhere.