Sep 06 2016

Adding the Technician to our Web Archives

The Special Collections Research Center at NCSU Libraries has archived and digitized the first 70 years of the print edition of the university’s student newspaper, the Technician. These scans, found on our Rare and Unique Digital Collections, are an important source of information, when it comes to documenting the history of the university. We now are expanding on this collection by adding the Technician website to our NC State University Websites web archive. We have started harvesting the website’s content daily, attempting to capture and preserve all of the stories that appear in the online version of the student newspaper.

The Technician has recently reduced the number of days a week it produces a printed version. With more content appearing only in the online version, it becomes especially important to archive the website, to ensure that all of this content is preserved in its original and complete online form.

While we only recently added the Technician website to our collection, the Internet Archive crawled and archived site content as early as February 04, 1998. As you can see from the image below, the capture was not complete; all of the images are missing! It does show that the Technician has had an online version dating back to at least 1998, which is impressive considering that was only two years after the New York Times first established their web presence.

Comparing the web archive capture to the scan of the printed version from that date allows us to see what stories appeared in both places. With the limited amount of content available in the online version from 1998 it is pretty easy to see that most of the stories appear in both places. Following each story link from the Internet Archive capture of the website allows you to explore each of the stories that were captured.

For comparison, here is the archived version of the Technician website from August 29th of this year. As you can see from the screenshot below, now that we can direct the crawl and preserve the website ourselves we have a much more complete capture. This allows us to see every article and image, as it appeared on the site on that day, to more accurately record the history of NC State as it appears in our student media.

Mar 21 2016

Web Archiving Update

It has been nearly five months since we started our web archiving initiative at the Special Collections Research Center and that time has gone by very quickly. We are happy to announce that we have made some really great progress in developing the collection and have already reached some very important milestones. Beginning in January we were able to identify, test, and actively begin monthly crawls of 25 major university websites that are the base of our primary web archive collection, the NC State University Websites Collection. These websites represent both administrative units as well as individual colleges within the university. The initial 25 sites only represent a small portion of what we hope this collection will one day contain. This collection will grow to include additional administrative and departmental sites over time.

We were able to work with the Internet Archive to merge all crawls that they had completed on 15 main university websites (prior to our partnership with them) into our collection. This has allowed us to add web content going back nearly two decades into our collection. While the earlier crawls completed by the Internet Archive are not as complete as those we are managing ourselves, it still provides a wealth of information on how the websites at NC State University have changed over time. For example, without merging this content the main NCSU website ( would have had only 3 captures in our collection, after merging old captures, it has grown to contain over 3,400 captures.

The NC State University Web Archive Collection now contains over 3,400 captures of the website.

A screenshot showing captures of the website going back 16 years.

In the next few months we will continue to focus on adding additional sites to our NC State University Websites collection, and are starting work on identifying websites to add to our other collections. We will also be launching a Web Archiving website that will serve as a centralized hub to provide access to our collections. This website will contain a full list of all of our web archive collections as well as general information about the project. It will provide a convenient place for people interested in the collections to learn more about their contents.

May 04 2015

Born Digital Buddies: Looking Outside of the Library for Answers

Since we began our born digital strategic initiative at NCSU Libraries we have been confronted with puzzlement about the project (why would anyone want anything on a floppy disk?) to fetishism (if it’s on a floppy disk it HAS to be worth looking at!) but, mostly, “I haven’t used a disk like that in xx years,” which implies “how could anyone even do anything with that disk?” When personal computing became affordable in the 1980s, a multitude of differently sized storage formats were available. Floppy disks of 5.25″, and especially 3.5″ (the drives for which were not uncommon to see in computers until the early 2000s) were especially ubiquitous. One could buy them at local computer stores or KMart. They were not only a handy portable data format but they were, for many, the only way to store data on a personal computer until hard drives became standard.

Unfortunately for the academy, libraries, and other repositories of knowledge, the demands of research and the responsibilities of keeping technology up to date has done to disk drives and other storage media that contains the work of our past what thousands of consumers did with their turntables at the dawn of the digital music era – the machines have been surplussed, donated, or simply left to rot, while the needs of current production are met. The disks themselves are put in a box on a shelf in an office, and the idea that they were once our only means of storage becomes a faint memory. IT departments, focusing on the demands of their clients, move on to what’s new, and what was in the past becomes unsupported. No more 3.5″ floppy drives are in computer labs. If you see a 5.25″ disk drive in the hallway of a library you might assume it’s being donated to a museum.

Ad from the Technician, Freshman Orientation Special Summer 1989

But outside of the halls of the academy is a flourishing trade of people who never let those particular bits die or who actively want them to be seen again. The Software Preservation Society, for example, is responsible for the Kryoflux, a common and robust tool that allows modern computers to control older disk drives and capture the information as a disk image that can then more easily be read in a modern environment. According to their website, the group “ dedicates itself to the preservation of software for the future, namely classic games.” The Kryoflux is well-known in digital forensics, despite it having been apparently developed to play old Amiga games. It’s ability to read low-level data helps in deciphering even the most difficult disks. Other devices, like Device Side Data’s FC5025, were created for the same reason. One of the earliest announcements on their website from January 27th, 2007, says “attendees to this electronics and ham radio swap meet were invited to bring disks and have image copies made.”

The device the NCSU Special Collections Research Center has employed for its processing of born digital items that come as 5.25″ and 3.5″ disks is called the Supercard Pro. Like the Kryoflux, the Supercard Pro was designed by a video game enthusiast – Jim Drew – to move the bits from his Commodore systems into the future.  Unlike those who assume this kind of technology is lost and gone forever, the Supercard Pro and Jim Drew are positive examples for those working on born digital programs that there are alternatives to online retailers and typical university vendors. [Update: We were using the HxC Floppy Emulator to transcode the SCP file to a raw disk image. However, in our experience, it did not have adequate support for Apple-formatted disks. The HxC converter software forum administrator says "The HxC software already supports Apple DOS sector exporting & importing. However the library DPLL parameters must be tweaked to analyze the Apple stream correctly." The Libraries is attempting to provide him with the appropriate streams to support this development. -Brian Dietz, 02-17-2016]

Dorothy Waugh of Emory University’s Manuscript, Archives, and Rare Book Library (MARBL) has spoken of reaching out to the retrocomputing community in Atlanta for answers to her questions about legacy equipment. This model, in concert with the knowledge that these devices and this expertise is out there, should be a ray of hope that, so long as we’re paying attention, the work can be done to find sustainable and efficient methods to deal with what many consider to be forgotten technology. While it may be more difficult to use Craigslist or eBay as a vendor in a university environment, a quick scan of both will net a multitude of hits for equipment – and potentially even human expertise behind the email addresses of sellers – that can bring this so-called dead material back to life. So the question is, then, how do we start tapping into these non-traditional marketplaces for the equipment we need.

Like the recent resurgence of vinyl records (which, contrary to popular belief, never stopped being created even when compact discs took over the market), legacy storage formats and devices have never truly left the market, either. Manufacturers still produce inexpensive 3.5″ USB floppy drives (which aren’t perfect), but based on the massive amount of drives and other computer equipment used heavily for thirty years, it’s not difficult to find better versions of what you need, it may just mean looking in alternative places. Floppy disk drives are not rare, they are just not on a Best Buy shelf anymore. As evidenced by the gamers that have propagated the use of legacy drives for the betterment of the digital forensics computing, there are plenty of people who want that equipment to tap into the data of years past. We aren’t going to, any time soon, revert to floppy drives as a practical storage solution, but knowing there are ubiquitous ways to take this legacy data – all 1s and 0s, just like today’s today’s data – and bring it forward into a hard drive environment means that sustainable born digital programs, with some practice, persistence, and a lot of flexibility, can be attained.

Mar 31 2015

Born Digital Doubt: Don’t Let the Bits Get You Down

As the NCSU Libraries Born Digital Strategic Initiative has grown over the last year and a half, we have been fortunate to interact with many talented librarians and archivists who are also building programs at their own institutions. While conferences like SAA 2014 in Washington, D.C. and, more recently, NEA/MARAC 2015 in Boston, have provided a context for us to share our work in person with others, we have also made the effort to reach out to individuals both in the Triangle Research Library Network and, more widely, through email and phone calls to those whose projects and work have surfaced beyond their respective institutions. It is safe to say that all of these interactions have, at some point or another, approached the topic that is on the minds of all of us working to make born digital collections discoverable, accessible, and responsibly preserved: “Am I doing this right?”

We have decided on (at least) two answers at NCSU Libraries. The first is “If you’re doing anything then, yes, you are doing it right.” And the second is the all-powerful “it depends,” which is quickly followed by “but if you are doing anything then, yes, you are doing it right.” Of course, “right” is a loaded word. As discussed in a previous post, flexibility is an important consideration when building a born digital program, since so many things can change in the processing of different digital objects. For NCSU “right” means the following: We established our core requirements for general processing based on our needs for access, which we  mapped out before we knew how we could process anything, and we built in enough flexibility to the workflow that, when changes (inevitably) rear their expected heads, we have room inside of our workflow to accomodate.

NC State Students in the 1980s, potentially creating data that we need to store and make accessible now.

Why all of this doubt, though? Archivists are already well-equipped to handle the daunting task of establishing physical order, appropriate room conditions, and an organizational system to provide the fulfillment of the promise to keep things safe and, hopefully, accessible for as long as possible (forever, for lack of a better word). What makes digital so different? It could be that digital computing devices and data, now almost one hundred years old, are still relatively young in the context of archives. It could be that we have faced challenges with storage and retrieval of digital objects in other professional domains, and we know the challenges associated with digital preservation and with maintenance of disks in general. It could be that we are a humble profession, and despite being information experts – largely through computing interfaces – we have decided that we are not “techies” and may not be able to approach this challenge properly. It could be that we’re afraid of the speed with which digital assets can be shared, which is far different than our traditional patron-in-the-reading-room model. It’s possible that all of these things contribute to the doubt, but, just like there is not one single tool that will solve any institution’s born digital challenges, none of these are the only reason we’re doubtful about born digital. These concerns do feed one of the most prevalent problems, which is the penchant we have for worrying about worst-case and, often, edge-case scenarios when it comes to digital collections.

There is no such thing as a perfect born digital curation and preservation program, and setting out to eliminate all problems, especially those we hear about in worst-case or edge-case scenarios, is a losing game. The majority of these cases likely do not now – and never will – apply to our institutions. For example, we currently have no workflow in the SCRC to handle 8″ floppy disks or data cassettes, but we know that other institutions do. Rather than worry we’re not doing born digital right because we can’t account for this legacy data carrier absence in our program, we have, instead, surveyed our collections and found very few of these items. We have decided that other formats, for which we have the capability to process, are higher priority. But rather than give up hope, we have built in some flexibility to discuss these formats in the future should we get to a point where they are in demand. In other words, we have devoted our resources to media we know we’ll see more of, while constantly scanning for solutions to cases that are decidedly more on the edge. This decision is practical and also empowering because it has set us in motion to focus on what we know we can do well rather than worrying about what we can’t do at all. But it also leaves room for us to consult our colleagues who either have these capabilities or have experience with appropriate vendors and make informed choices when and if the time comes to take care of that data.

A photo of now obsolete media from the NCSU student newspaper, The Technician, November 2, 1983

We also know that we will face lots of data that can’t properly be processed by applications we have at our hands right now. While we’re not placing bets about robust virtualization environments being available to us anytime soon, we can’t let this keep us from at least migrating the data from legacy media that we can handle to monitored hard disk environments that afford preservation practices. In other words, freeing the currently unreadable data from their media jails gives us a chance to see it later; not doing anything guarantees that our chances will grow slimmer by the month to ever even approach it again. On top of all of this, since most digital curation and preservation programs like ours are so young, we can’t decisively say what it is our patrons or researchers actually want, so keeping it all in a responsible way and paying attention to patron activity will help us keep our program one that works for patrons rather than one that works for the ideas we have about them.

This may sound a little reductive, but the essential component of a born digital program is the safe transfer of data from one place to another that does not harm the data and that allows us to monitor it safely for the duration while providing access to it, too. Sound familiar? It’s just like what we do with papers, books, and physical objects in our archives, with one key difference: It can happen very quickly and can be both deceptively simple and complicated. That is, a hard drive will fail, so just because it feels easy to see and access data right away when the hard drive is fresh, it doesn’t mean we can take our eyes off of it and assume it, like a book stored properly, will last several lifetimes. And, on the other hand, a lot of people create a lot of complication around the basic component of born digital, sometimes just because they can. Making sure that what we do with the data when we free it from its original carrier and add it to our repository matches the goals of access and care we have established from the beginning keeps us from experiencing “digital creep” (making something simple in the digital world very complicated because of the affordance of tools at our disposal) and helps us to move our processing forward so we actually can get to our backlog and keep up with what’s coming in.

In general, what rises to the top regarding news of born digital are crises that result in data breaches, huge technical failures, unreadable media, forensics tools that do every possible thing to a bitstream that one can currently think of, and on, and on, and on. What isn’t generally discussed are smart archivists making plans to accomplish the goals their archives have established for proper access and preservation of their digital holdings. These archivists do not let the idea of technology or tragedy get in their way. They realize the skills they need to deal with this technology are truly basic, since so many other smart people who develop software and hardware have made it easy for them. They realize, too, that they already know how to accomplish the majority of this work by using the skills they have honed with traditional collections. Their organizational and planning skills, along with some updated vocabulary and either a write-blocker or a write-blocking script for their USB ports, are the firm foundation for a solid born digital program.

Feb 09 2015

Discovering Born Digital Collections

One of the most significant benefits of working in the digital domain is the power to search quickly and accurately. Open a physical copy of To Kill a Mockingbird and then open a digital copy on a machine with a search engine. Now, imagine how long it would take to count the amount of times the word Scout appears in the text using your physical copy, and compare that to a quick ctrl- or cmd-f, typing the word “Scout” in the search box, and watching the search engine parse the results. Even if a number is not presented, pressing return and counting would take several hours less time than going page by page and marking up your book, counting by hand. This is not a value judgment regarding physical versus digital, but a point of fact – quantitative and focused research can be done significantly faster in the digital domain. Now, imagine applying that power to research in an archive, searching for “rhino” across, say, the Mitchell Bush Papers and immediately retrieving accurate and usable results. In addition to saving an enormous amount of time for the researcher who may already be in the reading room, remote users could analyze results before ever setting foot in the library, and would have a better of idea of exactly what to look for when it came time for the meat of their work.

NCSU Libraries’ born digital strategic initiative was established in 2013 to attempt to make this promise a reality. At this point, a year and a half after starting the initiative in earnest, we feel confident that our exploration of tools and our ideas about arranging and describing materials will lead us, sooner rather than later, to making digital collections as easy to use as the opening paragraph of this post dreams. But as we step to the brink of making literally millions of files easy to find and potentially as easy to access, the specific challenges of an ambitious born digital program really come to light. One of those challenges is making those files easily and widely discoverable.

Murray Downs, Burton Beers, Jim Rasor, and Jimmy Williams review photographs in the NCSU University Archives.

With the advent of inexpensive digital storage has come an explosion of stored (and often unmanaged) data. An 80gb hard drive used for testing born digital workflows in the SCRC – which only had 20gb of actual information on it – contains 176,000 files. Internal hard drives in new computers are often at least 250gb or more, and 5tb external hard drives cost less than $150. When the inevitable happens and we receive a hard drive with millions of files, it will be impossible for us to examine each file individually. As reported in our “Let the Bits Describe Themselves” post, we use automated tools to generate data that our own tool idea, “Archivision,” can read and then display easily to the interested party as a virtual file explorer in a web browser. What we are providing is context, as the actual workflow will look something like this: We process the disk or disk drive in question, we run tools on the drive to create a preservation package (an “image” of the drive) which goes to storage, but, at the same time, create the files that can be read by Archivision, and we tell many already-in-place systems that we’re doing this so we can immediately make these things discoverable. Thus, in the case of the Mitchell Bush Papers referenced above, as soon as we have gone through the process of safely making an accurate copy of the data, our proposed workflow will take over and automatically make the existence of those files viewable by researchers by adding an easy to follow link directly to our finding aid.

An ad for Macintosh Computers in the NCSU Technician, Vol. 71 No. 41, December 4, 1989

The goal of an archive is to make as much of its material discoverable and usable as possible, while maintaining status as a trusted repository for its donors and managing the materials responsibly for the long-term. The digital domain, in one respect, brings this goal closer to reality through the affordances granted by technology. When the material comes in already digitized we have a better chance to make that material discoverable even more quickly.

To boil down the goal of the archive even further is to say that we are here to provide access. Knowing that these digital files exist and actually being able to use them for research are two different things. But we believe that using a tool like Archivision to increase visibility of digital holdings is the first step, and we have plans – referenced explicitly in our “Access and Born Digital Collections” post – to allow researchers to use an in-house laptop filled with indexed versions of our responsibly stored disk images, so they can put themselves into the shoes of the person or institution who previously used that content. Unlike many collections in the physical realm, we are given the opportunity, through born digital, to experience objects the way the donor left them (exactly, in some cases). And unlike physical collections we can easily make available the list of files in the context of the disk as they came in, getting us one step closer to automatically, and as quickly as possible, making needed material discoverable to scholars everywhere.

Nov 17 2014

Two (Disk Reading) Heads are Better Than One: Sharing Born Digital Resources

There are numerous obstacles to overcome when instituting a born digital processgathering equipment, establishing basic institutional requirements for how processing is done, and deciding on tools are just some of the steps that need to be completed before a workflow is put into place. Thankfully, as the field grows, so does the amount of resources available to those just starting out. The Demystifying Born Digital Reports from the OCLC, the Digital POWRR Tool Grid, and the Digital Curation Google Group are just three helpful, and ever-growing, examples of this. But other resources are always out there, and sometimes they don’t need a URL – they may be your neighbors.

NC State University Libraries is part of the Triangle Research Library Network (TRLN). TRLN is a collaborative organization comprised of NCSU and other Triangle academic libraries – Duke University, the University of North Carolina at Chapel Hill, and North Carolina Central University – the “purpose of which is to marshal the financial, human, and information resources of their research libraries through cooperative efforts in order to create a rich and unparalleled knowledge environment that furthers the universities’ teaching, research, and service missions.” In other words, we have agreed to share our stuff and staff with one another.

Just a small sample of some of the legacy media in the NCSU SCRC born digital collection.

In early 2013, the TRLN Born Digital Task Group was formed. Archivists from Duke, UNC, and NCSU worked together on a report to explore the state of born digital programs at each institution. As expected, we all discovered that we shared similar questions about requirements, hardware, and software. Because our three institutions have different identified requirements for born digital materials, not all of our answers will be the same, but the opportunity to share experiences regarding hardware and software is an immediate benefit of working together. Some of the many action items identified by the report included sharing transparent documentation about our processes as well as sharing equipment when needed.

Having such cooperative neighbors has already paid off. Recently, Duke and NCSU Libraries got together to explore various floppy disk controllers, and to compare notes about how to evaluate hardware problems versus disk obsolescence issues. This kind of sharing brought our report to life – real outcomes, both in the form of digital files and a new understanding of tools, were achieved by sharing knowledge and tools.

In addition to sharing, the report focused on several other areas of collaboration that were of interest to the three schools, including:

  • More collaboration with the BitCurator team, some of whom are on the UNC-Chapel Hill campus, and who are an amazing resource for all three institutions since all three of us plan or are planning to use BitCurator for at least some of our processing workflow.
  • Working on enhanced communications strategies with IT administrators. Born digital is not just an IT problem – it is a universal problem with IT solutions. Maintaining strong relationships and transparency with those who provide us IT solutions is of the utmost concern when looking toward a long-term solution to born digital.
  • Creating a stronger relationship with the robust and experienced UNC Digital Forensics Lab, to better understand the tools of our trade and to have a place to do comparisons (the example of the floppy controllers above is a version of the kinds of tool comparisons we expect to see more of).
  • More discussion of potential emulation environments.
  • An exploration of BitTorrent as a potential avenue for born digital file transfer within and between institutions.

The TRLN report has led to an extension of the Born Digital Task Group, and judging by the results of our first equipment share, our perceived shared needs, and the ease with which we have already worked together, it’s bound to create a template for other neighborly schools to follow.

Oct 06 2014

Let the Bits Describe Themselves: Arrangement and Description of Born Digital Objects

Throughout our born digital strategic initiative here at NC State Libraries we have debated over the last year just how we will make digital items discoverable to our patrons. Archival discovery begins with the finding aid or collection guide. These guides provide the context of the collection for researchers, and also present the description of the content of the collection. So how does one represent, say, a 16gb USB flash drive as a usable list or, even more challenging, a 2TB external drive, inside one of those lists? And, inside that list, how do you arrange those files/folders/hidden files/trashed files/all of the other stuff that each of us manages on our own digital landscape in one way versus the way we manage our physical landscapes (real desktops, book shelves, and on and on)? The thing is, digital objects are, in some way, already arranged when they are donated, and they are arranged in a way that made sense to the person who donated them.

It is no secret that researchers are interested in the process that goes into creating the subject of their research, so the arrangement of files on a laptop, for example, also gives clues as to what the person who arranged them might have been thinking. We decided, therefore, to give our patrons the chance to experience the arrangement of files and folders in the way they were given to us. In other words, we would not rearrange them in any way since it is assumed that the person who did the arrangement had a reason and that this kind of archival practice – digital, that is – gives us the chance to actually retain original order. But, again, how do we show this to a patron? If it’s just a list then there is no context for the files, outside of knowing who created them or what collection they come from.

An NCSU reference librarian circa 1985 possibly demonstrating an older style of arrangement and description of digital objects.

Working with our Digital Library Initiatives department (DLI) we have developed a plan to not only give patrons easy access to this list, but to also allow them to ascertain context and description easily without us spending hours at the item level trying to decipher a donor’s file scheme. We call this idea “Archivision,” and it is really just a way to allow the bits of a digital object to describe itself by generating a visual browsing environment of the object.

But how? Well, in the course of our workflow we run several tools over digital objects. These tools extract metadata, and included in this metadata are paths to where the files exist inside the disk structure, as well as metadata about these files that tell us what they are, what they contain (at least technically), when they were created, by whom, etc., etc. If they happen to be text files or contain text we can run tools that tell us what words are in those files. If they are media files we can decipher video CODECs, sample rates, and more. By drawing information from these reports we can create a virtual disk browser that looks similar to a Mac finder window or a Windows explorer window, and by simply providing a link to this virtual disk browser inside of our finding aid (next to the description of the object itself, for example – like “USB Flash Drive”) the researcher can move through the digital object as they would if they had it loaded on their own CPU. An even simpler addition – a sortable spreadsheet that contains all of the file information from the disk – will be provided as a download, too, so the researcher will not have to rely on an internet connection to look through the digital objects we have in our collection. In this way the researcher is not relying on a description that we force upon them that may not lead them to what they need for their work, but rather can contextualize the information in the way that best suits their needs.

This saves time for us and for the researcher, and is an affordance that is specific to digital information. We could not allow a box of papers to “describe itself,” but by using the archival practice of original order, we can leave the disks the way we find them and, rather than looking over each file at the item level, use tools that allow the bits to tell their own story. In this way we hope to increase the amount of digital information we can get to our patrons, make it easier for them to sort through, and save time on our end by taking advantage of the benefits of digital environments while retaining original order and getting closer to a more genuine representation of archival objects.

Aug 25 2014

Getting Things Done with Born Digital at SAA 2014

The Society of American Archivists’ 2014 Annual Meeting just wrapped up in Washington, DC, and the NCSU Libraries Born Digital Strategic Initiative was represented through a panel, proposed by NCSU’s born digital team Brian Dietz and Jason Evans Groth, called “Getting Things Done with Born-Digital.” Brian and Jason were joined by colleagues Gloria Gonzalez (Digital Archivist, UCLA Special Collections), Ashley Howdeshell (Associate Archivist, University Archives and Special Collections, Loyola University, Chicago), Daniel Noonan (e-Records/Digital Resources Archivist, University Archives, The Ohio State University), and Lauren Sorensen (Digital Conversion Specialist, American Archive of Public Broadcasting, Library of Congress). Despite the wide diversity of institutions and background of the six participants, one thing was clear from each of their presentations: Now is the time to begin a comprehensive digital archives program that works in the context of one’s institution, and it can be done using widely available tools and an even more valuable asset – other librarians and archivists who have, themselves, started programs, encountered and overcome obstacles, and are ready to share their knowledge and experience with everyone else.

The premise of the panel, overall, was that reports like the OCLC’s Demystifying Born Digital and others are excellent foundations on which to begin a born digital program. The problem, however, is that every institution is, by nature, unique, with its own unique context and needs. The panel explored the details and case studies of the various institutions, hoping to connect more easily through these contextual clues rather than making a big problem seem bigger by speaking vaguely about tools and equipment that already pose barriers – both in terms of vocabulary and perceived difficulty – to those who are in the beginning stages of planning a born digital program.

Prior to the session, the online scheduling tool for SAA 2014 said that over 360 people would attend. While all of the panelists understand that this is important work, the number was still a surprise. At 9:59am, a minute before the session began, the panelists were told to ignore the sounds of the hotel facilities staff opening the airwall at the back of the room – it was Standing Room Only, and, at the session’s peak, an estimated 500 attendees listened to six very different practitioners discuss their successes, failures, and excitement regarding digital archives. The session itself generated much in-person discussion as well as hundreds of tweets.

The panelists touched on such topics as utilizing a committee that includes stakeholders and IT to maintain transparency with others in one’s institution while such a program is getting put into place; being unafraid to tackle technical needs by relying on the transparency of others and one’s own ability to search for help with processes with which librarians and archivists are already familiar but maybe have never used themselves (like the command line); accepting that flexibility in both tools and workflow is not only OK but also desirable, understanding that there is not one, single, “silver bullet” tool or service that can answer all of your questions or needs; that problems and challenges, which will arise without a doubt, are actually quite educational and necessary; and even the “Top 10 Things I Don’t Let Stop Me From Getting Things Done (With Digital Archives),” which included lack of practical experience and assuming equipment is, by nature, inadequate, in addition to the Litany Against Fear from Dune.

The audience asked questions like “what can we not do in order to process digital objects more quickly,” “how do we establish good relationships with IT,” and “what about metadata.” In all cases, the panelists assured them that these answers existed – perhaps not in one, single location, and definitely in the minds of those who had moved through them already – and could be discovered through both understanding the context of the institution and the real, required needs established by the institution. In other words, the answers amount to careful planning for the future based on the understanding of an institution’s priorities and requirements for both collecting and access. Librarians and archivists are familiar with such planning already: Collection policies, donor agreements, and gathering data to predict access usage are things we are taught from the beginning of our careers, and they are exactly the kinds of skills needed to figure out requirements for born digital collections. What do we collect? What can we make accessible? How will this be used? A call for shared documentation and more open questions and answers was made, and the audience was reminded that the National Digital Stewardship Alliance (NDSA) has recently implemented Digital Preservation Q&A a site which allows members of the digital preservation community to share their challenges and successes in order to facilitate both progress and community building.

In addition to the incredible attendance at this session, many – if not all – of the other digital focused sessions were at capacity or very close to it – a heartening sign that professionals are taking very seriously this seemingly overwhelming challenge. SAA 2014 made it clear that those of us who fight the good digital preservation fight are not only not alone but are in very good company.

Jul 28 2014

Bit by Bit: Flexibly and Collaboratively Making Sense of Born Digital Materials

Students and staff in the Department of Computer Science, College of Engineering, in the 1970s, potentially creating media for modern digital archivists to curate.

When the NCSU Libraries’ Born Digital initiative began back in August of 2013, helpful colleagues from institutions seasoned in such work mentioned over and over that, no matter how solidly planned out the workflow for digital collections might be, it is inevitable that an object or group of objects will present themselves as the kinds of roadblocks that keep institutions from instituting born digital programs in the first place. These roadblocks come in many forms: Disks that are unreadable by local equipment, giant hard drives that take forever to image, file systems that are not understood by the CPU, etc., etc. This is not a surprise – the Demystifying Born Digital Reports, created by OCLC, list multiple tools and pointers for the digital archivist to carefully consider while they are crafting their projects. However, the multitude of ideas presented in these reports may lead the digital archivist to believe that they need to pick one tool or suggestion over another and limit themselves to those decisions, especially since the word “flexible” never appears in the reports. At NCSU Libraries we have discovered that familiarizing ourselves with a range of softwares, documenting their strengths and weaknesses, and creating a flexible workflow that relies on many free tools rather than limiting ourselves to one set and one set only has helped us make sense of how to deal with our born digital materials proactively to get as close as we can to robust access of the materials.

Just down the road from NCSU Libraries, at the University of North Carolina-Chapel Hill, a group of people who believe the same thing are working hard to prepare a suite of tools that answers the needs of the digital archivist. The BitCurator project, “a joint effort led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH) to develop a system for collecting professionals that incorporates the functionality of many digital forensics tools,” recognizes that many of the existing options to begin born digital programs are “not very approachable to library/archives professionals in terms of interface and documentation.” At NCSU Libraries, documentation is imperative for both understandability and repeatability of the born digital curation process. The folks at BitCurator feel the same way, and are striving to provide a suite of tools, packaged easily as a virtual machine or a standalone system (whatever works better for a given institution), that not only comes as a singular piece with multiple tools but also comes with easy to follow documentation.

Recently, the team hosted a “BitCurator clinic” in Chapel Hill, an event which brought together digital archivists from NC State, UNC, and Duke University to explore BitCurator together, to talk amongst ourselves about our challenges with born digital materials, and most importantly, to share how we felt BitCurator was working for us and how it could improve. This kind of collaboration is a necessity to keep tools in scope for librarians and archivists to ensure their proper and effective usage. Flexibility was on everyone’s mind at the clinic, considering that groups brought everything from floppy disks to external hard drives from real collections to work on in front of the developers. And the developers were quick to remind us that BitCurator is built to be flexible, encompassing many disparate tools in both GUI and command line forms (you can read all about it on their wiki). Even with all of this built in flexibility, one may need to dip outside of the BitCurator environment depending on the roadblock they encounter with a particular collection – and that’s OK! Flexibility (which is absolutely necessary and even encouraged if it is all documented and leads to pre-determined requirements) and collaboration (particularly the willingness to ask questions of colleagues and to report problems with tools, for example) are two of the most important tenets of getting a digital curation program off the ground.

May 27 2014

Access and Born Digital Collections

No matter how detailed the setup is for processing born digital collections, no matter what suite of tools one might use, and no matter how much one might discuss with colleagues about the best way to package electronic files to get into local storage, after all is said and done, the purpose of a born digital curation program in an archive is to provide the best possible access to this carefully processed and stored material. Each floppy disk, CD-ROM, flash drive, or external hard drive may present its own unique challenge when it comes to moving the data to a more stable digital environment, but since digital data is all 1s and 0s, packaging it and migrating it can be a lot less complicated than making those files searchable and usable.

When we process born digital collections, we create a series of reports that give us a lot of clues about what might be contained on the object itself. These clues include file names and paths; file types and the applications used to create them; create, access, and modified dates; file sizes; and also wordlists, personal or private information, phone numbers, and more. While these reports will be kept with the package of information we submit to local storage, they can also be used to help provide context and inform “best guesses” about what these files might mean or contain without an archivist having to look at each one of them individually. By summarizing this data and linking to it from the collection’s finding aid, those otherwise unknown and difficult to find files have a better chance to be used by patrons and researchers.

The Reading Room at D.H. Hill Library

This data summary will be especially helpful when it comes to our plans to provide a dedicated MacBook specifically for perusing and using our born digital collections in our Special Collections Reading Room at D. H. Hill Library. A patron will be able to look through the summary before requesting the files or disk image they wish to see when they arrive at the Reading Room. The librarian responsible for providing this access will either copy the files to the laptop, or mount the disk image directly to it, and then using Spotlight (built into every Mac), index those files for easier searching. In other words, while we will ask the patron to follow a traditional archival workflow in terms of requesting materials and coming to the Reading Room to view material, we will use the affordances of a digital index to make searching easier and quicker, and, for example, with mounted disk images, even allow the patron to explore and use the files in an exact replica of the hard drive that the person whose materials they are researching set up themselves.

The landscape of access differs from collection to collection. Donor agreements, sensitive information, and file types dictate how easily a born digital collection can be both presented to and used by a researcher. Using the best possible tools during processing puts the archive in a position to offer the best context to get at accessing these items which have been, prior to processing, harder to get to simply because they have been stored on disconnected media. By adding them to an environment where data integrity can be verified and where it is becoming more and more possible to use even the most obsolete of file types gives the archive a better chance to offer them to patrons and researchers.