NC State University  | campus directory  |  libraries  |  mypack portal  |  campus map  |  search ncsu.edu

Posts tagged: digital forensics

May 04 2015

Born Digital Buddies: Looking Outside of the Library for Answers

Since we began our born digital strategic initiative at NCSU Libraries we have been confronted with puzzlement about the project (why would anyone want anything on a floppy disk?) to fetishism (if it’s on a floppy disk it HAS to be worth looking at!) but, mostly, “I haven’t used a disk like that in xx years,” which implies “how could anyone even do anything with that disk?” When personal computing became affordable in the 1980s, a multitude of differently sized storage formats were available. Floppy disks of 5.25″, and especially 3.5″ (the drives for which were not uncommon to see in computers until the early 2000s) were especially ubiquitous. One could buy them at local computer stores or KMart. They were not only a handy portable data format but they were, for many, the only way to store data on a personal computer until hard drives became standard.

Unfortunately for the academy, libraries, and other repositories of knowledge, the demands of research and the responsibilities of keeping technology up to date has done to disk drives and other storage media that contains the work of our past what thousands of consumers did with their turntables at the dawn of the digital music era – the machines have been surplussed, donated, or simply left to rot, while the needs of current production are met. The disks themselves are put in a box on a shelf in an office, and the idea that they were once our only means of storage becomes a faint memory. IT departments, focusing on the demands of their clients, move on to what’s new, and what was in the past becomes unsupported. No more 3.5″ floppy drives are in computer labs. If you see a 5.25″ disk drive in the hallway of a library you might assume it’s being donated to a museum.

Ad from the Technician, Freshman Orientation Special Summer 1989

But outside of the halls of the academy is a flourishing trade of people who never let those particular bits die or who actively want them to be seen again. The Software Preservation Society, for example, is responsible for the Kryoflux, a common and robust tool that allows modern computers to control older disk drives and capture the information as a disk image that can then more easily be read in a modern environment. According to their website, the group “ dedicates itself to the preservation of software for the future, namely classic games.” The Kryoflux is well-known in digital forensics, despite it having been apparently developed to play old Amiga games. It’s ability to read low-level data helps in deciphering even the most difficult disks. Other devices, like Device Side Data’s FC5025, were created for the same reason. One of the earliest announcements on their website from January 27th, 2007, says “attendees to this electronics and ham radio swap meet were invited to bring disks and have image copies made.”

The device the NCSU Special Collections Research Center has employed for its processing of born digital items that come as 5.25″ and 3.5″ disks is called the Supercard Pro. Like the Kryoflux, the Supercard Pro was designed by a video game enthusiast – Jim Drew – to move the bits from his Commodore systems into the future.  Unlike those who assume this kind of technology is lost and gone forever, the Supercard Pro and Jim Drew are positive examples for those working on born digital programs that there are alternatives to online retailers and typical university vendors. [Update: We were using the HxC Floppy Emulator to transcode the SCP file to a raw disk image. However, in our experience, it did not have adequate support for Apple-formatted disks. The HxC converter software forum administrator says "The HxC software already supports Apple DOS sector exporting & importing. However the library DPLL parameters must be tweaked to analyze the Apple stream correctly." The Libraries is attempting to provide him with the appropriate streams to support this development. -Brian Dietz, 02-17-2016]

Dorothy Waugh of Emory University’s Manuscript, Archives, and Rare Book Library (MARBL) has spoken of reaching out to the retrocomputing community in Atlanta for answers to her questions about legacy equipment. This model, in concert with the knowledge that these devices and this expertise is out there, should be a ray of hope that, so long as we’re paying attention, the work can be done to find sustainable and efficient methods to deal with what many consider to be forgotten technology. While it may be more difficult to use Craigslist or eBay as a vendor in a university environment, a quick scan of both will net a multitude of hits for equipment – and potentially even human expertise behind the email addresses of sellers – that can bring this so-called dead material back to life. So the question is, then, how do we start tapping into these non-traditional marketplaces for the equipment we need.

Like the recent resurgence of vinyl records (which, contrary to popular belief, never stopped being created even when compact discs took over the market), legacy storage formats and devices have never truly left the market, either. Manufacturers still produce inexpensive 3.5″ USB floppy drives (which aren’t perfect), but based on the massive amount of drives and other computer equipment used heavily for thirty years, it’s not difficult to find better versions of what you need, it may just mean looking in alternative places. Floppy disk drives are not rare, they are just not on a Best Buy shelf anymore. As evidenced by the gamers that have propagated the use of legacy drives for the betterment of the digital forensics computing, there are plenty of people who want that equipment to tap into the data of years past. We aren’t going to, any time soon, revert to floppy drives as a practical storage solution, but knowing there are ubiquitous ways to take this legacy data – all 1s and 0s, just like today’s today’s data – and bring it forward into a hard drive environment means that sustainable born digital programs, with some practice, persistence, and a lot of flexibility, can be attained.

Mar 31 2015

Born Digital Doubt: Don’t Let the Bits Get You Down

As the NCSU Libraries Born Digital Strategic Initiative has grown over the last year and a half, we have been fortunate to interact with many talented librarians and archivists who are also building programs at their own institutions. While conferences like SAA 2014 in Washington, D.C. and, more recently, NEA/MARAC 2015 in Boston, have provided a context for us to share our work in person with others, we have also made the effort to reach out to individuals both in the Triangle Research Library Network and, more widely, through email and phone calls to those whose projects and work have surfaced beyond their respective institutions. It is safe to say that all of these interactions have, at some point or another, approached the topic that is on the minds of all of us working to make born digital collections discoverable, accessible, and responsibly preserved: “Am I doing this right?”

We have decided on (at least) two answers at NCSU Libraries. The first is “If you’re doing anything then, yes, you are doing it right.” And the second is the all-powerful “it depends,” which is quickly followed by “but if you are doing anything then, yes, you are doing it right.” Of course, “right” is a loaded word. As discussed in a previous post, flexibility is an important consideration when building a born digital program, since so many things can change in the processing of different digital objects. For NCSU “right” means the following: We established our core requirements for general processing based on our needs for access, which we  mapped out before we knew how we could process anything, and we built in enough flexibility to the workflow that, when changes (inevitably) rear their expected heads, we have room inside of our workflow to accomodate.

NC State Students in the 1980s, potentially creating data that we need to store and make accessible now.

Why all of this doubt, though? Archivists are already well-equipped to handle the daunting task of establishing physical order, appropriate room conditions, and an organizational system to provide the fulfillment of the promise to keep things safe and, hopefully, accessible for as long as possible (forever, for lack of a better word). What makes digital so different? It could be that digital computing devices and data, now almost one hundred years old, are still relatively young in the context of archives. It could be that we have faced challenges with storage and retrieval of digital objects in other professional domains, and we know the challenges associated with digital preservation and with maintenance of disks in general. It could be that we are a humble profession, and despite being information experts – largely through computing interfaces – we have decided that we are not “techies” and may not be able to approach this challenge properly. It could be that we’re afraid of the speed with which digital assets can be shared, which is far different than our traditional patron-in-the-reading-room model. It’s possible that all of these things contribute to the doubt, but, just like there is not one single tool that will solve any institution’s born digital challenges, none of these are the only reason we’re doubtful about born digital. These concerns do feed one of the most prevalent problems, which is the penchant we have for worrying about worst-case and, often, edge-case scenarios when it comes to digital collections.

There is no such thing as a perfect born digital curation and preservation program, and setting out to eliminate all problems, especially those we hear about in worst-case or edge-case scenarios, is a losing game. The majority of these cases likely do not now – and never will – apply to our institutions. For example, we currently have no workflow in the SCRC to handle 8″ floppy disks or data cassettes, but we know that other institutions do. Rather than worry we’re not doing born digital right because we can’t account for this legacy data carrier absence in our program, we have, instead, surveyed our collections and found very few of these items. We have decided that other formats, for which we have the capability to process, are higher priority. But rather than give up hope, we have built in some flexibility to discuss these formats in the future should we get to a point where they are in demand. In other words, we have devoted our resources to media we know we’ll see more of, while constantly scanning for solutions to cases that are decidedly more on the edge. This decision is practical and also empowering because it has set us in motion to focus on what we know we can do well rather than worrying about what we can’t do at all. But it also leaves room for us to consult our colleagues who either have these capabilities or have experience with appropriate vendors and make informed choices when and if the time comes to take care of that data.

A photo of now obsolete media from the NCSU student newspaper, The Technician, November 2, 1983

We also know that we will face lots of data that can’t properly be processed by applications we have at our hands right now. While we’re not placing bets about robust virtualization environments being available to us anytime soon, we can’t let this keep us from at least migrating the data from legacy media that we can handle to monitored hard disk environments that afford preservation practices. In other words, freeing the currently unreadable data from their media jails gives us a chance to see it later; not doing anything guarantees that our chances will grow slimmer by the month to ever even approach it again. On top of all of this, since most digital curation and preservation programs like ours are so young, we can’t decisively say what it is our patrons or researchers actually want, so keeping it all in a responsible way and paying attention to patron activity will help us keep our program one that works for patrons rather than one that works for the ideas we have about them.

This may sound a little reductive, but the essential component of a born digital program is the safe transfer of data from one place to another that does not harm the data and that allows us to monitor it safely for the duration while providing access to it, too. Sound familiar? It’s just like what we do with papers, books, and physical objects in our archives, with one key difference: It can happen very quickly and can be both deceptively simple and complicated. That is, a hard drive will fail, so just because it feels easy to see and access data right away when the hard drive is fresh, it doesn’t mean we can take our eyes off of it and assume it, like a book stored properly, will last several lifetimes. And, on the other hand, a lot of people create a lot of complication around the basic component of born digital, sometimes just because they can. Making sure that what we do with the data when we free it from its original carrier and add it to our repository matches the goals of access and care we have established from the beginning keeps us from experiencing “digital creep” (making something simple in the digital world very complicated because of the affordance of tools at our disposal) and helps us to move our processing forward so we actually can get to our backlog and keep up with what’s coming in.

In general, what rises to the top regarding news of born digital are crises that result in data breaches, huge technical failures, unreadable media, forensics tools that do every possible thing to a bitstream that one can currently think of, and on, and on, and on. What isn’t generally discussed are smart archivists making plans to accomplish the goals their archives have established for proper access and preservation of their digital holdings. These archivists do not let the idea of technology or tragedy get in their way. They realize the skills they need to deal with this technology are truly basic, since so many other smart people who develop software and hardware have made it easy for them. They realize, too, that they already know how to accomplish the majority of this work by using the skills they have honed with traditional collections. Their organizational and planning skills, along with some updated vocabulary and either a write-blocker or a write-blocking script for their USB ports, are the firm foundation for a solid born digital program.

Feb 09 2015

Discovering Born Digital Collections

One of the most significant benefits of working in the digital domain is the power to search quickly and accurately. Open a physical copy of To Kill a Mockingbird and then open a digital copy on a machine with a search engine. Now, imagine how long it would take to count the amount of times the word Scout appears in the text using your physical copy, and compare that to a quick ctrl- or cmd-f, typing the word “Scout” in the search box, and watching the search engine parse the results. Even if a number is not presented, pressing return and counting would take several hours less time than going page by page and marking up your book, counting by hand. This is not a value judgment regarding physical versus digital, but a point of fact – quantitative and focused research can be done significantly faster in the digital domain. Now, imagine applying that power to research in an archive, searching for “rhino” across, say, the Mitchell Bush Papers and immediately retrieving accurate and usable results. In addition to saving an enormous amount of time for the researcher who may already be in the reading room, remote users could analyze results before ever setting foot in the library, and would have a better of idea of exactly what to look for when it came time for the meat of their work.

NCSU Libraries’ born digital strategic initiative was established in 2013 to attempt to make this promise a reality. At this point, a year and a half after starting the initiative in earnest, we feel confident that our exploration of tools and our ideas about arranging and describing materials will lead us, sooner rather than later, to making digital collections as easy to use as the opening paragraph of this post dreams. But as we step to the brink of making literally millions of files easy to find and potentially as easy to access, the specific challenges of an ambitious born digital program really come to light. One of those challenges is making those files easily and widely discoverable.

Murray Downs, Burton Beers, Jim Rasor, and Jimmy Williams review photographs in the NCSU University Archives.

With the advent of inexpensive digital storage has come an explosion of stored (and often unmanaged) data. An 80gb hard drive used for testing born digital workflows in the SCRC – which only had 20gb of actual information on it – contains 176,000 files. Internal hard drives in new computers are often at least 250gb or more, and 5tb external hard drives cost less than $150. When the inevitable happens and we receive a hard drive with millions of files, it will be impossible for us to examine each file individually. As reported in our “Let the Bits Describe Themselves” post, we use automated tools to generate data that our own tool idea, “Archivision,” can read and then display easily to the interested party as a virtual file explorer in a web browser. What we are providing is context, as the actual workflow will look something like this: We process the disk or disk drive in question, we run tools on the drive to create a preservation package (an “image” of the drive) which goes to storage, but, at the same time, create the files that can be read by Archivision, and we tell many already-in-place systems that we’re doing this so we can immediately make these things discoverable. Thus, in the case of the Mitchell Bush Papers referenced above, as soon as we have gone through the process of safely making an accurate copy of the data, our proposed workflow will take over and automatically make the existence of those files viewable by researchers by adding an easy to follow link directly to our finding aid.

An ad for Macintosh Computers in the NCSU Technician, Vol. 71 No. 41, December 4, 1989

The goal of an archive is to make as much of its material discoverable and usable as possible, while maintaining status as a trusted repository for its donors and managing the materials responsibly for the long-term. The digital domain, in one respect, brings this goal closer to reality through the affordances granted by technology. When the material comes in already digitized we have a better chance to make that material discoverable even more quickly.

To boil down the goal of the archive even further is to say that we are here to provide access. Knowing that these digital files exist and actually being able to use them for research are two different things. But we believe that using a tool like Archivision to increase visibility of digital holdings is the first step, and we have plans – referenced explicitly in our “Access and Born Digital Collections” post – to allow researchers to use an in-house laptop filled with indexed versions of our responsibly stored disk images, so they can put themselves into the shoes of the person or institution who previously used that content. Unlike many collections in the physical realm, we are given the opportunity, through born digital, to experience objects the way the donor left them (exactly, in some cases). And unlike physical collections we can easily make available the list of files in the context of the disk as they came in, getting us one step closer to automatically, and as quickly as possible, making needed material discoverable to scholars everywhere.

Nov 17 2014

Two (Disk Reading) Heads are Better Than One: Sharing Born Digital Resources

There are numerous obstacles to overcome when instituting a born digital processgathering equipment, establishing basic institutional requirements for how processing is done, and deciding on tools are just some of the steps that need to be completed before a workflow is put into place. Thankfully, as the field grows, so does the amount of resources available to those just starting out. The Demystifying Born Digital Reports from the OCLC, the Digital POWRR Tool Grid, and the Digital Curation Google Group are just three helpful, and ever-growing, examples of this. But other resources are always out there, and sometimes they don’t need a URL – they may be your neighbors.

NC State University Libraries is part of the Triangle Research Library Network (TRLN). TRLN is a collaborative organization comprised of NCSU and other Triangle academic libraries – Duke University, the University of North Carolina at Chapel Hill, and North Carolina Central University – the “purpose of which is to marshal the financial, human, and information resources of their research libraries through cooperative efforts in order to create a rich and unparalleled knowledge environment that furthers the universities’ teaching, research, and service missions.” In other words, we have agreed to share our stuff and staff with one another.

Just a small sample of some of the legacy media in the NCSU SCRC born digital collection.

In early 2013, the TRLN Born Digital Task Group was formed. Archivists from Duke, UNC, and NCSU worked together on a report to explore the state of born digital programs at each institution. As expected, we all discovered that we shared similar questions about requirements, hardware, and software. Because our three institutions have different identified requirements for born digital materials, not all of our answers will be the same, but the opportunity to share experiences regarding hardware and software is an immediate benefit of working together. Some of the many action items identified by the report included sharing transparent documentation about our processes as well as sharing equipment when needed.

Having such cooperative neighbors has already paid off. Recently, Duke and NCSU Libraries got together to explore various floppy disk controllers, and to compare notes about how to evaluate hardware problems versus disk obsolescence issues. This kind of sharing brought our report to life – real outcomes, both in the form of digital files and a new understanding of tools, were achieved by sharing knowledge and tools.

In addition to sharing, the report focused on several other areas of collaboration that were of interest to the three schools, including:

  • More collaboration with the BitCurator team, some of whom are on the UNC-Chapel Hill campus, and who are an amazing resource for all three institutions since all three of us plan or are planning to use BitCurator for at least some of our processing workflow.
  • Working on enhanced communications strategies with IT administrators. Born digital is not just an IT problem – it is a universal problem with IT solutions. Maintaining strong relationships and transparency with those who provide us IT solutions is of the utmost concern when looking toward a long-term solution to born digital.
  • Creating a stronger relationship with the robust and experienced UNC Digital Forensics Lab, to better understand the tools of our trade and to have a place to do comparisons (the example of the floppy controllers above is a version of the kinds of tool comparisons we expect to see more of).
  • More discussion of potential emulation environments.
  • An exploration of BitTorrent as a potential avenue for born digital file transfer within and between institutions.

The TRLN report has led to an extension of the Born Digital Task Group, and judging by the results of our first equipment share, our perceived shared needs, and the ease with which we have already worked together, it’s bound to create a template for other neighborly schools to follow.

Oct 06 2014

Let the Bits Describe Themselves: Arrangement and Description of Born Digital Objects

Throughout our born digital strategic initiative here at NC State Libraries we have debated over the last year just how we will make digital items discoverable to our patrons. Archival discovery begins with the finding aid or collection guide. These guides provide the context of the collection for researchers, and also present the description of the content of the collection. So how does one represent, say, a 16gb USB flash drive as a usable list or, even more challenging, a 2TB external drive, inside one of those lists? And, inside that list, how do you arrange those files/folders/hidden files/trashed files/all of the other stuff that each of us manages on our own digital landscape in one way versus the way we manage our physical landscapes (real desktops, book shelves, and on and on)? The thing is, digital objects are, in some way, already arranged when they are donated, and they are arranged in a way that made sense to the person who donated them.

It is no secret that researchers are interested in the process that goes into creating the subject of their research, so the arrangement of files on a laptop, for example, also gives clues as to what the person who arranged them might have been thinking. We decided, therefore, to give our patrons the chance to experience the arrangement of files and folders in the way they were given to us. In other words, we would not rearrange them in any way since it is assumed that the person who did the arrangement had a reason and that this kind of archival practice – digital, that is – gives us the chance to actually retain original order. But, again, how do we show this to a patron? If it’s just a list then there is no context for the files, outside of knowing who created them or what collection they come from.

An NCSU reference librarian circa 1985 possibly demonstrating an older style of arrangement and description of digital objects.

Working with our Digital Library Initiatives department (DLI) we have developed a plan to not only give patrons easy access to this list, but to also allow them to ascertain context and description easily without us spending hours at the item level trying to decipher a donor’s file scheme. We call this idea “Archivision,” and it is really just a way to allow the bits of a digital object to describe itself by generating a visual browsing environment of the object.

But how? Well, in the course of our workflow we run several tools over digital objects. These tools extract metadata, and included in this metadata are paths to where the files exist inside the disk structure, as well as metadata about these files that tell us what they are, what they contain (at least technically), when they were created, by whom, etc., etc. If they happen to be text files or contain text we can run tools that tell us what words are in those files. If they are media files we can decipher video CODECs, sample rates, and more. By drawing information from these reports we can create a virtual disk browser that looks similar to a Mac finder window or a Windows explorer window, and by simply providing a link to this virtual disk browser inside of our finding aid (next to the description of the object itself, for example – like “USB Flash Drive”) the researcher can move through the digital object as they would if they had it loaded on their own CPU. An even simpler addition – a sortable spreadsheet that contains all of the file information from the disk – will be provided as a download, too, so the researcher will not have to rely on an internet connection to look through the digital objects we have in our collection. In this way the researcher is not relying on a description that we force upon them that may not lead them to what they need for their work, but rather can contextualize the information in the way that best suits their needs.

This saves time for us and for the researcher, and is an affordance that is specific to digital information. We could not allow a box of papers to “describe itself,” but by using the archival practice of original order, we can leave the disks the way we find them and, rather than looking over each file at the item level, use tools that allow the bits to tell their own story. In this way we hope to increase the amount of digital information we can get to our patrons, make it easier for them to sort through, and save time on our end by taking advantage of the benefits of digital environments while retaining original order and getting closer to a more genuine representation of archival objects.

Aug 25 2014

Getting Things Done with Born Digital at SAA 2014

The Society of American Archivists’ 2014 Annual Meeting just wrapped up in Washington, DC, and the NCSU Libraries Born Digital Strategic Initiative was represented through a panel, proposed by NCSU’s born digital team Brian Dietz and Jason Evans Groth, called “Getting Things Done with Born-Digital.” Brian and Jason were joined by colleagues Gloria Gonzalez (Digital Archivist, UCLA Special Collections), Ashley Howdeshell (Associate Archivist, University Archives and Special Collections, Loyola University, Chicago), Daniel Noonan (e-Records/Digital Resources Archivist, University Archives, The Ohio State University), and Lauren Sorensen (Digital Conversion Specialist, American Archive of Public Broadcasting, Library of Congress). Despite the wide diversity of institutions and background of the six participants, one thing was clear from each of their presentations: Now is the time to begin a comprehensive digital archives program that works in the context of one’s institution, and it can be done using widely available tools and an even more valuable asset – other librarians and archivists who have, themselves, started programs, encountered and overcome obstacles, and are ready to share their knowledge and experience with everyone else.

The premise of the panel, overall, was that reports like the OCLC’s Demystifying Born Digital and others are excellent foundations on which to begin a born digital program. The problem, however, is that every institution is, by nature, unique, with its own unique context and needs. The panel explored the details and case studies of the various institutions, hoping to connect more easily through these contextual clues rather than making a big problem seem bigger by speaking vaguely about tools and equipment that already pose barriers – both in terms of vocabulary and perceived difficulty – to those who are in the beginning stages of planning a born digital program.

Prior to the session, the online scheduling tool for SAA 2014 said that over 360 people would attend. While all of the panelists understand that this is important work, the number was still a surprise. At 9:59am, a minute before the session began, the panelists were told to ignore the sounds of the hotel facilities staff opening the airwall at the back of the room – it was Standing Room Only, and, at the session’s peak, an estimated 500 attendees listened to six very different practitioners discuss their successes, failures, and excitement regarding digital archives. The session itself generated much in-person discussion as well as hundreds of tweets.

The panelists touched on such topics as utilizing a committee that includes stakeholders and IT to maintain transparency with others in one’s institution while such a program is getting put into place; being unafraid to tackle technical needs by relying on the transparency of others and one’s own ability to search for help with processes with which librarians and archivists are already familiar but maybe have never used themselves (like the command line); accepting that flexibility in both tools and workflow is not only OK but also desirable, understanding that there is not one, single, “silver bullet” tool or service that can answer all of your questions or needs; that problems and challenges, which will arise without a doubt, are actually quite educational and necessary; and even the “Top 10 Things I Don’t Let Stop Me From Getting Things Done (With Digital Archives),” which included lack of practical experience and assuming equipment is, by nature, inadequate, in addition to the Litany Against Fear from Dune.

The audience asked questions like “what can we not do in order to process digital objects more quickly,” “how do we establish good relationships with IT,” and “what about metadata.” In all cases, the panelists assured them that these answers existed – perhaps not in one, single location, and definitely in the minds of those who had moved through them already – and could be discovered through both understanding the context of the institution and the real, required needs established by the institution. In other words, the answers amount to careful planning for the future based on the understanding of an institution’s priorities and requirements for both collecting and access. Librarians and archivists are familiar with such planning already: Collection policies, donor agreements, and gathering data to predict access usage are things we are taught from the beginning of our careers, and they are exactly the kinds of skills needed to figure out requirements for born digital collections. What do we collect? What can we make accessible? How will this be used? A call for shared documentation and more open questions and answers was made, and the audience was reminded that the National Digital Stewardship Alliance (NDSA) has recently implemented Digital Preservation Q&A a site which allows members of the digital preservation community to share their challenges and successes in order to facilitate both progress and community building.

In addition to the incredible attendance at this session, many – if not all – of the other digital focused sessions were at capacity or very close to it – a heartening sign that professionals are taking very seriously this seemingly overwhelming challenge. SAA 2014 made it clear that those of us who fight the good digital preservation fight are not only not alone but are in very good company.

Jul 28 2014

Bit by Bit: Flexibly and Collaboratively Making Sense of Born Digital Materials

Students and staff in the Department of Computer Science, College of Engineering, in the 1970s, potentially creating media for modern digital archivists to curate.

When the NCSU Libraries’ Born Digital initiative began back in August of 2013, helpful colleagues from institutions seasoned in such work mentioned over and over that, no matter how solidly planned out the workflow for digital collections might be, it is inevitable that an object or group of objects will present themselves as the kinds of roadblocks that keep institutions from instituting born digital programs in the first place. These roadblocks come in many forms: Disks that are unreadable by local equipment, giant hard drives that take forever to image, file systems that are not understood by the CPU, etc., etc. This is not a surprise – the Demystifying Born Digital Reports, created by OCLC, list multiple tools and pointers for the digital archivist to carefully consider while they are crafting their projects. However, the multitude of ideas presented in these reports may lead the digital archivist to believe that they need to pick one tool or suggestion over another and limit themselves to those decisions, especially since the word “flexible” never appears in the reports. At NCSU Libraries we have discovered that familiarizing ourselves with a range of softwares, documenting their strengths and weaknesses, and creating a flexible workflow that relies on many free tools rather than limiting ourselves to one set and one set only has helped us make sense of how to deal with our born digital materials proactively to get as close as we can to robust access of the materials.

Just down the road from NCSU Libraries, at the University of North Carolina-Chapel Hill, a group of people who believe the same thing are working hard to prepare a suite of tools that answers the needs of the digital archivist. The BitCurator project, “a joint effort led by the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) and the Maryland Institute for Technology in the Humanities (MITH) to develop a system for collecting professionals that incorporates the functionality of many digital forensics tools,” recognizes that many of the existing options to begin born digital programs are “not very approachable to library/archives professionals in terms of interface and documentation.” At NCSU Libraries, documentation is imperative for both understandability and repeatability of the born digital curation process. The folks at BitCurator feel the same way, and are striving to provide a suite of tools, packaged easily as a virtual machine or a standalone system (whatever works better for a given institution), that not only comes as a singular piece with multiple tools but also comes with easy to follow documentation.

Recently, the team hosted a “BitCurator clinic” in Chapel Hill, an event which brought together digital archivists from NC State, UNC, and Duke University to explore BitCurator together, to talk amongst ourselves about our challenges with born digital materials, and most importantly, to share how we felt BitCurator was working for us and how it could improve. This kind of collaboration is a necessity to keep tools in scope for librarians and archivists to ensure their proper and effective usage. Flexibility was on everyone’s mind at the clinic, considering that groups brought everything from floppy disks to external hard drives from real collections to work on in front of the developers. And the developers were quick to remind us that BitCurator is built to be flexible, encompassing many disparate tools in both GUI and command line forms (you can read all about it on their wiki). Even with all of this built in flexibility, one may need to dip outside of the BitCurator environment depending on the roadblock they encounter with a particular collection – and that’s OK! Flexibility (which is absolutely necessary and even encouraged if it is all documented and leads to pre-determined requirements) and collaboration (particularly the willingness to ask questions of colleagues and to report problems with tools, for example) are two of the most important tenets of getting a digital curation program off the ground.

Apr 14 2014

Forensics in the Library: Born Digital Tools

The Born Digital Curation strategic initiative at NC State University Libraries has been developed under the principle that, in order both to treat born digital assets as an important resource in our collection and to exploit the promise of said digital assets, we must treat born digital objects forensically. That is, because “evidence” about digital objects is inherently part of the object, we should use tools that extract that evidence expertly, carefully, and thoroughly, in order for us to offer future researchers the most complete investigative environment we can. Digital forensics techniques have long been used to investigate crimes, and the tools that have been developed for those investigations offer libraries and archives a very powerful resource to enhance the discovery and use of  born digital collections.

For example, a digital image often has metadata – data about the image – embedded within it. Depending on the camera that took the image, time, date, geographical location, and even digital evidence of the potential photographer (which can be discovered with embedded usernames or can be linked to other files’ date and time stamp on a hard drive, for example) can be “hidden” inside the ones and zeros that create what we see as a digital representation of a photograph. These bits of “evidence” are most often created without the photographer ever having to think about them, automatically, by the camera’s operating system. On the other hand, while it is not rare to see traditional film prints which have been processed with a time and date stamp on them, that certainly was not a standard. Geographical location, photographer, and other context was up to the person who handled the prints; if they do not arrive as part of a collection – or even if they do – the description of such images can be, at best, contextualized based on human-written clues and, at worst, are interesting – but random – decontextualized parts of collections.

This photograph of Alan Alda is dated as 1970-1979 in the Rare and Unique Digital Collections of NCSU, but the digital screenshot of the photograph has the exact time and date of creation.

This comparison/contrast is not intended to pit digital and analog objects against one another, but rather to point out that work within the digital realm, while riddled with challenges that are not necessarily part of the analog world, does allow for a degree of automation of information that, if harnessed and used correctly, can make the discovery, retrieval, and use of objects potentially easier. After all, what good is an archive if the stuff can’t be used?

Our born digital workstation has been outfitted with the tools to make this desire a reality. In order to accomplish effective management of born digital objects the Special Collections Research Center relies on an array of software tools that, depending on the needs presented by the curated object, are used in various combinations to produce usable packages of information that we can make more easily available to researchers and other patrons interested in the collections. The primary tool in our current workflow is BitCurator, which was developed primarily by our colleagues at the School of Information and Library Science at the University of North Carolina, Chapel Hill (SILS) in partnership with the Maryland Institute for Technology in the Humanities (MITH) and is funded by grants from the Andrew W. Mellon Foundation. In addition to the powerful tools contained within the software, there is a community of practitioners growing up and around the use of the software and born digital curation in general, which adds a whole new level of usefulness and empowerment to those of us tasked with tackling the challenge of born digital curation. Look for more on our workflow and specific born digital tools as we continue to update you on our born digital curation progress!

Feb 24 2014

Managing Born Digital Collections

Along with the decision to start a born digital curation initiative like the one we have just begun here at the NCSU Libraries Special Collections Research Center comes an important and complicated question: How? Unfortunately, as is the case with many such questions in libraries and archives, the answer is not only difficult but also fraught with compromise. That is, there are a number of solutions to handling the management and storage of digital materials, but because every collection and every organization that takes care of those collections is different there is not one easy, out-of-the-box, instructional-video-on-YouTube path to take, and what looks great and is attainable for one organization may be out of scope (and budget) for another. That does not mean there are no answers, however. In fact, quite the opposite – there are so many possible answers that they can be overwhelming enough to keep such a project from ever getting off of the ground. The OCLC has provided some enormously helpful starting points with a series of “Demystifying Born Digital Reports.” By their nature, these reports have been created for an entire community of practice, and thus cannot speak directly to the very specific issues confronting individual institutions like NCSU Libraries. However, they, used in concert with the knowledge we have gathered here about both our needs and the makeup of our collections, in addition to the expertise of a communicative and curious international community of librarians and archivists who recognize born digital curation to be an extremely important part of the future of libraries, have helped us craft what we like to call the “first draft” of our born digital program.

The SCRC Born Digital workstation, aka "The Kraken"

In addition to this 5.25" floppy drive, the Kraken can currently process 3.5" floppies, ZIP disks, CD/DVD/Blu-Ray disks, and many internal hard drives or external USB devices.

The cornerstone of our program is the desire to make our previously unavailable digital collections accessible to our patrons. In order to do so responsibly, we use our digital forensics workstation – what we affectionately call The Kraken, due to the amount of tentacle-like wires emerging from its core – to migrate content from its original medium to a package that contains not only the data but also metadata (the last time the file was edited, the creator of the file, file size, file type, software used to create the file, etc.) relating to the data. We do this by “imaging” objects – that is, turning the contents of a disk, hard drive, CD-ROM, etc., into one single file that is a snapshot of the original object – and then using that image as an archival object that is both something to preserve and something to make copies of to provide access. By following forensics guidelines – that is, “touching” the object as little as possible and leaving as few “digital fingerprints” as we can – we treat the items archivally. We use a write blocker to keep the Kraken from leaving any traces on the original object when we are creating images of disks. We then make an additional copy of the disk image, and run a series of tools over it to extract information from the singular image, providing a broader scope of data to the potential researcher before they even attempt to look at the data inside of the image. We then package all of that together and responsibly store the image, giving us the chance to pull digital copies when we need to without having to touch the original disk. The decisions about all of these components – the workstation, the methods used to extract data, the tools to run over the image, the digital package, and how to access that package – were not prescribed to us, and will likely change as the program continues. The most important decision was the one to start the program in the first place to protect and make available the valuable digital objects in our collections here at the SCRC. Stay tuned to the blog for updates on the progress of our exciting born digital initiative!