By: Todd Stoffer
NC State University Websites Collection
We are pleased to announce that the first phase of our web archiving project is now in full swing. Earlier this month we completed the quality assurance checks for 140 new seed URLs in our NC State University Websites collection, which will now be crawled on a recurring basis. We are now crawling 190 university websites on a recurring basis. This includes the websites of campus-wide administrative units, each of the 12 colleges, and the vast majority departmental websites. We have captured 484 gigabytes worth of data in just the past 12 months. This collection is set to be crawled at regular intervals throughout the year, so we will continue to capture updated websites as changes are made to campus websites. You can explore the entire collection by visiting https://archive-it.org/collections/5838. The chart below shows the growth we have experienced over the past year of collecting. We are projecting that by the end of this collecting cycle we will have preserved over 700GB of website, the majority of which is contained in our NC State University Websites Collection.
Documenting the Process
Web Archiving is still a relatively new area of practice for libraries and archives. There are not nearly enough resources that document the process of starting to build a new web archive. It is a complex task both from a technical standpoint as well as from an organizational policy standpoint. Developing internal standards and best practices ensures that the web archive can be maintained long-term. We have been working on these standards and practices for over a year now, and decided it was time to formally document them. For us the best option for documentation was to create a website that outlined different processes that we have in place, from seed selection and scoping to quality assurance and collecting guidelines. We have also made the decision to make that documentation openly available online. You can view it at https://ncsu-libraries.github.io/web-archiving-docs. We hope that this documentation is helpful for other organizations that might be starting new web archives, by adding transparency to the a process that is often only internally documented.