Universities are well acquainted with the need to archive. Their work plays a vital role in society as their research contributes to every aspect of human civilisation from our health to our economy. For centuries, these institutions have been preserving items and information of importance, keeping a record that is incredibly valuable to students, faculty, researchers, wider society and the general public.
The way universities interact digitally has also changed dramatically over recent decades. Information for students is uploaded to web pages, and documents from prospectuses to academic publications are hosted online. Conversations about applications, from clearing, to freshers’ week right through to graduation and alumni relations take place via social media. This shift towards digitalisation is accelerating, and many universities will have key information that can only be found online.
The requirement for archiving is not limited to web pages; it can be entire websites, blogs, online surveys, even if they are stored behind a firewall. Social media, even Tweets that are designed to last for a short, limited timespan before disappearing, can now be kept securely as a record of communications, right down to the metadata on each platform and interaction.
These will be the foundations of future archived content that will make up the rich tapestry of history for future generations providing reference and insights into the way we lived at the dawn of the ‘digital age’.
Everything a modern university does needs to be preserved, for many reasons. Academic work is of cultural and historical value to future generations, while communications and day-to-day operations need to be trackable for legal and compliance reasons. For example, online prospectuses that have been archived clearly show what a particular course was advertised as, at any given time. It is a demonstration of best practice in terms of corporate or administrative record-keeping. Competition and Markets Authority guidelines on how to stay compliant will inevitably become more of a heightened requirement, and we may perhaps see a legislative requisite as organisations become increasingly digitalised.
The higher education community already appreciates that archived digital material provides a permanent, date and time stamped, unalterable record of a comprehensive snapshot in time. Digital material is now produced more prolifically, and will become more and more openly harvestable over time.
“One important consideration for all organisations is that archiving retrospectively will not provide the most satisfactory or cost effective results.”
From a heritage point of view, the need to capture, catalogue and make digital material usable and accessible is an integral requirement of content and artefact recording. Put simply, Universities not capturing it face losing it forever, in the all-consuming scale and pace of content being produced on a daily basis.
Digital archiving also goes beyond chronicling what messages and information a university sends out – it can also be used to track what students and the public are saying and thinking. Collecting records from social media conversations about a particular course, facility, or initiative can help institutions to understand the sentiment at the time, through the use of data analytics.
The research work of a university is another aspect that will particularly benefit from the capture and preservation of digital content, because it can protect the outputs of that research. Many university researchers now produce websites as part of their studies, which then need to be preserved to comply with funding requirements or simply to chronicle and present their work. Additionally, as both internal and external researchers increasingly use online-based data for research purposes, there is a growing demand for open data, for which web archiving is now an essential component.
The only way to safeguard this new online information and content is through digital archiving. This process captures and preserves the online content, in its native format, future-proofing it for continued use and avoiding the danger of relying on formats that may later become obsolete, potential issues with third-party platforms used for publishing, or dependence on content management and backups that only provide security in the short term.
However, there is currently a lack of understanding surrounding this issue, despite the growing recognition of digital content as a valuable asset. When mixed with tight budgets and the time-sensitive nature of the issue, it becomes a real challenge for university archivists.
“For centuries, universities have been preserving items and information of importance, keeping a record that is incredibly valuable to students, faculty, researchers, wider society and the general public.”
But unlike the physical archives that have been preserved and carefully curated over the years, digital data always has a deadline and is therefore at high risk of being lost forever. Each day, billions of GB of data is created online, all of it fragile and being scattered in unstructured and often unfathomable corners of big data back holes. It is easy to assume that anything uploaded to the web is safe there, but this does not reflect reality. The average lifespan of a webpage is just 90 days. An error message that reads “404 page not found” is the modern-day equivalent of a missing book in a library archive collection.
Once the data is captured, the archive also needs to be made usable, otherwise it is worthless. For example, MirrorWeb recently digitally archived the UK central government’s online presence to the cloud for The National Archives, comprising a gigantic 120TB web archive in a process that took only two weeks. The 1.4 billion documents were then indexed – a process which lists all the assets in an archive to make them searchable – in a mere ten hours. This pioneering work with The National Archives has allowed MirrorWeb to collaborate with the Digital Preservation Coalition, and the Archives and Records Association, bringing together a select working group of partner Universities. The collaboration has redefined a HE digital archiving solution that is embedded in archival best practice, while meeting these challenges head on. Delivering simple, cost effective digital archiving capability, with 24×7 open access to crawls, collections and interrogate meta data and crawl logs seamlessly.
One important consideration for all organisations is that archiving retrospectively will not provide the most satisfactory or cost effective results. In as little time as a few months, a huge amount of important content and assets would be nigh on impossible to successfully recapture. MirrorWeb is collaborating with the HE community to research and ‘test and learn’ solutions to ensure digital data isn’t lost forever.
Some forward-thinking institutions and businesses are already taking advantage of the possibilities of capturing content for their future digital archive, but the entire community needs to begin treating digital content as the precious asset it is, and more importantly will be in future, by capturing it now so that it can bring value to generations to come.
Phil Ogden is Chief Marketing Officer at MirrorWeb