Is the Internet Forever? (Not Always!)

There’s a popular argument to think twice before you share on social media, because what is shared cannot be taken back.  Once online, it is online forever. While that is more of a precaution than a hard and fast rule, it is often something we believe.

But if you’ve ever read something and then tried to find it again a decade later, you know first-hand how content that is born digital can become unavailable.

The nature of born-digital content is as the name suggests—content built and published (or “born”) for a digital medium and meant to stay digital. Without conscious preservation efforts, born-digital content isn’t archived anywhere physically and can easily disappear.

Efforts by preservation organizations help mitigate the trickle or flood of material disappearing from the web, but they cannot capture everything.

What does it mean when born-digital content can simply disappear without much of a trace?

  1. Documenting Disappearances

In August of this year, a disturbing trend was revealed.

Over the course of the last ten years, a team of researchers had noticed that previously-available open access journals were disappearing from the web. In order to understand and document the full extent of this vanishing knowledge, they built methodology, compared databases, and tallied up the losses.

Information scientists Mikael Laakso, Lisa Matthias, and Najko Jahn identified a total of 176 journals that had disappeared between the years 2000-2019. These journals had been published all over the world and covered a variety of disciplines, meaning the problem was widespread and indiscriminate.

There are safeguards that act as a network of duplicate copies and preservation efforts for journals, but Thib Guicherd-Callin, acting manager of one such program, told Nature that digital preservation initiatives are “woefully underfunded.”

It is important to remember that this study wasn’t focused on how many journals had been successfully preserved and remained accessible thanks to programs such as LOCKSS (Lots of Copies Keep Stuff Safe), and PKP PN (Portico and the Public Knowledge Project’s Preservation Network). The Global LOCKSS Network preserves 11,000 journal titles! LOCKSS makes “use of the copies it manages, by enlisting them to validate integrity against each other, rather than relying uncritically on comparisons against a centralized fixity store,” keeping digital objects safe, accessible, and verifiable.

Matthias believes a true solution would need to be shared between publishers, authors, librarians, and preservation services, but that solution just doesn’t exist yet.

  1. Digital Excavations of the Future

Modern-day historians have a wealth of physical objects and documents (often preserved digitally) that they mine for details about events, track daily life, understand a variety of perspectives, and ultimately draw conclusions about the past. The tumultuous year of 2020 has provided an interesting thought experiment around a practical question: In one thousand years, what material from this year, created almost entirely online, will be available for study? And more to our point, what material will vanish completely?

The first question of what will be worth preserving at all is a difficult one. The answer typically only becomes clear in hindsight, but in the fast-moving world of the internet, decisions are needed now. University Affairs tracks collection projects by the University of Saskatchewan and Brock University Library, noting success in user submission and a hyper-local, person-focused scope.

Many countries, such as United Kingdom, France, and Denmark mandate their national libraries to capture a comprehensive digital record of life in that country at any given time. This may prove a future advantage to historians studying these countries’ culture in the time of COVID-19.

The United States has a more randomized approach. The Library of Congress has digital archives covering the first 12 years of Twitter, but now it acquires tweets on a selective basis “similar to [their] collections of websites.” The LOC’s Web Archive team of subject experts select sites in accordance with the Library’s (incredibly broad) policy guidance. There are also the efforts of the (non-profit) Internet Archive’s Wayback Machine project to snapshot the web.

But without a legal mandate to preserve a comprehensive digital picture of the world, gaps and perspectives are bound to be missed. When only one voice is remembered, history isn’t complete.

Within the field of digitization, there is often a feeling that what’s done is done forever. But in reality, website collections need to be managed and interfaces kept functional as internet standards and usage practices change. A collection easily accessed on a desktop, for example, may be impossible to navigate on a smart phone. Scans and images saved on floppy discs are now difficult to access from modern computers, and other save formats are nearly obsolete.

Anderson Archival tracks the standards and trends of digitization, ensuring that files are saved in a variety of formats that have been identified by experts as having the most digital longevity. These formats include archival PDFs (PDF-A) and are often saved on cloud servers along with physical backups. We, too, believe in the notion that lots of copies keep stuff safe.

Whether you want your collection to be accessible forever online, or want to pick and choose who is granted copies and access, Anderson Archival is the solution to your digitization needs. Don’t let your digitization decisions come back to haunt you! To start the discussion and keep your collection available for the future, contact us today.

Subscribe to Our Newsletter

Digital preservation is about connecting to history. We do our best to bring you the important news and personal stories you’re interested in. We’re always looking for article ideas. Come learn with us!

Invalid email address

Share this post with your team

Share on linkedin
Share on facebook
Share on twitter
Share on whatsapp
Share on reddit
Share on telegram
Share on email