Anderson Archival logo

Search Results: 0 – The Unseen Cost of Inaccurate Data and Sub-Par Solutions

searching figures

Anderson Archival is pleased to have presented at Digital Preservation 2018 (#digipres18) in Las Vegas in October! The conference, with a theme on the future of digital preservation was hosted by the National Digital Stewardship Alliance (NDSA) and the Digital Library Foundation (DLF).

At the conference we highlighted what archivists should consider when creating or updating a digital collection, when not to choose economy over quality, and the various ways in which a digital collection can fail to be a useful research tool as a result of substandard work.

We embraced attendance at Digital Preservation 2018 as an opportunity to take part in the national discussion of preservation quality and access, and we would like to share with you what we presented at the conference.

Minute Madness

Anderson Archival shared a short one-minute presentation on the hidden cost of incorrect data.

Our Minute Madness presentation, “Search Results: 0 – The Unseen Cost of Inaccurate Data and Sub-Par Solutions” illustrated our experience in providing preservation solutions for a client who had previously invested in what they ultimately realized were poor solutions that offered only inaccurate, incomplete data.

For a collection that is used for scholarly research within their organization, this was a problem.

This group considered their collection preserved, but after a careful audit of their digital materials, we discovered that not only were chunks of original information missing entirely, the scans that were complete provided such messy OCR that search results woefully underrepresented the actual contents of the collection.

Search Results

What was the true cost of using this cheaper digitization solution for ten years? It’s impossible to calculate! Imagine the hours lost to inefficient search, and the research and publications that are now known to have drawn from fragmented data.

For instance, see what happens if OCR software reads this famous quote from Winston Churchill:

If the OCR mistakes the g and h and it goes unchecked, we end up with this in the collection:

If you searched for the famous portion of this Churchill quote “go to hell,” this document would never show up in your search results. Now imagine this hundreds of times over throughout your collection – many collections being tens of thousands of pages, or larger.

Inaccurate OCR data provides limited search results, and the lack of good search technology will give you an infinite number of useless results. These are both complicated by poor metadata tagging.

So what happens when a digital collection is preserved with inaccurate data and sub-par solutions? The voices of history don’t resonate when users access a poor software solution with inaccurate search results, and your collection won’t be used to its greatest potential.

The methodology you employ can mitigate these problems.

For the most accurate data, establishing a multi-step system for scanning, image cleanup, OCR and quality assurance is critical. 

You also need detailed tagging to support your data architecture and the right search technology tuned to your data set.

The Executive Director for the project mentioned above was horrified to learn that nearly a decade of their research was not complete. 

How do you feel about your collection? Is quality important to you?

With a digitization provider like Anderson Archival, every step of the archival process is performed and checked by members of our expert team.

It’s time to gain confidence in your data and your search results! Check out our poster from Digital Preservation 2018 and call us today at 314.259.1900.

Subscribe to Our Newsletter

Digital preservation is about connecting to history. We do our best to bring you the important news and personal stories you’re interested in. We’re always looking for article ideas. Come learn with us!