What Is Historical Document Digitization?
Historical Document Digitization
Have you ever seen a copy of the Declaration of Independence in an online collection like the one provided by the National Archives? You can click on it, zoom in, and study the document in detail. If the document has been through the optical character recognition process (OCR), you can search the document for keywords or whole phrases. That is the result of digitizing historical documents. However, there are many components that go into the document digitization process depending on the extent to which you would like your historical documents preserved/digitized.
What Types of Documents Can Be Digitized?
Almost any! Even severely damaged documents are candidates. In fact, a major reason to digitize is to preserve historical materials that might not last much longer in their physical form. Extreme care is used when working with these precious documents because Anderson Archival’s team cares as much about your collection as you do.
A great candidate for digitization is a collection of historical books or periodicals that needs to be searchable. Once digitized, these books become completely searchable, and you’ll be able to find and reference documents much more easily.
How Does Digitizing Historical Documents Work?
The historical document digitization process starts by carefully scanning each page of the document. Resolution, lighting, and accuracy are all important factors for this step. Large format items such as maps and legal documents are placed on specialty flatbed scanners. After scanning, the images are carefully inspected and quality assured.
Quick scanning companies shove documents through machines at a high rate and are liable to overlook pages that are stuck together, torn, or missing from the collection. That’s our differentiator. Every single page scanned by Anderson Archival is quality assured by multiple, trained archivists. Our team of professionals makes sure not one page was skipped and that each scan is up to our—and your—expectations and requirements as a living digital copy of the original.
Collection owners then have a decision to make. Do you want the scans to go through our cleanup process, or to present the images as-is? If cleanup is desired, each page is carefully studied by human eyes from top to bottom—from one character to the next, if that’s what your specifications require. Once flaws are documented, our team has several restoration options.
The first is to clean up the digital document. Are there specks? Stains? Rips or wrinkles? We can clean most of those away for a spotless final image. Is there partial, missing, faded, or damaged text that you’d like replaced? Usually we can rectify those problems, too. Again, everything is quality assured by multiple archivists.
OCR and Verification
Next is Optical Character Recognition (OCR). This is the meat of how information is digitized. After software has optically recognized the text, a human verifies the low-confidence characters and then double checks each correction. Sometimes a project requires additional checking, or even word-by-word proofing, which can be done on multiple levels. Do you want the text proofread? Or do you want to ensure every single word is exactly as it appears in the original document? For most projects, a simple OCR verification pass is usually enough.
Do you want to preserve your historical documents forever?
Do you want to digitize your documents to be able to showcase them and release them to the world? When your historical documents have been properly digitized, the access and usage capabilities go far beyond those who can come to your physical collection for research. Ready to expand your audience while protecting and preserving the original documents?
Preserving historical materials isn’t complete until you have a way to view your documents. Metadata is the information the computer uses to organize your files and helps you to search quickly and easily between documents. For instance, once metadata is inserted into the digital documents, you will be able to search for any part of the document you want, whether it’s text on the page or a more generalized author, or date, or subject that multiple documents would be associated with.
Metadata can include dates, page numbers, chapter titles, customized watermarks, and much more.
There are a few ways to organize your preserved historical materials. One way is to organize them on a server, cloud storage, or a removable storage device that you can do whatever you’d like with. These documents can be viewed with any PDF viewer, and we can provide copies of them on multiple hard drives or another flash drive to create backups.
Anderson Archival also offers assistance creating a website or digital catalog, so you can showcase your library and broadly share it. We’ll work with you to build a site that you love and one that will display your documents beautifully.
What Can I Do with My Digitized Historical Documents?
Once your project is complete, you’ll have a beautiful digital library at your fingertips. Do you want to digitize your documents so you can use them for research purposes? You’ll be able to query all of the OCR’d text, search for phrases and metadata with accuracy, zoom in on the images, and print the documents.
There are a lot of steps in the historical document digitization process. Ready to learn more?
Subscribe to Our Newsletter
Digital preservation is about connecting to history. We do our best to bring you the important news and personal stories you’re interested in. We’re always looking for article ideas. Come learn with us!