How Does Digitizing Historical Documents Work?
The historical document digitization process starts by carefully scanning each page of the document. Resolution, lighting, and accuracy are all important factors for this step. Large format items such as maps and legal documents are placed on specialty flatbed scanners. After scanning, the images are carefully inspected and quality assured.
Quick scanning companies shove documents through machines at a high rate and are liable to overlook pages that are stuck together, torn, or missing from the collection. That’s our differentiator. Every single page scanned by Anderson Archival is quality assured by multiple, trained archivists. Our team of professionals makes sure not one page was skipped and that each scan is up to our—and your—expectations and requirements as a living digital copy of the original.
Collection owners then have a decision to make. Do you want the scans to go through our cleanup process, or to present the images as-is? If cleanup is desired, each page is carefully studied by human eyes from top to bottom—from one character to the next, if that’s what your specifications require. Once flaws are documented, our team has several restoration options.
The first is to clean up the digital document. Are there specks? Stains? Rips or wrinkles? We can clean most of those away for a spotless final image. Is there partial, missing, faded, or damaged text that you’d like replaced? Usually we can rectify those problems, too. Again, everything is quality assured by multiple archivists.
OCR and Verification
Next is OCR. This is the meat of how information is digitized. After software has optically recognized the text, a human verifies the low-confidence characters and then double checks each correction. Sometimes a project requires additional checking, or even word-by-word proofing, which can be done on multiple levels. Do you want the text proofread? Or do you want to ensure every single word is exactly as it appears in the original document? For most projects, a simple OCR verification pass is usually enough.
Preserving historical materials isn’t complete until you have a way to view your documents. Metadata is the information the computer uses to organize your files and helps you to search quickly and easily between documents. For instance, once metadata is inserted into the digital documents, you will be able to search for any part of the document you want, whether it’s text on the page or a more generalized author, or date, or subject that multiple documents would be associated with.
Metadata can include dates, page numbers, chapter titles, customized watermarks, and much more.
There are a few ways to organize your preserved historical materials. One way is to organize them on a server, cloud storage, or a removable storage device that you can do whatever you’d like with. These documents can be viewed with any PDF viewer, and we can provide copies of them on multiple hard drives or another flash drive to create backups.
Anderson Archival also offers custom website building, so you can showcase your library and broadly share it. We’ll work with you to build a site that you love, and one that will display your documents beautifully.
Cloud storage is essential for backup, especially for large libraries. Anderson Archival also offers customized cloud services so you’ll know your digitized historical documents are safe forever, no matter what happens to your physical backups.
What Can I do With My Digitized Historical Documents?
Once your project is complete, you’ll have a beautiful digital library at your fingertips. Do you want to digitize your documents so you can use them for research purposes? You’ll be able to query all of the OCR’d text, search for phrases and metadata with accuracy, zoom in on the images, and print the documents.