Historical document digitization has been a part of preservation plans for decades, but the available technology has improved dramatically. With optical character recognition (OCR), which creates the ability to search through the text of any scanned image, your collection has the potential to become a functional research tool for anyone viewing your digital library.
Converting historical scans into usable research tools or online databases can be a complicated process. It may not feel like specialized training and fancy scanners are necessary for every collection, but entrusting your historical materials to an experienced firm is almost always the best option for their long-term welfare. The Federal Agencies Digital Guidelines Initiative 2022 (FADGI) notes that “a project with inadequate resources may produce results with little or no long-term value, and could even result in damage to or loss of the collections.” Depending on the condition of your digital collection, it may be more cost-effective to outsource OCR services rather than handle it in-house.
Want to learn more about the Federal Agencies Digital Guidelines Initiative digitization standards and how it guides Anderson Archival’s processes? Click here!
Bringing Out the Text from Scanned Images
Before starting any digitization project, it’s vital to know the quality of the images. Older image files or those created without quality equipment may no longer be suitable for preserving a collection for the future. FADGI guidelines offer specific quality standards for digital scans to be considered suitable for OCR or other information processing techniques.
Many early scanning efforts may not offer the image resolution or clear detail needed for OCR software to read text. In this case, any attempt to use image-to-text converting services requires new scans with updated equipment. That can get expensive if your organization doesn’t already have scanners or digital cameras capable of creating images of sufficient quality or the manpower to perform the scans. Your organization may find it more economical to hire a company that offers both scanning and image-to-text converting services to avoid buying expensive equipment.
Why Should You Outsource OCR Services?
If your scans are suitable quality to proceed without problems or need only minimal adjustments, then you can immediately begin converting your historical scans to readable text. OCR software is available for purchase, but before you task a single employee with days of converting, consider this warning from the FADGI guidelines: “Resist the temptation to dive into a digitization project without proper resources. . . . avoid the trap of assuming doing the work in-house will cost less. Insourcing may cost more than outsourcing.”
Even if your digitization project doesn’t need new scanners or digital cameras, it can still be beneficial to outsource OCR services. For all that OCR software is capable of, it still reads text like a computer, and that can mean countless errors in the conversion process. If your project requires a decently accurate rendering of the text, a trained employee must verify potential errors flagged by the software. If your project requires a high level of accuracy, another human pass may be needed to review the text against the scanned image manually, word-for-word. All of this increases the amount of employee time you must devote to the project.
Anderson Archival’s historical digitization services provide you with staff already proficient in this process. Our employees can perform the same tasks with better resources and less downtime learning new software or which errors to watch for. This can ultimately save your organization money and resources in the long run.