Though the OCR software market is far from caput, the problem is it rarely gets business users a home run.
Data Extraction: First and Second Base
More paper documents than ever are generated daily to ensure physical signatures for legal admissibility, to accommodate existing workflows from the-field-to-the-office locations, and address ongoing partiality to handwritten annotation. Contrary to the crystal-ball readers, who predicted the demise of paper—and therefore, an end for OCR—the demand for OCR is actually growing in many industry sectors.
No matter how advanced systems become, people like paper. For now, traditional paper flows and documents born digitally must work together. To be leveraged by the business, many document types benefit from being transformed by OCR software into digital formats that are shared across the enterprise.
Some areas within businesses, such as accounting, have as little as 19 percent OCR adoption. Other areas, such as check and claims processing, are in the high 80’s percentile when it comes to adoption. Still, the average for OCR software adoption enterprise-wide, by many accounts, is a mere 30 percent. Historically, this selective adoption has been due to the fact that OCR software solutions have been prohibitively expensive to adopt, integrate, and maintain. Or, the value of that OCR data is difficult to obtain due to the inability to accurately use it within operations.
As with every product solution that’s been around awhile, OCR software has become affordable—especially when it’s replacing manual data entry. OCR offers higher throughput and accuracy at a lower cost than manual data entry.
Data Extraction: Home Run
Unfortunately, OCR software is inadequate for businesses that want to use the data to efficiently process transactions, organize their documents for better control and governance, search important documents quickly and easily, access the right data for decision making, and find the content necessary to support business. OCR software supplies text and numbers devoid of context. This data might serve useful for a full-text search. However, as so many businesses have already realized, full-text search is insufficient and fails to provide a basis for knowledge management and information governance.
Using advanced technologies on top of OCR is the only way to properly identify the document type and then accurately locate, extract, and validate the data. New, advanced data extraction solutions move beyond OCR. The best solutions build on a strong OCR base and then leverage machine learning, content and image pattern recognition, and automated classification to provide a solution that more than satisfies the business users’ needs.
Hitting Home Runs
While OCR is a fundamental prerequisite for many automation needs that involve documents, the output only provides raw text. Analysis of any industry demonstrates the need to go beyond simple OCR to satisfy business users’ requirements for efficient and relevant data location and extraction. Highly accurate OCR software provides the initial technology base upon which more advanced and value-added solutions are built.
Advanced document classification, data location and extraction make the data immediately available to other tasks—such as audits, managing process deviations, and alerts—by taking key data from these forms (even handwritten information) and applying business rules to the extracted data. Advanced capabilities include:
- Locating initial and signatory areas (using keywords such as “initials” or “signature”).
- Determining the presence of entries for specific fields.
- Ensuring that pages that meet the rules are classified complete and that pages that do not meet the rules are classified incomplete.