A lot of attention is placed in the document capture industry around document classification, field recognition, and full-text recognition. But little attention is given to the real mission of capture solutions providing high-quality data to businesses. Key to this mission is the data quality assurance process. There is no such thing as a fully straight-through data capture process; even for highly-structured electronic information. Solid, business-grade systems always provide processes to ensure the data coming out of one system is valid before shuttling it to another system.
But what about capture software? In a world where machine print is almost 99% accurate for structured forms, it would appear that the issue of data quality isnt an issue at all. True? Not on your life. Even in scenarios where businesses can achieve accuracy of 99%, these same businesses would be foolish to entrust a system and run it as a black box without verifying those accuracy rates. And businesses with high volumes of structured data would still have errors. Take that high water mark of 99% field-level accuracy. If a company processes 10 million claims forms per year and each form has 35 fields of data, that translates to a total of 35 million fields and a potential to have 350,000 field errors. All of a sudden, a 99% accuracy rate doesnt seem so good.
What to do? Validate the data that comes out of the system. Most capture software does this through data validation workflows where documents that fall below a certain accuracy confidence threshold are sent for review. In this case, a person charged with reviewing data will open an application and be presented with documents that have potential errors. If the data is valid, they approve it and move on. If the data is not valid, they correct it using a template form.
The majority of solutions take a page-centric approach that displays the entire document image and highlights the fields that require validation. The user scans the document and tabs through the field templates. Once complete, they move to the next document.
This page-centric approach has two major shortfalls.
First is that the process of a user scanning the document takes quite a long time; users must scan the document and move back-and-forth between image and the data field. Even for solutions that aid the user with zoom-in capability, this is still time-consuming and error prone.
The second is much more serious: data security. If a user is presented with the entire image and that image contains personally-identifiable information, then only staff with appropriate levels of clearance can validate the data. This raises costs and also raises risks of security breaches. For financial and health-related documents, breaches of data security can include criminal liability.
So is there a better way to manage data quality while ensuring both efficiency and data security?
Click here to read Data Quality: Transactional Data Validation – Part 2.
To learn the latest advancements in capture technology, download the white paper “Intelligent Document Recognition (IDR) Advanced Technology for Increased Productivity.”