Image quality remains an ongoing challenge for our financial clients. In benchmarking our data verification and check fraud prevention projects, it’s easy to demonstrate that image-based processing is highly successful. However, the very small percentage of poor quality images can cost financial institutions millions of dollars to research and correct. A recent white paper from Digital Check offers some good insights (available for download on the website) with an in-depth review of the costs of poor image quality and the types of errors that are present.
The problem goes beyond check recognition, especially when it is necessary to classify and extract data from documents. Regardless of the application, it is imperative that businesses undertaking a data extraction project pay close attention to the input quality of the image. Aspects like size, contrast, and presence of streaks or other artifacts all play a part in increasing operating costs of data extraction because they usually force the organization to treat it as an exception and have a person manually review it. Resolution of scanned documents works best when it falls within the 200 to 300 DPI range. Color scans can offer more information than grayscale or black and white.
There is another side of image quality that often goes unnoticed. In some circles, this is called “usability” and it goes way beyond image quality analysis to include both the legibility of the document and/or presence of actual data. Let’s take data presence first.
Missing Data
Depending upon the scenario in dealing with documents such as contracts, the absence of data may affect a document’s legal status. It is critical that these problems are detected before any business process actually starts. For applications such as remote deposit capture, the last thing any bank wants is to accept a check that is missing important data such as the amount, date, or the endorsement. Data presence detection can be implemented in a completely automated software-based process to provide a gatekeeper capability that can enforce these types of requirements. Essentially the software takes a candidate image and then looks for the locations of specific data. Once found, it analyzes the area where the data is expected. While some software only detects the presence of something (it could be a smudge) using pixel analysis, other more-sophisticated software discerns between artifacts that happen to be in the region from actual data that is supposed to be there. This capability reduces the amount of false positives that plague transactional data extraction processes.
Legible, Quality Data
Now for legibility. Determining the presence of needed data is one challenge, but it is entirely more complicated to provide an extra level of analysis to determine the probability of data extraction success. The objective is to only forward images that have data that will have a high percentage of data extraction success. Any data that is poorly printed or handwritten will fail and either be returned or treated as an exception. The major thing here is that the recipient will know before the process starts that something needs to be done rather than have a document (e.g., check or contract) go through a process and then get rejected. This requirement is especially important for hand-printed data such as personal checks or data completed in contracts or applications. Legibility analysis can actually review the hand-printed and handwritten (or cursive) writing to evaluate things such as the spacing between words and the shapes of individual letters to determine if the data will be recognized successfully – and you don’t have to perform costly recognition to find this out.
Successful Data Extraction
While data extraction projects spend a lot of time on the documents themselves and actual recognition of data, it is important to go one step earlier in the process and address your image quality and usability needs to ensure success. For high volume or sensitive transactional data extraction needs, pay a lot of attention to the quality of what goes into the system. Understand the costs of poor images and bad data to your business process. Then, find the solution that is right for you.