A great way to improve recognition results in general is to use a drop out form whenever possible. These forms are printed with ink that can be “dropped out” by the scanner during image capture (usually red). What you’re left with is a nice white page with nicely positioned data.
Using drop out templates is not always a possibility, but advanced Intelligent Document Recognition (IDR) software provides a way to mimic this behavior that is known as “template removal.” Using a blank form as your template, IDR software can compare the blank form and the registered image and remove pixels that appear in both images. We will demonstrate this using Parascript’s IDR software, FormXtra.
First, we have template removal enabled for the “First Name” field:
In the recognition box on the bottom of the image, you can see the line below the name “Christopher” is green. This is feedback from FormXtra telling you what was removed from the field snippet before recognition. Compare this to the results without using template removal:
In this case, we’d have to take another approach to capturing the data, which didn’t involve the field encompassing the line in the form:
This is a less flexible approach in that we are not able to capture text that was written below the line. In this way, we affect our confidence as you can see in the above result.
Template removal is turned on field by field, and can be located in the property pane of any field:
One important thing to note before using template removal is that the template you provide to FormXtra during the Registration portion of Form Definition creation must be a blank form. If you use a form with data, you will be dropping out parts of data that you actually want to capture.
Don’t have a blank form? Not a problem as Form Definition Studio provides tools for manual cleanup of forms. Just add any captured image as a template to your Form Definition:
Now toggle to the “Image” tab at the top of the ribbon menu. Next we’ll use the “White zone” tool to cover up the hand written data:
Be careful to try and remove only the data that was hand written or typed into a field. Once you’ve removed all the data from any field that you want to use template removal on, you’re ready to start improving your recognitions results while also handling a wider range of scenarios.