Form design is essential to the success of recognition technology. Forms should be well designed to reduce illegible, inaccurate, misinterpreted or missing data, and to improve data recognition. It’s not always the case that you will be able to control the forms that you need to handle, but even if you can’t, often you can educate or influence those in charge for the next revision of a given form.
First let’s start with laying out a form. If you wanted to capture a name field, you may be inclined to create a field called “Name:” with a line like so:
Although, form processing or document capture software can handle this situation with a “Full Name” field, the design is not optimal for two reasons: recognition performance and result confidence.
Because we have to try and determine the entire name in one pass, the ICR engine must concatenate possible values in the built in first and last name vocabularies. This is going to be slower and less accurate than comparing just the first name to possible values in the first name vocabulary, and last name to the last name vocabulary.
Ok, so if not a full name field, then what? Well, your next thought might be to do something like:
While this is better than the first example and allows the system to capture the first and last name separately, there is still one problem that we run into with this approach: overrun. We haven’t left much space for the user to write their name, so we’ll often have an issue where the user writes over the following machine print, or in other creative directions:
In the above case, even though Parascript’s FormXtra is able to capture the correct name, the confidence is only 76. If we instead built the form with “First Name:” on one line and “Last Name:” on the next, the users with long names would have plenty of space to write their name, and we could capture the data with more confidence (in this case 24 points higher):
Although I’ve used “Name” as an example above, you should try to apply this principle to address fields as well, by splitting up each part of the address in your form. Again, FormXtra can capture address blocks, but in general, you can expect to have better results when the data is split out.