There are 3 possible outcomes when recognition engines attempt to read any data: the correct answer, error, or reject. This post will focus on understanding errors and rejects and how to find the right balance between them.
Errors refer to the instances when a recognition engine gives an incorrect result. The problem with errors is that there is no way to find out that it is in fact an error unless you have a second opinion or external, verified information from sources other than the recognized image.
If you have a big enough deck of test images with known actual values, called truth data, you can obtain statistical measurements of how often these errors occur. The statistical estimate for the frequency of errors is called error rate. There should be at least several dozens of errors to make measurements of the error rate statistically significant. That means that if you need to measure error rate at around 1%, the total amount of images you need is at least 5000.
Errors are not exclusive to automatic recognition. Humans make errors as well. But human errors are different than automatic recognition errors. Many human errors are related not to recognition but to typing. Combining automatic recognition with data validation, or keying, reduces error rates significantly. Advanced recognition technology uses voting algorithms which significantly reduces error rates as the final result is a combination of several engines. Using voting has proven to achieve much lower error rates than those provided by double keying (data verification done by 2 separate individuals). But even these algorithms do not eliminate errors completely and there will be some errors which will need to be handled by the solution.
Rejects refer to the situation when the engine doesnt provide an answer. It may be caused by the inability to process some specific input or by the need to reduce error rate to the level the application can tolerate. Rejected items are typically processed manually or require recapturing of input data.
Heres where it gets interesting. Theres a tradeoff between errors and rejects. Depending on the application needs, we can either reduce rejects by increasing error rates or reduce error rates by increasing reject rates.
Reducing rejects by increasing error rates allows companies to process more information automatically without the need for human intervention, which results in lower processing costs. This is appropriate for applications where having data with some errors is not detrimental, such as processing magazine subscription forms.
On the other hand, reducing error rates by increasing reject rates results in more data to be validated, which is more suitable for applications where every error counts or there is very little room for errors, such as in check processing.
The operating point determines the balance between errors and rejects. Read more about it here.