Theres a lot of confusion around how to judge the accuracy of intelligent character recognition (ICR). Creating a metric is critical, as it helps us define the business case. To that end, Parascript engines produce an internal metric, called confidence value. And while this metric is critical to defining the business case, it is really a developers metric, and often misleading particularly to the end user. The number you need to know is the operating point. [edit: formerly referenced in this post as “working point”]
In this post well help you understand what it is and why it is so critical.
Confidence value a critical yet irrelevant number for your business case
In order to perform recognition or verification, multiple software engines evaluate an image and provide a response based on their individual areas of expertise. These responses all have different levels of scoring, and these scores all interweave into a very complex algorithm and output that is called, somewhat confusingly, confidence. As humans we immediately associate a confidence score of 90 with a percentage of rightness. The software doesnt see it that way. It is simply a reflection of a grade impacted by many different variables which can include light, scanning equipment, pen stroke, paper type, etc. These are all specific to the application at hand.
To the software then, a confidence score of 90 is of little real value until you compare it to a large dataset (thousands). This comparison is done to create the operating point.
Operating point the only number that really matters
In recognition the operating point is the critical number. It builds the business case, establishes the ROI, and sets the benchmark to measure against. It is composed of two numbers: read rate and error rate. In the example below, we show an operating point of 85% read rate @ 1% error rate. This means that out of 100 documents, software will successfully read 85 and is likely to produce an error on 1. The other 15 documents would be considered unreadable and passed to a human for review. For comparison, a human typically produces errors at a 3% rate.
Additionally, we cant assume that humans will error on the same 1% that software does, so you can see through this example the way to produce the most accurate and most efficient results is to blend your usage of software and trained personnel.
The operating point is the bottom-line number you must stay focused on.
From confidence values to operating point closing the loop
If you really want to know how to find an operating point, you need to make use of confidence values. And that means samplesthe more the merrier. The samples need to include truth data, which is a known set of right answers input by humans. These should be double-verified for accuracy. Then, to select the operating point, a data specialist will (among other things) run the images for recognition, sort the confidence values from largest to smallest and compare results against the truth data. A this point, a clear line will be defined in the data where confidence values take their meaning, and higher confidence values start having relative value against lower confidence values; as seen by the ratio of correct answers to incorrect answers. The sweet spota balance between read rate and error ratebecomes your operating point, and can be adjusted to your organizations tolerances.
Conclusion
Achieving good recognition results is a combination of many different factors that need to be assessed over a large statistical dataset. This dataset is used to produce the operating point, which is the bottom line for developing a successful business case and implementation of recognition or signature verification.
To learn more about how recognition technology works, download our white paper: Automated Handwriting Recognition Not All ICR Software is Created Equal.