Recognizing handwriting is not an easy task, either by a person or by a computer. Think about it, how often have you looked at your own handwriting and can’t even read it?
The development that is required for handwriting recognition software is monumental. Parascript has been working on this technology for over 20 years. We have reached read rates of over 90% for specific applications, but there’s still a lot of work to be done.
Tuning vs Self-Learning
Let’s explore first how recognition software is tuned. Handwriting recognition software does a lot of “thinking” before providing an answer. It utilizes several engines with different approaches to provide this answer. This level of sophistication is achieved by analyzing specific features of handwriting.
Fig. Main handwriting features analyzed by Parascript
Developers first determine what features or characteristics are needed to be recognized, then they provide them to the tuning mechanism. They analyze the features, understand what’s important, determine how to extract them and in what form. Only after this work is done they provide this information to the training tool. Tuning is part of the development process.
There are many different classifiers that need to be trained depending on the nature of the features. One classifier, for example, can be used for a particular scenario while another classifier for a different one (or both can be used together), depending on the task. Each classifier would have its own training tool utilized by the developer.
As you can see, tuning handwriting recognition software requires a lot of upfront effort. While some capture software provides so-called “self-learning” capabilities, in most cases this capability is more of a guided-learning. Furthermore, it is not about improving the recognition itself but about improving the location of the fields to be recognized. Operators add new document samples during production so the software will recognize the specific template in the future, and will be able to extract the fields more precisely. The software is not learning per se, but adding templates to a database instead.
Software Read Rate Improvements
Read rate improvements are achieved not only by improving the software’s ability to recognize a particular handwriting characteristic, but also through better location of the field (more specific templates or dynamic field location), and by narrowing the recognition context such as providing vocabularies and rules.
The technology is constantly evolving and Parascript is always looking for new algorithms to improve our recognition. Stay tuned.
Learn more: