“Can you look at this check and tell me why the software fails to recognize the amount field? I can’t see anything wrong with the image. Here is another check image that is much more difficult to read, and the software recognizes the amount field perfectly!”
We get questions like this all the time. From a logical perspective, if a document such as a check looks fine, and I can actually read the handwritten amount or other data, why can’t the technology do the same? This is made more confusing when the recognition engine does a great job on harder-to-read data. So what gives?
Built-in Complexity of Neural Networks
The answer we often provide is, “it’s complex.” This is not because we are trying to be difficult. It’s because, unlike rules-based systems that apply OCR on a field, Parascript technology is based upon more complex machine learning algorithms including, and most importantly, neural networks.
Let’s look at two examples, one for a rules-based approach and one that uses a neural network to get the job done.
Invoice Capture and Recognition Using Rules
Let’s say you have a document processing system for invoices. Invoices are a more-complex document since the data can be located in different places, depending upon the layout chosen by the vendor. But because you always want the same information, you can use a rules-based approach such as:
- Locate the words “invoice number”
- Look to the right approximately 200 pixels and find an alphanumeric value
- Extract the value up to 600 pixels
Most of the time this will work, but you encounter problems along the way. What if the vendor uses “invoice #” or locates the invoice number value below instead of to the right? When you get an error, you can easily see what might be going wrong and address it. The same thing could occur if you are only getting part of the invoice number. Maybe the value is further to the right than anticipated. So you can extend the range where the value is extracted. Easy. So with a rules-based approach, it is easy to understand the problem because you can interpret the rules against the errors and adjust them as necessary.
Invoice Data Extraction with Neural Networks
With neural networks, the interpretability is different. Unlike expert systems-based rules, neural networks have rule sets that are developed through a supervised learning process. You provide the neural network with descriptions of the “features” such as the invoice number that it should extract, and it examines a variety of examples to develop its own inferences on how to accomplish the objective.
Through a process of refining the input and observing the output, the neural network is trained to the optimal performance. Unlike traditional rules, neural networks develop this “knowledge” from finely curated statistically representative data sets. It is impossible to actually see the rule sets since they are represented as a bunch of weights within a neural network.
Neural Networks and Their Output
What you can do is observe the output. With deep learning neural networks, things get even more complex because training doesn’t involve providing characteristics of the features at all. You simply input the data set along with ground truth data, which is a sample set of documents or images and verified extracted data (truth data) results that allows you to objectively measure your capture engine to understand how well it performs. The neural net does the rest.
Choose Your Representation Wisely
Sometimes we can identify issues. For instance, if the sample has data that is not represented in the sample set used to train the neural network, then we can understand why the error occurred and address it by adding a representative set of these new examples and retrain the software. The samples of errors do not have any distinct differences from the training set, and then we can only assume that the errors are part of the measured error of the overall system.
Evaluating the Results
So when you are evaluating results from a neural network-based system, it is better to understand the overall error rate of a larger number of production document volumes, not just to focus on the errors themselves. For instance, if the system is measured to provide a 1 percent error rate and you find that a large representative volume of documents is experiencing 4 percent error rate, that is something to report. If you want to understand why a specific check or other specific document(s) cannot be correctly processed against a much larger volume, unlike a person, the machine will not provide you the reason.
If you found this interesting, check out our latest eBook on data quality: