This is part two in a multi-part series on the advanced capture stack. In this article, we examine the role of OCR in cognitive capture. In part one, we discussed what cognitive capture is juxtaposed to traditional document capture solutions.
What role does OCR play within cognitive capture? Is OCR the hard part? Is text parsing the same as cognitive capture? To answer these questions, let’s start with a popular meme.
Aoccdrnig to a rscheearch at Cmabrigde Uinervtisy, it deosn’t mttaer in waht oredr the ltteers in a wrod are, the olny iprmoetnt tihng is taht the frist and lsat ltteer be at the rghit pclae. The rset can be a toatl mses and you can sitll raed it wouthit porbelm. Tihs is bcuseae the huamn mnid deos not raed ervey lteter by istlef, but the wrod as a wlohe.
This meme while it is not entirely accurate (there is no record of Cambridge University staff doing this research, and ample examples of preserving the first and last letters resulting in words that are unreadable exist), the above meme does provide a good illustration of the difference between OCR and cognitive capabilities. Simply stated, while OCR certainly has machine learning at its core, it’s job is to simply transcribe the text in an image into machine-readable formats. If you were to run OCR on an image of the above, you get the following:
By design, OCR doesn’t attempt to make corrections because making corrections implies knowledge of the context of the information. If the information above isn’t embedded within an image, you don’t even need OCR to provide a transcription.
Extracting Good Data from Bad
So how do we extract good data from the above meme? First, there isn’t a cognitive solution that can take 100% misspellings and make instantaneous corrections like the human brain can. However, in the domain of advanced capture, most of the effort is placed on what to do WITH the information contained within the document. You might say, “hey, we can run a spell-checker to correct the words.” Nice idea, and this is typically Step 1. And yet, this won’t solve all problems as many words just aren’t contained within standard vocabularies.
Cognitive Capture: Interpretation Methods
Interpretation methods using fancy names like n-gram can determine probabilistically the next word or words in a sequence. These techniques are especially useful with complex multi-word data. Using this and other techniques, cognitive capture deals with the presence of various specialized words and phrases often contained within a given structured form field or unstructured document where use of specialized vocabularies or general dictionaries fall short.
Further distancing cognitive capture from OCR software is that cognitive capture also attempts to reduce or obviate the use of OCR, using it only when necessary. Just like a human would, these systems learn where needed information is located and what clues help to find it. Instead of reading an entire document, cognitive capture focuses directly on the information.
Focusing on What Matters Most
So, instead of performing OCR or parsing the entire document text, the system skips over irrelevant information and focuses on specific sections. This helps to avoid unnecessary OCR or full-text parsing that can slow-down the entire process. If documents are born-digital, then OCR can be skipped altogether to immediately interpret the document and extract the needed data.
By now, it should be clear that while OCR is an important step within cognitive capture for scanned documents, it delivers only text, not interpretation. If the documents are digital, OCR is not required at all. To move to a capability where document-based information can be used within an automated transaction, several levels of capabilities are required in order to go from text to real structured data.
If you found this article interesting, you may find this eBook useful, Your Document Processing: Is There a Better Way? or the recent study by Everest Group where Everest Group recognizes Parascript as a major contender in intelligent document processing.