Artificial Intelligence (AI) and Machine Learning
Parascript specializes in the research and development of AI and machine learning technology applications that focus on taking hard-to-understand or complex information sources and locating and extracting relevant information about them. For instance, Parascript provides high-performing computer assisted diagnostics (CAD) software for detecting breast cancers. While I can think of a lot of complex image-related problems, few match the complexity of identifying a malignancy within a radiological image. Part of the challenge is to reliably identify malignancies without also increasing the potential of erroneously reporting a malignancy where there is none, which is called a “false positive.”
Regions of Interest
From this perspective, the notion of locating data on an invoice, receipt or other highly variable format document might seem easy but it’s not, just different. One is an image devoid of any specific textual markers while the other is a text-based document. The similarities, however, lie in what are called “regions of interest.” In any complex data location and extraction problem, there is the signal, and there is the noise. The first task is to quickly and reliably hone-in on the most likely places for the signal to be. In the case of radiological images, it is an area that could represent the shape and shading of a tumor. With invoices, the region of interest can be the shape of a text block that could represent the address of the vendor. Both of these tasks can be accomplished without use of traditional recognition technologies and without reliance upon markers such as keywords as are traditionally used in invoice recognition.
Invoice Recognition Benefits from Medical CAD Software
How Does Invoice Recognition Benefit from Medical CAD Software? The underlying CAD technology that provides region of interest capability can also be applied to the problem of reliably locating and extracting key invoice data. More importantly, with this capability, use of keywords is unnecessary.
Imagine a document that has critical data, but no “convenient” labels or other keywords that allow location of this information. For instance, it is very common to use keywords such as “Vendor Address” and its variants to locate the actual address data. And yet, what if the invoice or receipt lacks this label? With region of interest capabilities, the right information can be narrowed-down using machine learning technology that has analyzed and derived the most probable locations for this information.
Once the likely region of the vendor address is located, additional techniques such as OCR can be used to gather all the text in that region and then additional AI capabilities can be applied to extract the most likely data for the address located within all of that text.
Moving Beyond Simplistic Solutions
Solutions that rely solely on keywords or other simple patterns have many dependencies to achieve quality results: 100 percent correct OCR, presence of matching terms, and other factors that it is difficult (if not impossible) to rely on.
Using region of interest along with other techniques provides a result that has a higher level of precision with lower false positives. The result for businesses is faster processes, high quality results, and more-actionable data.
Interested in finding out more about invoice processing? Watch this two-minute video: