This is part one in a multi-part series on the advanced capture stack. In this article, we discuss what cognitive capture is juxtaposed to traditional document capture solutions.
What is cognitive capture? These days, it’s harder to find a technology solution, whether hardware or software, that doesn’t wrangle the words artificial intelligence, machine learning or cognitive into its description. It’s easy to understand why progress in digital assistants and other automated capabilities has led to an intense interest by organizations of all sizes to select solutions that have the ability to learn.
Unfortunately, as with any trend in technology, while organizations rush to avail themselves of these new capabilities, there is the tendency by the solution providers to confuse the market with too many buzzwords and claims while overselling capabilities. All of this leads to the dreaded Gartner trough of disillusionment.
Advanced document capture is no different. Visit any vendor website (including Parascript’s site), and you come across words related to artificial intelligence. So, how is one to truly understand what is meant by applying AI to advanced capture and what does it really mean in terms of benefits? To provide an answer, it helps to understand the history of advanced capture.
Advanced Capture: It all started with taking pictures
Several decades ago, the document scanner came into being with the benefit largely focused on the ability to make documents portable and easier to store. The benefits were greatly enhanced with the increased use of email and then with the public Internet and Web. Organizations could easily scan, store and share document-based information. However, this was hardly advanced capture as we know it today.
In the mid-1990s, with all of these documents digitized, businesses were eager to automate the process of describing them to improve access. Up until then, most organizations manually created the equivalent of the library card catalog, providing index data (known as metadata) to each document to support the ability to better organize and retrieve the data.
Advent of Forms Processing
Enter forms processing where software introduced the ability to designate the location of data on a document by supplying X/Y coordinates and applying OCR to these locations. The result was the ability to efficiently add metadata automatically to larger volumes of documents, relying upon staff to only deal with the review of the metadata and occasional corrections. However, there are only so many instances of standardized forms. Organizations increasingly needed to more efficiently manage other non-standardized, more complex documents.
Arrival of Advanced Capture
Enter advanced capture. Advanced capture is designed to tackle documents known as semi-structured and unstructured. Examples include invoices, bills of lading and other document types where the data is similar from document-to-document, but the format and location of data are highly variable. Advanced capture introduced techniques such as using keywords and pattern matching algorithms like regular expressions to both classify documents and locate information held within them.
These techniques, which inform the software how to operate on specific data, were part of a realm of AI known as expert systems. Expert systems require a subject matter expert to provide the system with specific instructions which are stored in a knowledge base. As documents are presented to the system, the software uses the knowledge base to determine the correct course of action. When results are reviewed and either verified or corrected, the knowledge base is updated.
Expert Systems and Their Challenges
While advanced capture based upon expert systems provides a leap forward in terms of the ability to work with more complex types of data, the problem with expert systems of any type is that the techniques on which they have been built has created a significant increase in complexity. Gone are the simple templates that mapped the precise location of data in favor of algorithms encoded by one or more people to manage classification and data location. All of these rules require a means of storage, which meant use of a database. Over years of use, the database can become very large by amassing more and more rules.
Not only are the systems more complex, but so, too, are the documents on which they operate. So the effort required to build rules also is significantly more complex and error-prone. In order to have a system that can reliably classify documents and locate data requires analysis of a large set of representative data and a lot of time to encode each rule. Once encoded, the system can be fairly brittle. New documents or new variants of known documents means system updates. With these systems, it is not unrealistic to spend two to three times as much on configuration as the cost of the software itself. Once in production, these systems often degrade over time as new document variants are encountered and become more expensive to manage.
Advanced Capture Gains Cognition
In practically any technology solution, the word cognition equates to the area of AI known as machine learning. From here on out, I will refer to cognitive as the machine learning branch of AI. With machine learning, the aim is to let the computer take over the process of the development of rules approaching them as inferences based upon reasoning and extrapolation instead of hard-coded sets of if-then statements. The benefit is obvious: no more tedious rule-making.
Another benefit is that machine learning can easily parse significant amounts of data to develop the inferences leading to more comprehensive and reliable rules. The rules are often more abstract and flexible, more closely emulating the process in which humans solve problems. For instance, using expert systems if I encode a rule to identify a purchase order by the presence of the words “purchase order” in the upper right-hand portion of the document, then purchase orders that do not have those precise words in that precise location will be left out.
Manage a much larger variety of document-based information, increase the likelihood of a new variant of a form or document being correctly identified, and allow the system to adapt and improve.
With machine learning crunching on a large sample set, it develops a more abstract view of purchase orders that can contain many different hints or clues about how to discern a purchase order from a remittance and vice versa. Just as importantly, the same machine learning process used to configure the system can be run again and again, allowing the system to adapt and improve.
All of this results in the ability to manage a much larger variety of document-based information, increases the likelihood of a new variant of a purchase order being correctly identified, and allows the system to adapt and improve. Unlike expert systems-based approaches that increase technology burden over time, and therefore, become more costly and less valuable, machine learning-based systems, with their ability to adapt and improve grow more valuable over time.
Portability of Inferences
Another key difference with machine learning-based advanced capture is that the inferences are often very portable, both physically and logically. Results of machine learning inferences, often called models, are stored abstractly in a different form than a traditional database. This means that the resulting model can be exported from one system and imported into another fairly easily (provided both systems are the same type).
Additionally, the trained model can operate on a similar corpus of document-based data on which it was trained. This means that a project using a system trained on one set of documents within a department or organization can be used to process a similar set of documents, but within another department or organization. It can truly be a case of a rising tide lifting all of the boats.
True Machine Learning
While machine learning is a part of AI, not all AI systems involve true machine learning. Many are based upon the reliable, but the complex and brittle expert systems form of AI. While there is a place for expert systems, as data becomes more complex and requirements increase in terms of precision of location and accuracy of resulting output, use of machine learning-based systems will be required in order to truly achieve both high levels of accuracy and automation.
# # #
If you found this article interesting, here’s a brief video on invoice processing automation today:
Perhaps, you will find this eBook useful. The Cognitive Document Information | RPA eBook examines cognitive document information and the role it plays in Robotic Process Automation (RPA).