Clues to Applied Machine Learning
If there’s a knowledge base, chances are it is not real machine learning. Many different technologies and techniques fall under the umbrella of artificial intelligence (AI) that can be employed to reduce the need for manual work. Some techniques have been around for decades, while others are newer. Regardless, all of them have attributes that make them ideal for certain applications, but less so for others.
Knowledge bases are simply the storage of a number of rules based upon facts.
Expert Systems
Take for instance the category of AI called expert systems. These rules-based systems automate manual tasks and are based upon conclusions founded on facts. A person must evaluate the task and the data associated with it in order to build a system that can achieve the objective of the task prior to automation. The larger the task or the larger the amount of data on which to operate, the larger the human effort required.
Document Automation World
Within the document automation world, examples of expert systems include templates that tell the software where the data is located on standardized forms and use keywords or regular expressions to help find the data on more variable documents such as invoices. All of these rules are based upon human reasoning. The typical way these rules are stored is within what is called a knowledge base. Knowledge bases are simply the storage of a number of rules based upon facts.
Where Machine Learning Comes In
Machine learning excels at problems where few facts are known and where the task is to discover relevant attributes that are then stored as generalized representations or models. While machine learning can use knowledge bases, it generally does so as a way to store and access a data set that with which to process. Machine learning can also be used to create knowledge bases. You can combine machine learning with knowledge bases. For example, Amazon Alexa uses machine learning for generalized capabilities and a knowledge base to answer specific questions that involve domain expertise.
Knowledge Base + Expert System
When it comes to document automation, however, knowledge bases have been employed mostly in combination with expert systems. The rationale for doing so is simple: if it is complex and burdensome to create rules or facts from a given sample set of data, then collect this data during a process and store it for subsequent use. The product of this combination was the introduction of systems that can “learn” document types and data by having us humans direct the system.
A document cannot be processed automatically? A person tells the system that it is an invoice. A data element fails to be located and extracted? A person informs the system where it is. All of this information is stored in a knowledge base, typically a relational database. This approach can have benefits for applications that have a small number of document types or a small amount of variation within a particular document type. This is because the system typically records and stores specific rules and/or templates that do not scale well to large data sets. This is especially true when the knowledge base cannot be highly optimized through the use of indexes, which enable quick look-ups. These document automation scenarios typically do not allow for a high degree of optimization.
Machine Learning-based Systems
Machine learning-based systems, on the other hand, typically do not store explicit rules except where they are warranted. Instead, machine learning works with a limited set of information and can identify the procedures necessary to complete a task. It can discover information that might be completely missed in a manual rules-based approach. The result is that the system uses representations rather than rules. This translates into much faster performance when dealing with larger numbers of document types or high document variance.
Machine learning-based systems use representations rather than rules.
An analogy would be an accounts payable (AP) clerk who is provided a work manual that describes the rules and tasks which must be completed each day. If the AP clerk uses a knowledge base, the result is that each task starts by looking up the associated rules. Every time. If there are a lot of rules, the speed with which the clerk works degrades and becomes significantly slower. If the subject of a task changes, the work stops altogether, and the rule must be updated.
Models Based on Experience
A machine learning-based system performs similarly to a human who reads the manual and gradually learns the routines without having to remember or reference each rule. Both the person and the system gain experience through which they create a model. The result is a higher level of performance. Using the model, the human and system can also adapt and generalize. This means the workflow is more-flexible than a rules-based approach.
Ultimately, we should leave knowledge bases to storing facts since they don’t work well for highly complex document automation.
Fast, Flexible and Comprehensive
Faster, more flexible and more comprehensive: that is what machine learning systems give to document automation, and it is only practical way to go in order to truly achieve a high level of accuracy and straight-through processing.
###
To find out more about self-learning capture with zero configuration, check out this FormXtra.AI demo here: Self-learning Document Capture Demo
For an in-depth look at applied machine learning, check out this eBook: