Part 3 in Advanced Capture Stack series
Is Machine Learning all the same? Let’s delve into the most common machine learning techniques, explore how they are used and where. As covered in a previous blog, “cognitive” more than likely refers to a branch of artificial intelligence called machine learning. Machine learning has been around for decades in one form or another. There are many different machine learning techniques, each with its own strengths and weaknesses.
In Part 1 of our “Advanced Capture Stack” series, we covered the evolution of advanced capture and what “Cognitive Capture” means. In Part 2, the topic of OCR and where it fits within Advanced Capture was discussed. In this Part 3, we dig into the specifics of different machine learning technologies involved and how they are applied to advanced capture.
Techniques Outside the World of Machine Learning
First, It is easiest start with the types of techniques that do not belong in the world of machine learning.
Rules-based approaches developed by humans are not machine learning. When applied to advanced capture, rules-based approaches generally fall into two categories: explicit rules—such as supplying the actual location of expected data on a given page (often referred to as “templates”)—and more-lenient rules using regular expressions and other types of pattern matching techniques such as “find the value that has 9 numerals and label it as social security number.” There are many variations of these rules-based approaches. However, in every case, these rules require a person to construct them. There are novel ways to construct these rules that don’t incur as much up-front effort such as the use of a knowledge base where individual staff make corrections. These corrections turn into specific rules. In this case, there is no machine actually doing any “learning.”
Machine Learning Applied to Advanced Capture
With that out of the way, we can focus on machine learning techniques as applied to advanced capture. The two most common areas to apply machine learning are with document classification (or document ID) and data extraction tasks. For document classification, the objective is to supply the software with examples of each document type. The system goes about identifying key unique attributes for each document type so that it can reliably perform class assignments. For data extraction, the objective is to provide the software with tagged examples of document-based data that need to be found and presented. Again, the system analyzes the samples and derives its own methods of reliably parsing documents to find needed data.
Supervised and Unsupervised Learning
Within machine learning for advanced capture, there are two common categories: supervised learning and unsupervised learning. Supervised learning requires input sample data along with the “answer key” that describes the desired output. Together, these are often referred to as the training data. For document classification, it would be a set of documents along with the actual class to which each belongs. For data extraction, it might be the document along with the location and value of each data field that needs to be extracted. From here, the software develops its own models for how to optimize for the desired output.
The most common types of machine learning algorithms are classification and regression. Classification is commonly used in (wait for it!) document classification where the class assignment options are limited. Regression is used to handle scenarios where there could be many potential answers. There are many variations of classification and regression algorithms that can be used and/or combined to optimize results.
Unsupervised Learning
The other type that can be used in advanced capture is called unsupervised learning. In reality these algorithms do not learn and create logic. Instead, they are employed to find structure in data such as grouping documents by likeness. There is no need for training data because there is no function to be learned or preserved. These algorithms can be used in advanced capture to segment documents based upon likeness prior to using other machine learning algorithms in order to reduce the number of potential variables.
Reinforcement Learning
Reinforcement learning is growing in awareness in the industry, but there is no real practical application for advanced capture as of yet. These algorithms are more suitable to problems such as autonomous vehicles and game theory. The DeepMind Go program is a good example of reinforcement learning.
Artificial Neural Networks
Digging deeper, within the supervised learning branch, there are several underlying machine learning models, the first of which is perhaps the best known. It is called an artificial neural network (ANN), this type of model is loosely inspired by the human brain in which a network of neurons is involved. ANNs take input training data and process it; each node can communicate with other nodes to influence the final output. Over a large amount of data, some nodes become “stronger” while others get “weaker,” based on successful output. Over time (and a lot of data), the ANN can become better at providing output.
Machine Learning Models
A form of ANN that has garnered increased excitement is the Deep Learning Artificial Neural Network. These networks are roughly similar in design as traditional ANNs, but have the ability to process more data. Therefore, they are better at more complex problems. Arguably, deep learning networks can perform much better than other machine learning models. However, the weakness is that they require a significant amount of training data so are not always the best at particular tasks.
Another popular machine learning model is the support vector machine (SVM). SVMs are used mostly for classification and regression analysis where the goal is to assign an input to one or another group. In many respects, SVMs work best at document classification.
Bayesian networks is a third model that can be applied to advanced capture. This type of model is probabilistic. It can deduce from input data, the probability that a given set of features belongs to a particular document type or if the amount at the bottom of the page is the total amount.
By now you might be asking “where is NLP in this discussion?”. The answer is that NLP or Natural Language Processing is an area of AI devoted to building systems that can interpret language. This task may or may not implement machine learning, but increasingly, machine learning is involved because there is often too much data to process. Ultimately, NLP is an area of applied AI, not a specific technology or technique unto itself. As such, it is another approach that can be used to aid with classification or data extraction to automate document-oriented tasks.
Moving Forward
As the adoption of various models grow, advanced capture systems will become much more self-sufficient at configuring themselves using a variety of training data inputs and adapting to gradual changes to documents. Just as with any application of machine learning, the most important prerequisite is training data. Without it, there is no learning and ultimately no automation. As such, use of real machine learning in advanced capture is still in its infancy with much of it being used to tackle specific tasks such as document classification or handwriting recognition rather than it being a black box that does everything in automated fashion. But this is just the beginning.
###
This on-demand AIIM webinar with guest speaker Rich Medina, Co-Founder of Doculabs, focuses on how to properly scope a Document Automation Proof of Concept: what you need to in terms of preparation, how to identify the right participants, what to do with the results, and the next steps you should take.