No Easy Button, But We Are Getting Closer in Intelligent Document Processing

Intelligent Automation | Intelligent Capture Stack | Trends

September 25, 2020

No Easy Button, But We Are Getting Closer

by The Parascript Team,

“No Easy Button, But We Are Getting Closer” is Part 1 of 2 on machine learning and deep learning in Intelligent Document Processing (IDP).

In just about every corner of the technology and business media, you find breathless articles extolling the magic of machine learning, especially a type called deep learning. With new machines that can learn and make inferences from vast amounts of data, practically any process can be automated with high rates of accuracy.

For the most part, this is true. It is true that machines can enable processes to be automated, some almost completely. The results of automation can achieve levels of accuracy higher than what we humans, who get bored and tired, can achieve. It is also true that machines learn from vast amounts of data. But while expectations within organizations are focused on levels of automation and accuracy, a key ingredient is left out: vast amounts of data.

Depending on Heuristics

You see, unlike us humans, where we can take a relatively small set of data and some basic instruction in order to learn, machine learning, and especially deep learning, requires a large data set from which to analyze and develop inferences. Without delving into neuroscience concepts, humans are able to create “heuristics” that machines cannot. Heuristics are essentially shortcuts for cognitive tasks that enable us to be more efficient. A heuristic as like an “intuition” that you have about taking an action based on relatively little information. The development of heuristics doesn’t take much data.

You can hand a person five different invoices with instructions on what data you need, and they quickly develop heuristics on how to locate the same data on each and all the hundreds or thousands of invoices after that. The flip side is that heuristics aren’t always correct. They definitely are not comprehensive since they do not work in all situations.

Crunching Vast Amounts of Data

Conversely, machine learning cannot jump to conclusions on how to act based on intuition. If you give it the same five invoices, it cannot automatically develop a reliable inference on where your required data is located. Rather, the power of machines is that they can crunch significantly larger amount of data than humans, and they can detect even seemingly invisible attributes in data to come to a conclusion.

So what does this mean? Machine learning is not useful in all situations; it works better (and is better than humans) in situations where there is a high degree of variance on a large amount of data that needs to be processed to specific requirements. Think handwriting recognition. Think of weather forecasting. Think of understanding language.

Each of these represents a problem where there is a significant amount of variance in the data. For handwriting, there is a different “font” for every person. For weather forecasting, there is a seemingly infinite number of variables that impact outcomes. For language understanding, in addition to the different ways in which words are spoken, there are thousands of ways to string words together.

Machine learning can crunch the enormity of data involved with each and develop “models” for how to produce output whether it is a transcribed handwritten letter, a 10-day forecast for Colorado or responding to a verbal request.

How Much Data Is Enough

Back to the input data problem. Does every machine learning project require hundreds of thousands of samples to be a success? Machines do generally work better with more data and luckily there are shortcuts to that problem as well. So the answer to 100,000 question “no”.

Next week, in Part 2 of this article, we delve into the factors that govern the necessary number of samples as well as strategies and shortcuts to reduce the effort.

Social Share

Article Catagories

Industry Insights

Partnerships

Product Updates

Endurance Italia Partnership Announcement

by chofer | Mar 3, 2025 | Banking, Parascript, Service Providers

Features and Characteristics That Make Up Handwriting

by Gabriela.Davila@parascript.com | Jul 24, 2024 | Handwriting Recognition, IDP

Handwriting has been a powerful form of knowledge retention for more than five thousand years. It provides an efficient means to record information and connect us to the past and the future. Being a skill we learn at a young age and one that’s deeply unique to each...

Problematic Document Processing in RPA Solutions

by Gabriela.Davila@parascript.com | Jun 11, 2024 | Automation, IDP, Trends

Let’s look at recent trends in new products and how their offering might leave buyers disappointed after the novelty of AI wears off. AI sells! What does that mean for you? It’s official – AI is everywhere and the push for acceptance has begun. Google has AI-based...

Depending on Heuristics

Crunching Vast Amounts of Data

How Much Data Is Enough

Endurance Italia Partnership Announcement

Features and Characteristics That Make Up Handwriting

Problematic Document Processing in RPA Solutions

Solutions

Resources

Contact