Within IDP vendor marketing, if the use of “machine learning” or “artificial intelligence” take the top two spots, coming in a quick 3rd has to be the term “template-less”. The main gist that this use is meant to convey is that organizations shouldn’t have to spend time laying out the various data fields they need from a document. You see, using templates is just plain evil. Why should users have to locate the specific coordinates of data within a document if machine learning can be used?
But are templates really evil and what is the difference with machine learning? Let’s dig into how we got here.
A long time ago, companies were scanning documents to save storage space and, after some time, decided that they also needed a better way to find their newly digitized information. Early document management systems solved the problem, allowing people to create indexes from the scanned documents that enabled a way to perform a structured search. But then the search for improvement started. Rather than manually index documents, realizing that many documents were forms, they figured out that OCR could be applied to automatically grab key index information. The template was born!
Templates work great for standardized documents where the data you need is always located in the same place. Using a visual tool, a person can define a template for a document by simply drawing boundaries around needed index data on a sample scanned document. The boundaries are converted into image X-Y coordinates that tell the software where to apply OCR. The problem with templates is that if you have more complex documents where the index data is not always in the same location, you need to create a template for each variant. Adding insult is that many scanned documents introduce quality problems such as shifting an image up/down/left/right or resizing the image; thus rendering any predefined templates useless, even for standardized forms, with OCR applied in the wrong locations. The result is garbage.
Then some smart folks figured out that many documents provide clues as to the location of data using labels such as “Invoice Number” and “Total Amount”. Using these labels as keywords, the software can be programmed to locate these keywords and then search around for the actual value. The template-less solution was born and this capability worked reasonably well for over a decade. The problem was that the solution replaced the arduous work of creating templates for each variant with a set of complex rules that are hard to implement and harder to manage. And they aren’t as precise as a template so performance suffered, but it was generally viewed by many as better than nothing and superior to managing potentially hundreds of templates.
So why all the fuss about being “template-less” if this concept has been around for over 15 years? After all, well-designed templates deliver a much higher degree of precision because you essentially tell the software exactly where to look. Comparing the precision of a template to a rules-based template-less approach is like comparing using a detailed map to go from point A to B vs. using a loose set of rules. I’m much likely to have a greater level of success if I use a map of exactly where to go vs. using signage and other visual clues to get where you’re going. It’s akin to using instructions such as “from Southbound Federal Blvd, turn right on 54rd Street” vs. “going southbound, turn right at the Burger King and then turn left at the large pine tree”. Which one would you use?
So templates equal precision without a lot of tolerance and template-less rules equal a lot of tolerance without as much precision. And yet “template-less” is held out as a panacea. Why?
Stay tuned for next week where we will uncover the reality of “template-less”. In the meantime, check our IDP Buyer’s Guide out!