This is part 1 of 2 articles focused on templates. The next part 2 article dives deeper into that most derided approach: the template.
No templates required. Template-less extraction. Everywhere you look there is the claim from vendors that “unlike other software, this doesn’t require templates.” Prospective customers are buying into the argument.
Okay, I’m here to tell you that any intelligent document automation that DOESN’T offer a way to locate and extract data from documents without templates is living in the age of the 386 Pentium computer. For highly variable documents like the classic use case of an invoice, using templates is not always a good solution. Note though that I said, “not always” instead of never. The reality with IDP software tools is the same with any tool – you fit the right one to the task at hand.
Fitting IDP tools for the Right Job
The common wisdom of fitting IDP tools for the right job is that templates work for structured forms and other non-template approaches work for non-form documents, which are often referred to as semi-structured documents. Examples include invoices, bills of lading, remittances, etc. The rationale behind this thinking is that the creation of templates for standardized documents is fairly straightforward and easy to do so templates make sense. But for documents like invoices where data can be located in various positions on a page, creating a template for each variant would be too time consuming. On the surface it makes sense. But this thinking is an oversimplification about which vendors don’t really like discussing or don’t have an appreciation. But I’ll clue you in and give you an advantage. Sometimes templates work just fine for semi-structured documents, and sometimes you need template-less approaches for structured forms.
Dealing with Structured Forms
Take, for instance, a structured form like an insurance prior authorization form. When you see a form, you typically think, “great, I can use a template for data extraction.” This may be true for instances where an organization controls the form so that there are little to no variations. But for service providers that must process many different forms for various insurers, then the task of setting-up and managing potentially hundreds of templates quickly becomes challenging. And then, there is handling different levels of image quality.
Increasingly, organizations are enabling submission of documents using devices that do not produce reliable image quality. Take, for instance, mobile phones. Even with a lot of attention on a mobile app that aids the user to capture a good quality picture, it is impossible to produce images that are completely standard to the degree that the form fields will always be in the same location; shifts up/down/left/right are always going to occur.
While you might be able to draw zones on templates that are larger to provide more tolerance for these shifts, doing so creates other problems with potentially capturing other data. The result is a mess. Organizations quickly find that using templates for otherwise simple forms is exhausting, and the templates don’t meet performance expectations.
Processing Semi-structured Documents
Now let’s turn our attention to semi-structured documents; we’ll forget about unstructured documents like promissory notes, insurance policies and powers of attorney since you’d need entirely different methods. If you have the need to parse health insurance remittances you might think “these are not forms so I need something better than templates.” This is true if you need to deal with different insurers, and the remittances are highly varied. But if you only deal with a few insurance companies, there is less variance. Therefore, constructing templates could be the more straightforward and reliable approach. The same goes for invoices: fewer vendors means fewer variants.
If templates don’t always work for forms and template-less approaches aren’t always a sure-fire approach for semi-structured documents, what is the best way to ensure success for a project?
By now you might be thinking “if templates don’t always work for forms and template-less approaches aren’t always a sure-fire approach for semi-structured documents, what is the best way to ensure success for a project?” Again, it is a matter of using the appropriate tool for the job.
Enter Machine Learning
Increasingly, with advances in computing power coupled with advance machine learning algorithms, it is not necessary to choose one method or another. Rather, you can let the software analyze the data and choose the best option. What is even more interesting is that the software may elect to use multiple approaches and decide what works best – something few organizations are willing to take the time and effort to do. Software can even decide what to do on-the-fly, using a freehand approach for data on one part of a document while using a more-specific template-like approach for other data, all in the same document or even on the same page!
The Real Answer
So the real answer for the question of “which is better, the template or the template-less approach?” is one, or the other, or both, or much, much more. That is what we at Parascript are doing with Smart Learning; letting the software do all the heavy lifting while you reap the results of optimal data.