Accounts Payable (AP) automation offers far more than back office efficiency and spend management through invoice, receipt, remittance and check processing. Reliable and actionable extracted data also provides the essential bedrock for corporate reporting and forecasting. It is estimated that over 400 billion invoices are processed globally each year (Billentis 2017 Market Report) and this volume is only growing. Unfortunately, all automation is not created equal. It’s critical to find the solution that best meets your enterprise needs.
First of all, AP processing remains heavily paper-based even today, according to the Aberdeen Group. Even for the increasingly number of invoices that are generated electronically, the majority are received as PDF or HTML documents without adequate tagging to enable full automation. So the AP staff continues to spend time to enter data into multiple systems, a manual process naturally prone to error. Invoices can easily be misplaced, and identifying where a specific document is in a highly manual workflow process can be challenging. Secondly, even with automation the vast majority of AP automated solutions require a significant amount of effort to devise and revise all of the rules for accurate data extraction, implement and maintain them. For your value chain to end in meaningful business analytics with actionable insights, it must begin with high quality data.
The goals of almost any AP automation are to reduce paper handling and workload, simplify processes down to one system for all invoice and other types of documents whether paper or digital, and gain visibility into where each invoice is in the system.
Traditional Automation
Traditional automation starts with manual evaluation. A properly-managed project starts with an inventory of all types of invoices and other documents that Accounts Payable team is processing. Next, staff collects examples of all document types from different sources. They organize 100s of these documents by type and review them to identify unique characteristics to automatically identify them during a typical AP workflow.
Once the staff identifies all the documents’ characteristics, an analyst encodes them as rules within a document capture system. Once rules are encoded, they must be tested in order to uncover any misclassifications that require adding new rules or fine-tuning existing ones. After testing and refining are completed, the rules go into the AP workflow.
Next, it’s necessary to create rules that locate needed data within the invoices before the data can be extracted and validated. A similar process of analysis, testing and tuning takes place to ensure the maximum amount of data is located and extracted, but also to understand the accuracy that governs when manual review is needed. Because many of the documents are not standardized, a wide range of rules must be created.
This is a tremendous amount of work to get to production. If any document changes, either from a document type or data layout perspective, the rules have to be re-evaluated and re-tuned. So the work is never finished. It is no wonder that many lenders have not invested in traditional automation.
Machine Learning: Invoices, Receipts & Checks
Now let’s look at the same set of requirements using machine learning technologies. For the initial document discovery perspective, a technique called “clustering” can be used to automate the logical grouping of like documents: different categories of like invoices, checks, receipts and remittances. Documents can be organized automatically. Invoices from one vendor can be grouped together; receipts can be grouped with travel documents and so on. The result is a set of documents grouped by likeness that can then be further evaluated.
Next each grouping, if part of a required document can be given a document type (or class) then the samples can be imported into machine learning designed to automatically identify key characteristics of each document type (often technically called “feature extraction”). The result is an automated set of rules for each document type. When performance is not good for a specific class, the staff can add those misclassified or unclassified documents to the class sample set to “re-train” the software.
Data extraction is simplified by taking sample invoices that have been processed along with the data required for each document. Together these automatically train the software to locate the matching data and derive positional rules for each data field. The software uses the processed data for each page and locates the corresponding data on every document. It will do this for each sample and then automatically create algorithms based upon exact location, changes in placement across each example and relative position to other data, among other elements. The staff simply examines the results.
The machine learning technology used to configure the system also makes adjustments. Complicated projects that typically would take weeks, if not months, are significantly reduced, saving both time and money. Machine learning technology streamlines the manual processes used in production and helps eliminate the labor.
All of this effort can be applied to automate both paper-based and electronic document-based processes in a single workflow.
Parascript Machine Learning
Parascript offered the first commercially available data location, extraction and verification software solution that deploys template-less, neural network-based document extraction. Parascript has productized its machine learning platform to support custom-developed recognition projects with much quicker turnaround than traditional rules-based approaches. The result is significantly faster production with more reliable and refined results for Parascript clients. Parascript machine learning offers:
- Dynamic data location and extraction of information on complex documents
- Region of interest location for parcels and flats to support post automation
- Automated indicia location to support verification of official marks for information governance
- Image comparison to support check fraud applications
- Feature extraction and classification to support medical imaging diagnostics
Our neural network-based solutions leverage machine learning to create form definitions in the lab, which provides a more-flexible and efficient way to develop new document types, either for customer-specific needs or introduction of new document type modules. For instance, with invoice samples and ground truth data, the system is trained and returns a fully-configured form definition. We tailor our already-developed document types such as receipts or invoices to a customer’s specific production stream.
If you found this article interesting, you may enjoy this eBook about AIIM Leaders in Capture survey results: