“Invoice recognition” makes most people think of accounts payable processes that are improved through the automation of invoice data entry. This is most commonly referred to as accounts payable automation. This type of technology uses OCR in order to locate and extract specific information on invoices such as the date, invoice number, tax, total and even purchased item data.
These automation systems typically have no way of knowing about what data to expect, only that it must find the data. For instance, the system doesn’t know the specific invoice number that will be on an invoice. It is the same for the date and any other data. So to provide a good level of automation, a significant amount of rules have to be created to aid the system to locate this data without having a lot of knowledge on what will be found on any given invoice.
Establishing the Rules for Automation
These rules tell the system the likely location of data (e.g., “the invoice number is typically in the top-right of the page”) or the likely format of data (e.g., “the invoice number is often preceded by the label INVOICE #“), etc. and are mandatory. This process is a “pure” data location and extraction task. Results can vary widely based on the amount of variance involved. Performance of a system on 100 different vendor invoices will be significantly different than the performance on 10,000 different vendor invoices.
More recently, the technology for AP automation has been adopted by another financial practice: the tax auditor. Services organizations—such as the “big four” accounting firms—offer what is called “indirect tax recovery” services that aim to recover as much tax reimbursement as possible by scouring all invoices of a particular client looking for qualified purchases. You can imagine that this can be a time-consuming, complex task. Not only must a person spend time reviewing invoices, but the knowledge required on various tax codes means that the person is often a more expensive tax professional. Automation of the process of locating specific data on invoices seems like a good idea. However, there is a twist that significantly departs from typical AP automation technology.
Invoice Data Entry Automation for Auditing
In tax auditing, rather than pure invoice data entry automation, the focus of the staff involved is to scour invoices looking for specific qualified purchases, not just data entry. This means that, rather than a “pure data location and extraction process,” the solution is actually a data verification process. With data verification, the task is to locate within a document, specific pre-defined information. So rather than try to extract all purchased items where the data is not known, the task is to scan a document and detect the presence of specific items. For example, if a client selling computer hardware has a purchased item that is intended for resale such as a hard disk, it is tax exempt. A verification process can have a pre-defined list of such exempt items that it can use to detect presence on any invoice. The invoice either has the items present or it does not.
Unfortunately, when verification is the primary need, many still attempt to apply data location and extraction as a first step and only then apply a verification stage. This approach is not necessary and actually leads to subpar performance. The reason is that data extraction includes all of the difficulty and resulting error associated with locating and extracting unknown values. This error can be as much as 50 percent leaving any subsequent verification severely hampered.
Data Verification Automation
If an organization approaches a project as a pure data verification process, performance is much better because the tasks of data location and extraction are combined with the process of verifying presence. While more complex in terms of the workflow due to the need to include specific lists during the OCR process, a “verification first” process can yield much higher levels of automation in terms of the accuracy of locating the presence of data. We are not just locating all data for extraction, but rather identifying the presence of specific data. The process can also be much-faster since only specific data is required – no additional time is spent applying OCR on the entire invoice. If the specific data is not present, the process moves on to the next invoice.
Data verification for tax audit automation provides superior throughput and performance leading to lower-cost, more-efficient, and more comprehensive services that can be offered not only to a firm’s largest clients, but even small ones.
If you would like to find out more about Parascript SDK for Invoices, check out Parascript Invoice Data Extraction. Or watch a brief video below.