Addresses are everywhere and they are important for more than just getting a letter or a package from point a to point b. Addresses are a great way to identify and segment data in relational databases or data stored within documents. Whether segmenting at the zip code level, state, city, street or even by business/recipient, the data available in addresses is very valuable and very useful. But when trying to gather this information for document processing, it can be a big challenge.
Addresses on documents can be located in different places depending on the document, can have different formats, and a lot of the time, aren’t even labeled as an address.
Take for instance, an invoice like the one below. In it we see three addresses. One for the billing vendor, one for the billing recipient, and one for the shipping location. Two of these addresses, the billing and shipping addresses, are not very standard with the address on two unlabeled lines. And they are handwritten. Ouch! Worse, the vendor address doesn’t have any label at all.
It is not an uncommon practice for an accounts payable staff to use the vendor address or name as a look-up to find the account in order to review and verify the actual items purchased from the vendor. For invoice automation systems, a key need is to be able to both locate the vendor address and then to successfully extract that data in order to perform a database look-up; without the need for human assistance. But this seemingly simple problem is actually very difficult and problematic in even the most sophisticated solutions.
Why? Because these systems rely on keywords or other assistance to first find the address, and then must do some very complex look-ups to validate each field of the address. If the address doesn’t have a keyword or some other identifier, then often it won’t even be located. If the address is located, then there are still a lot of “data maneuvers” still to be done and there are many points of failure. And if the addresses are handwritten, then it gets really hard.
A last resort is to send each document where an address wasn’t located or extracted correctly to a person assigned to review these documents. This person then updates a set of rules that can then be applied as a very strict template the next time this specific document format is encountered. If the document should change ever so slightly or have scaling issues (as in faxed invoices), the results will not be good.
“So everyone should just give up” you say? No, actually there is very good news: Technology is here that can dynamically locate addresses based upon the look and the data within, and that can very quickly and logically validate this data prior to performing any 3rd party data lookup. Machine print and handwriting, too!
Check this out:
With Parascript dynamic address location and recognition, you can take a very complex task and solve it in a very simple and reliable way; in two steps actually.
You just need to add what we call a “dynamic address” field anywhere on the sample document like below. In this case, we are going to look on the entire page, but if you know that the address will always be on the top-half, you can constrain the analysis to only that location.
Now, because there could be more than one address and we’re analyzing the entire page, we just add a small line of code in the business rules section to output every address that FormXtra finds. What’s really neat is that FormXtra’s scripting can access the FormXtra .NET api – in this way, you can actually create a full self-contained application all within the form definition – no need for “wrapper .exe’s” or other external programming.
And that’s it. From there, the data can be output to a search query of a vendor database to automate order look-ups and to perform additional data validation against the actual purchase order. Cool huh?
Parascript technology can read both machine printed and handwritten addresses too. So everything is covered, all in one easy-to-use solution.
Ready to learn more? Download our E-book: Understanding Recognition Technology below or request a demo.