As the address recognition providers to the USPS and other major postal operators around the world, we can safely say we know a thing or two about reading addresses.
If what comes to mind is business correspondence with nicely printed machine characters, you may say what’s the big deal? Any OCR software should be able to do that! But then think about the lovely handwritten letter you got from your mother? It might look something like this:
Then it gets a bit more complicated.
We can not only read handwritten addresses such as the one above on millions of mail pieces around the world, but our software can also locate and read multiple addresses on business documents, such as invoices, or verify addresses between documents, such as a driver’s license against an application form to increase accuracy rates.
It is a very sophisticated process that we have improved over the years to claim recognition rates over 90% for handwritten addresses.
How does it work?
The first step is finding the address. You might be surprised to learn that locating the address is one of the most challenging tasks. Take for example the front and back cover of a typical magazine: one large piece of paper featuring headlines, photos, advertisements, and logos in a variety of colors and sizes. Finding the small address block among these images must be approached in a very precise way. At the same time, the process must have the flexibility to accommodate different applications.
Mail pieces have other challenges such as noise coming from dirt and jams, skewed images, etc. In any case, the address block needs to be located and isolated from the background.
Business documents have their own set of difficulties. In a highly structured form, locating the address might be an easy task. However, business documents go far beyond to include semi-structured and unstructured formats as well. In those cases, locating the addresses requires more sophisticated techniques, for example keyword location.
Once we have located the address block, we are ready for recognition. During this process, the address field is divided into a city-state-zip line and a street address line. Based on the interpretation rules for this specific application, the recognition engine processes the city, state, and zip code first. This information is then used to verify the recognition of the street address. The software identifies all of the possible combinations of city, state, and zip code and cross-validates them against the U.S. Postal database. The same process occurs for street address recognition, with additional verification against the final city, state, and zip code. Based on the customer’s unique needs, the software can either flag the address as invalid, incomplete, or actually update the recognition result to match the USPS record.
With the prevalence of addresses on practically every business document, the opportunities for automated address recognition are endless. See the software in action: