Automation Maturity Continuum
If you are a business process outsourcer or SaaS provider that automates processes, which include document-based data, undoubtedly you use some method of extracting key data from these documents. The most common and easiest way, but most costly in terms of time and expense, is through manual data entry. Many of you might have employed OCR to extract the content so that it can be searched.
Moving-up the capability scale, you might have designed OCR templates that locate key data by coordinates (or zones) to extract this data. You might even use databases to help validate the extracted data automatically. Some very advanced BPOs and SaaS providers use multiple OCR technologies and have business rules that dictate where to look for the data.
Apart from manual data entry, every move up the “automation maturity continuum” brings with it complexity and hidden costs. With manual data entry, it’s quite easy to calculate your costs. Just factor your total data by the total throughput of your data entry operators and you get your data entry costs. Or just calculate your raw data entry wage bill.
Complex and Costly
When you apply automation through OCR and validation, often the most well-designed system can have “efficiency leaks” that can end-up costing your company millions and even tens of millions of dollars of unnecessary data entry and validation expenses.
At Parascript, we have worked with the largest organizations to uncover these hidden inefficiencies and bring more accuracy to document-oriented data extraction processes. What we find is that a system that once provided good accuracy no longer provides that accuracy. Or that advances in technology can provide more efficiencies than ever before. These technologies are complex and it’s hard to configure them initially let alone continually monitor and tune them. Even the largest most experienced organizations can experience significant hidden costs within their document processing. While tight controls over data quality can be implemented initially; there are a variety of factors that, over time, can contribute to increased costs of data extraction processes.
Hidden Costs of Data Extraction
Here are seven potential causes for increased costs for your organization to consider:
- Changes in document format. One of the most common and costly processes to manage is when document formats (the layout of information) changes or a new format gets introduced. When this occurs, significant analysis is required to understand the effect of the new layout, how to adjust existing field location rules, and then measure performance.
- Change in image quality due to hardware degradation, new hardware implementation or changes to configuration. It is a common experience among service providers that the way images are acquired changes over time, even for the same client. The client may use new hardware, or they might have poorly-maintained hardware. Both lead to changes in image quality that affect data extraction performance.
- Changes in data formats or types. Sometimes a client will change the way in which their data is entered. For instance, a date field might change from MMDDYY to MMDDYYYY. If these changes are not tracked, this may result in data output that does not meet the client’s requirements.
- Changes in validation requirements. For some fields, the list of acceptable entries can change over time to suit changing business requirements. For instance, the transition from ICD-9 to ICD-10 brought with it significant change to the codes that would be acceptable with automated validation. Sometimes these changes are well-managed and communicated, but often they are not.
- Upgrades of component software that introduce errors or accuracy problems. Technology continues to march on and software vendors are constantly wanting customers to upgrade to the latest and greatest. But oftentimes upgrades can bring with them unintended consequences when it comes to data location and extraction. Improvements on one part of the software can mean degradation to other parts. The only way to be sure is to stringently measure performance of each system. This is time consuming and complex.
- Staff turnover can have a significant effect in both ability to manage configurations of data extraction software as well as the quality control side. Data entry and validation staff often become very knowledgeable regarding specific document types and data and are highly efficient in catching problems. The result is less errors. However, with each new staff member, errors can increase and if performance of each staff member is not stringently measured, data quality suffers.
- Samples and ground truth data used to measure system performance do not represent production documents. This is probably the single-biggest contributor to poor performance: the service provider just doesn’t know that their data quality has declined due to lack of statistical representative data sets used to test the system. In addition, the ground truth data must be maintained to ensure that it always represents the most current production data and requirements. Without it, service providers might as well be flying at night without instruments.
These are a few of the factors that can contribute to significant cost increases even when everything appears to be running smoothly. Next, we’ll go into each area in more detail and provide recommendations on how to manage them.