Even though around 94 percent of claims are submitted electronically—as reported by the latest CAQH Index—many of these are submitted by non-facilities providers. Instead of a pure end-to-end electronic transaction, a good portion of claims submittals still start out as paper-based claims. Currently almost 1.5B claims are submitted annually and best estimates are that between 20-40 percent of these claims start out as a form even if they end-up being transmitted as an electronic claim. This amounts to between 300-600 million paper claims per year.
Claims Data Extraction Automation
Automation of paper forms processing is a tried-and-true capability so transforming a paper claim to EDI 837 is not a terribly difficult task, but it is still a costly operation because data needs to be extracted at an accuracy of 99 percent or greater. Making this process even more-costly is the percentage of claim forms that are submitted as hard-to-process images.
Within the “paper claim initiated” realm, the vast majority are submitted as “drop-out” forms. This means that the form structure that guides the entry of all of the claim data disappears when the paper claim form is scanned. This occurs through a combination of a specially-printed form using a particular color ink along with a scanner that can detect this ink and remove it. The result is only the claim data is submitted which allows optical character recognition (OCR) to work with a pristine image.
Extracting Data from Black-and-White Claims
Since many paper claims are initiated by individual providers, investment in these special forms and scanners is not 100 percent leading to form images that leave the data and the underlying form structure intact, commonly referred to as “black-and-white” claims. Some estimates are that between 10 percent to 25 percent of paper-initiated claims are black-and-white images translating to up to around 150 million annually.
Depending upon the quality of the scanner, the resulting images can also have a lot of noise or incur shifting or rescaling. This leads to suboptimal performance for OCR because the form structure can confuse the OCR engine or the OCR engine cannot process the claim at all because of significant distortion. Overall, the inability to automate with a high-level of precision creates significant costs. Perhaps, 20 percent of all black-and-white claims are even able to be partially automated.
Achieving Quality Data Results
Yet our experience dealing with a variety of these paper-initiated claims including the poorest quality faxed forms has also resulted in implementing a high-degree of automation, even with black-and-white images. In fact, it is possible to achieve processing close to the performance of pristine drop-out claims and with 99 percent accuracy. So how can high performance be achieved? It is useful to dig into how paper claims are processed.
Preprocessing Claims
First, incoming images undergo what is called preprocessing. Preprocessing often involves correcting orientation and general noise issues that result from lower-quality scanners. Next the image is converted from color or grayscale to high-contrast “bitonal” images, which optimizes the image for OCR. Even though we refer to “black-and-white” claims, the original images can be in color or grayscale. Generally the more information within the image, the better the software can improve overall image quality prior to converting to black-and-white images.
Optimal Alignment
Next comes the alignment process. For structured forms such as claims, software relies upon examples with which to match incoming forms. These examples serve as a means to align the claim to known data structures, often called “templates” that dictate the actual location of data to be extracted. Successful alignment drives all subsequent extraction processes and is crucial to automation; inability to align claims means a completely manual data entry process. Some software has less tolerance for images that don’t perfectly align largely due to the inability to transform the image which can adjust the actual location of the data. A key capability to this issue is to match, at the field level, an incoming claim to the template, shifting up-down-left-right data so that it conforms to the known structure. This allows a higher-level of OCR automation since data is now successfully located.
Field-level OCR: Going Templateless
The last step is applying field-level OCR. With a claim aligned to the correct template, OCR is used to extract each individual data element. Unfortunately, with black-and-white images where the claim structure is left intact, the presence of lines and text can interfere with the actual data. A novel approach to this problem is application of field-level template structure removal. Since the claim is aligned to the template in the previous step, all preexisting print can also be aligned and “scrubbed” from the image, leaving only the data in a manner very similar to what can be achieved with drop-out forms. Overall, success rates with OCR by removing the underlying form structure can result in between 5 percent and 15 percent improvements over traditional OCR processing.
While each individual technique described can be implemented which will result in significant performance improvement over traditional OCR, application of all three optimization techniques achieves a level of performance close to that of drop-out forms. Automation can go from between 10 percent-20 percent all the way to 85 percent – an improvement of several hundred percent. That equates to significant cost savings while simultaneously improving data quality.
Parascript Virtual Drop-out for Claims
Parascript offers virtual drop-out for claims processing that has set a new standard for classification and recognition for black-and-white claims, overcoming image quality and scaling challenges. Black and white forms are converted to simulate drop-out ink forms; images are analyzed, and those that vary in size are reformatted to conform to expected layouts. Mobile images are transformed into high-quality scans—even without specialized mobile apps. Parascript deep learning algorithms improve out-of-the-box accuracy to the industry’s highest level. If you’d like to find out more, go to Parascript Claims Data Extraction or watch this video.