Dark Data. It’s big and elusive, and still a new concept. Sounds a bit like Big Data a few years ago. This article will help define dark data, what it means to the document capture space and why it presents such critically important opportunities.
What is Dark Data?
Dark Data is a pretty mysterious term. In fact, we just received hundreds of poll responses from members of ARMA International. The answers were evenly distributed among 3 different definitions and a fill-in-the-blank, with a slight lead going to “it’s just a buzzword for data we don’t use.”
Obviously there’s no agreement, so let’s see what Gartner says:
“…the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).”
That’s a better start, but for those of us who are more visual, I think Anne Tulek of Access Sciences did a really nice job of putting it to a chart in an AIIM Webinar, Capturing the Value of Your Information and Data, a few weeks ago.
As Anne shows in this chart, there is data we have (left) and data we don’t have (right). Of that, there’s only a small portion of each that is actually currently being understood or utilized, or that we could go get if we are willing to take on the effort. The rest of the data that we have but don’t use (left) is dark.
What is Dark Data as it Relates to Document Capture?
While email may be one of the largest repositories of dark data (especially as it pertains to legal discovery needs), the document capture process also contains a lot of unused information, and much of it has value because the capture process is very often attached to a transaction, making it critically important to a business.
In recent research conducted with AIIM, we found that about 50% of responding organizations are using text recognition (the fastest way to unlock data and make use of it), while 25% are just scanning to archive and another 25% are manually re-keying everything. Because of the time and expense involved in manual keying, there is likely a lot of information being skipped during the keying process for efficiency’s sake.
This identifies a few big opportunities for getting at dark data during the document capture process. Foremost, for the 50% not using text recognition, there is an opportunity to convert and index text using basic OCR. For those that already use OCR, there is a lot of valuable information being left on the page that can be addressed by some ICR engines. These include:
- Signatures – usually required to authorize a transaction or a document, a signature (and its very presence) is data that is most often not reviewed or validated in a capture process
- Annotations – ranging from subtle, hard-to-detect strikeouts to full-blown commentary, annotations are critical datapoints on legal documents
- Keywords – often scribbled in the margins or in a comments section, keywords contain valuable information including customer sentiment or special service requests (such as change-of-address)
How can I use the Dark Data in My Capture Process?
Here’s a few suggestions for leveraging this data. Typically these suggestions follow two paths – supporting information governance, compliance, and risk management, or helping you make better operating processes and decisions.
- First, if you’re among the 50% not using whole page OCR in your incoming capture process, make the modest investment in running OCR, as it will yield a keyword universe that will support automatic classification, indexing and retrieval. This will make archival more efficient, and provide for easier discovery
- Second, utilize the signatures on your documents. If you’re processing transaction orders, or claims, validate that a signature even exists. If it was overlooked, send the document to exception handling so customer service can be proactive in follow-up. Also validate that the signature is authentic (signature verification software, an extension of ICR capabilities, can do this very accurately)
- Third, flag the presence of annotations. This in itself could kick off an exception handling process (annotations may indicate that this is a version of a contract and not the final). And use the software as a second set of eyes to find small annotations that may not be noticed in a cursory glance by a human
- Finally, identify keywords for tagging, routing and classification. This has the potential to greatly augment archival, process management and discovery. For example, sentiment words in comment functions (love, hate) are opportunities for customer service to engage the customer for a case study or to fix a brewing product problem
Regardless of the decision you make, be aware that there is information at your disposal that can help your business operate better with less risk.
Want to learn more? Download a copy of our recent AIIM research: “Shedding Light on the Dark Data in Your Document Capture Process”.