The importance of metadata cannot be over emphasized as evidenced by John Horodyski’s most recent contribution to CMS Wire. Horodyski of Optimity Advisors makes a strong case for metadata as a critical first step with any project that involves managing content. Apparently, workers waste more than 40 percent of their time searching for existing assets. To make matters worse, businesses operate in a changing environment. In order to effectively manage content, the metadata and the taxonomies that govern them must also be allowed to evolve to meet new needs
Metadata represents a “snapshot” in time, as Horodyski points out. There’s an ongoing “journey” in managing metadata so that it is as dynamic as the organization’s business needs. In the past, search tools were expected to solve the challenges of managing information content. Unfortunately, leveraging traditional search tools has proven an ineffective strategy. “Dumping data in a bucket” and letting search be the answer has failed to meet business needs. With proper taxonomies and good descriptive metadata, business can support functional, effective document organization and search. When we talk about the need to evolve the metadata, this is an extra layer of complexity on an already challenging problem.
For organizations that have gone to great lengths to create a good content taxonomy, the last thing anyone wants is to adopt a data management system that backs them into a corner with no exit strategy. Both your taxonomy and metadata systems must evolve with organizational needs. Makes sense, but here’s a note of caution. Just because a “living taxonomy” has been created, doesn’t mean the supporting systems can support it.
In fact, there’s a real-world scenario that we see occur all too frequently. Ace Company adopts an enterprise-wide document system that implements a solid information architecture (IA) based upon evaluations of existing documents, their uses, and how best to describe and organize them. The program also includes annual reviews of document use, user feedback on efficiency of the system, and potential needs to add new document categories or metadata. During the first review, the company finds that a certain category of documents requires the addition of two new metadata tags in order to support a new business line. Adding the new metadata fields to the IA, and to the document control and retrieval system is no problem. And yet, what about the task of populating the new fields with meaningful data from the documents themselves? How does the company efficiently manage this task?
Ace Company also finds that they need to add a new category of documents based upon this same new business line. Again, adding the new document type to the company’s IA and document system is straightforward and simple. And yet, how does the company populate this document type with its existing repository of documents?
These are tasks tailor-fit to document classification and data extraction. While most companies view this process as a “front-end” need, the same capabilities are necessary to maintain and evolve the information architecture in order to insure what it built today, will satisfy tomorrow’s business needs.
At Parascript, we’re interested in this problem because it is a real need that is more complex than most organizations realize – and solving complex problems is what we do. Stay tuned, the next article will focus on a more-detailed example of how to solve this process using classification.
If you’d like to find out more about the Parascript software platform, contact us. Also, if you found this article interesting, you might find this video on document classification useful: