Here we cover how to address the problems posed by a legacy system that has inadequate metadata in terms of both detail as well as coverage. Existing document types don’t have the metadata needed to support efficient governance and use, and new document types need to be created.
Unless an organization is familiar with the latest technologies and how to implement them, the typical avenue to solve the problem is through brute force: hire new staff or use existing staff to manually review each document to assign it to a category, catalog it, and then update the new document control system. The cost of this path is often too high, either in workload or additional costs, and the reality is that the documents are migrated into the new system as-is. The organization suffers through the negatives, and hopefully straightens it out in a future project.
The good news is that there is proven software that can automate this task at a fraction of the labor and time costs of alternative manual review and tagging. Here’s a step-by-step process of how to achieve a technology-driven classification and tagging project.
Step One: Take an inventory of your document assets. As stated earlier, one way is the brute force method. Have staff manually review every file. There is also software that can automate the clustering of documents by likeness in much the same way a human would. The actual capability is called document clustering, and it can automate grouping by visual elements or content, or both to dramatically reduce the amount of work required. You simply import the volume of documents you want grouped and the software will automatically sift through them, evaluating each document, and place them into clusters.
Step Two: Identify your document types. For this, you will want staff familiar with the types of documents used in business and their names; we’ll call these folks subject matter experts or SMEs. Again, you can manually assign these documents to their respective types, or you can take a small subset of each type (identified by the SMEs of course) and then use them to automatically create rules using classification software. An automated approach allows you to forego the time-consuming task of analyzing each type of document, and variants, in order to discover rules which can be used. Rather, the software itself can accomplish this task with a good deal of accuracy. You simply locate a handful of each type, taking care to identify variants where possible, and import them into the system. From there the system will automatically analyze and create rules. You can then review the accuracy of the results and make adjustments by moving documents from one class to another or assign a document an entirely new class. If you have used clustering software to obtain groups, you can use these to create a very efficient classification process.
Step Three: Identify your metadata. Once you are confident you have a good comprehensive view of your document types, the next step is to figure out what key data about each document type will be useful for management. It’s important to not just think of activities such as search and retrieval of documents, but also to include governance-related data such as if the type belongs to a record class, if there is any sensitive data on the document type, or dates associated with the document’s creation. Once you have your types and metadata, you have the basis of a true document taxonomy.
Step Four: Implement metadata rules. Now that you have your document types and necessary metadata, you can create metadata rules for each document type such that when it is automatically assigned a class, data can either be extracted or applied to the document. Once again, if you have software to handle this task, it makes it a lot easier. Essentially each document class drives the metadata extraction/assignment process such that you only need to establish the rules once for each class. From there, documents automatically get their class and metadata assigned.
Step Five: Start the classification and metadata assignment process. If you don’t have software that can do this, it’s a load of work, both initially and ongoing. But, just as you were lucky enough to have clustering, classification, and metadata rules software, you have the tools to automate this process too Simply start importing your entire volumes of documents and the system will automatically go to work assigning documents to their respective classes, extracting and assigning metadata, and then exporting it to your destination of choice. For documents that are hard to classify, those automatically get routed through a quality control workflow where a staff member reviews and assigns it. Once assigned, it goes through the metadata assignment process and then a last quality review. The automation of the quality control process itself is a time-saving activity as all documents will go through the same process and you have a complete audit record.
This step-by-step guide to how to use software and automation will cut-down on costs, time, and complexity of any document classification and metadata project for efficient information governance.