Faster, better, more accurate. It seems as though the typical response to the need to improve speed, quality, or accuracy is to introduce technology into the mix. Does that mean humans don’t need to be involved?
Autonomous Machine Learning
Whether it’s artificial intelligence, machine learning, deep learning, or a mix, the notion that computers and software can replace activities that typically require a human to perform has been popular for decades, and if we broaden the scope to mechanical machines, for centuries.
Increasingly, research institutions and businesses are putting advanced technology into the mainstream with the intent to supplement or even obviate the need for human-based work. Take for instance IBM’s Watson Services. Earlier this year, it added several services designed to reduce the need for human-based taxonomy of images and documents. Companies such as Narrative Science are making the need for journalists and now business analysts more relegated to more-complex topics through advanced machine-based content production.
This progression towards more autonomous machine learning and artificial intelligence has its detractors and supporters. From concerns over AI used in weapons to economic fears, futurists are sounding the alarms. Supporters look in awe at what humanity might or might not achieve.
Deep Learning: Applied Uses
How does deep learning relate to Parascript and what we’re doing? Well, our staff has been working with advanced machine vision and machine learning technologies for over 20 years applying it to very practical needs in banking and government. Solving needs such as locating and interpreting handwritten or text-based information into transactional or actionable data is our hallmark. Recently, we have turned our attention to the problems of how to efficiently and accurately identify documents and then describe them. These problems affect every organization and are core issues with customer service, operational effectiveness, and information governance.
A recent article in Xconomy places the machines versus humans debate in the context of managing the explosion of information. The author compares techniques used to curate music playlists that is an analog to identifying and organizing documents into similar groupings. For both, you must be able to accurately describe the music file in question and then relate it to other similar tracks. While all the fervor was initially on algorithms to automate this process, increasingly the accuracy of these processes has come into question resulting in a preference towards a symbiotic blend of both human-based curation with computer-based analysis. The result is that humans can be much more efficient with assembling curated lists of music and associating one artist’s style with another without having to spend an inordinate amount of time reviewing every song.
Have Our Cake and Eat It
We want our cake and to eat it too. With document classification and curation, we don’t want to have to review and appropriately tag documents with metadata. And yet, we want all the benefits of accurate search results. While there is no silver bullet, there is a silver lining and it’s called assistive technology. To put it simply, machine learning and artificial intelligence is not to the point where we can enjoy both highly automated and highly accurate document classification results. AI does very well at defined tasks where there is a defined answer, but where transposition is required, humans still do a superior job. The answer is to intelligently blend both computer-based heuristics with human-based workflows.
Computer-based Heuristics & Human-based Workflows
With regards to automated machine learning for classifying documents, the best and most practical approach is to employ both computers and humans in the process. Software that enables humans to automate classification rules through machine learning makes things more efficient. You don’t have to spend countless hours investigating document properties and devising or updating rules. The machine does it for you and you can test the results. Software that shepherds a classification review process allows humans to be more efficient as well. Only documents that evade a classification get reviewed and it is also possible to establish an auditing process that tightly controls quality. We supervise the machines, not the other way around.
We’re also interested in more advanced technologies such as deep learning that can provide further efficiencies. But nothing yet we have seen will remove us from the equation. When planning for document classification or data extraction projects, processes must be designed for the staff and technology treated as the welcome assistant, but not the boss.