Ontology at the Core of Intelligent Capture
Another day, another buzzword. This time, there is an increased use of the word ontology in the intelligent capture solution domain. Sounds cool. But what does it mean? Does it really change things for the better?
Let’s first tackle the definition of the word. According to the Oxford English Dictionary, ontology has two definitions, the second of which is “a set of concepts and categories in a subject area or domain that shows their properties and the relations between them.”
To apply this to the topic of documents, an ontology can simply be the concept of defining a set of documents by the type of document and then the nature of the related data within each document type. So far that doesn’t sound very sophisticated does it? It shouldn’t.
Most intelligent capture systems are based on the concept of identifying document types and their constituent data in a similar relational way. For example, an invoice has data such as a document title, “Invoice” along with other data, “Invoice Number” and “Invoice Total.” These two data elements along with the document title and document type form an ontology. The reality is that ontological principles lie at the heart of intelligent capture. Use of ontologies is not a new feature of intelligent capture. So it is correct to use the word “ontology” in relationship to intelligent capture, but it is misleading to make this seem new.
Classifying Documents Using a Domain Ontology
I recently came across the use of ontology in a competing product that stated, “It classifies documents using a domain ontology.” That sounds fairly complex and powerful so it must use machine learning, most likely deep learning, right? Not necessarily.
Again, an ontology is simply a set of relationships of data to a particular document. In the case of document classification, the relationship can be simply the specific words or phrases that are most likely to be located in a particular document that reliably help to identify that document. So using the invoice example again, the presence of the word “invoice” is a highly reliable word that can be used to classify a document. So the invoice document type along with the keyword “invoice” form an ontology.
Natural Language Processing and Ontologies
Lastly, some vendors might nestle Natural Language Processing (NLP) alongside the use of ontology probably in hopes of making the reader believe that the solution has a significant amount of artificial intelligence. And yet, ontology does not necessarily use NLP. Depending on how NLP is defined, it could mean the simple act of searching for a list of verbs.
On the flip side, true NLP relies on ontologies. In this case, the ontology exists as the relationship of words, sentences and phrases. It is organized using knowledge about a specific written language. With true machine learning, the language context supplied by NLP tools such as phrase chunkers and parts-of-speech taggers are used as input data by machine learning algorithms to identify key information within a document that is needed.
So go ahead and feel good about using the word “ontology” when you create your document classification and data extraction configurations. You just might impress someone! But don’t be fooled by others doing the same.