In basic terms, OCR software examines a scanned image and translates the text within it into a file that can be edited. The first OCR systems translated text into a single font and size only. Today’s ambitious programs attempt to duplicate not only the fonts but also complex layout features, such as columns, tables, headers and footers, and even graphics.
OCR Software Reads Text One Character at a Time
There are many different types of pattern recognition schemes, and each OCR software uses a different set of models and implements them in different ways. One thing is true for all however, OCR software reads text one character at a time.
Components of OCR Software
Typical OCR systems include three components: an image scanner, OCR software and hardware, and an output interface. The image scanner optically captures text images to be recognized. Text images are processed with OCR software and hardware. The process involves three operations: image analysis (extracting individual character images from document), image recognition (recognizing these images based on shape), and contextual image processing (either to correct misclassifications made by the recognition algorithm or to limit recognition choices). The output interface provides OCR software results to the outside world.
Categories of Commercial OCR Software
There are two main categories of commercial OCR software: task-specific software and general-purpose page reading software. Task-specific OCR software deals only with particular document types and fields, like machine printed fields on bank checks, letter mail, or credit-card slips. This type of OCR software uses scanning hardware that captures only a few predefined document regions. For example, a bank-check reader may scan just the courtesy-amount field (where the amount of the check is written numerically) and a postal OCR system may scan just the machine-printed address block on a mail piece.
General-purpose page reading OCR software is designed to handle a broader range of machine-printed documents, in particular unstructured documents such as business letters, technical writings, and newspapers. These systems capture an image of a document page and separate the page into text and non-text zones. Text zones are segmented into lines, words, and characters, and the characters are passed to the recognizer for one-by-one recognition. Non-text zones such as graphics and line drawings are often saved separately from the text and associated recognition results. Recognition results are output in a format that can be later processed by application software.
Learn more: