When setting up your scanner or mobile device for introduction to a recognition engine, either for machine print recognition (Optical Character Recognition – OCR) or handwriting recognition (Intelligent Character Recognition – ICR), is more DPI better?
In the case of scanning images, there can be too much or too little. There is a point where too much is beyond what is needed for image analysis, and too little can result in poor quality for analysis engines.
First, let’s define DPI. Dots Per Inch is a term used in the print industry referencing a calculation of how many dots should be presented into a square inch by an ink-based system. Printing is an additive process where ink is built up. Typically there are 4 to 6 different color passes, which added together, provide the proper visual effect for the naked eye to interpret what’s on paper. Each color used in a print process has its own DPI and together they add up to density.
For recognition purposes, this is done purely in a digital world and DPI does not actually apply. However, it has taken on a different connotation for scanning used in business document processing where the images are to be captured, presented to an OCR/ICR engine and archived. Image quality reference should technically use Pixels Per Inch, or PPI. Given the legacy purposes of scan, save and print, the term DPI connotes PPI, so DPI is the term used by most business and consumers alike when capturing an image.
Now since DPI references more or less PPI, how does that translate? Translation from an additive to subtractive system is difficult and there are multiple variables to determine how best to translate these input terms for various output uses. Instead, we continue to use the term DPI as a general reference for how many pixels a monitor can actually present in a square inch and how good the image quality for the recognition engine will be during processing. Like ink dots, the amount of pixels presented in a given space references how good or bad the image will look.
So, for document capture, what are the optimal scanner settings? Given that the image will only be used for recognition analysis and eventual archival storage AND that the image will most likely be translated into a binary image for the analysis phase, any image scanned between 200dpi and 300dpi is optimal. Less than 200dpi will result in possible jagged lines, missing data and low quality letter edges. However, images scanned above 300dpi will result in large image files making them cumbersome for transport and storage.
Is too much DPI good for recognition? Not necessarily. A recognition engine can only process so much data and when there’s too much, it has to reject the additional data for speed, performance and optimization, and in some cases may slow the processing down. Higher scanning settings should be avoided for most workflows unless the image will be resized to fit an expected page.
Setting scanners to accept actual size AND between 200dpi and 300dpi will satisfy most document management capture workflows. More is only better with specific workflows that require significant resizing from the original to the sample template image. Final answer? It’s best to be “just right” by setting the scanner between 200 and 300 dpi.
Learn everything you need to know about recognition technology.