WHAT-IS . NET
Information and answers to all your common and special questions.
OCR (Optical Charactor Recognition) is the process of turning a picture of words (such as a scan of a typed letter) into an editable document that you can open and use in your desktop publishing software, word processor, or other text editor. Today's OCR software packages contain sophisticated support for multiple languages, PDF and HTML output, and format retention.

Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.

Imagine you've got a paper document - for example, magazine article, brochure, or PDF contract your partner sent to you by email. Obviously, a scanner is not enough to make this information available for editing, say in Microsoft Word. All a scanner can do is create an image or a snapshot of the document that is nothing more than a collection of black and white or colour dots, known as a raster image.
Copyright ©2009 What-is.Net  All rights reserved.
Last Updated: Sep 2009
What is OCR Software?
In order to extract and repurpose data from scanned documents, camera images or image-only PDFs, you need an OCR software that would single out letters on the image, put them into words and then - words into sentences, thus enabling you to access and edit the content of the original document.

The exact mechanisms that allow humans to recognize objects are yet to be understood, but the three basic principles are already well known by scientists - integrity, purposefulness and adaptability ( IPA). These principles constitute the core of ABYY FineReader OCR allowing it to replicate natural or human-like recognition.

Let's take a look on how OCR software recognizes text. First, the program analyzes the structure of document image. It divides the page into elements such as blocks of texts, tables, images, etc. The lines are divided into words and then - into characters. Once the characters have been singled out, the program compares them with a set of pattern images. It advances numerous hypotheses about what this character is. Basing on these hypothesises the program analyzes different variants of breaking of lines into words and words into characters. After processing huge number of such probabilistic hypothesises, the program finally takes the decision, presenting you the recognized text.

Images captured by a digital camera differ from scanned documents or image-only PDFs. They often have defects such as distortion at the edges and dimmed light, making it difficult for most OCR applications, to correctly recognize the text. Some OCR software supports adaptive recognition technology specifically designed for processing camera images. Such software offers a range of features to improve the quality of such images, providing you with the ability to fully use the capabilities of your digital devices.
What is OCR Software?
Return to HOME Page