Getting Started with OCR

Optical character recognition or Optical character recognition (OCR) refers to the ability to detect text in an image. OCR is useful in many computer vision applications such as image search, document analysis, and robot navigation. The images can be of any type of document, or a scene that contains text (for example, license plates). Computer Vision Toolbox™ provides functionalities to detect and recognize text in multiple languages and train OCR models to recognize custom text. The ocr function is at the core of these functionalities, which use it to perform text detection and recognition in an image.

OCR to recognize text of a handicap sign.

Text Detection

The first step for OCR is to detect regions of text in an image. Computer Vision Toolbox includes these approaches.

Built-in Layout Analysis

The ocr function uses the Tesseract OCR engine to perform automatic text detection and recognition, which performs best when the text is located on a uniform background and formatted like a scanned document. When the text has a different layout, use the LayoutAnalysis name-value argument of the ocr function to specify the layout of the text in the image. You can set the layout to "auto", "page", "block", "line", "word", or "character". When the layout is "line", "word", or "character" layout it's best to also specify the locations of text region. For more details, see Specify Text Regions.

The built-in layout analysis in the ocr function analyzes a binarized input image by assuming the image has dark text against a uniform, light background. If the image contains a nonuniform background or lighting, use binarization to prepare the input image for text recognition. For more information, see the Troubleshoot OCR Function Results section.

CRAFT Text Detector

The detectTextCRAFT function offers robust text detection based on the Character-Region Awareness For Text detection (CRAFT) model, which can detect text regions in images regardless of factors such as image background, contrast, and intensity values. When you have difficulty segmenting the text regions in an image, use the pretrained, deep-learning-based CRAFT model. This model requires more computational resources than other text detection approaches, and also requires a Deep Learning Toolbox™ license. For more information, see Automatically Detect and Recognize Text Using Pretrained CRAFT Network and OCR.

Custom Text Detection

Computer Vision Toolbox provides several tools for users to develop custom algorithms to detect text in complex image scenes. These examples provide different approaches to image preprocessing algorithms:

Recognize Text Using Optical Character Recognition (OCR) — Overview of preprocessing, with an example using blob analysis.
Segment and Read Text in Image — ROI-based preprocessing.
Opening by Reconstruction — Remove artifacts to produce a cleaner image.
Correct Nonuniform Illumination and Analyze Foreground Objects — Enhance an image.

Text Recognition

The ocr function supports text recognition functionality in 64 languages. The function recognizes text in English by default, and works well on scanned documents when using all of the default values for the function. For example, this image shows a scan of a business card.

Mathworks business card with the company name, address, and website. The information on the card has been recognized and reproduced in a rectangular element.

Specify Text Regions

Using text regions for OCR can improve performance, especially when performing OCR on images of a natural scene that contains words. To recognize text in these kinds of images, specify ROI bounding boxes around text regions and then use the ocr function with an ROI input. For example, this code snippet uses an ROI to specify the location of the tex ton an accessible parking sign:

I = imread("handicapSign.jpg");
roi = [360 118 384 560];
ocrResults = ocr(I,roi);

Handicap parking sign

Specify OCR Model

The ocr function supports text recognition in 64 languages through built-in language model files. To use these models, specify the Model name-value argument of the ocr function. For faster performance (but with less accuracy) using the built-in models, you can append "-fast" to the language model string. For example, "english-fast", "japanese-fast", and "seven-segment-fast". This code snippet recognizes the seven-segment characters in an image.

I = imread("sevSegDisp.jpg");
roi = [506 725 1418 626];
ocrResults = ocr(I,roi,Language="seven-segment");

Image of a digital number with a bounding box around it.

Note

Computer Vision Toolbox ships with language model files for recognizing English, Japanese, and seven-segment characters. To perform recognition on other language characters using the ocr function you must install the OCR Language Data Files support package. For more details, see Install OCR Language Data Files.

Troubleshoot OCR Function Results

If your OCR results are not what you expect, you can try one or more of these options:

The built-in layout analysis in the ocr function analyzes binarized input image by assuming a uniform background and dark text on a light background. If the image contains nonuniform background or lighting, use binarization to prepare the input image for text recognition. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential nonuniform lighting issue. Try applying top-hat filtering by using the imtophat function, or use other techniques that address non-uniform illumination.
Increase the image size 2–4 times larger.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.
Use binarization to check for nonuniform lighting issues. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential nonuniform lighting issue. Try applying top-hat filtering by using the imtophat function, or other techniques that address non-uniform illumination.
Use the region-of-interest option to isolate the text. Specify the ROI manually or use text detection.
If your image looks like a natural scene that contain words, such as a street scene, rather than a scanned document, try setting the LayoutAnalysis name-value argument to either "Block" or "Word".
Ensure that the image contains dark text on a light background. If the image instead contains light text on a dark background, you can binarize the image and invert it before passing the image to the ocr function.

Train Custom OCR Models

In some cases, to get accurate recognition results, you must train a custom OCR model. For example, when the text in your images use a proprietary font that is significantly different from any of the available fonts, or when OCR results are not what you expect even after trying the troubleshooting steps. For more details on how to train a custom OCR model, see Train Custom OCR Model.

Create Ground Truth Data

You can use the Image Labeler app to interactively label images for ground truth data for training and evaluating OCR models. For more details, see Prepare Training Data.

Evaluate and Quantize OCR Results

Optionally, you can quantize a trained model for faster performance, by using the quantizeOCR function, but this can decrease the accuracy of the model. Use the metrics generated by the evaluateOCR function to evaluate the quality of the OCR results.