ocr
Recognize text using optical character recognition
Description
[___] = ocr(___,
uses
additional options specified by one or more Name,Value
)Name,Value
pair
arguments, using any of the preceding syntaxes.
Examples
Recognize Text Within an Image
businessCard = imread('businessCard.png');
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
recognizedText = ocrResults.Text;
figure;
imshow(businessCard);
text(600, 150, recognizedText, 'BackgroundColor', [1 1 1]);
Recognize Text in Regions of Interest (ROI)
Read image.
I = imread('handicapSign.jpg');
Define one or more rectangular regions of interest within I.
roi = [360 118 384 560];
You may also use IMRECT to select a region using a mouse: figure; imshow(I); roi = round(getPosition(imrect))
ocrResults = ocr(I, roi);
Insert recognized text into original image
Iocr = insertText(I,roi(1:2),ocrResults.Text,'AnchorPoint',... 'RightTop','FontSize',16); figure; imshow(Iocr);
Display Bounding Boxes of Words and Recognition Confidences
businessCard = imread('businessCard.png');
ocrResults = ocr(businessCard)
ocrResults = ocrText with properties: Text: '‘ MathWorks®...' CharacterBoundingBoxes: [103x4 double] CharacterConfidences: [103x1 single] Words: {16x1 cell} WordBoundingBoxes: [16x4 double] WordConfidences: [16x1 single]
Iocr = insertObjectAnnotation(businessCard, 'rectangle', ... ocrResults.WordBoundingBoxes, ... ocrResults.WordConfidences); figure; imshow(Iocr);
Find and Highlight Text in an Image
businessCard = imread('businessCard.png'); ocrResults = ocr(businessCard); bboxes = locateText(ocrResults, 'MathWorks', 'IgnoreCase', true); Iocr = insertShape(businessCard, 'FilledRectangle', bboxes); figure; imshow(Iocr);
Input Arguments
I
— Input image
M-by-N-by-3 truecolor
image | M-by-N 2-D grayscale
image | M-by-N binary image
Input image, specified in M-by-N-by-3 truecolor, M-by-N 2-D grayscale, or binary format. The input image must be a real, nonsparse value. The function converts truecolor or grayscale input images to a binary image, before the recognition process. It uses the Otsu’s thresholding technique for the conversion. For best ocr results, the height of a lowercase ‘x’, or comparable character in the input image, must be greater than 20 pixels. From either the horizontal or vertical axes, remove any text rotations greater than +/- 10 degrees, to improve recognition results.
Data Types: single
| double
| int16
| uint8
| uint16
| logical
roi
— Region of interest
M-by-4 element matrix
One or more rectangular regions of interest, specified as an M-by-4
element matrix. Each row, M, specifies a region
of interest within the input image, as a four-element vector, [x y width height].
The vector specifies the upper-left corner location, [x y],
and the size of a rectangular region of interest, [width height],
in pixels. Each rectangle must be fully contained within the input
image, I
. Before the recognition process, the
function uses the Otsu’s thresholding to convert truecolor
and grayscale input regions of interest to binary regions. The function
returns text recognized in the rectangular regions as an array of
objects.
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: ocr(I,'TextLayout','Block')
TextLayout
— Input text layout
'Auto'
(default) | 'Block'
| 'Line'
| 'Word'
| 'Character'
Input text layout, specified as the comma-separated pair consisting of
'TextLayout
' and one of the
following:
TextLayout | Text Treatment |
---|---|
'Auto' | Determines the layout and reading order of text blocks within the input image. |
'Block' | Treats the text in the image as a single block of text. |
'Line' | Treats the text in the image as a single line of text. |
'Word' | Treats the text in the image as a single word of text. |
'Character' | Treats the text in the image as a single character. |
Use the automatic layout analysis to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text. You may get poor results if your input image contains a few regions of text or the text is located in a cluttered scene. If you get poor OCR results, try a different layout that matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.
Language
— Language
'English'
(default) | 'Japanese'
| character vector | string scalar | cell array of character vectors | string array
Language to recognize, specified as the comma-separated pair
consisting of 'Language
' and the character vector 'English'
, 'Japanese'
,
or a cell array of character vectors. You can also install the Install OCR Language Data Files package
for additional languages or add a custom language. Specifying multiple
languages enables simultaneous recognition of all the selected languages.
However, selecting more than one language may reduce the accuracy
and increase the time it takes to perform ocr.
To specify any of the additional languages which are contained in the Install OCR Language Data Files package, use the language character vector the same way as the built-in languages. You do not need to specify the path.
txt = ocr(img,'Language','Finnish');
List of Support Package OCR Languages
'Afrikaans'
'Albanian'
'AncientGreek'
'Arabic'
'Azerbaijani'
'Basque'
'Belarusian'
'Bengali'
'Bulgarian'
'Catalan'
'Cherokee'
'ChineseSimplified'
'ChineseTraditional'
'Croatian'
'Czech'
'Danish'
'Dutch'
'English'
'Esperanto'
'EsperantoAlternative'
'Estonian'
'Finnish'
'Frankish'
'French'
'Galician'
'German'
'Greek'
'Hebrew'
'Hindi'
'Hungarian'
'Icelandic'
'Indonesian'
'Italian'
'ItalianOld'
'Japanese'
'Kannada'
'Korean'
'Latvian'
'Lithuanian'
'Macedonian'
'Malay'
'Malayalam'
'Maltese'
'MathEquation'
'MiddleEnglish'
'MiddleFrench'
'Norwegian'
'Polish'
'Portuguese'
'Romanian'
'Russian'
'SerbianLatin'
'Slovakian'
'Slovenian'
'Spanish'
'SpanishOld'
'Swahili'
'Swedish'
'Tagalog'
'Tamil'
'Telugu'
'Thai'
'Turkish'
'Ukrainian'
To use your own custom languages, specify the path to the trained data file as the language
character vector. You must name the file in the format,
<language>.traineddata
. The file must be located
in a folder named 'tessdata
'. For
example:
txt = ocr(img,'Language','path/to/tessdata/eng.traineddata');
txt = ocr(img,'Language', ... {'path/to/tessdata/eng.traineddata',... 'path/to/tessdata/jpn.traineddata'});
traineddata
files in the cell array are
contained in the folder ‘path/to/tessdata
’. Because the following code
points to two different containing folders, it does not work.
txt = ocr(img,'Language', ... {'path/one/tessdata/eng.traineddata',... 'path/two/tessdata/jpn.traineddata'});
traineddata
file
must also exist in the same folder as the Hindi traineddata
file. The
ocr
only supports traineddata
files created using
tesseract-ocr
3.02 or using the OCR
Trainer.
For deployment targets generated by MATLAB®
Coder™:
Generated ocr executable and language data file folder must be colocated.
The tessdata
folder must be named tessdata
:
For English:
C:/path/tessdata/eng.traineddata
For Japanese:
C:/path/tessdata/jpn.traineddata
For custom data files:
C:/path/tessdata/customlang.traineddata
C:/path/ocr_app.exe
You can copy the English and Japanese trained data files from:
fullfile(matlabroot, 'toolbox','vision','visionutilities','tessdata');
CharacterSet
— Character subset
''
all
characters (default) | character vector | string scalar
Character subset, specified as the comma-separated pair consisting
of 'CharacterSet
' and a character vector. By
default, CharacterSet
is set to the empty character
vector, ''
. The empty vector sets the function
to search for all characters in the language specified by the Language
property.
You can set this property to a smaller set of known characters to
constrain the classification process.
The ocr
function selects the best match
from the CharacterSet
. Using deducible knowledge
about the characters in the input image helps to improve text recognition
accuracy. For example, if you set CharacterSet
to
all numeric digits, '0123456789'
, the function
attempts to match each character to only digits. In this case, a non-digit
character can incorrectly get recognized as a digit.
Output Arguments
txt
— Recognized text and metrics
ocrText
object
Recognized text and metrics, returned as an ocrText
object.
The object contains the recognized text, the location of the recognized
text within the input image, and the metrics indicating the confidence
of the results. The confidence values range is [0 1] and represents
a percent probability. When you specify an M-by-4 roi
,
the function returns ocrText
as an
M-by-1 array of ocrText
objects.
If your ocr
results are not what you expect, try one or more of the
following options:
Increase the image 2-to-4 times the original size.
If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters separates the characters.
Use binarization to check for non-uniform lighting issues. Use the
graythresh
andimbinarize
functions to binarize the image. If the characters are not visible in the results of the binarization, it indicates a potential non-uniform lighting issue. Try top hat, using theimtophat
function, or other techniques that deal with removing non-uniform illumination.Use the region of interest
roi
option to isolate the text. Specify theroi
manually or use text detection.If your image looks like a natural scene containing words, like a street scene, rather than a scanned document, try using an ROI input. Also, you can set the
TextLayout
property to'Block'
or'Word'
.
References
[1] R. Smith. An Overview of the Tesseract OCR Engine, Proceedings of the Ninth International Conference on Document Analysis and Recognition (ICDAR 2007) Vol 2 (2007), pp. 629-633.
[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).
[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.
Extended Capabilities
C/C++ Code Generation
Generate C and C++ code using MATLAB® Coder™.
Usage notes and limitations:
'TextLayout'
,'Language'
, and'CharacterSet'
must be compile-time constants.Generated code for this function uses a precompiled platform-specific shared library.
Version History
See Also
OCR
Trainer | ocrText
| insertShape
| graythresh
| imbinarize
| imtophat
| detectTextCRAFT
MATLAB 명령
다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.
명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)