Using OCR to find handwritten digits

조회 수: 13 (최근 30일)
Odey Yousef
Odey Yousef 2020년 10월 20일
댓글: Shashank Gupta 2020년 10월 30일
I'm working on a project to automatically calculate the position of a laser relative to a ruler. I've been able to find the laser with high accuracy using HSV masking, but I'm running into a roadblock with using OCR to find the centroidal location of the ruler's demarcations. I've used HSV thresholding to completely eliminate the background, but OCR has trouble detecting written numbers. I've tried ROI analysis and predefining the character set to find (digits from 1 to 4), but it either turns up too much in the case of the original image (none of which is correct) or nothin in the case of the segmented image.

답변 (1개)

Shashank Gupta
Shashank Gupta 2020년 10월 23일
Hi Odey,
Have you looked at this example, If not give it a try, It might resolve some of your problem. Let me know if this helped you in some way.
  댓글 수: 2
Odey Yousef
Odey Yousef 2020년 10월 27일
편집: Odey Yousef 2020년 10월 27일
I had been following it initially but found that, even when applying a wide range of parameters, pre-processing the image wouldn't improve OCR detection. Instead, I followed a region of interest analysis which enabled me to identify candidate text regions and filter them. Although this worked well to identify regions with text, OCR still fails to detect accurately (using a character set of 2 3 4 5). Ultimately, I need to be able to do one of the following:
  • Accurately determining location of specified digit
  • Accurately filering ROIs to only contain each digit in frame
I have considered the following solutions but couldn't find success with any:
  • Adjusting filters on ROIs
  • Specifying required confidence in character recognition
  • Disregarding frames that indicate excessive displacement relative to a previous frame
  • Using mean location of found regions of interest to assume pixel location of 2, 3, or 4 depending on number of ROIs
I've attached a series of images throughout application of the algorithm and a code snippet demonstrating attempted pre-processing and ROI analysis. Let me know if there are clear issues or alternative approaches to this.
maskedGrayImage = rgb2gray(maskedRGBImage);
% Remove keypad background.
Icorrected = imtophat(maskedGrayImage,strel('rectangle',[60 150]));
BW1 = imbinarize(Icorrected);
figure;
imshowpair(Icorrected,BW1,'montage');
% Perform morphological reconstruction and show binarized image.
marker = imerode(Icorrected, strel('line',40,0));
Iclean = imreconstruct(marker, Icorrected);
BW2 = imbinarize(Iclean);
figure;
imshowpair(Iclean,BW2,'montage');
% Detect MSER regions.
[mserRegions, mserConnComp] = detectMSERFeatures(maskedGrayImage, ...
'RegionAreaRange',[80 300]);
figure
imshow(maskedGrayImage)
hold on
plot(mserRegions, 'showPixelList', true,'showEllipses',false)
title('MSER regions')
hold off
% Use regionprops to measure MSER properties
mserStats = regionprops(mserConnComp, 'BoundingBox', 'Eccentricity', ...
'Solidity', 'Extent', 'Euler', 'Image');
% Compute the aspect ratio using bounding box data.
bbox = vertcat(mserStats.BoundingBox);
w = bbox(:,3);
h = bbox(:,4);
aspectRatio = w./h;
% Threshold the data to determine which regions to remove. These thresholds
% may need to be tuned for other images.
filterIdx = aspectRatio' > 2 ;
filterIdx = filterIdx | [mserStats.Eccentricity] > .995;
filterIdx = filterIdx | [mserStats.Solidity] < .3;
filterIdx = filterIdx | [mserStats.Extent] < 0.2 | [mserStats.Extent] > 0.6;
filterIdx = filterIdx | [mserStats.EulerNumber] < -4;
% Remove regions
mserStats(filterIdx) = [];
mserRegions(filterIdx) = [];
% Show remaining regions
figure
imshow(maskedGrayImage)
hold on
plot(mserRegions, 'showPixelList', true,'showEllipses',false)
title('After Removing Non-Text Regions Based On Geometric Properties')
hold off
% Get bounding boxes for all the regions
bboxes = vertcat(mserStats.BoundingBox);
% Convert from the [x y width height] bounding box format to the [xmin ymin
% xmax ymax] format for convenience.
xmin = bboxes(:,1);
ymin = bboxes(:,2);
xmax = xmin + bboxes(:,3) - 1;
ymax = ymin + bboxes(:,4) - 1;
% Expand the bounding boxes by a small amount.
expansionAmount = 0.02;
xmin = (1-expansionAmount) * xmin;
ymin = (1-expansionAmount) * ymin;
xmax = (1+expansionAmount) * xmax;
ymax = (1+expansionAmount) * ymax;
% Clip the bounding boxes to be within the image bounds
xmin = max(xmin, 1);
ymin = max(ymin, 1);
xmax = min(xmax, size(maskedGrayImage,2));
ymax = min(ymax, size(maskedGrayImage,1));
% Show the expanded bounding boxes
expandedBBoxes = [xmin ymin xmax-xmin+1 ymax-ymin+1];
IExpandedBBoxes = insertShape(rgbImage,'Rectangle',expandedBBoxes,'LineWidth',3);
figure
imshow(IExpandedBBoxes)
title('Expanded Bounding Boxes Text')
% Compute the overlap ratio
overlapRatio = bboxOverlapRatio(expandedBBoxes, expandedBBoxes);
% Set the overlap ratio between a bounding box and itself to zero to
% simplify the graph representation.
n = size(overlapRatio,1);
overlapRatio(1:n+1:n^2) = 0;
% Create the graph
g = graph(overlapRatio);
% Find the connected text regions within the graph
componentIndices = conncomp(g);
% Merge the boxes based on the minimum and maximum dimensions.
xmin = accumarray(componentIndices', xmin, [], @min);
ymin = accumarray(componentIndices', ymin, [], @min);
xmax = accumarray(componentIndices', xmax, [], @max);
ymax = accumarray(componentIndices', ymax, [], @max);
% Compose the merged bounding boxes using the [x y width height] format.
textBBoxes = [xmin ymin xmax-xmin+1 ymax-ymin+1];
% Show the final text detection result.
ITextRegion = insertShape(rgbImage, 'Rectangle', textBBoxes,'LineWidth',3);
figure
imshow(ITextRegion)
title('Detected Text')
[bboxesRow, bboxesCol] = size(textBBoxes);
results = ocr(maskedGrayImage, textBBoxes, 'TextLayout', 'Line', 'CharacterSet', '2345');
digits = zeros(bboxesRow,1);
for j = 1:bboxesRow
digits(j,1) = str2double(results(j,1).Text);
end
digits(isnan(digits)) = 0;
% draw boxes around the digits
Idigits = insertObjectAnnotation(frame,'rectangle',textBBoxes,digits);
figure;
imshow(Idigits);
title('Recognized Text')
Shashank Gupta
Shashank Gupta 2020년 10월 30일
Hey Odey,
I haven't tried it personally, If you haven't solved it yet. i will give it a try. Also, in the first glance of the code, I can definitely make one point that these types of codes which involve a lot of hyperparameter generally hard to generalise and even if you able to solve the problem for this specific image, it will be difficult to generalise it to other images, A lot of things can impact the solution such as exposure of light in image, contrast change, basically all sort of image property. Neverthless, if you sure that such changes are consistent in all the images which you are trying, then sure i'll also give it a try for you.
Thanks.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by