Regular expression for arabic text in matlab

I used ocr in matlab to read arabic text from image.Now I want to write a regular expression that matches a word in arabic text but it does not work

댓글 수: 19

Could you give us some specific examples to experiment with?
N Rh
N Rh 2017년 12월 21일
For example I want to extract the text "فاتورة عدد" in the attached image.
Stephen23
Stephen23 2017년 12월 21일
"...but it does not work"
This tells us nothing about what you have tried so far, nor about what the difference is between working/not working.
What have you tried? Do you get an output? How are you checking this output? If no output, do you get any error message?
N Rh
N Rh 2017년 12월 21일
the output is "????????" it means that the regular expression does not support the arabic
Guillaume
Guillaume 2017년 12월 21일
"it does not work"
As Stephen's said this is a useless statement if you don't even tell us what the "it" is. How can we know if you've made a mistake with the "it", or if you're using the "it" incorrectly, or if indeed the "it" does not support arabic.
So show us the "it", that is the exact code you're using and ideally an example input where "it" doesn't work.
N Rh
N Rh 2017년 12월 21일
편집: N Rh 2017년 12월 21일
The "it" means for example: Pattern=['(فاتورة عدد)']; P = regexp(Lines,Pattern,'match'); P=[P{:}];
Seems to work for me (R2017b):
>> Pattern = '(فاتورة عدد)';
>> Lines = {Pattern(2:end-1); [Pattern(2:end-1), '2015/02 ']; Pattern(4:5)}
>> P = regexp(Lines,Pattern,'match');
>> P = [P{:}]
Lines =
3×1 cell array
{ فاتورة عدد'}
{'فاتورة عدد2015/02 '}
{ 'تو'}
P =
1×2 cell array
{'فاتورة عدد'} {'فاتورة عدد'}
N Rh
N Rh 2017년 12월 21일
Thank you i will install R2017b and try it.
N Rh
N Rh 2017년 12월 21일
편집: N Rh 2017년 12월 21일
With R2017a I have this result:
Lines = 3×1 cell array
'?????? ???'
'?????? ???2015/02 '
'??'
P =
1×9 cell array
'??' '??' '??' '??' '??' '??' '??' '??' '??'
Guillaume
Guillaume 2017년 12월 21일
I don't think it is an issue with your matlab version as it also works for me in R2016a, R2016b and R2017a. It's probably more related to your operating system.
I'm using Win 7 (Enterprise) and didn't have to do anything special to get the above to work.
N Rh
N Rh 2017년 12월 21일
Can you tell me how you use ocr in matlab for recognition
One thing to note is that if your operating system is set to English, then MATLAB might not store .m files with UTF encoding, so when you save the .m file and close it and open it again, any arabic characters you had in the file might be gone. With newer versions there is apparently a way to force MATLAB to permit UTF-8 for .m files, but it involves editing an obscure configuration file.
Just to be sure we are all referring to the same thing:
It is not possible to use regexp() on an image, only on character vectors or cell array of character vectors or on string() arrays.
N Rh
N Rh 2017년 12월 21일
Yes i use regexp() on cell array i don't know what is the problem!!!
Please attach a .mat containing the cell array and also containing the pattern you are trying to search for.
N Rh
N Rh 2017년 12월 21일
편집: Walter Roberson 2017년 12월 21일
this is the used code, you can execute it and the image in the attached file.
clear all;close all;clc;
!tesseract -l eng+ara fac.jpg output
slCharacterEncoding('UTF-8')
fid = fopen('output.txt');
b = fread(fid,'uint8')';
fclose(fid);
a=dec2bin(b);
c=dec2hex(b);
str = native2unicode(b,'UTF-8');
disp(str);
C = textscan(str,'%s');
data=cellstr(C{1});
for i=1:length(data)
if strfind(char(data(i)), 'عدد')==1
fprintf('Numero de la facture : %s\n',char(str(i+1)))
end
end
I had to hunt around for the arabic training files for tessaract; perhaps I did not find the right ones. And I got a whole bunch of messages about
Cube ERROR (ConvNetCharClassifier::RunNets): NeuralNet is NULL
The output.txt file contained only English for me.
N Rh
N Rh 2017년 12월 21일
because you need files ara.cube.bigrams, ara.cube.fold, ara.cube.lm, ara.cube.nn, ara.cube.params, ara.cube.size, ara.cube.word-freq, ara.traineddata
N Rh
N Rh 2017년 12월 21일
편집: N Rh 2017년 12월 21일
The attached file contains the cell array of words: فاتورة, عدد, 2015/02

댓글을 달려면 로그인하십시오.

답변 (0개)

질문:

2017년 12월 20일

편집:

2017년 12월 21일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by