textscan or import of unicode encoded textfile
조회 수: 2 (최근 30일)
이전 댓글 표시
Question 1: Are textscan and importdata supposed to work with unicode encoded text file?
Question 2: After UTF-8 encoded file is opened with the correct encoding spec in the fopen argument, textscan output puts the following three characters  preceding the very first valid data I have in the file. Is this expected behavior undocumented?
댓글 수: 0
답변 (2개)
Anne
2011년 12월 5일
I have the same problem with my old MATLAB 7.3.0. Textscan won't read correctly unicode files, but it can deal with unicode formatted strings.
Thus a simple (but slow) workaround is to read text first with scanf and run textscan on the text.
[f,msg]=fopen(nomfic,'r','n','UTF-8');
LIGNES=textscan(f,'%[^\n]','delimiter','\n');
won't work with unicode encoded characters but
[f,msg]=fopen(nomfic,'r','n','UTF-8');
txt=fscanf(f,'%c');
LIGNES=textscan(txt,'%[^\n]','delimiter','\n');
will.
댓글 수: 0
Walter Roberson
2011년 9월 22일
Answer 1: textscan() is; I do not know about importdata
Answer 2: When you explicitly specify one of the UTF-* as the encoding, the MATLAB code will not look for a Byte Order Mark, and will leave any Byte Order Mark in the file stream. If you do not explicitly specify the encoding, then the byte stream will be examined for a Byte Order Mark and if found the encoding will be determined by that.
It is not recommended that a Byte Order Mark be used with UTF-8, but some Windows editors insert it anyhow. The Byte Order Mark represented in UTF-8 is 0xEF,0xBB,0xBF which show up exactly as the characters you notice. See reference
I have not examined to see whether it makes a difference as to whether you opened the file with 'r' or 'rt' . I use 'rt' when referring to text files, as it can make a difference in some instances.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Import and Export에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!