Can't read formatted data (textread, textscan, others)

For the life of me, I can't figure out how to properly use textread, textscan and other similar formatted text functions. I'd like to read a file like this ('test2.txt'):
header1 header2 header3 header4
abc 1 2 3
def 4 5 6
ghi 7 8 9
And build a matrix = [1 2 3; 4 5 6; 7 8 9] and a cell array containing {abc; def; ghi}. From examples posted here and elsewhere, this should work:
fid = fopen('test2.txt');
data = textscan(fid,'%s %f %f %f','delimiter',' ','headerlines',1)
fclose(fid);
But it doesnt. Output:
data =
{1x1 cell} [0x1 double] [0x1 double] [0x1 double]
and the 1x1 cell contains just ''
I've since tried ~ a dozen other examples of this function and similar functions and haven't gotten any to work!
For example: http://www.mathworks.com/matlabcentral/answers/21810-reading-a-text-file - Using Jan's code and the OP's data which is formatted similarly to mine, I get the same problem as above: a bunch of empty cells/vectors.
Another recent post: http://www.mathworks.com/matlabcentral/answers/24995-simple-file-i-o-problem-help-needed - Same deal. Friedrich's solution doesn't give me the same output as he shows.
Finally, I copied/pasted the example in 'help textscan'. Same problem, except the first cell does contain some gibberish 'ÿþS'. What am I missing here? Thanks for your time.
SOLUTION. (In comments section in Walter Roberson's Answer) Notepad defaulted to saving files as Unicode format. When saving a text file in notepad, changing the "Encoding" option (near bottom of save as dialog window) from Unicode to either ANSI or UTF-8 resulted in proper code execution. Thank you!

 채택된 답변

Walter Roberson
Walter Roberson 2012년 1월 2일

1 개 추천

gibberish 'ÿþ' tells us that your file is encoded by UTF-16 Little Endian.
Please try with
fopen('test2.txt', 'rt')
so that your file is treated as a text file rather than as a binary file.

댓글 수: 10

M S
M S 2012년 1월 2일
No dice. Same result. Note, I don't get that gibberish with the first test case, just an empty cell. Still get an empty cell.
There should be no need to use ' ' as the delimiter, and it could be that it is interfering with the parsing. The default is "whitespace" which includes tab and vertical tab and spaces and newlines.
M S
M S 2012년 1월 3일
OK, tried that. Also no change! The call now is:
fid = fopen('test2.txt','rt');
data = textscan(fid,'%s %f %f %f','headerlines',1)
Do you have a small file you could try with? If so, could you code
fid = fopen('test2.txt','r');
reshape( fread(fid, 'char=>uint8'), 1, [])
and show us the output of that ?
M S
M S 2012년 1월 3일
Sure.
Columns 1 through 22
255 254 104 0 101 0 97 0 100 0 101 0 114 0 49 0 32 0 104 0 101 0
Columns 23 through 44
97 0 100 0 101 0 114 0 50 0 32 0 104 0 101 0 97 0 100 0 101 0
Columns 45 through 66
114 0 51 0 32 0 104 0 101 0 97 0 100 0 101 0 114 0 52 0 13 0
Columns 67 through 88
10 0 97 0 98 0 99 0 32 0 49 0 32 0 50 0 32 0 51 0 13 0
Columns 89 through 110
10 0 100 0 101 0 102 0 32 0 52 0 32 0 53 0 32 0 54 0 13 0
Columns 111 through 130
10 0 103 0 104 0 105 0 32 0 55 0 32 0 56 0 32 0 57 0
M S
M S 2012년 1월 3일
Walter--you're definitely onto something. Tried using dlmwrite() to write a matrix, then read it using textscan and it works perfectly. No header though and not mixed data, all %f. But first time I've gotten it to read right. So this has something to do with notepad's formatting?
I have recreated the file here, but unfortunately I do not have access to MATLAB tonight to experiment with it.
In the mean time, you might want to see if notepad has a way to save as plain text, or as UTF-8 .
The file is currently in UTF-16 Little Endian for sure.
M S
M S 2012년 1월 3일
Walter, solved. Code now works if I save as as ANSI and UTF-8 formats, but NOT unicode (what it was--and that must be little endian?) and not unicode big endian. Thank you so much. Those are the only 4 options, which do you consider "plain text"? Thanks again.
"plain text" is ASCII or ISO-8896-1
Walter Roberson
Walter Roberson 2012년 1월 3일
편집: per isakson 2015년 9월 4일
Which MATLAB version are you using? I found a thread indicating a textscan issue in some earlier versions and showing a work-around: http://www.mathworks.com/matlabcentral/answers/16493-textscan-or-import-of-unicode-encoded-textfile

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

태그

질문:

M S
2012년 1월 2일

편집:

2015년 9월 4일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by