Problems when using textscan to read csv files

I have multiple csv files that needs to be loaded into Matlab. Below is my script. Weird enough, it works flawlessly for some of them, but for others, it can only read the very first row of the data, and refuse to move on to read the rest of the data. I tried to open the csv file using xlsx format and re-save them using tab-delimited txt files, then the script always works. But this is not the best option, as I have too many of the csv files. Anyone has any clue?
file1 = ['sko', name, '_final.csv']
fid = fopen(file1, 'rt');
% skip the headerlines:
line1=fgetl(fid)
fmt = [repmat('%s', 1, 25) '%*[^\n]'];
nn = 0;
while ~feof(fid);
nn=nn+1;
C = textscan(fid, fmt, 1, 'delimiter', ',', 'collectoutput', true);
S{nn}.group_ship = C{1}{2};
S{nn}.cruiseid = C{1}{3};
S{nn}.jday = str2num(C{1}{5});
end;
fclose(fid);

댓글 수: 7

dpb
dpb 2013년 11월 27일
Only way to diagnose specifically would be to see a small subset of the failing file(s).
One guess would be missing values and perhaps not all delimiters for every column in that case.
Alternatively, doublecheck for control characters in the file, particularly if came from Excel or some other formatting program w/ a penchant for "prettying up" output.
Leon
Leon 2013년 11월 27일
편집: Leon 2013년 11월 27일
Thank you for the reply.
Here is one of the failing files
dpb
dpb 2013년 11월 27일
1) They're only 22 columns of data for a fmt string of 25...can't help altho by itself it doesn't seem to confuse.
2) The killer seems to be the attempt to skip end of line -- not surprising to me that cornfoozed textscan since already past eol trying to satisfy the format string.
Killing the trailing '%*[^\n]' seems to work irrespective of correct count or no.
Leon
Leon 2013년 11월 27일
Wow, it works by removing the trailing '%*[^\n]'. I thought that is the golden key for me to find the end of line without worrying about the count of columns. It does not seem to be a good idea.
Thank you so much for the help. I wish I can accept your answer, but the solution is in the comment :)
dpb
dpb 2013년 11월 28일
OK, iff'en it'll make you feel better I'll move the comment to Answers. :)
BTW, since you're doing it the way you are (altho I wondered why not just read the whole file then select the columns wanted instead of line-by-line), why not count the number of commas in the header line and make the format field fit the actual data dynamically?
Think I'll leave this here for my fellow Googlers: I found that using
C= textscan(fid,fmt,'Delimiter',',','CollectOutput',true);
with no loop to check for the end of the file created a cell containing a correct 2D array. Specifying any sort of new line or whitespace screwed it up; it's up to smarter people to say why.
dpb
dpb 2014년 8월 14일
Indeed, default behavior is for the format string to be applied repetitively to end of file; combining that default behavior with trying to help too much trying to account for records manually does make for problems because it then tries to do what was asked for--even when not needed.
Only time things like skipping rest of record and the like are needed/useful is when, for example, one is ignoring one or more fields in each record--then it needs the extra help to be told to do that.

댓글을 달려면 로그인하십시오.

 채택된 답변

dpb
dpb 2013년 11월 28일
편집: dpb 2013년 11월 28일

0 개 추천

1) They're only 22 columns of data for a fmt string of 25...can't help altho by itself it doesn't seem to confuse. (ADDENDUM: But does return 3 empty cells at end which seems pointless?)
2) The killer seems to be the attempt to skip end of line -- not surprising to me that cornfoozed textscan since already past eol trying to satisfy the format string.
Killing the trailing '%*[^\n]' seems to work irrespective of correct count or no.
BTW, since you're doing it the way you are (altho I wondered why not just read the whole file then select the columns wanted instead of line-by-line), why not count the number of commas in the header line and make the format field fit the actual data dynamically?
Or, since you seem to be picking a fixed set of columns, why not read those only and use the skipping facility to not return the undesired ones? This then would be a place where the "skip to end of line" would come into play after the last desired column is scanned in the format string.

댓글 수: 3

Leon
Leon 2013년 11월 28일
편집: Leon 2013년 11월 28일
"I wondered why not just read the whole file then select the columns wanted instead of line-by-line), why not count the number of commas in the header line and make the format field fit the actual data dynamically?"
I'd love to do what you suggested, but do not know how. Do you have an example script to share? Thanks. I do need all the columns, I did not spell out my whole script here.
Instead of
fmt = [repmat('%s', 1, 25) '%*[^\n]'];
try
fmt = [repmat('%s', 1, numberOfCommasYouCounted) '%*[^\n]'];
where the number of commas you counted is stored in variable called numberOfCommasYouCounted (or whatever you want).
There is one more field in the line than number of columns, though, so
fmt = [repmat('%s', 1, numberCommasCounted+1) '%*[^\n]'];

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Simon
Simon 2013년 11월 28일
편집: Simon 2013년 11월 28일

0 개 추천

Hi!
Even if the question is already answered, my suggestion is
% read in file
fid = fopen('myfile.csv');
FC = textscan(fid, '%s', 'delimiter', '\n');
fclose(fid);
FC = FC{1};
% prepend separator ';'
FC = strcat(';', FC);
% read all columns
CSV = regexp(FC, ';([^;]*)', 'tokens');
It is quite fast, needs no loops and you can read in any file regardless of the number of columns.

카테고리

도움말 센터File Exchange에서 Data Import and Export에 대해 자세히 알아보기

태그

질문:

2013년 11월 27일

댓글:

dpb
2014년 8월 14일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by