Import large csv as datastore with formatting errors

조회 수: 5 (최근 30일)
Russell Shomberg
Russell Shomberg 2021년 10월 6일
댓글: Russell Shomberg 2021년 10월 13일
Hi,
I am trying to import a very large CSV file as a tall table. Right now I am just testing the demonstration code, I have run into a problem! At least one row in my CSV file has a formatting error. The error prevents any code from working.
ds = tabularTextDatastore('data.csv');
tt = tall(ds);
ids = unique(tt{:,25});
ids = gather(ids);
This code produces the error.
Evaluating tall expression using the Local MATLAB Session:
- Pass 1 of 1: 99% complete
Evaluation 90% complete
Error using matlab.io.datastore.TabularTextDatastore/readData (line 77)
Mismatch between file and format character vector.
Trouble reading 'Numeric' field from file (row number 17128, field number 50) ==> " .","",""," ."," .",".",132400,"-1.0","X",2008\n
Learn more about errors encountered during GATHER.
Error in matlab.io.datastore.TabularDatastore/read (line 120)
[t, info] = ds.readData();
Error in tall/gather (line 50)
[varargout{:}, readFailureSummary] = iGather(varargin{:});
Caused by:
Reading the variable name 'Var50' using format '%f' from file: 'data.csv' starting at offset 2583691484.
I know from previous experience with this data set that one of the lines had a formatting error that then causes more errors. The file is too large to open and edit manually. Is there anyway for me to fix this error in matlab or throw out the bad lines?
Thanks!

채택된 답변

per isakson
per isakson 2021년 10월 7일
편집: per isakson 2021년 10월 7일
Comment out the offending rows
open files
while not( feof(fid) )
chr = fgetl(fid);
is_ok = analyse_row( chr );
if is_ok
fprintf( fid_out, '%s\n', chr )
else
fprintf( fid_out, '%% %s\n', chr )
end
end
close files
Test with small files. Five good rows and a couple of faulty rows.
Read the new file with CommentStyle,'%'
  댓글 수: 1
Russell Shomberg
Russell Shomberg 2021년 10월 13일
This is great! I can also write a file with all the bad rows!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 File Operations에 대해 자세히 알아보기

제품


릴리스

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by