Read specific rows from a large .csv

조회 수: 5 (최근 30일)
Lorenzo
Lorenzo 2016년 7월 6일
댓글: Steven Hunsinger 2022년 9월 14일
Hi,
I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.
Unfortunately the file is composed like this:
Varname_1 timestring(t=0) valueX valueY
Varname_2 timestring(t=0) valueX valueY
...
Varname_n timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_2 timestring(t=1) valueX valueY
...
Varname_n timestring(t=1) valueX valueY
...
... and so on
My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:
Varname_1 timestring(t=0) valueX valueY
Varname_1 timestring(t=1) valueX valueY
Varname_1 timestring(t=2) valueX valueY
...
Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)
------------------------------------ EDIT: example data
output would be i.e.
var2 10:10:10 16,1010138923
var2 10:10:20 89,1560542863
var2 10:10:30 69,557621819
var2 10:10:40 9,9246195517
  댓글 수: 3
Lorenzo
Lorenzo 2016년 7월 6일
Sorry! Means the decimal delimiter is not a point. Its a comma. Example: 12,34 instead of 12.34
dpb
dpb 2016년 7월 6일
That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.

댓글을 달려면 로그인하십시오.

채택된 답변

Image Analyst
Image Analyst 2016년 7월 6일
Use readtable() and then search column 1 for the filename pattern you want. Attach a small example with wanted and unwanted filenames if you can't figure it out.

추가 답변 (2개)

dpb
dpb 2016년 7월 6일
편집: dpb 2016년 7월 6일
Untested, but check that the pattern matching format string doesn't solve the problem directly...
vName='Varname_1'; % the variable name you're looking for
fmt=[vName '%s %f %f']; % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);
As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...
Well, doggonit, magic doesn't happen, joy didn't ensue... :(
But, the original idea isn't difficult...
while ~feof(fid)
l=fgetl(fid);
if strfind(l,vName)
data{i}=textscan(l,fmt);
end
end
fid=fclose(fid);
worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.
I thought
while ~feof(fid)
try
data{i}=textscan(l,fmt);
catch
end
end
fid=fclose(fid);
would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(
  댓글 수: 3
dpb
dpb 2016년 7월 6일
I used textscan not csvread, IA???
He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?
Image Analyst
Image Analyst 2016년 7월 6일
Oh, sorry - I didn't notice.

댓글을 달려면 로그인하십시오.


Lorenzo
Lorenzo 2016년 7월 6일
Got it. readtable() works lightning fast. This is my approach:
1) overwrite , with . as decimal delimiter(not necessary but I need the values as numbers for postprocessing)
2) readtable
comma2point_overwrite('bigdata.csv')
T = readtable('bigdata.csv', 'Delimiter', ';');
T2 = T(find(strcmp('Durchflussmessung-H2-163bar_real', T{:,1})),:)
clearvars T;
where comma2point_overwrite() is:
function comma2point_overwrite( filespec )
% replaces all occurences of comma (",") with point (".") in a text-file.
% Note that the file is overwritten, which is the price for high speed.
file = memmapfile( filespec, 'writable', true );
comma = uint8(',');
point = uint8('.');
file.Data( transpose( file.Data==comma) ) = point;
end
Thanks for Your Help!!
  댓글 수: 1
Steven Hunsinger
Steven Hunsinger 2022년 9월 14일
Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Workspace Variables and MAT Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by