Read specific rows from a large .csv

Question

0 개 추천

data.bsp.csv

Hi,

I try to find a solution, which computes fast, to handle a big .csv (35MB). Good part is I only a certain part of the file. Basically I would like to read only rows which start with a certain name.

Unfortunately the file is composed like this:

Varname_1   timestring(t=0)   valueX   valueY
Varname_2   timestring(t=0)   valueX   valueY
...
Varname_n   timestring(t=0)   valueX   valueY
Varname_1   timestring(t=1)   valueX   valueY
Varname_2   timestring(t=1)   valueX   valueY
...
Varname_n   timestring(t=1)   valueX   valueY
...
... and so on

My idea would be to read the .csv-file line by line check for Varname = Varname1 i.e. and write it to an cellarray (or 4 vectors) like this:

Varname_1   timestring(t=0)   valueX   valueY
Varname_1   timestring(t=1)   valueX   valueY
Varname_1   timestring(t=2)   valueX   valueY
...

Any idea for a smart code? Thank You! (add. notes: varname = string, time = string, value = number with , separated decimal)

------------------------------------ EDIT: example data

output would be i.e.

var2 10:10:10 16,1010138923

var2 10:10:20 89,1560542863

var2 10:10:30 69,557621819

var2 10:10:40 9,9246195517

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Lorenzo 2016년 7월 6일

Sorry! Means the decimal delimiter is not a point. Its a comma. Example: 12,34 instead of 12.34

dpb 2016년 7월 6일

That, I think, you'll have to fixup outside Matlab; don't think it knows how to handle it?? If it's csv separated, that's a problem for certain.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Image Analyst 2016년 7월 6일

1 개 추천

Use readtable() and then search column 1 for the filename pattern you want. Attach a small example with wanted and unwanted filenames if you can't figure it out.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

dpb 2016년 7월 6일

편집: dpb 2016년 7월 6일

MATLAB Online에서 열기

1 개 추천

Untested, but check that the pattern matching format string doesn't solve the problem directly...

vName='Varname_1';       % the variable name you're looking for
fmt=[vName '%s %f %f'];  % match vName, string, two numerics
fid=fopen('yourbigfile.csv','r');
data=textscan(fid,fmt,'delimiter',',');
fid=fclose(fid);

As said I'm not positive, but I think there's at least a reasonable chance the pattern-matching will do what you're looking for. Worth a shot methinks...

Well, doggonit, magic doesn't happen, joy didn't ensue... :(

But, the original idea isn't difficult...

while ~feof(fid)
  l=fgetl(fid);
  if strfind(l,vName)
    data{i}=textscan(l,fmt);
  end
end
fid=fclose(fid);

worked for a sample file albeit I used space-delimited and '.' as the decimal indicator; I think that'll still be a problem.

I thought

while ~feof(fid)
  try
    data{i}=textscan(l,fmt);
  catch
  end
end
fid=fclose(fid);

would work around the issue but it didn't; textscan simply gave up and quit reading anything once if failed; it doesn't throw an error, it just throws up its hands silently. :(

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

dpb 2016년 7월 6일

I used textscan not csvread, IA???

He's also got comma as the decimal indicator and says he's got a .csv file in which case it's indeterminable--which comma is a delimiter and which is a decimal point?

Image Analyst 2016년 7월 6일

Oh, sorry - I didn't notice.

댓글을 달려면 로그인하십시오.

Answer 3

Lorenzo 2016년 7월 6일

MATLAB Online에서 열기

0 개 추천

Got it. readtable() works lightning fast. This is my approach:

1) overwrite , with . as decimal delimiter(not necessary but I need the values as numbers for postprocessing)

2) readtable

comma2point_overwrite('bigdata.csv')
T = readtable('bigdata.csv', 'Delimiter', ';');
T2 = T(find(strcmp('Durchflussmessung-H2-163bar_real', T{:,1})),:)
clearvars T;

where comma2point_overwrite() is:

function    comma2point_overwrite( filespec )
    % replaces all occurences of comma (",") with point (".") in a text-file.
    % Note that the file is overwritten, which is the price for high speed.
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data( transpose( file.Data==comma) ) = point;
end

Thanks for Your Help!!

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

Steven Hunsinger 2022년 9월 14일

Not so lightning fast if you get your company network involved. 67.5MB with a breakpoint after readtable. 10 minutes. This might be OK if I need all that data loaded into RAM, but seems excessive for reading the first line or so. Is there a better way?

댓글을 달려면 로그인하십시오.

Read specific rows from a large .csv

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

카테고리

태그

Community Treasure Hunt

Read specific rows from a large .csv

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기