Read csv-File fast / convert csv to mat

Question

David 2014년 4월 9일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat

편집: per isakson 2017년 8월 23일

Hello everybody,

From a datalogger i have a csv-file which looks like this:

"s";"m";"";"";"deg";"";"m/s≤";"m/s≤";"m/s≤";"bar";"bar";"";"";"";"";"%";"1/s";"";"km/h";"";"V";"bar";"bar";"∞C";"∞C";"";"∞";"∞";"m";"∞";"Km/h";"km/h";"m/s≤";"deg/s";"m/s≤";"∞";"sec";"m"
   "0,000";"0";"2,00";"10,48";"193,01";"11,08";"0,99";"0,21";"5,49";"0,85";"0,86";"50,00";"54,00";"53,00";"71,00";"5,5";"0,00";"1,0";"0,00";"0";"12,46";"0,00";"3,85";"-40,0";"-40,0";"0,00";"9,272901";"48,995935";"800,40";"1,00";"0,07";"0,1";"-0,01";"0,00";"0,00";"0,00";"0,00";"0,00"

What i've done yet is i created a little GUI tool to plot the data in different graphs, that works fine. To read the data, i've used the import tool from matlab itself, but the csv file is more than 200MB big, to import this takes realy long. And the csv will is becoming bigger in the next version because there will be more channels added.

So my question is the following: first, i want to load the csv-file faster and second it would be nice if the script imports automatically all columms, so it doesn't mater if the csv has 3 or 30 columms.

It would be nice if someone could help me with this problem :)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2014년 4월 9일

Well, you can try importdata and see how it goes for speed but probably best you'll do will be with textread or textscan

The other delimited-file routines such as csvread aren't able to deal with the header row. If you could do without it, keeping it separately it might help.

Of course, if speed and file size is the issue, not writing a text file to begin with would be the way around the problem. If at all possible, use a stream file instead.

If you must stay formatted, I'd seriously recommend ditching the embedded headerline for a separate info file and then I'm very partial to the (deprecated) textread for the job as it returns a "regular" array instead of cell array by default.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

per isakson 2014년 4월 9일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_132879

편집: per isakson 2014년 4월 10일

MATLAB Online에서 열기

Warning: These functions change your data files. Operate on a copies!

Here are two functions to test regarding speed. I believe the first function will do the job in less time. This approach with a separate step, c2p, works well when there is enough memory for the entire file to fit in the system cache.

Both functions return

    >> whos M
      Name      Size            Bytes  Class     Attributes
      M         3x38              912  double

Run

M = cssm_q( 'cssm.txt' );

where (in one m-file)

    function  num = cssm_q( filespec )
        c2p( filespec );
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ';' ) ) + 1; 
        frm = repmat( '%f', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',';', 'CollectOutput',true );
        num = cac{1};
        fclose( fid );
    end
    function c2p( filespec )
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data(( file.Data==comma)' ) = point;
        quote   = uint8('"');
        space   = uint8(' ');
        file.Data(( file.Data==quote)' ) = space;
    end

and where cssm.txt contains your header row together with a few copies of your data row.

.

Second function is

    function  num = cssm( filespec )
        c2p( filespec );
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ';' ) ) + 1; 
        frm = repmat( '%q', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',';' );
        num = nan( length(cac{1}), length(cac) );
        for jj = 1 : length(cac)
            num( :, jj ) = str2num( char( cac{jj} ) ); 
        end
    end
    function c2p( filespec )
        file    = memmapfile( filespec, 'writable', true );
        comma   = uint8(',');
        point   = uint8('.');
        file.Data(( file.Data==comma)' ) = point;
    end

This is a few columns of the data file after running cssm_q

    s ; m ;  ;  ; deg ;  ; m/s? ; m/s? ; m/s? ; bar ; 
000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;
000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;
000 ; 0 ; 2.00 ; 10.48 ; 193.01 ; 11.08 ; 0.99 ;

The "?" in the header shows that I have made a mistake regarding the text encoding.

.

"but i can also export the data without the double quotes, so the data is just seperated by commas"

And I assume "." as decimal separator. That's more Matlab-friendly. Try dlmread, which actually calls textscan, and cssm_c, which I believe to be somewhat faster. To be fair dlmread has better error handling.

    function  num = cssm_c( filespec )
        fid = fopen( filespec, 'r' );
        str = fgetl( fid );
        ncl = length( strfind( str, ',' ) ) + 1; 
        frm = repmat( '%f', [1,ncl] );
        cac = textscan( fid, frm, 'Delimiter',',', 'CollectOutput',true );
        num = cac{1};
        fclose( fid );
    end

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

per isakson 2014년 4월 10일

Your timing was that with cssm_q? Did you compare cssm_q and cssm?

Joseph Cheng 2014년 4월 10일

Oopse, it's late in the day and thought i should stop playing and get back to work. Didn't notice i just kind of left it vague. The test was for cssm_q. cssm took longer than my trip to and from the vending machine so... i killed the process.

댓글을 달려면 로그인하십시오.

Answer 2

Image Analyst 2014년 4월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_132881

편집: Image Analyst 2014년 4월 9일

Doesn't look like a csv file to me, which should be numbers separated by commas. You have double quotes and semicolons in between the commas in addition to the numbers. It is supposed to be only numbers. You might be able to use dlmread, if your delimeters are actually semicolons and not commas. You might also try readtable(), which is what you use for reading in tables that have mixed data types such as character strings along with numbers on the same row.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

per isakson 2014년 4월 9일

편집: per isakson 2014년 4월 9일

In some parts of the world "," is used as decimal delimiter and ";" as list delimiter. Despite these delimiter characters we often call the file "csv". It certainly causes some confusion and extra work.

댓글을 달려면 로그인하십시오.

Answer 3

David 2014년 4월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_132882

First, a big thank you for your answers.

@isakson, I will try your code tomorrow.

@Image Analst, you're right, the posted lines are not realy "csv", but i can also export the data without the double quotes, so the data is just seperated by commas ;)

댓글 수: 2
없음 표시없음 숨기기

Image Analyst 2014년 4월 9일

What about the semicolons? It almost looks like some European way of doing it where everything is different, like decimals are commas, and commas are semicolons.

per isakson 2014년 4월 9일

http://www.iso.org/schema/isosts/v1.0/doc/n-cdf0.html

댓글을 달려면 로그인하십시오.

Answer 4

David 2014년 4월 10일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_132968

MATLAB Online에서 열기

Well done! Thank you for the code isakson, it works very well. Now it it takes just 48 seconds, instead of round about 20 minutes. For a CSV with 637824x40.

Now the only thing is, the header has now 2 lines: The two lines look like that:

if true

Time;Dist;ACC;ACC;ACC;...;...;...;...;voltage1;pressure1;pressure2;pressure3;...;sensor1;...;sensor2;sensor3;sensor4;sensor5;...;speed;...;...;...;...;...;...;...;...;...;...;...;temp1;pressure;temp2 s;m;m/s≤;m/s≤;m/s≤;m;m/s≤;m/s≤;∞;V;bar;bar;bar;deg;%;;;;;;;km/h;∞;deg/s;;;∞;∞;;1/s;;km/h;Km/h;∞C;bar;∞C

end

But I don't understand how to skip two lines when reading the data with the following code:

if true
     function  num = cssm_c( filespec )
      fid = fopen( filespec, 'r' );
      str = fgetl( fid );
      ncl = length( strfind( str, ',' ) ) + 1; 
      frm = repmat( '%f', [1,ncl] );
      cac = textscan( fid, frm, 'Delimiter',',', 'CollectOutput',true );
      num = cac{1};
      fclose( fid );
  end
end

And then there is another thing, it would be nice if the Data would be stored in Vectors for each columm, which are named with the names of the columms from the first header line.

Could you help me also a little for this problem?

And again a huge thank you for your help!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

per isakson 2014년 4월 10일

편집: per isakson 2014년 4월 10일

MATLAB Online에서 열기

One option is to replace

str = fgetl( fid );

by

    st1 = fgetl( fid );
    st2 = fgetl( fid );

If you do not need to keep the header line you may use the option, Headerlines, of textscan, i.e

cac = textscan( fid, ....., 'Headerlines', 2 );

Mistake: I forgot that the header line is needed to find the number of columns

    ncl = length( strfind( str, ';' ) ) + 1; 
    frm = repmat( '%f', [1,ncl] );

However, those two solutions will break when you get a file with a different number of header lines. A more flexible solution will include something like

    str = 'dummy'; n = -1;
    while isempty( regexp( str, '^[0-9\+\-\.\,;" ]{100,}$', 'match' ) )
        str = fgetl( fid );
        n = n + 1;
    end
    ....
    frewind( fid )
    cac = textscan( fid, frm, ....., 'Headerlines', n );

not tested

댓글을 달려면 로그인하십시오.

Answer 5

David 2014년 4월 11일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_133100

MATLAB Online에서 열기

Well again a big thank you to you!

The file has everytime two headerlines, so everything's ok :)

The only thing that isn't working is the part that should assign each columm to a vector with the name from the first header line.

I'm working with this code:

if true
 function  num = cssm_c( filespec )
    fid = fopen( filespec, 'r' );
    st1 = fgetl( fid );
    st2 = fgetl( fid );
    col_names = strsplit(st1,';');
    ncl = length( strfind( st2, ';' ) ) + 1; 
    frm = repmat( '%f', [1,ncl] );
    cac = textscan( fid, frm, 'Delimiter',';', 'CollectOutput',true );
    num = cac{1}; 
    fclose( fid );
    [num_length num_width] = size(num);
    for i=1:num_width    
        assignin( 'base' , col_names(i) , num(:,i) );
    end  
 end
end

When debugging it I can see everything is right, in col_names are the names for the columms, but when it comes to the assignin code, i get the following error:

if true

Error using assignin Unknown command option.

Error in cssm_c (line 20) assignin( 'base' , col_names(i) , num(:,i) );

end

And i don't understand where the problem is with the assignin. Have you any idea?

Just for information, the vectors should be stored in a mat-file later.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

David 2014년 4월 12일

편집: David 2014년 4월 12일

MATLAB Online에서 열기

Again, thank you, this works also.

But I've got now other problems, in the GUI I've got the following function for an button:

% --- Executes on button press in pushbutton_load_csv. function pushbutton_load_csv_Callback(hObject, eventdata, handles) 
% hObject handle to pushbutton_load_csv (see GCBO) 
% eventdata reserved - to be defined in a future version of MATLAB 
% handles structure with handles and user data (see GUIDATA)
csv_file = uigetfile('*.csv', 'Pick a *.csv Logging File'); %%%csv_file = [filename, pathname];
my_struct = cssm_c(csv_file); 
assignin('base', 'raw_data', my_struct);

This works, in the base workspace I get the struct raw_data with the data in it.

And then I want to add a few more information to the struct, the date, location and a a comment for the test, so I made this:

% --- Executes on button press in pushbutton_save. function pushbutton_save_Callback(hObject, eventdata, handles) 
% hObject handle to pushbutton_save (see GCBO) 
% eventdata reserved - to be defined in a future version of MATLAB 
% handles structure with handles and user data (see GUIDATA) global raw_data
raw_data.test_date = get(handles.edit_test_date, 'String'); raw_data.test_location = get(handles.edit_test_location, 'String'); raw_data.test_comment = get(handles.edit_test_comment, 'String');
%only for debugging disp(raw_data.test_date); 
disp(raw_data.test_location); 
disp(raw_data.test_comment);

But none of the three variables are added to the struct, have you an idea why?

per isakson 2014년 4월 13일

I never use GUIDE and I'm not able to quickly find an answer. Please post this comment as a new question with "GUIDE" as one of the tags.

댓글을 달려면 로그인하십시오.

Answer 6

Dahai Xue 2015년 1월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/125185-read-csv-file-fast-convert-csv-to-mat#answer_166156

for system with Excel installed, this simply works

[~,~,myTable] = xlsread('myFile.csv');

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Read csv-File fast / convert csv to mat

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (6개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Read csv-File fast / convert csv to mat

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (6개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기