Make this script faster

조회 수: 8 (최근 30일)
samy rima
samy rima 2015년 12월 9일
편집: Colin Edgar 2015년 12월 17일
Dear all,
I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:
The first line is the header with variable names, and only column number 9 is strings, the rest is double
Is there a way to make this script run faster?
Thanks for the insight
filename = 'file.txt' ;
% - Get structure from first line.
fid = fopen( filename, 'r' ) ;
line = fgetl( fid ) ;
fclose( fid ) ;
% - Build formatSpec for TEXTSCAN.
fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
% - Read full file.
fid = fopen( filename, 'r' ) ;
data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
fclose( fid ) ;
data = ([data{:}]) ;
data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);
  댓글 수: 5
jgg
jgg 2015년 12월 17일
I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.
Colin Edgar
Colin Edgar 2015년 12월 17일
I can't make fscanf ignore the first "" string, for example:
frmt = '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double

댓글을 달려면 로그인하십시오.

답변 (1개)

Colin Edgar
Colin Edgar 2015년 12월 17일
편집: Colin Edgar 2015년 12월 17일
Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.
formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
fid = fopen(flnm,'r');
t1 = fgetl(fid); %reads past heading, I know it's a hack but...
t1 = fgetl(fid);
t1 = fgetl(fid);
t1 = fgetl(fid);
mat = fscanf(fid, formatSpec, [12,inf]);
mat = mat'; %transpose to correct layout
fclose(fid);
Versus my old version which took ~15sec (similar to approach of OP)
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
fid = fopen(flnm,'r');
C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);

카테고리

Help CenterFile Exchange에서 Workspace Variables and MAT Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by