Read large dat file without intermediate text lines

조회 수: 2 (최근 30일)
Prodip Das
Prodip Das 2019년 5월 17일
댓글: Prodip Das 2019년 5월 17일
I have a rather large dat file (~10 Gb) which I import into Matlab. This contains around 15 columns of data with 5 text lines inserted every once in a while (quite random).
Something in this flavor:-
TextLine1
TextLine2
TextLine3
TextLine4
TextLine5
-63.314206 26.987558 -7.394694 0.77476 0.388499 0.026758 0.05441 0.393202 0 1.7368 -10.191015 -30.842446 32.528906
-63.269659 19.946536 -3.917253 2.545467 1.074432 0.049985 0.030154 1.076017 1 -1.12505 12.880929 35.52709 37.806854
AnotherTextLine1
AnotherTextLine2
AnotherTextLine3
AnotherTextLine4
AnotherTextLine5
-61.372225 10.331962 -5.238261 0.403139 1.152237 -0.02932 -0.114794 1.158313 7 -3.856155 10.778429 47.319591 48.684578
-61.169832 18.776033 0.266432 0.746247 0.691331 0.001851 -0.015114 0.691499 8 1.08805 29.457441 77.521452 82.936724
-61.084236 13.463656 -0.325214 0.725108 0.980495 -0.116528 0.051806 0.988753 9 -5.922746 7.106945 0.756297 9.282218
-62.760365 24.293523 -0.478205 0.532399 0.881855 0.031047 -0.076085 0.885675 2 1.734042 -12.800895 66.549994 67.79212
I use textscan to read the file skipping the text lines and loading only the number columns (taking a cue from here https://stackoverflow.com/questions/21160220/how-to-read-only-numerical-data-into-matlab-and-ignore-any-text)
But as you can guess, for a file this size it takes forever to run through. Is there a better way of accomplishing this?
Thanks!

답변 (1개)

KSSV
KSSV 2019년 5월 17일
편집: KSSV 2019년 5월 17일
As the number of text lines are fixed i.e 5. We can take advantage of this. We can skip the text lines and read the numeric data using textscan. Check below code:
result = cell([],1) ;
fmt = repmat('%f ',1,13) ;
fid=fopen('data.txt');
HeaderLines = 5 ;
count = 0 ;
while 1
count = count+1 ;
S = textscan(fid,fmt,'HeaderLines',HeaderLines) ;
result{count,1} = cell2mat(S) ;
HeaderLines = 5 ;
if isempty(result{count}) % this assumes end of the file
break
end
end
result(end) =[] ; % remove the last empty cell
result = cell2mat(result) ; % convert to matrix
fclose(fid);
  댓글 수: 1
Prodip Das
Prodip Das 2019년 5월 17일
Thanks for the reply.
From what I understand, even this would scan through the entire dataset right?
I am looking for a faster way to achieve this, for example a 10 gig dataset takes me around 10-20 mins to load (using fscanf) with just a header on top (which I remove using fgetl). But with multiple text blocks in middle, and using text scan it takes forever. Eg: A smaller dataset of 0.2 gb takes hours.
So given there are around ~150 million (or more!) lines in my dataset (10 gb), is there a faster way of approaching this?
Thanks again!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by