Out of memory using textscan - fail to read in part a part at a time

조회 수: 4 (최근 30일)
Jesper Kamp Jensen
Jesper Kamp Jensen 2015년 11월 6일
댓글: Walter Roberson 2015년 11월 8일
Hi,
I hope some one can be helpful as I have been struggling with this problem for a while now.
I have huge datafiles (3-4 GB), that I want to read in and save part of in new and smaller files.
When I run the code below it will eventually run out of memory, but for each run in the while loop the matrices G1 and MAT will be reassigned? Or am I missing something obvious? Another problem is that, the files that I manage to save before the computer runs out of memory, is not separated by N - it is only shifted 4 lines so 4 new lines are added in each new saved file.
I really hope someone can be helpful! Thanks for your time.
Best regards Jesper
N=10*60*60*48;
while ~feof(fid)
G1 = textscan(fid,formatSpec,N,'HeaderLines','8','Delimiter',' '); % Read in one block at the time at saving it temporarily
MAT=cell2mat(G1(:,[6 8 10 11 13 15 17 18 19 20 21 22 23 24 25])); % Collection in the rigth structure
k=k+1 % Counting for saving with new name each time
k_str=num2str(k);
sti1 = ['C:\Data\Youngsund\Dat files\' name '-' k_str '_new.dat'];
dlmwrite(sti1,MAT);
end

답변 (2개)

Walter Roberson
Walter Roberson 2015년 11월 6일
Yes, all of your variables except k will get overwritten each time through the while loop. You could be even more explicit about that by adding a "clear" statement.
If only 4 lines at a time are getting added then the implication is that your formatSpec fails to match the input either some time in the 4th line or at the beginning of the 5th line.
It is not obvious to me why you would be running out of memory, but one thing I would suggest is that you replace the dlmwrite() with a few lines of code that output the way you want. For example before the loop,
outfmt = repmat('%g,', 1,15);
outfmt(end:end+1) = '\n';
Then replace the dlmwrite with
outfid = fopen(sti1, 'wt');
fprintf(outfid, outfmt, MAT.'); %you need the .' that is there!
fclose(outfid);

Jesper Kamp Jensen
Jesper Kamp Jensen 2015년 11월 6일
Thanks for your answer. When applying the changes you suggest (only for a few runs) it gives good results - so I must see whether it can run all the way through the file without getting out of memory.
However, something must be wrong with formatSpec - I've tried to set N=4 and for k=1 I get four lines 1, 2, 3 and 4, but for the following:
  • k=2, lines: 5, 6, 7 and 8
  • k=3, lines 8, 9, 10 and 11
  • k=4, lines 8, 9, 10 and 11
  • k=5, lines 8, 9, 10 and 11
  • k=6, lines 9, 10, 11 and 12
  • k=7, lines 10, 11, 12 and 13
  • k=8, lines 11, 12, 13 and 14
  • k=9, lines 11, 12, 13 and 14
  • k=10, lines 12, 13, 14 and 15
Is it only the formatSpec that is causing that?
formatSpec='%4s %f %f %f %f %f %c %f %c %f %f %c %f %c %f %c %f %f %f %f %f %f %f %f %f %*[^\n]';
  댓글 수: 1
Walter Roberson
Walter Roberson 2015년 11월 8일
I do not understand what the chart of k and line numbers is intended to indicate?
Could you post about 9 lines of input?
My guess is that you have a problem with blanks and using %s and %c.
Note: you can simplify your processing by using
formatSpec = '%*4s %*f %*f %*f %*f %f %*c %f %*c %f %f %*c %f %*c %f %*c %f %f %f %f %f %f %f %f %f %*[^\n]'
;
G1 = textscan( fid, formatSpec, N, 'HeaderLines', '8', 'Delimiter',' ', 'CollectOutput', 1);
MAT = G1{1};

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by