Accessing data from a file without storing it

Question

Sebastian Ciuban 2014년 11월 16일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/162919-accessing-data-from-a-file-without-storing-it

편집: dpb 2014년 11월 17일

채택된 답변: dpb

read.zip

MATLAB Online에서 열기

Hello guys,

I have this function that reads and stores data from a file:

    function [data] = read()
    [file,path] = uigetfile('.txt');
    fid = fopen(fullfile(path, file), 'rt');
READS UNTIL END OF HEADER
tline = fgets(fid);
while (~strcmp(tline(1:3),'END'))
tline = fgets(fid);
end
k=0;
while ~feof(fid)
    tline = fgets(fid);
sat=tline;
index=strfind(tline,'G');
[~,n]=size(index) ;
A = zeros(4,n);
for i=1:n
    A(1,i)=str2num(sat(index(i)+1:index(i)+2));
    tline = fgets(fid);
    a= sscanf(tline,'%f');
    A(2:4,i) = [a; NaN(2-length(a),1)];
end
k = k+1;
data(k).info=A;
end
end

and this is the exemple of the file that contains my data:

THIS HEADER CONTAINS
INFORMATION AND OTHER
THINGS THAT ARE NOT NEEDED
AT THE MOMENT
END OF HEADER
4G05G16G21G25
60000 30000 10000
60001 30001 10002
60002 30002 10001
60003 30003 10003
3G02G03G15
50000 20000 10001
50001 20001 10002
50002 20002 10001
4G06G11G17G20
30000 10000 10001
30001 10005 10008
30002 10002 10001
30003 10003 10004

The function works fine BUT I wonder if I can access the data from my file WITHOUT storing it. In case I have a larger file, this function takes too long to run. In the example I posted, as you see I have 3 blocks of data but my original file have 2880 blocks..It take like 10 seconds for my function to read and store and afterwards I need to use that data for different algorithms. Is there any way to access the information directly from the file and use it in formulas/algorithms?

Thank you very much.

PS: I have attached both of the files in case someone have some spare time to throw a look.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2014년 11월 16일

편집: dpb 2014년 11월 16일

MATLAB Online에서 열기

1) Are the number of headerlines known a priori?

2) In the section header 4G05G16G21G25 the number of components is given as the first numeric value; reading that number would be faster than the strfind solution used.

3) with those two pieces(), could get rid of using *fget and go directly to textscan or fscanf

4) The big slowdown for larger files, is, however, the dynamic reallocation at the end

A(2:4,i) = [a; NaN(2-length(a),1)]

Either use cell arrays for each section and cell2mat when done or allocate a large array initially and only reallocate yet another big chunk when is full and not yet thru the file. Probably can guesstimate reasonably closely the number of blocks from file size and a known average number for the size of the blocks.

() If the header isn't fixed size, the *fget solution is ok; it won't be the major bottleneck as it's only at the beginning and (at least presumably) not terribly long. If it were a sizable fraction of the whole file, that would be something else again and some other strategies might pay. But, unless told so, that's simply a diversion at the moment.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

dpb 2014년 11월 16일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/162919-accessing-data-from-a-file-without-storing-it#answer_159149

편집: dpb 2014년 11월 17일

MATLAB Online에서 열기

Per comments above, my initial pass would look something like

n=cell2mat(textscan(fid,'%dG%*s',1,'headerlines',5));  % first n, skip header
idx=0;    % cell array indexing var
while ~feof(fid)
  idx=idx+1;
  data(idx)={fscanf(fid,'%d',[3,n]).'};
  n=cell2mat(textscan(fid,'%dG%*s',1); % textscan handles line for us...
end

At end can combine the various cells within the data array via cell2mat or if it's useful, have them this way by the individual groups.

ADDENDUM

As for the question regarding "any way to access the information directly from the file and use it in formulas/algorithms?" that's what reading the file is. And, no, unless the data are in memory there's no way you can operate on them. And certainly to read a value from disk for each operation individually to do such would be orders of magnitude even slower than the present processing line-by-line is.

The solution is to read as much data as possible in each operation using the builtin features of the formatted i/o libraries as heavily as can rather than doing the individual line parsing as present. There are times when that's the only way, but other than possibly finding the initial header, here isn't one of those times; this is a very regular file format.