필터 지우기
필터 지우기

Why Fread a 2 GB file needs more than 8 GB of Ram?

조회 수: 10 (최근 30일)
Gabriel
Gabriel 2013년 6월 4일
textscan is too slow.
Thus, I want to load a 2 GB file in RAM with fread (fast), then scan it.
Fread works well with small files, but if I try to fread(filename,'*char') a 2 GB file, RAM spikes for some reason over my 8 GB limit and I get out of memory.
Ideas?
  댓글 수: 2
Jan
Jan 2013년 6월 4일
Please post the full code, because there might be unexpected problems.
Gabriel
Gabriel 2013년 6월 4일
Well, the code is simple:
fid = fopen(filename);
test = fread(fid, '*char');

댓글을 달려면 로그인하십시오.

답변 (3개)

Jan
Jan 2013년 6월 4일
Reading a 2GB-file into a CHAR required 4GB of RAM, because Matlab uses 2-byte-chars. Then it is possible depending on the way you store the data, that the contents of a temporary array is copied, such that 8GB is the expected memory consumption. But actually I'd expect that this copy could be avoided, so it might be helpful, if you show us the code fragment.
  댓글 수: 2
Gabriel
Gabriel 2013년 6월 4일
Precisely, I expect it to require 4GB, yet watching system monitor, the whole things goes over 8GB and into swap.
I also get the copied into functions parts, etc. But shouldnt FREAD be able to load a 2 GB file into a 4GB char array without needing more than 8GB of Ram?
Jan
Jan 2013년 6월 4일
편집: Jan 2013년 6월 4일
I've seen an equivalent behavior for another FREAD implementation (not in Matlab): The required final size was not determined by FSEEK, but the file was read in chunks until the buffer was filled. Then the buffer was re-allocated with the double size. After the obvious drawbacks have been mentioned in a discussion, the author decided to replace the doubling method by a smarter Fibonacci sequence. :-)

댓글을 달려면 로그인하십시오.


Iain
Iain 2013년 6월 4일
As Jan implied, passing around variables often leads to memory duplication - 2GB arrays get COPIED when put into functions.
The Out of memory error normally comes up when matlab cannot find a single chunk of RAM big enough for a variable.
Use much smaller chunks of memory, and read the file in and parse it in chunks of, say, 64MB.
  댓글 수: 2
Walter Roberson
Walter Roberson 2013년 6월 4일
The arrays will only get copied if they are modified; otherwise the data pointer will point to the original storage.
Gabriel
Gabriel 2013년 6월 4일
I think I did not express myself well, I apologize. Parsing is not the issue. I fully expect scanning functions to be memory hogs (relatively).
Fread on the other hand, I don't quite get why it needs so much overhead to load a 2GB+ file in the workspace?

댓글을 달려면 로그인하십시오.


Gabriel
Gabriel 2013년 6월 4일
편집: Gabriel 2013년 6월 4일
In any case, I have found a workaround for textscanning large ascii files (4GB and beyond) that contain numbers
The trick is padding the numbers with PERL or SED before trying to read them into matlab. If you pad your numbers with leading 0s, every line has the same ammount of chars, thus FREAD is easy to execute in chunks.
ex:
While not eof
tmp = fread X lines
data = textscan(tmp)
process(data)
end
With this trick, I went from 3 MB/sec to 130 MB/sec for processing a file.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by