How to read, reshape, and write large data?

조회 수: 2 (최근 30일)
NeuronDB
NeuronDB 2021년 7월 29일
댓글: Chunru 2021년 7월 29일
Hello!
I have a data matrix data = m x n, which I want to transform into a single column vector data(:), and write this vector to an output file.
% read 100 rows of data
data = [];
for idx_row = 1:100
A = fscanf(fileID,formatSpec);
data = cat(1,data, A);
end
% Convert to int16
data =data*10^6;
data = int16(data);
% Write to file
fp = fopen([filepath 'data.dat'], 'wb');
fwrite(fp, data(:),'int16');
fclose(fp);
The problem is that the size of data is very large to fit in the memory (e.g. 100 x 1e10). And, each row of the data is saved in separate file, and I must read them separately.
I can read a single row, which works file, but when I try to add more rows, the computer runs out of memory rather quickly. :(
Also, when creating a large array to fill the data in runs into the same problem regarding out of memory -
data = nan(100,1e10)
Error using nan
Requested 100x10000000000 (7450.6GB) array exceeds maximum array size preference. Creation of arrays greater
than this limit may take a long time and cause MATLAB to become unresponsive.
How can I make it work? Thanks in advance!
  댓글 수: 2
Rik
Rik 2021년 7월 29일
If you don't have 8TB of RAM, you can't create such a large array (and even if you had, it could still be a problem, as memory needs to be contiguous). Using int16 to preallocate your array will help, but only by a factor 4.
You will have to do this chunk by chunk.
Chunru
Chunru 2021년 7월 29일
8TB is way too big for today's system. However, the array size is not exactly limited by the RAM size. It is limited by the virtual memory size that OS manages (which may use hard disk as part of memory hierarchy). Of course, the speed will be affected when data are exchanged between RAM and hard disk very frequently.

댓글을 달려면 로그인하십시오.

채택된 답변

Chunru
Chunru 2021년 7월 29일
편집: Chunru 2021년 7월 29일
You can read a small portion each time and write to the file. This way you will not use a lot of memory.
blocksize = 1e6;
nfiles = 100;
for i=1:nfiles
% fileID(i) = fopen(...)
end
fp = fopen([filepath 'data.dat'], 'wb');
data = zeros(nfiles, blocksize);
% you may need special treatment for last block
for iblock=1:nblocks
for i = 1:nfiles
% for large file, using fwrite and fread for speed
% fscanf and fprintf are slow and take much more disk space
data(i, :) = fread(fid, ...); % read a block of data from each file
%A = fscanf(fileID,formatSpec);
%data = cat(1,data, A);
end
% write data
fwrite(fp, int(data(:)*1e6),'int16');
end
fclose all
  댓글 수: 2
Rik
Rik 2021년 7월 29일
The problem is that you need the first element from every file, then the second element from every file, etc.
And about the coding style: I would suggest using fclose(fp);, instead of closing all files. That habit will get you when you do have multiple files open.
Chunru
Chunru 2021년 7월 29일
Instead of read first element from every file, we read a block of data from every file (obviously for speed). You don't need all data from a single file before doing partial reshaping. "fclose all" is a lazy way here as I am tired of another for loop to close all the files.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Low-Level File I/O에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by