필터 지우기
필터 지우기

populating a tall array in a for loop

조회 수: 10 (최근 30일)
Still Learning Matlab
Still Learning Matlab 2018년 6월 6일
편집: KAE 2021년 4월 22일
*I acknowledge that my approach is flawed, but am curious whether this solution exists.
Can I populate a tall array in a for loop?
I am running a large number of calculations and wanting to store the results in a vector. Lets say the results resemble a 1 by 1*10^12 vector of doubles. Clearly this exceeds the memory of my laptop.
The way I have it coded now is to keep track of how many calculations have been performed. Once a particular number is exceeded, I save of the workspace variable and then clear the variable from memory.
%count = 1
%for i = 1:1*10^12
% if count > 784000000
% if exist('A') == 1
% save('TestsaveforProject.mat','A','-v7.3')
% clear A
% end
% B = zeros(1,784000000);
% B(count-78399999) = calculation
% if count > 2*784000000
% if exist('B') == 1
% save('TestsaveforProject2.mat','B','-v7.3')
% clear B
% C = zeros(1,784000000);
% end
% C(count-2*78399999) = calculation%
% end
%else
% A(count) = Calculation%
%end
%count = count+1;
%end
Can I convert the series of 'if' statements to a few lines to populate a tall table? For 1*10^12 cases I would need to include more than 100 if statements like this...plus the save function is pretty clunky. Open to any other suggestions on data storage.
Thanks
  댓글 수: 1
dpb
dpb 2018년 6월 6일
Do you need the large vector of values to do the calculation or is it the result of the calculations (I think I gather)?
If it is the latter, look at the example of using matfile at Growing an array. Also read on the various ML tools for large data Large-files-and-big-data to get an overview of facilities that exist and see what seems to fit most naturally to your problem.

댓글을 달려면 로그인하십시오.

답변 (1개)

Edric Ellis
Edric Ellis 2018년 6월 7일
What I think you should do is something like the following:
% Choose a directory to store the files
outDir = '/tmp/tall_eg';
% Counter indicating which file we'll save to next
fileIdx = 1;
% How many rows of data to save in each file
rowsPerFile = 100;
% How many rows have been written so far
rowsWritten = 0;
% How many rows to write in total
totalRows = 10010;
while rowsWritten < totalRows
% Choose how many rows to write to this file
rowsThisTime = min(rowsPerFile, totalRows - rowsWritten);
% Build the rows
data = rand(rowsThisTime, 1);
% Choose a file name - ensure these progress in order
fname = fullfile(outDir, sprintf('data_%05d.mat', fileIdx));
% Save the data and increment counters
save(fname, 'data');
fileIdx = 1 + fileIdx;
rowsWritten = rowsThisTime + rowsWritten;
end
% Read the data back in as a tall array. First create a datastore ...
ds = fileDatastore(fullfile(outDir, '*.mat'), ...
'ReadFcn', @(fname) getfield(load(fname), 'data'), ...
'UniformRead', true);
% ... and then a tall array
tdata = tall(ds)
Note the 'ReadFcn' argument to the fileDatastore is a little tricky - it loads the file and then simply extracts the 'data' field and returns that. 'UniformRead' is required to ensure that we get a tall numeric vector rather than a tall cell array.
  댓글 수: 2
Margarita Martínez Coves
Margarita Martínez Coves 2020년 4월 3일
Thank you so so so much! I have a bunch of .mat files of 1,2 GB (in-memory) size, so they don't fit in memory at the same time. I've just needed the last two lines of code, but they were very helpful and able to create a DataStore from them. Then I've run:
write(foldername, tdata)
I was able to save the new datastore as TallDataStore. Of course, this duplicates all the data, but it's worth it since Matlab resizes the blocksize to read small blocks of data each time you use read.
Again, thank you so much for helping people in this community
KAE
KAE 2021년 4월 22일
편집: KAE 2021년 4월 22일
This should be added as an example in the Matlab documentation.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

제품


릴리스

R2018a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by