Internal problem while evaluating tall expression (requested 40.5 GB array)
이전 댓글 표시
Hi, I'm working with a large data set with approximately 500k rows and 6k columns. I'm using a datastore and tall array to handle the loading. The file itself is comma separated file while with most of its values coded with integers or strings. I have a dictionary for decoding these values. What I am trying to do is to replace codes with the actual meaning and save the decoded file to local.
Below I copied a structure of my program
classdef myTable < handle
% ...
methods
function this = myTable
end
% ...
end
methods
function loadCsv(this)
% ...
ds = datastore(this.csvSource);
ds.SelectedFormats = repmat({'%q'}, 1, length(ds.VariableNames));
this.csvTable = tall(ds);
end
% ...
function decoding(this)
% ...
end
function export(this)
% ...
write([this.outputDir '/' this.csvTableName '_decoded_*.csv'], this.csvTable, 'WriteFcn', @myWriter);
end
end
end
%% helper
function myWriter(info, data)
filename = info.SuggestedFilename;
writetable(data, filename, 'FileType', 'text', 'Delimiter', ',')
end
Error occured at this.export:
Error using digraph/distances
Internal problem while evaluating tall expression. The problem was:
Requested 73733x73733 (40.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive.
Question: I was thinking that the write function should be partitioning the data while exporting. Isn't that true? Why did MATLAB still try to create such a big array?
I am using a windows machine with 16GB RAM. MATLAB R2020a (tried on 19a first and just upgraded to 20a).
Thank you!
댓글 수: 16
Peng Li
2020년 3월 23일
Peng Li
2020년 3월 23일
Peng Li
2020년 3월 24일
Peng Li
2020년 3월 24일
per isakson
2020년 3월 24일
편집: per isakson
2020년 3월 24일
You are asking for too much. I've have looked at your code and I have made a working example based on an example in the documentation. It seems to work. I fail to understand what's going wrong for you. Your code include a lot of irrelevant stuff.
Proposal
- present a MWE (Minimal working example) that produces this error
- upload one (or a few) row of your data set.
Sean de Wolski
2020년 3월 24일
Yes, please provide a few sample rows.
Peng Li
2020년 3월 24일
Peng Li
2020년 3월 25일
Sean de Wolski
2020년 3월 25일
Your understanding is correct.
But we need to know why digraph is trying to create a 73733x73733 array. It could be you have something shadowed so it's not calling a builtin, it could be expected and you need to partition differently, I don't know.
Peng Li
2020년 3월 25일
Peng Li
2020년 3월 25일
Walter Roberson
2020년 3월 25일
A complete error message showing traceback would help.
Peng Li
2020년 3월 25일
Sean de Wolski
2020년 3월 26일
Tall uses a digraph to figure out the fewest number of lower level operations that need to be done so it can efficiently traverse the data set as few a times and without repetition as possible.
Peng Li
2020년 3월 26일
Peng Li
2020년 3월 27일
답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Matrix Indexing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!