How to remove columns in very large matrices.
조회 수: 5 (최근 30일)
이전 댓글 표시
Hi all,
I have run into a problem, with it takign a huge amount of time to remove columns from a large matix.
I am using the standard approach of:
matrix(:,badcols) = [];
And this works but I am currently handelling large martices ~ 1,750,000x2,000, and it is taking much longer than expected to simply remove a few columns. Is this just a practical limiation while working with large datasets or is there a way that I can better handle/process the data to process this more efficently?
Thanks for the help!
J
댓글 수: 1
dpb
2025년 2월 15일
>> x=zeros(1750000,2000);
Error using zeros
Requested 1750000x2000 (26.1GB) array exceeds maximum array size preference (15.9GB). This might cause MATLAB to become unresponsive.
Related documentation
>>
You're quite possibly running into disc-thrashing owing to exceeding memory available...
It probably won't make any difference but you could try the alternative syntax of
matrix=matrix(:,~badcols);
Other than that, breaking it down to process the data in smaller chunks either by row or column depending on what the operations are is the classic approach, yes.
Or, check into the tools for large datasets under the "Large Files and Big Data" section under "Data Import and Analysis".
답변 (3개)
John D'Errico
2025년 2월 15일
GET MORE RAM!!!!!!!!!
RAM IS CHEAP!!!!!!!!
Need I say it again? You have a quite large array.
(1.75e6*2000*8)/1e9
That array, in double precision form, uses approximatey 28 gigabytes of RAM. When you remove some random columns, you force MATLAB to completely reallocate the entire array, copy over all 28 gigabytes to the new addresses. And that means you need to have roughly 56 gigabytes of free memory, because temporarily, there will be two version of your data.
What can you do? The simplest thing is to work in single precision. If possible, int16 or int8, will be even better, in terms of memory required if your array is integer. But single might be sufficient, since it cuts the memory load by half, and it is still a floating point form.
What else can you do? Learn to use tall arrays! A tall array is designed to work on huge arrays like this. By way of comparison, done on my own computer with only 40 GB of RAM, I see these times:
A = rand(1e6,1000);
tic,A(:,3) = [];toc
Elapsed time is 9.746878 seconds.
A = rand(1e6,1000,'single');
tic,A(:,3) = [];toc
Elapsed time is 2.885848 seconds.
A = int8(20*randn(1e6,1000));
tic,A(:,3) = [];toc
Elapsed time is 0.825612 seconds.
A = tall(rand(1e6,1000));
tic,A(:,3) = [];toc
Elapsed time is 0.006552 seconds.
As you can see, by use of single, or int8 where possible, I was able to seriously cut the time required to remove the specified column. That is entirely due to the smaller footprint of the array itself. However, the tall array put them all to shame, and it did not force me to reduce the precision in any way. Of course, tall arrays do take some additional effort to learn and to use.
댓글 수: 0
Walter Roberson
2025년 2월 15일
Working by selecting columns to save is marginally slower than working with columns to delete, on average. The timing overlaps -- the slowest select-to-delete was worse than the slowest select-to-save.
Meanwhile, punting through a simple function took twice as long (!!). This is surprising as punting through a function should have invoked potential in-place modification.
rng(655321)
x = rand(17500,2000);
badcols = randi(size(x,2), 1, 20);
tic
x = x(:,setdiff(1:size(x,2),badcols));
toc
rng(655321)
x = rand(17500,2000);
badcols = randi(size(x,2), 1, 20);
tic
x(:,badcols) = [];
toc
rng(655321)
x = rand(17500,2000);
badcols = randi(size(x,2), 1, 20);
tic
x = DeleteColumns(x, badcols);
toc
rng(655321)
x = rand(17500,2000);
badcols = randi(size(x,2), 1, 20);
tic
x = NullColumns(x, badcols);
toc
function x = DeleteColumns(x, badcols)
x(:,badcols) = [];
end
function x = NullColumns(x, badcols)
x(:,badcols) = 0;
end
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Performance and Memory에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!