Fastest way to index large arrays

조회 수: 36 (최근 30일)
Vittorio Picco
Vittorio Picco 2022년 10월 4일
댓글: dpb 2022년 10월 5일
I have two sets of arrays, A and B. The "A" arrays have about 1 million elements. The "B" arrays have about 65 thousand elements. For every element in A I need to find the corresponding element in B and pull a related value. Here's a crude minimal working example
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
out = nan(1e6,1);
tic
for loop = 1:length(PhiA)
idx = PhiA(loop) == PhiB & ThetaA(loop) == ThetaB;
out(loop) = VarB(idx);
end
toc
Given the size of the arrays this is not very fast, over 40 seconds on my machine. The profiler tells me that those two lines in the for loop are the slowest in my code, and surprisingly they split the burden almost exactly 50/50.
This is actually my already faster version: originally A and B were tables and the profiler told me that the slow operations were accessing and storing into the tables. Switching to arrays has sped up things a little but not as much as I hoped.
How could I make this faster?

채택된 답변

dpb
dpb 2022년 10월 4일
With the lookup arrays structured as they are, you don't need a lookup at all; you can just calculate the row directly --
fnRow=@(phi,theta)phi+360*theta+1;
so, with this,
PhiA = round(359*rand(1,1e6));
ThetaA = round(179*rand(1,1e6));
PhiB = repmat(0:359,1,180);
ThetaB = reshape(repmat(0:179,360,1),1,[]);
VarB = 1:180*360;
tic
out=VarB(fnRow(PhiA,ThetaA));
toc
Elapsed time is 0.012699 seconds.
  댓글 수: 4
Vittorio Picco
Vittorio Picco 2022년 10월 5일
Yeah, it worked out. I can round to make integers so that's not a problem. The problem was that the array A has occasionally NaN, which are entries I need to skip, but that made the last line fail. The way I dealt with it was by appending a dummy value to the end of the VarB array, and by replacing the NaN with this new index; that made the out= assigment work. Then I replaced the dummy entries back with NaNs. All of that could be done without for loops so my execution time remained almost unaffected. I wonder how you would have dealt with it. I'm not good at anonymous functions so I never think about them.
dpb
dpb 2022년 10월 5일
I probably would have simply used logical addressing in the calculation selection...
isOK=isfinite(all(A,2));
out=VarB(fnRow(PhiA(isOK),ThetaA(isOK)));
The above assumes the A array is the one of interest and checks that there are no missing lines.
If out must be the same size as A in the row dimension, then you would need to preallocate it to ensure it is that size; otherwise it will be only as large as the last non-missing element in A location. It only matters it the last N elements are those missing, but you may not have any way to know that isn't going to be the case so defensive coding would preallocate.
If the above is more like the way the code is constructed, then
isOK=isfinite(all([PhiA.' ThetaA.'],2));
looks ominous but will be fast and is easier to write than the two conditions on each vector with &

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

제품


릴리스

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by