More Efficient ismember Calculation

Question

0 개 추천

Hello, I'm working on an interpolation for a specific project that I'm working on and have managed to produce a very fast interpolation function for N-D array for042. There is one final bottleneck that is taking up 84% of the total run-time of the program. Inserted are the profiler and exemplar code, any and all optimizations are greatly appreciated, but the particular line is obvious from the profiler.

alpha = -5:5:30;
mach  = 1:6:25;
beta = -5:2.5:5;
num = zeros(length(alpha),length(mach),length(beta));
for cnt = 1:numel(alpha)
    for cnt1 = 1:numel(mach)
        for cnt2 = 1:length(beta)
            num(cnt,cnt1,cnt2) = 3*alpha(cnt)+mach(cnt1)/2+beta(cnt2);
        end
    end
end
for cnt2 = 1:length(beta)
    data.for006{cnt2}.cd = num(:,:,cnt2);
end

Is an example of the data set that we're working with.

And the line in question from the profiler is part of this block:

a = zeros(1,2^size(CURRENT,1)-1);
curridx = zeros(1,size(CURRENT,1));
for cnt = 1:(2^size(CURRENT,1))
    change = find((rem(cnt-1,2.^(0:size(CURRENT,1)-1))==0)==1);
    curridx(change) = 1*curridx(change)==0;
    idx = lidx.*(curridx==0) + uidx.*(curridx==1);
    chkvals = zeros(1,length(idx));
    for cnt1 = 1:length(idx)
        B = RANGES{cnt1};
        chkvals(cnt1) = B(idx(cnt1));
    end
    chkvals = [chkvals(2),chkvals(4:end)];
% INEFFICIENT LINE %
    a(cnt) = DATA.for042{find(ismember(DATA.permutation,chkvals,'rows'))}.cn(sub2ind_a(siz,idx));
end

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

Walter Roberson 2020년 4월 16일

MATLAB Online에서 열기

Guidelines for using find():

If you have a relational test that is used exactly once, and you find(), and you use the result of the find() only to index an array: then skip the find() and use the result of the relational test as a logical index.
if you are doing computation on the indices returned by find() then it might be worth retaining the find(). For example you might be wanting to compute the distance between events
If you are updating corresponding locations, matrix(locations) = f(matrix(locations)) then one could hypothesize that taking find(locations) and using that might in some cases be faster than using logical indexing, because there would be less for the assignment to examine (just change a few locations directly, right?) . However in my tests with large arrays, using logical indexing is faster even for a small number of output locations, and is notably faster if there are a large number of output locations.

Note: this timing test takes a few minutes to execute due to the size of the arrays. For each test, a new output array the same size as the input has to be created.

data = rand(83,19,207,51,3);
data(12345) = -1;
data(876543) = -1;
N = 10;
t1 = zeros(N,1);
t2 = zeros(N,1);
t3 = zeros(N,1);
t4 = zeros(N,1);
f1 = @()fun1(data);
f2 = @()fun2(data);
f3 = @()fun3(data);
f4 = @()fun4(data);
for K = 1 : N; t1(K) = timeit(f1,0); end
for K = 1 : N; t2(K) = timeit(f2,0); end
for K = 1 : N; t3(K) = timeit(f3,0); end
for K = 1 : N; t4(K) = timeit(f4,0); end
plot([t1,t2,t3,t4]);
legend({'find0', 'mask0', 'find5', 'mask5'});
m1 = mean(t1); m2 = mean(t2); m3 = mean(t3); m4 = mean(t4);
ms = [m1;m2;m3;m4];
disp('timings')
disp(ms);
m = min(ms);
disp('ratios')
disp(ms ./ m);
function fun1(data)
    %small number of changes, find
    idx = find(data<0);
    data(idx) = data(idx) * 2;  %#ok<NASGU>
end
function fun2(data)
    %small number of changes, logical indexing
    idx = data<0;
    data(idx) = data(idx) * 2; %#ok<NASGU>
end
function fun3(data)
    %many places, find
    idx = find(data<0.5);
    data(idx) = data(idx) * 2; %#ok<NASGU>
end
function fun4(data)
    %many places, logical indexing
    idx = data<0.5;
    data(idx) = data(idx) * 2; %#ok<NASGU>
end

Walter Roberson 2020년 4월 17일

But the length of idx does not depend upon the contents of CURRENT, only on the size of CURRENT, right? So you can pre-compute it.

In the great majority of cases, if you can move a computation out of a loop, doing so will result in more efficient code. This is not always the case, but most of the time.

Ayden Clay 2020년 4월 20일

ahh, I understand! I've now, I believe, moved as much as I can outside of loops.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Steven Lord 2020년 4월 16일

MATLAB Online에서 열기

0 개 추천

What's the most time consuming part of that line that you've identified as the bottleneck? Is it the ismember call, the indexing into DATA.for042, the indexing into the result of that indexing to retrieve part of the cn field, or the assignment into a section of a? To tell, break that into four parts (for performance profiling purposes.)

ind = ismember(DATA.permutation,chkvals,'rows');
data1 = DATA.for042{ind};
data2 = data1.cn(sub2ind_a(siz,idx));
a(cnt) = data2;

My guess is that the ismember call still might be the most time consuming part of that process, but it's not going to be all 83.4% of the total runtime of your code.

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

Walter Roberson 2020년 4월 17일

But you should still make the other improvements I noted about not computing the same value multiple times.

Ayden Clay 2020년 4월 20일

I completely agree, I've implemented a large number of those changes too (there may be more), I've tried to remove some of the for loops in favour of vector operations. There is still some work to be done, but this is much closer to what I needed. Thank you.

댓글을 달려면 로그인하십시오.

More Efficient ismember Calculation

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

채택된 답변

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

Community Treasure Hunt

More Efficient ismember Calculation

댓글 수: 11 이전 댓글 9개 표시 이전 댓글 9개 숨기기

채택된 답변

댓글 수: 6 이전 댓글 4개 표시 이전 댓글 4개 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 11
이전 댓글 9개 표시 이전 댓글 9개 숨기기

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기