Permutations Inversions Counting ... speed up
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi, this is my code for permutation inversions counting. Any idea how to optimize it to get some additional speedup for large set of permutations? See my previous problem: here
function count = PermInvsCount(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for i = 1:nE-1
for j = i+1:nE
count = count + (p(:, i) > p(:, j));
end
end
end
Some testing data:
[~,perms] = sort(rand(1000000,60),2);
tic;c= PermInvsCount(perms);toc
Elapsed time is 1.500214 seconds.
댓글 수: 0
답변 (1개)
John D'Errico
2017년 10월 23일
편집: John D'Errico
2017년 10월 23일
Why do people always think there is something they could have done that was faster? You are doing something on the order of 2 billion comparisons, and as many additions. 1.5 seconds does not seem wildly out of line for that.
If you need a speedup, you might look into the parallel toolbox. The fact is however, your code is already multithreading. A quick test on my machine showed all of my cores running when I execute that code. MATLAB automatically sets many processes up for multithreading IF it sees a gain by doing so. In this case, it did. That means you won't gain by use of forcing MATLAB to use the spare cores on your computer, because it will do so already.
You can check the value of
maxNumCompThreads
to make sure that MATLAB is able to use all of your cores. (Hyper-threaded cores don't give you any real boost, so don't go there.)
If you can push those computations onto your GPU, there might be some gain. I can't say about that, since I don't have the parallel toolbox.
So, you need to do a few billion computations there. You are already doing them. Doing them in some different order won't help a lot.
Ok, I can already hear you wondering if or why this should have been better:
function count = PermInvsCount2(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for i = 1:nE-1
count = count + sum((p(:, i) > p(:, (i+1):end))),2);
end
end
As it turns out, that code, even though it is in theory vectorized more, also does more work. It needs to extract sub-arrays of different sizes in a loop.
The loop overhead here is simply not important. These loops below are trivial, compared to what happens inside the loop.
for i = 1:nE-1
for j = i+1:nE
% nada
end
end
So unless you can use the Parallel Computing toolbox, and you have a decent GPU, I don't see you getting much of a gain.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!