Permutations Cycles Counting ... speed up

Hi, this is my code for permutation cycles counting. Any idea how to vectorize it to get some additional speedup for large set of permutations?
function count = PermCyclesCount(p)
[nR,nE] = size(p);
count = zeros(nR,1);
for j = 1:nR
i = 0;
count(j) = 0;
while i < nE
i = i + 1;
if p(j,i) > 0
flag = false; % flag is just to trick the computer into entering the while loop
first = i;
while (first ~= i) || ~flag
flag = true;
r = first;
first = p(j,first);
p(j,r) = 0; % The entries in the cycle are changed to zero to indicate that they have been `used'
end
count(j) = count(j) + 1;
end
end
end
end
Some testing data:
[~,perms] = sort(rand(1000000,60),2);
tic;c= PermCyclesCount(perms);toc
Elapsed time is 1.559758 seconds.

 채택된 답변

Jan
Jan 2017년 10월 20일
편집: Jan 2017년 10월 20일

1 개 추천

% VERSION 1
[nR, nE] = size(p);
count = zeros(nR, 1);
for j = 1:nR
q = p(j, :); % <== Cut out a vector to operate on
i = 0;
while i < nE
i = i + 1;
if q(i) > 0
flag = false;
first = i;
while (first ~= i) || ~flag
flag = true;
r = first;
first = q(first);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end
This tiny change let the code run in 1.41 instead of 2.37 sec on my computer.
And this in 1.25 sec without the flag:
% VERSION 2
[nR, nE] = size(p);
count = zeros(nR, 1);
for j = 1:nR
q = p(j, :);
for i = 1:nE
if q(i) > 0
f = q(i);
q(i) = 0;
while f ~= i
r = f;
f = q(f);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end

댓글 수: 3

But your last code is about 2 times faster than mine. Thanks!
tic;c_= PermCyclesCount_(perms);toc
Elapsed time is 0.797457 seconds.
Jan
Jan 2017년 10월 20일
편집: Jan 2017년 10월 20일
This code is 20% faster, if the data a provided in columnwise order:
[~,perms] = sort(rand(60, 1000000), 1);
% VERSION 2.1
[nE, nE] = size(p); % <== Swapped
count = zeros(nR, 1);
for j = 1:nR
q = p(:, j); % <== Copy column
for i = 1:nE
if q(i) > 0
f = q(i);
q(i) = 0;
while f ~= i
r = f;
f = q(f);
q(r) = 0;
end
count(j) = count(j) + 1;
end
end
end
The C code is even 33% faster - see in the other answer.
Does the creation of perms matter also? Then:
[~,perms] = sort(rand(60, 1000000),1); % 4.88 sec
perms = Shuffle(repmat((1:60).', 1, 1000000), 1); % 2.95 sec
Michal
Michal 2017년 10월 23일
One again Jan, tahnk you very much for your help. Creation is not a critical part, but anyway, the Shuffle function looks very useful for me, too.

댓글을 달려면 로그인하십시오.

추가 답변 (3개)

Jan
Jan 2017년 10월 20일
편집: Jan 2017년 10월 23일

1 개 추천

A C-Mex has about the double speed of the VERSION 2:
// VERSION 3
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
size_t nR, nE, j, i;
double *c, *p, *pr;
int *q, f, r;
// Get input:
nR = mxGetM(prhs[0]);
nE = mxGetN(prhs[0]);
p = mxGetPr(prhs[0]);
// Create output:
plhs[0] = mxCreateDoubleMatrix((mwSize) nR, 1, mxREAL);
c = mxGetPr(plhs[0]);
q = (int *) mxMalloc((nE + 1) * sizeof(int));
for (j = 0; j < nR; j++) {
// Copy row p(j, :) to q:
pr = p++;
for (i = 1; i <= nE; i++) {
q[i] = (int) (*pr); // One based indexing for q
pr += nR;
}
for (i = 1; i <= nE; i++) {
if (q[i] != 0) {
f = q[i];
q[i] = 0;
while (f != i) {
r = f;
f = q[f];
q[r] = 0;
}
*c += 1.0;
}
}
c++;
}
mxFree(q);
return;
}
This is 33% faster, if you provide the input in columnwise order instead:
[~,perms] = sort(rand(60, 1000000),1);
// VERSION 3.1
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
// !!! FOR [nE x nR] !!!
size_t nR, nE, j, i;
double *c, *p, *pr;
int *q, f, r, ic;
// Get input:
nE = mxGetM(prhs[0]);
nR = mxGetN(prhs[0]);
p = mxGetPr(prhs[0]);
// Create output:
plhs[0] = mxCreateDoubleMatrix((mwSize) nR, 1, mxREAL);
c = mxGetPr(plhs[0]);
q = (int *) mxMalloc((nE + 1) * sizeof(int));
for (j = 0; j < nR; j++) {
// Copy row p(j, :) to q:
pr = p;
for (i = 1; i <= nE; i++) {
q[i] = (int) (*pr++); // One based indexing for q
}
ic = 0;
for (i = 1; i <= nE; i++) {
if (q[i] != 0) {
f = q[i];
q[i] = 0;
while (f != i) {
r = f;
f = q[f];
q[r] = 0;
}
ic++;
}
}
*c++ = (double) ic;
pr += nE;
}
mxFree(q);
return;
}

댓글 수: 1

Michal
Michal 2017년 10월 23일
Jan, thanks a lot for your extraordinary help!!! The codes you provide here are perfect. Especially attached MEX files.

댓글을 달려면 로그인하십시오.

Jos (10584)
Jos (10584) 2017년 10월 20일

0 개 추천

Not vectorised, but less checking, not sure if it is faster
nR = size(p,1) ;
count2 = zeros(nR,1) ;
tf = ~isnan(p) ;
for j = 1:nR
r1 = find(tf(j,:), 1 ,'first') ; % find the first cycle
while ~isempty(r1) % is there a cycle
count2(j) = count2(j) + 1 ;
while tf(j,r1)
tf(j,r1) = false ;
r2 = p(j,r1) ;
r1 = r2 ;
end
r1 = find(tf(j,:),1,'first') ; % find the next cycle
end
end

댓글 수: 9

[~,perms] = sort(rand(1000000,60),2);
tic;c_= PermCyclesCount_(perms);toc
Elapsed time is 5.393064 seconds.
More than 3x slower!!!
Jos (10584)
Jos (10584) 2017년 10월 20일
haha! Good for you that you checked :)
Did you profile your code to see where the bottlenecks are?
Most critical lines:
0.23 6000000 11 tf(j,r1) = false ;
0.29 6000000 12 r2 = p(j,r1) ;
0.22 6000000 13 r1 = r2 ;
0.23 6000000 14 end
0.38 468495 15 r1 = find(tf(j,:),1,'first') ; % find the next cycle
Jos (10584)
Jos (10584) 2017년 10월 20일
and your own code?
Michal
Michal 2017년 10월 20일
편집: Michal 2017년 10월 20일
0.21 6000000 10 i=i+1;
0.23 6000000 11 if p(j,i)>0
0.02 468495 12 flag=0; % flag is just to trick the computer into entering the while loop
0.02 468495 13 first=i;
0.02 468495 14 while (first ~=i) || (flag==0)
0.21 6000000 15 flag=1;
0.21 6000000 16 r=first;
0.27 6000000 17 first=p(j,first);
0.23 6000000 18 p(j,r)=0; % The entries in the cycle are changed to zero to indicate that they have been `used'
0.23 6000000 19 end
0.02 468495 20 count(j) = count(j) + 1;
0.02 468495 21 end
0.22 6000000 22 end
But, the profiler info is now (from version 2014a , I think) very confusing, because enhanced JIT engine in non-profile mode performed some lines significantly faster than is showed in profile report. Now I am using R2017b.
This is really terrible problem, because your code is in profile mode faster, than my code!!?? From my point of view is profiler tool completely useless now!!!
So, the relevant timings are available only for non-profile mode!!!
I recommend use only run-time results:
[~,perms] = sort(rand(1000000,60),2);
tic;c= PermCyclesCount(perms);toc
Elapsed time is 1.559758 seconds.
Stephen23
Stephen23 2017년 10월 20일
편집: Stephen23 2017년 10월 20일
@Michal Kvasnicka: if you expect that running the profiler somehow magically will use no system resources and that all code will run at exactly the same speed when it is boing profiled then you will always be disappointed. Nowhere in the profiler documentation
does it state that everything will run at exactly the same speed, but it does state that "Profiling is a way to measure where a program spends time." Running extra code (i.e. the profiler itself) requires system resources, and therefore this can affect the timing.
If you want to measure the time that a function takes to run, then use the correct tool: timeit.
Michal
Michal 2017년 10월 20일
So, what is the main purpose to use profiler now?
I have two codes, doing same thing. First code is in profiler mode slower than second code. But, in non-profiler mode is the first code significantly faster than second code.
Please, let me know what the Matlab user is supposed to do in this situation? Or, what is the reason to use profiler, when its results are completely distorted?
Jan
Jan 2017년 10월 20일
The profiler disables the JIT acceleration, which could re-order the code lines to improve speed. This cannot be allowed, if the time per line should be measured.
But the JIT accelerator is essential for the run time of such loops. Therefore the profiler has a limited use here. You can determine the commands which take the most time, but not the code blocks.
Michal
Michal 2017년 10월 20일
@Jan Simon Yes, you are right. The profiler has a limited use, now!!! But frankly speaking, this limitation is really terrible.

댓글을 달려면 로그인하십시오.

Jos (10584)
Jos (10584) 2017년 10월 20일
편집: Jos (10584) 2017년 10월 20일

0 개 추천

This is upto 10 times faster than your code (and indeed about 100 times faster than my previous code):
[nR, nE] = size(p) ;
count3 = zeros(nR,1) ;
tf = true(nR,nE) ;
for Ri = 1:nR
for Ei = 1:nE,
if tf(Ri,Ei)
count3(Ri) = count3(Ri) + 1 ;
while tf(Ri,Ei)
tf(Ri,Ei) = false ;
r = p(Ri,Ei) ;
Ei = r ;
end
end
end
end

댓글 수: 2

This code is definitely still a bit slower than mine :)
[~,perms] = sort(rand(1000000,60),2);
My code:
tic;c= PermCyclesCount(perms);toc
Elapsed time is 1.552508 seconds.
Your code:
tic;c_= PermCyclesCount_(perms);toc
Elapsed time is 1.664062 seconds.
Jan
Jan 2017년 10월 20일
편집: Jan 2017년 10월 20일
An improved version of Jos' code:
[nR, nE] = size(p) ;
count = zeros(nR,1) ;
tf = true(nE, nR) ; % Transposed to operate on column
for Ri = 1:nR
q = p(Ri, :); % Vector cut out
for Ei = 1:nE
k = Ei;
if tf(k, Ri)
count(Ri) = count(Ri) + 1 ;
while tf(k, Ri)
tf(k, Ri) = false ;
r = q(k) ;
k = r ;
end
end
end
end
To my surprise using a vector tf = true(nE, 1) is not faster.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 C Shared Library Integration에 대해 자세히 알아보기

제품

질문:

2017년 10월 20일

편집:

Jan
2017년 10월 23일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by