I have a simple code for testing parfor in my local profile (with 4 cores)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 1
matlabpool open 4 % 2 or 1
tic;
parfor i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
matlabpool close
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 2
tic;
for i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I have executed code 1 using 4 labs or 2 labs or 1 lab and executed code 2. the results is here:
code-1 - 8 labs(4 core with 4 hypthread) --> 15 sec
code-1 - 4 labs --> 22 sec
code-1 - 2 labs --> 35 sec
code-1 - 1 labs --> 65 sec
code-2 - --> 18 sec
regards the results, it is better to use code-2 and releasing all other cores (you may also consider the time needed to run 'matlabpool open' and 'matlabpool close'). I have read this : http://www.mathworks.co.uk/matlabcentral/answers/44734-there-is-aproblem-in-parfor
but it seems in this case execution time is much longer than setup time of parallel mechanism.
if there is not any thing wrong with my results, main question is when its better to use parfor.

댓글 수: 17

Matt J
Matt J 2014년 2월 3일
I can't reproduce that, I'm afraid. I see close to linear speed-up with 2,4, and 12 workers in the pool. What version of MATLAB are you using and what CPU(s)?
amir
amir 2014년 2월 4일
편집: amir 2014년 2월 4일
I have checked with both R2013a and R2012a (hypthread just with R012a) and the results was very near. My processor is : "Intel® Core™ i7-3610QM CPU @ 2.30GHz, 2301 Mhz, 4 Core(s), 8 Logical Processor(s)" OS : Windows 7 64bit
may be it is related, I have executed this:
matlabpool open
numlabs
matlabpool close
and result is :
Starting matlabpool using the 'local' profile ... connected to 4 workers.
ans =
1
Sending a stop signal to all the workers ... stopped.
I think something is wrong with 'numlabs' here.
Edric Ellis
Edric Ellis 2014년 2월 4일
NUMLABS is designed to return 1 inside PARFOR because you cannot use labSend/labReceive there. This is described in the documentation.
there is no parfor here:
matlabpool open
numlabs
matlabpool close
just initializing the pool.
Matt J
Matt J 2014년 2월 4일
NUMLABS will only return a meaningful value inside an SPMD...END block.
Matt J
Matt J 2014년 2월 4일
@mohammad
Are there any other machines available to you that you could test it on, to check whether the problem is platform-dependent?
unfortunately I can not run it on anther system. I have changed the code to generate all times. you can just paste and run it then check your results with my results.
-----------------------------------------------------
clear all;
clc;
max_counter = 3000000;
fprintf(1,'max count %d\n', max_counter );
times = zeros(5,1);
for poolsize = 0:4
if poolsize > 0
matlabpool('open',poolsize);
end
real_poolsize = matlabpool('size');
fprintf(1,'pool size %d\nstarted ...\n', real_poolsize );
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
tic;
if poolsize == 0
for i = 1:30
res = 0;
for n = 1 : max_counter
res = res + sin(n) + cos(n);
end
A(i) = res;
fprintf(1,' %d ',i);
end
else
parfor i = 1:30
res = 0;
for n = 1 : max_counter
res = res + sin(n) + cos(n);
end
A(i) = res;
fprintf(1,' %d ',i);
end
end
t1 = toc;
fprintf(1,'\npool size %d , time %f\n',poolsize,t1);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if poolsize > 0
matlabpool close
end
times(poolsize + 1) = t1;
end
disp(times);
-----------------------------------------------------
and here is my results:
max count 3000000
pool size 0
started ...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
pool size 0 , time 18.536493
Starting matlabpool using the 'local' profile ... connected to 1 workers.
pool size 1
started ...
20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 30 29 28 27 26 25 24 23 22 21
pool size 1 , time 68.053388
Sending a stop signal to all the workers ... stopped.
Starting matlabpool using the 'local' profile ... connected to 2 workers.
pool size 2
started ...
20 19 18 17 16 15 14 13 12 11 25 24 23 22 21
10 9 8 7 6 5 4 3 2 1 28 27 26 30 29
pool size 2 , time 40.965749
Sending a stop signal to all the workers ... stopped.
Starting matlabpool using the 'local' profile ... connected to 3 workers.
pool size 3
started ...
14 13 12 11 10 9 8 26 25 30
20 19 18 17 16 15 24 23 22 21
7 6 5 4 3 2 1 28 27 29
pool size 3 , time 28.554977
Sending a stop signal to all the workers ... stopped.
Starting matlabpool using the 'local' profile ... connected to 4 workers.
pool size 4
started ...
10 9 8 7 6 27 28
5 4 3 2 1 24 23
15 14 13 12 11 26 25 30
19 18 17 16 22 21 20 29
pool size 4 , time 23.066573
Sending a stop signal to all the workers ... stopped.
18.5365
68.0534
40.9657
28.5550
23.0666
as you can see the best answer is without using parfor.
I have executed on other system (Intel Corei5 3470 with 3.2 Ghz):
16.1494 --> using for
59.5967 --> parfor (1 lab)
30.1803 --> parfor (2 labs)
20.4604 --> parfor (3 labs)
16.6839 --> parfor (4 labs)
as you see using for is better than using parfor using 4 labs. its amazing for me.
Matt J
Matt J 2014년 2월 5일
편집: Matt J 2014년 2월 5일
Try upgrading to R2013b if you can. Also, try turning off hyperthreading. I can't explain it otherwise.
amir
amir 2014년 2월 5일
편집: amir 2014년 2월 5일
I do not think it is related to R2013a or R2013b and the second system has 4 core without hyperthread. Obviously something is wrong in this code or maybe we must consider another things while using parfor in matlab. Actually I want to learn when and how I can take benefits of using parallel computing in matlab.
a simple question : Did you run the code ? and what was your result?
Matt J
Matt J 2014년 2월 5일
편집: Matt J 2014년 2월 5일
As I mentioned here, I ran the first version of the code and successfully achieved near linear speed-up with PARFOR. That was with R2013b. I haven't run the second version of the code yet, but I don't see any significant modification in it that would lead me to expect a different result.
So, the slow behavior you're seeing has to be environment-related.
Matt J
Matt J 2014년 2월 5일
편집: Matt J 2014년 2월 5일
Here are my results when I run the modified version of the test code for poolsize=0:12. The three columns correspond to R2011b, R2012b, and R2013b
Times =
19.9430 20.4689 21.0302
21.1632 21.8318 23.0208
10.6021 10.7968 11.5326
7.0738 7.3209 7.9293
5.7969 5.9354 6.1944
4.3994 4.5522 4.9174
3.7105 3.8611 4.1811
3.6653 3.7533 3.9924
3.0179 3.1299 3.2726
2.9612 3.0899 3.2563
2.3155 2.3643 2.5791
2.3111 2.3792 2.5677
2.3000 2.3633 2.6129
Interestingly, performance gets a bit slower with more recent releases. Not sure if that's a significant trend, though. This is on an Intel Xeon X5680 @3.33 Ghz, dual hexacore CPU.
So... still baffled.
amir
amir 2014년 2월 6일
편집: amir 2014년 2월 6일
I wish one system like that.do you fly with it ?
anyway, I have checked your results and I think something is wrong with my system that causes big difference between first and second rows (using 'for' or 'parfor').
my res : 18.5365('for') 68.0534('parfor')
your res : 19.9430('for') 21.1632('parfor')
maybe it is related to version of matlab or some configuration on my system. I will try to test it on R2013b.
Matt J
Matt J 2014년 2월 6일
Any difference if you pre-allocate A first?
Matt J
Matt J 2014년 2월 6일
I wish one system like that.do you fly with it ?
Not always. Like you, I've also had cases where PARFOR mysteriously under-performs in environment-dependent ways. See this thread, for instance
Matt J
Matt J 2014년 2월 6일
You're not doing any of this over a network are you? This is all on a local CPU?
amir
amir 2014년 2월 6일
편집: amir 2014년 2월 6일
I think I have found the reason:
If I run the code as a function, I will get your results and If I run it as a script (without deceleration of function and name) I get bad results. and my new results (poolnum = 4) :
19.4899
20.7605
11.2675
7.8502
6.4180
Please tell me how did you run this code (as a function in a function file or as a script) ?

댓글을 달려면 로그인하십시오.

 채택된 답변

Matt J
Matt J 2014년 2월 7일
편집: Matt J 2014년 2월 7일

0 개 추천

If I run the code as a function, I will get your results and If I run it as a script (without deceleration of function and name) I get bad results. and my new results (poolnum = 4) :
I think you're on to something! I was indeed running inside a function. When I repeated the test, but running it as a script instead, I get bad behavior similar to what you were reporting.
Times =
20.7096
67.3600
33.4894
23.1408
18.0744
13.8923
11.6439
11.3326
9.2532
9.3852
7.0565
7.0984
7.1319
As can be seen here, PARFOR eventually does outperform a plain for-loop, but it takes a very large worker pool, and with very marginal benefits.
I'm just wondering now whether this is known/documented behavior, or a bug...

댓글 수: 2

amir
amir 2014년 2월 7일
maybe if I had 12 pools, using 'parfor' was better than for. I think using 'parfor' in script makes some overhead. you can see it in your results too (20.7096 -->'for' 67.3600-->'parfor').
I think its not a known behavior because at least it must shows some warning or something.
Thanks for your help.
Matt J
Matt J 2014년 2월 7일
Interestingly, I seem to be encountering the opposite behavior here
There, I have an example where putting the code inside a function is slower than inside a script.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Parallel for-Loops (parfor)에 대해 자세히 알아보기

질문:

2014년 2월 3일

댓글:

2014년 2월 7일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by