parfor calculations take longer time than for

I am starting to work with the Parallel Computing Toolbox, and just constructed simple example to compare for and parfor:
tic
a=rand(4000,4000);
k=size(a,1);
(par)for i=1:k
for j=1:k
a(i,j)=2/(2+a(i,j));
end
end
toc
I computed this with parfor and for (I wrote par in brackets so first time there is no "par", and second there is "par"). Computational time is several seconds, but parfor calculates it two or three times slower. I also used matlabpool (4 workers) before.
What is the problem?
Thanks in advance

댓글 수: 3

It is much slower on my dual-core machine too. I am also curious why it is.
Mikhail
Mikhail 2014년 10월 25일
Elapsed time is 7.915533 seconds. with parfor
Elapsed time is 0.919463 seconds. with for.
This is fun. for is 8 times faster (quad-core CPU, # of workers =4)
Mikhail
Mikhail 2014년 10월 25일
Actually I tried yo use both variants for 1 array a (so array a was the same with for and parfor). Same result

댓글을 달려면 로그인하십시오.

 채택된 답변

Mohammad Abouali
Mohammad Abouali 2014년 10월 25일
편집: Mohammad Abouali 2014년 10월 25일

1 개 추천

It has to do with the communication and the way you are addressing the memory or slicing the variable.
In general this sort of communication is causing too much interprocess communication and also you are addressing the memory in an uncoalesced fashion.
Note that MATLAB, unlike C, stores the variable by changing the first index first. This means that if A is a double precision variable with 4000x4000 element, then A(1,1) and A(2,1) are next to each other in the memory, while A(1,1) and A(1,2) are separated by 4000*sizeof(double). (Perhaps it has something to do with original implementation of MATLAB which was written in FORTRAN (not fact checked, this is what I have heard; they just wanted to keep it that way). FORTRAN also stores the variable with changing the first index first).
This means that
for i=...
for j=...
A(i,j)=...
end
end
increases the cache misses (there are too much communication between RAM and CPU. while
for j=...
for i=...
A(i,j)=...
end
end
Works on memory addresses which are close to each other; hence increases the cache hits; and consequently the performance. To get a feeling look at implementation 1 and 3. while implementation 1 took 0.884058 seconds on my system implementation 3 took only 0.451835 seconds. This difference would be larger and larger as your array sizes increases and if you run it on a system with lower cache memory on the CPU. By the way, the best way to implement this calculation is implementation 5. Let MATLAB handles the looping as much as possible. Underneath they have fine tuned the implementations, to use as much resources as possible.
Hope this would help. Below are couple of different implementations.
disp('1')
tic
a=rand(4000,4000);
k=size(a,1);
for i=1:k
for j=1:k
a(i,j)=2/(2+a(i,j));
end
end
toc
disp('2')
tic
a=rand(4000,4000);
k=size(a,1);
parfor i=1:k
for j=1:k
a(i,j)=2/(2+a(i,j));
end
end
toc
disp('3')
tic
a=rand(4000,4000);
k=size(a,1);
for j=1:k
for i=1:k
a(i,j)=2/(2+a(i,j));
end
end
toc
disp('4')
tic
a=rand(4000,4000);
k=size(a,1);
parfor j=1:k
for i=1:k
a(i,j)=2/(2+a(i,j));
end
end
toc
disp('5')
tic
a=rand(4000,4000);
a=2./(2+a);
toc
1
Elapsed time is 0.884058 seconds.
2
Elapsed time is 8.502811 seconds.
3
Elapsed time is 0.451835 seconds.
4
Elapsed time is 3.407372 seconds.
5
Elapsed time is 0.216000 seconds.

댓글 수: 5

Mikhail
Mikhail 2014년 10월 26일
Thanks a lot for such detailed answer. I didn't know this feature about memory. However, in my example, still "for" is faster then "parfor". I know,that I could use implementation 5, but I intentionally used for loops - I created this simple problem in order to compare "for" and "parfor".
May be someone know when it is better to use parfor? Could I solve my problem with parfor faster, then with for? Thanks
Mikhail
Mikhail 2014년 10월 26일
Btw, in my matlab the difference between methods '2' and '4' is much less:
2 Elapsed time is 7.020168 seconds.
4 Elapsed time is 6.894523 seconds.
Generally I don't get a good speed up when I slice a 2D array with only one index, (same case as explained here), It should have something to do with the way matlab distributes the variable among the processes. Maybe it makes it a broadcast variable. Anyway, avoid these cases or design your program from the beginning that it is single index such as a{i}(j) or something like this.
But if parfor used properly it would speed up the code. Try to apply a certain function on multiple image for example. For example let's say you have 10 images, and you want to apply convolution command on it. In this case you would see some speed up.
something link this
parfor i=1:n
-read image i
-apply convolution on image i
-store the convoluted results to a new image file or a new variable such as processedImage{i}.
end
This case you would see some speed up.
Mikhail
Mikhail 2014년 10월 26일
편집: Mikhail 2014년 10월 26일
I'll try this, thanks
Great answer, thanks @Mohammad Abouali

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Parallel for-Loops (parfor)에 대해 자세히 알아보기

질문:

2014년 10월 25일

댓글:

2022년 6월 16일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by