About parallel computation and inter process communication
조회 수: 1 (최근 30일)
이전 댓글 표시
Hello all!
There is a piece of code that deals with finding patterns in sequences of strings of varying length. Nothing overly complex - except that the main code includes three loops. Anyway - the basic premise is as follows:
- Load the entire data set (essentially as a cell array) consisting of rows of these sequences.
- Run the main code
- Write the output to a file.
Sequentially this process when running without any parallel directives takes "x" seconds.
Now: if I change this to:
- Load the entire data set
- Start matlabpool
- invoke spmd(n)
- Run the main code.
- Write the output to file.
The run time is approximately "10x"!!
The machine on which this is being run: 12GB RAM, i7 with 6cores etc. etc.
From my understanding, upon invoking spmd (since I just am interested in letting different workers perform the same job on different sets of data), the total data set is automatically divided. So - logically the run time should decrease.
However, while trying to figure this out: I also divided the data set into process specific files which are loaded based on respective "labindex". That also - did not provide any relief nor answers.
I have some background with MPI and F90 so I am assuming that the significantly increased run time with more than one worker is probably due to inter-process communication. If that is so: is there any way to prevent this?
The problem I am trying to solve is a disjointed one. One set of data has no bearing on the other - so there is no real need for one worker to talk to another.
Any insight would be greatly appreciated. This really has me intrigued.
Cheers!
댓글 수: 0
답변 (1개)
Edric Ellis
2014년 7월 14일
What sort of data are you passing into SPMD? Inside SPMD, only distributed arrays are automatically operated on in parallel. For example:
x = rand(5000);
xd = distributed.rand(5000);
spmd
x = x * x; % all workers operate on their own total copy of 'x'
xd = xd * xd; % each worker has a slice of 'xd', and they collaborate
end
댓글 수: 3
Edric Ellis
2014년 7월 15일
편집: Edric Ellis
2014년 7월 15일
Unless you need the (MPI-style) communication available within SPMD, you might be better off using PARFOR which can automatically divide up your problem. For example:
% build 'c' which is a 50x1 cell array where each cell is 100x100
c = mat2cell(rand(5000, 100), 100 * ones(50,1), 100);
% operate on 'c' in parallel
parfor idx = 1:numel(c)
out{idx} = max(abs(eig(c{idx})));
end
The key to getting PARFOR working in this case is that you index into your cell array ("c" in the above example) using the loop variable - this ensures the data is 'sliced', and therefore can be operated on efficiently in parallel.
참고 항목
카테고리
Help Center 및 File Exchange에서 MATLAB Parallel Server에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!