MATLAB Answers

How are iterations assigned to workers in parfor?

조회 수: 12(최근 30일)
I am currently using parfor to process multiple raw data files, in the statement, it first checks if the raw file have already been processed, and only process if it does not see an existing output, like this:
RawDatalist=dir(fullfile(RawDataFolder,'*.txt'));
NumRawData=length(RawDatalist);
parfor i =1:NumRawData
if %output for RawDatalist(i).name already exist
Execute=false;
else
Execute=true;
end
if Execute
%Process RawDatalist(i).name
end
end
Obviously some iterations will take less time than others because there is no calculation involved. I am just wondering if iterations are 1)devided among works at the start of parfor, or 2)handed out one by one once a worker become available? If it's the first case then some workers will be just sitting idle while some others busy working, and I need to move the existance check out of the parfor loop.

  댓글 수: 2

Mohammad Sami
Mohammad Sami 8 Jan 2020
Please see here. Your code block in parfor should be independent of other iterations. This will ensure correct parallel execution. https://www.mathworks.com/help/releases/R2019b/parallel-computing/decide-when-to-use-parfor.html
Each execution of the body of a parfor-loop is an iteration. MATLAB workers evaluate iterations in no particular order and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If there are more iterations than workers, some workers perform more than one loop iteration; in this case, a worker might receive multiple iterations at once to reduce communication time.
Yi-xiao Liu
Yi-xiao Liu 8 Jan 2020
"a worker might receive multiple iterations at once to reduce communication time."
I would like to know more details, exactally when will a worker receive multiple iterations at once?

로그인 to comment.

채택된 답변

Edric Ellis
Edric Ellis 8 Jan 2020
As @Mohammad already commented, the parfor implementation automatically divides up the iterations of the loop onto the workers. Since R2019a, you can have some control over this division using parforOptions. The default division works well in most situations, even when the loop iterations do not take equal amounts of time. However, if there is a large imbalance, the division might not work well, and it may indeed be worth pre-computing which iterations need real work to be done.

  댓글 수: 2

Yi-xiao Liu
Yi-xiao Liu 8 Jan 2020
Can you be more specific? What is considered as "large imbalance"? Any way to monitor/profile how well the division works?
Edric Ellis
Edric Ellis 8 Jan 2020
Unfortunately, it's not terribly straightforward to come up with a way of monitoring this. One possibility is to use mpiprofile - it's not really intended for profiling parfor loops, but it does work. For example:
parpool('local', 4);
spmd, mpiprofile('on'); end
parfor idx = 1:100
if idx < 5
pause(1);
end
end
spmd, mpiprofile('viewer'); end
If you then select "Compare (max vs.min TotalTime)" in the UI, and pay attention to the calls to remoteParallelFunction - that indicates how much time was spent per worker executing loop iterations.
The degree of imbalance that is going to be problematic is a bit hard to quantify. But in general, the default loop division splits the iterations into about 3 * NumWorkers contiguous "ranges", of differing sizes. The worst-case scenario is where one "range" takes longer than everything else put together. But it also matters where the slower ranges turn up. If they get run last, then the normal dynamic scheduling doesn't have chance to "hide" this by running other stuff in parallel.

로그인 to comment.

추가 답변(0개)

이 질문에 답변하려면 로그인을(를) 수행하십시오.

태그

제품


릴리스

R2018a

Translated by