필터 지우기
필터 지우기

Parfor with Iterations Having Unequal Execution Times

조회 수: 10 (최근 30일)
Greg
Greg 2011년 12월 7일
답변: Martin Ryba 2021년 4월 7일
I've run into an interesting problem with parfor...
I'm trying to use parfor to evaluate a bunch of different input cases of a program simultaneously. I know beforehand that the cases are not going to take the same amount of time to evaluate, and I can tell you with reasonable accuracy which ones are going to take the longest. The number of cases that each worker is running is very low (about 40 total cases being distributed to 12 workers).
Some of these cases will take as much as three or four times longer than the faster ones, and so a lot of user time could be saved if I could predict what order the parfor loop was going to execute in and try to distribute the jobs intelligently.
The simple example would be this: say I have four jobs to run on two workers, three of which will take 10 seconds and one of which will take 30 seconds. Obviously the ideal solution is for one worker to evaluate the long job and the other worker to evaluate the other three in the same amount of time. What tends to happen, though, is that both workers will evaluate one of the short jobs first, and then one of them will wind up with 40 seconds of working time while the other worker sits idle for half that time.
Even more annoying, I think that sometimes there will be an available job sitting in the "queue" with a worker available, but it won't run because that job is supposed to go to a worker that is busy. The example here would be two 10-second jobs and two 30-second jobs, and one worker winds up evaluating both 30-second jobs.
Anyone have any clever ideas of how to work around this problem?

채택된 답변

Titus Edelhofer
Titus Edelhofer 2011년 12월 7일
Hi,
if your cases are input cases you might try to use the (slightly more elaborate way) of distributing the code using a job with 40 tasks. It helps at least for the distribution of the tasks, because they get distributed one by one. You might still end up that the longest run is the last in the row to be computed.
One step further would be to create more then one job where you distribute your tasks more clever.
Titus
  댓글 수: 1
Greg
Greg 2011년 12월 7일
After looking into the job scheduler stuff a bit, I think this is going to be the "right" way to do what I want, though it's going to take a little bit more effort to get there.

댓글을 달려면 로그인하십시오.

추가 답변 (4개)

Jaap
Jaap 2011년 12월 7일
is it possible to make an inner loop that goes through the faster cases?
or: if you know everything beforehand, make two parfor loops. while in the first, you will do all short/fast cases (hoping that rem(12,cases)==0). the next loop, you will do the rest in the same fashion.
  댓글 수: 1
Greg
Greg 2011년 12월 7일
That's not a bad idea, but it might come back to bite me if I'm a little bit wrong about which one is going to take the longest.

댓글을 달려면 로그인하십시오.


Titus Edelhofer
Titus Edelhofer 2011년 12월 7일
Hi,
here a rather ugly way which gives you full control on which worker processes which index:
function myparfortest
% this is a usual parfor loop
x = zeros(200, 1);
parfor i=1:200
x(i) = rank(magic(i));
end
% use spmd to do the same, although
spmd
y = zeros(200, 1);
% suppose we have 3 workers:
% worker1: process i=1, 4, 7, ...
% worker2: process i=2, 5, 8, ...
% worker3: process i=3, 6, 9, ...
for i=labindex:numlabs:200
y(i) = rank(magic(i));
end
end
% now collect the result
Y = zeros(200, 1);
for i=1:length(y)
yi = y{i};
Y(i:length(y):end) = yi(i:length(y):end);
end
% let's see if we did right:
isequal(x, Y)
Titus
  댓글 수: 1
Greg
Greg 2011년 12월 7일
This is a little hack-ish, but it's easy and it has promise if you know the execution order precisely beforehand. It won't dynamically re-evaluate like the job scheduler would, though, if some jobs start finishing earlier than expected.

댓글을 달려면 로그인하십시오.


Mikheil Azatov
Mikheil Azatov 2016년 12월 16일
A bit late to the party, but have you figured this out since ? I'm having the same issue with R2016b.

Martin Ryba
Martin Ryba 2021년 4월 7일
Hi I've noticed similar behavior all the way to r2020a. I have a big parfor loop where the execution time of each step is on the order of 10 minutes and I need to do 5000+ of them. I can get 100-150 workers. This gets submitted to a cluster (LSF) so if other jobs are on the node it will slow that node down. So, it appears the work unit allocation is frozen at the start of the loop, and you get to wait for the slowest of your workers to finish his list of jobs before the loop completes. Any good workarounds to give a more dynamic queue since the activity to manage the queue is small compared to the duration of each task?

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by