Submit job to specified workers in an MJS cluster

조회 수: 3 (최근 30일)
Data Analysis
Data Analysis 2015년 12월 21일
댓글: Peter Schwander 2020년 1월 7일
My tasks consume a large amount of memory so that I should not submit more than 5 tasks to each node in our group cluster (each node has 8 workers). Can I tell MJS to use only 5 workers in each node? If not possible, can we do it manually?
  댓글 수: 1
Peter Schwander
Peter Schwander 2020년 1월 7일
I have a somewhat related question. In MATLAB R2019a, I want that have successive tasks to go on different worker nodes. This was possible e.g. in MATLAB2015b by chosing appropriate names for the workers, i.e.
./startworker -name worker000 -jobmanagerhost boltzmann.phys.uwm.edu -remotehost compute-0-0 -v -clean
./startworker -name worker001 -jobmanagerhost boltzmann.phys.uwm.edu -remotehost compute-0-1 -v -clean
./startworker -name worker002 -jobmanagerhost boltzmann.phys.uwm.edu -remotehost compute-0-2 -v -clean
./startworker -name worker003 -jobmanagerhost boltzmann.phys.uwm.edu -remotehost compute-0-3 -v -clean
However, with MATLAB R2019a, this scheme does not work anymore.

댓글을 달려면 로그인하십시오.

채택된 답변

Thomas Ibbotson
Thomas Ibbotson 2016년 1월 4일
Unfortunately, this is not possible. You could temporarily reduce the number of workers on each node to 5. The MJS startworker and stopworker scripts can be run remotely from the client machine. Of course you would not want to stop a worker that was running someone else's job, so you would need to pause the queue and wait for all other jobs to finish running first.
Something a bit like this (untested) code should work (replace 'MyMJSProfile' with your MJS profile):
c = parcluster('MyMJSProfile');
% Pause the queue so no more jobs will run
pause(c);
% Create and submit our job to the queue
j = batch('myJobScript');
% Promote the job to the top of the queue so it will
% run next
while true
[~, q, ~, ~] = findJob(c);
if q(1) == j
break;
else
promote(c, j);
end
end
% Wait for all other jobs to finish running
[~, ~, runningJobs, ~] = c.findJob;
if ~isempty(runningJobs)
wait(runningJobs);
end
% Now run stopworker on each node until there are only 5 workers on each node.
% Note I've missed out the code that loops through the nodes and stops the
% right number of workers.
system([matlabroot '/toolbox/distcomp/bin/stopworker -remotehost myNode1 -name myWorker1']);
% Resume the queue so the job runs
resume(c);
wait(j);
% The job has finished now so we can start the workers again
system([matlabroot '/toolbox/distcomp/bin/startworker -remotehost myNode1 -name myWorker1']);

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Cluster Processes and Profiles에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by