MATLAB Answers

Calling parpool with SpmdEnabled = False

조회 수: 20(최근 30일)
Jose Sanchez
Jose Sanchez 29 Jan 2020
답변: Edric Ellis 30 Jan 2020
I am having an issue with my University cluster. It is that when one worker crash, for a reason like "communication with the worker is lost probably due to a network problem", then all the remaining workers crash, and it may happen again and again!
Because I am using parfor and it doesn't require communication among the workers like spmd, then I would like all the other workers to finish their job!
I learnt online that you can solve this issue by calling your HPC cluster with the parameter "SpmdEnabled" equals False. What I did but MATLAB ignored my request as you can see in the Warning message at the end.
My question is, how can I solve this issue?
parpool('HPCServerProfile1', 160, 'SpmdEnabled', false)
Starting parallel pool (parpool) using the 'HPCServerProfile1' profile ...
Warning: Disabling SPMD on parallel pools is not supported on this cluster type. Set 'SpmdEnabled' value to true.
Connected to the parallel pool (number of workers: 160).

  댓글 수: 0

댓글을 달려면 로그인하십시오.

채택된 답변

Edric Ellis
Edric Ellis 30 Jan 2020
Unfortunately, only MJS and Local cluster types support SpmdEnabled = false. You might be able to use the "cluster parfor" approach though - see the documentation. Basically, you would transform your main parfor loop like so:
% Important: do *not* create a parallel pool prior to running this!
% In fact, you may wish to call "delete(gcp('nocreate'))" to be sure.
% Get the cluster object
c = parcluster('HPCServerProfile1');
% Create the parforOptions structure
opts = parforOptions(c);
% Run the parfor loop directly on the cluster with no parallel pool
parfor (idx = 1:10000, opts)
% Body of the parfor loop
This should be more robust, however it does incur more overhead than the interactive parpool approach.

  댓글 수: 0

댓글을 달려면 로그인하십시오.

추가 답변(0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by