Is is possible to run a batch job on an MJS Cloud Center cluster with SpmdEnabled set to false?

조회 수: 2 (최근 30일)
I'm running some parallel pool computations on an MJS cluster (created with Cloud Center) using the batch command. Used this help article to set that up: https://www.mathworks.com/help/parallel-computing/run-a-batch-job.html#bu62o45.
I'm running with a pool of several hundred workers using the 'Pool' argument to the batch command. Unfortunately, the entire job will fail if any of the workers crash, which happens quite frequently.
Searching online, I've found that setting SpmdEnabled to false when using the parpool command will allow the task to complete on the remaining workers. I'd like to set this flag, but can't seem to find a way to do it using the batch command. Is there another way to disable SPMD support but also use the batch command with a parallel pool to submit a job to a cloud cluster?

채택된 답변

Edric Ellis
Edric Ellis 2023년 1월 4일
Unfortunately, this option is not supported at the moment for batch jobs. I realise it's probably rather a big change to your code, but you could use independent tasks using createJob and createTask.
  댓글 수: 4
Hridu Jain
Hridu Jain 2023년 1월 5일
Thanks, Edric. If I were to use your suggestion and rework my code, would I need to eliminate use of parfor and parsim?
I believe parfor works when using job = createCommunicatingJob(...,'Type','pool',...) but this would once again have SPMD support, right?
So, I would instead need to rework the code to work with createJob and run createTask in a for loop. Is that right? And would there be anyway to use parsim with this setup?
Here's a rough outline of how my program is currently setup:
function y = run_simulations()
% initialize
[a,b] = initialize_stuff();
% preprocess and prep for simulation
parfor i = 1:500
SimIn(i) = preprocess_simulation_inputs(a,b);
end
% run simulations
SimOut = parsim(SimIn);
% post-process
y = postprocess(SimOut);
end
I call my run_simulations function with the batch command with the Pool argument.
Edric Ellis
Edric Ellis 2023년 1월 6일
I'm afraid you'd need to change your parfor loop into a series of independent tasks using createTask. (Likewise, you wouldn't be able to use parsim).

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Get Started with Parallel Computing Toolbox에 대해 자세히 알아보기

제품


릴리스

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by