Validation parallel cluster profile error because of the plugin function 'independentSubmitFcn.m' error

조회 수: 10 (최근 30일)
Got error in validating parallel cluster profile, the error message is:
Error Report: Job submission failed because the plugin function 'independentSubmitFcn.m' errored.
Caused by:Brace indexing is not supported for variables of this type.
I used matlab-slurm plugins provided in this github repo. This seems confusing since the same cluster profile can be validated several days ago.
Thanks for any reply!
  댓글 수: 2
Damian Pietrus
Damian Pietrus 2023년 10월 27일
A few questions before we do some troubleshooting:
  • Is the client that's submitting jobs on the cluster itself, or are you on a remote machine?
  • Have you made any edits to the plugin files themselves?
  • Does the error continue after restarting MATLAB?
We can try to manually submit the job to get more information from the log file. Please make sure that the Slurm cluster is set as your default from the "Parallel" drop-down menu, then try the following steps:
c=parcluster;
% Independent job
j=batch(c,@pwd,1,{});
If the job successfully submits, we can then wait for the job to finish before getting the log file. If the job does not submit, please let me know if the error message is the same as in your post or if it changed.
% If the job submitted, wait for it to finish
j.wait
% Get the log file for the independent job
c.getDebugLog(j.Tasks(1));
Wei Jianwen
Wei Jianwen 2023년 11월 9일
Hi Damian,
There are some additional infomation:
  • This client submits a slurm job with a remote client, need to input username and password when parpool is started
  • I don't modify the plugin function 'independentSubmitFcn.m' mentioned in error message
  • I can start parallel pool normally after restarting MATLAB, but the same error may occur after several days
  • I set slurm cluster as default in cluster profile manager, is that right?
Since restarting MATLAB can fix this problem, I haven't tried to manually submit jobs, I will write aother comment for this post if I do so.
Thanks for your reply! :D

댓글을 달려면 로그인하십시오.

답변 (1개)

Damian Pietrus
Damian Pietrus 2023년 11월 10일
Hey Wei,
Thanks for sending that additional information. MATLAB uses SSH to connect to and run commands on a remote cluster. When MATLAB is left open for a long period of time, that connection may end up breaking down for one reason or another. Once it's broken, any additional interactions with the cluster will fail until a new connection is established. To work around the issue you can restart MATLAB or you can try the following to see if it helps:
clear all force
c=parcluster;
% Interact with the cluster here. You can use the Job Monitor, submit a
% new job, etc.

카테고리

Help CenterFile Exchange에서 Third-Party Cluster Configuration에 대해 자세히 알아보기

제품


릴리스

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by