Distributed job validation passes but parallel job validation fails for Parallel Computation Toolbox.
이전 댓글 표시
Hi,
I am trying to use matlab parallel computation toolbox on a cluster. When I try to validate my scheduler configuration, the distributed job passes the validation but the parallel job fails with the following error:
Stage: Parallel Job
Status: Failed
Description: The given stage reached the default or user-specified timeout.
Command Line Output:
2346069.pbs001.palmetto.clemson.edu
Additionally I find the following error in the lob file on the cluster:
Node file: /var/spool/torque/aux//2346072.pbs001.palmetto.clemson.edu
Starting SMPD on node0218 node0219 node0275 node0276 ...
ssh node0218 "/opt/matlab-R2010a/bin/mw_smpd" -s -phrase MATLAB -port 26072
Warning: Permanently added 'node0218,10.125.1.218' (RSA) to the list of known hosts.^M
Permission denied, please try again.^M
Permission denied, please try again.^M
Permission denied (publickey,gssapi-with-mic,password).^M
Launching smpd failed for node: node0218
Stopping SMPD on ...
Exiting with code: 0
The settings which I have used for the scheduler are:
set(sched, 'ClusterMatlabRoot', '/opt/matlab-new');
set(sched, 'HasSharedFilesystem', true);
set(sched, 'ClusterOsType', 'unix');
set(sched, 'SubmitFcn',{@pbsNonSharedSimpleSubmitFcn,clusterHost, remoteDataLocation});
set(sched, 'ParallelSubmitFcn',{@pbsNonSharedParallelSubmitFcn, clusterHost, remoteDataLocation});
I have also setup a passwordless ssh connection using a rsa key. Could anyone tell me what is wrong with my configuration?
Thanks in advance.
댓글 수: 1
Sarah Wait Zaranek
2011년 3월 14일
Did you set up passwordless ssh between all nodes of the cluster?
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Job and Task Creation에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!