Failed to start a 'local' parpool of 32 workers

조회 수: 2 (최근 30일)
Ninh DO
Ninh DO 2018년 12월 29일
답변: Jason Ross 2019년 1월 2일
Hi everyone,
I tried to start a 'local' parpool of 32 workers but it failed. This is MatLab 2017b on a hpc system which have 32 MatLab DCS licenses.
>> c = parcluster()
c =
Local Cluster
Properties:
Profile: local
Modified: false
Host: n0060
NumWorkers: 32
NumThreads: 1
JobStorageLocation: /global/home/users/ninhdo/.matlab/local_cluster_jobs/R2017b
RequiresMathWorksHostedLicensing: false
Associated Jobs:
Number Pending: 0
Number Queued: 0
Number Running: 0
Number Finished: 0
>> p = c.parpool(32)
Starting parallel pool (parpool) using the 'local' profile ...
Error using parallel.Cluster/parpool (line 86)
Failed to start a parallel pool. (For information in addition to the causing
error, validate the profile 'local' in the Cluster Profile Manager.)
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
675)
Failed to start pool.
Error using parallel.Job/submit (line 351)
An unexpected error occurred accessing properties: "CaptureDiary"
"CreateDateTime" "CreateTime" "DependentFiles" "Diary" "Error"
"ErrorIdentifier" "ErrorMessage" "FinishDateTime" "FinishTime"
"Function" "InputArguments" "DiagnosticWarnings" "Name"
"NumOutputArguments" "OutputArguments" "StartDateTime" "StartTime"
"StateEnum" "Worker"
Error using save
Error closing file
/global/home/users/ninhdo/.matlab/local_cluster_jobs/R2017b/Job5/Task26.out.mat.
The file may be corrupt.
I tried distcomp.feature( 'LocalUseMpiexec', false ) but it didn't help solve the problem. Do you have any idea why?

답변 (1개)

Jason Ross
Jason Ross 2019년 1월 2일
There seems to be an issue accessing your home directory. Try changing the JobStorageLocation to a local directory on the host, e.g. make a directory called /tmp/ninhdo/jobstorage and change the "JobStorageLocation" in the "Local" profile to point there. You can change this property through the Parallel > Manage Cluster Configurations menu, just edit the "Local" profile.
As for what the underlying issue is with your home directory is, it could be that the file is corrupt, it could be that it's being accessed by another local cluster elsewhere, it could be a permissions issue to/from this host, etc.

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by