parfor loop errors on AMD cores limits
이전 댓글 표시
Hello,
I am trying to run a simple parfor script on nodes on our cluster. The code works fine until I try to use > 46 CPUs (workers) at once, on one server. Some of our latest nodes have 128 AMD cores. I can run up to 56 cores on our Intel CPU servers (nodes) , but on any AMD I get errors (java runtime and others) when using >46 cores. It would be great to use all 128 cores on these new nodes for our MATLAB code. I have tried increasing memory and I still get these errors when using > 46 cores.
I will attach the MATLAB crash dump, code and sbatch files.
My sbatch file (I have tried many, many different parameters) -
#!/bin/bash
#SBATCH -J pfor_matlab
#SBATCH -o pfor".%j".out
#SBATCH -e pfor".%j".err
#SBATCH -t 45:00
#SBATCH -N 1
#SBATCH -p normal
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=48
module load matlab
hostname -s
env | egrep SLURM
matlab -nosplash -nodesktop -r "pfor"
The sbatch produces this output in the SLURM .err file-
Error using parpool (line 145)
Parallel pool failed to start with the following error. For more detailed
information, validate the profile 'local' in the Cluster Profile Manager.
Error in pfor (line 5)
parpool('local', str2num(getenv('SLURM_CPUS_PER_TASK')))
Caused by:
Error using parallel.internal.pool.InteractiveClient>iThrowWithCause (line
670)
Failed to initialize the interactive session.
Error using
parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus
(line 781)
The interactive communicating job failed with no message
Thank you for any pointers!
Mark
댓글 수: 2
Walter Roberson
2021년 2월 9일
The volunteers are not likely to know the solution for this; you should open a support case.
Mark PIERCY
2021년 2월 9일
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Third-Party Cluster Configuration에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!