Unable to start parallel pool for more than 12 cores

조회 수: 9 (최근 30일)
Xiaofan Cui
Xiaofan Cui 2020년 12월 19일
댓글: Raymond Norris 2021년 2월 13일
Hi
My Matlab version is 2019a and my server has 8 cpus(Intel(R) Xeon(R) CPU E7- 8860 @ 2.27GH), each cpu has 10 cores with hyperthreading. Hence I thought I can at most set my "preferred number of workers in a parallel pool" to be 80. However, whenever I set my "preferred number of workers in a parallel pool" to be higher than 12, Matlab returns "failed to start parallel pool" to me. This is my cluster profile:
Thanks

답변 (1개)

Raymond Norris
Raymond Norris 2020년 12월 19일
I'm a bit confused how setting the default size of a parallel pool would throw "failed to start parallel pool", since setting the size in the profile doesn't start a pool. I'm gathering that your Intel E7-8860 has 8 CPUs with 10 cores/socket plus hypertheading (that is, the 10 cores don't reflect the HT). Where are you running your MATLAB client, on your local workstation or on one of the server nodes?
Although you can run a local pool on a single node on the server, I'm wondering if you're running MATLAB on your local workstation, where there are less cores. Run the following in MATLAB on the workstation where you're setting the profile.
feature numcores
The local profile provides the settings for a local pool on the machine where the MATLAB client is running. If you want to run the pool of workers on your 80 core/node server, you either need to run MATLAB directly on the server (and use the 'local' profile) or create a new a new profile in your workstation MATLAB. This new profile would instruct MATLAB how to submit to scheduler (e.g. MJS, Slurm, etc.) on the cluster.
If this sounds about right, contact Technical Support (support@mathworks.com) -- they can walk you through the process of submitting parallel jobs on machines other than your local workstation.
  댓글 수: 2
Xiaofan Cui
Xiaofan Cui 2020년 12월 19일
편집: Xiaofan Cui 2020년 12월 19일
Thank you so much for your quick reply, Raymond. The problem occured when I am running my code. The "parfor" in my code triggered the parallel pool to start. Then the matlab keeps trying to start the parallel pool (some times can be 1 hour long), and then fails and return me this error.
I guess I am using MATLAB on a server node.
Raymond Norris
Raymond Norris 2021년 2월 13일
If you're running MATLAB on a server node, how many cores did you allocate to it? That is, I'm going to assume you're running under some scheduled environment (e.g. PBS) and if so, can you post your job script? It's possible that you only request 1-2 cores, but the local profile sees 80 and it's contending with other jobs running on the same node.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Third-Party Cluster Configuration에 대해 자세히 알아보기

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by