How to set up a Matlab parallel cluster for thread-based environment

조회 수: 10 (최근 30일)
Hi,
I am starting to explore Matlab Parallel functionalities, and, I have to say, I am a bit confused about the process-based vs. thread-based environment.
First question: I have 2 clusters, namely, the local cluster and the "MatlabCluster" (remote cluster with 8 nodes, 32 workers). If I use
poop = parpool('MatlabCluster');
the default environment is the "process-based" environment. Correct? Can I use the remote cluster in a "thread-based" environment? If I do
pool = parpool('thread');
only the local cluster switches to 'thread'. Can I do the same with the remote cluster?
Second question: I am experimenting with distributed arrays. However, if I start the 'MatlabCluster' (remote cluster), I get few errors and the last error message is
No workers are available for FevalQueue execution
This happens for the line of code that uses distrubuted arrays. I read that FevalQueue is not supported in "thread-based environment". Does this error mean that, by default, the remote cluster is starting as "thread-based"? (which would contradict my first hypotesis?).

채택된 답변

Raymond Norris
Raymond Norris 2021년 6월 16일
The thread-based pool only runs on the same machine as the MATLAB client, similar to a local process-based pool. However, unlike the local pool, the threaded pool has a fixed startup size, which is the value returned by maxNumCompThreads. If you wanted a different number of workers started with a threads pool, you have to set it first in maxNumCompThreads. For example:
% Let's assume you have 8 physical cores, but only want to start a threaded
% pool of 2 workers.
old_threads = maxNumCompThreads(2);
parpool("threads");
Starting parallel pool (parpool) ... Connected to the parallel pool (number of workers: 2).
ans =
ThreadPool with properties: NumWorkers: 2
Keep in mind that setting maxNumCompThreads, in addition to effecting the number of workers started, may have an effect on your other MATLAB code.
You'll need to post a bit more (code, errors) to decipher the FevalQueue error.
  댓글 수: 2
Maria
Maria 2021년 6월 16일
편집: Maria 2021년 6월 16일
Thank you for your answer. With respect to the FevalQueue error, I am running some more test, and I start thinking that there is some problem with the distributed memory of nodes. I have some issues with the cluster that we set up, and I am already in contact with the Mathworks support since a week or so. However, I am able to run the remote cluster to some extent. I tried some code with a couple of parfor and I could see that all 32 workers were working.
Now, I tried to run a very simple test:
A = magic(4);
B = distributed(A);
And I get the warning:
Warning: The SPMD infrastructure has been initializing for 94 seconds. This may indicate a problem in initialization.
You might need to restart the pool.
And then
Error using distributed (line 282)
One or more futures resulted in and error.
Caused by:
No workers are available for FevalQueue execution.
The cluster has 8 nodes, 32 workers, that run Debian 10.9 (Buster). The client machine is also linux-based. The job scheduler is mjs. The firewall is disabled on all nodes, we already run tests including disabling the firewall on the client machine, and we excluded it as a problem.
During validation, the parpool "hangs" and I have to manually terminate Matlab because it does not respond anymore. This happens only when we use more than 1 node in the cluster.
How do I check the memory set up among the nodes of the cluster?
Raymond Norris
Raymond Norris 2021년 6월 16일
To get the memory on the Linux nodes, run
free -mth
This will give you the free & used memory.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 MATLAB Parallel Server에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by