R2024b parpool crashing when being activated with 24 workers.

조회 수: 64 (최근 30일)
Matteo D'Ambrosio
Matteo D'Ambrosio 2024년 9월 25일
댓글: Sergio E. Obando 2025년 1월 2일
!!! Update: These crashes seem to be happening quite randomly, regardless of the number of workers that are used.
Dear all,
Whenever i try to start a parpool with >20 workers on the processes profile, an error occurs and the parallel pool automatically gets shut down. I have tried validating the profile with the cluster profile manager, and using any value above 20 workers seems to be producing this error for some reason, despite my CPU having 24 cores. I've never experienced this problem on Matlab 2024a, and I have always been able to start parallel pools with up to 24 workers.
Is there a known fix for this? It has only been happening since updating to Matlab R2024b. My CPU is an Intel Core i9-14900KF.
Thanks in advance, I attached the error below if it can be useful, and a few snapshots of the cluster profile manager validations.
Command window output:
Starting parallel pool (parpool) using the 'Processes' profile ...
Error using parpool (line 133)
Parallel pool failed to start with the following error. For more detailed information,
validate the profile 'Processes' in the Cluster Profile Manager.
Error in parallel.internal.ui.PoolHelper.startPool (line 12)
parpool();
^^^^^^^^^
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>@()checker.checkState()
(line 121)
The parallel pool job errored with the following message: MATLAB worker shut down
unexpectedly with status 1 during task execution.
Parallel pool using the 'Processes' profile is shutting down.
This parallel pool has been shut down.
Caused by:
The client lost connection to worker 2 (Task 2; Host: localhost), potentially due to
network issues or errors during the interactive communicating job.
With 16 workers (same output when using 20):
With 24 workers:
  댓글 수: 3
Matteo D'Ambrosio
Matteo D'Ambrosio 2024년 9월 25일
편집: Matteo D'Ambrosio 2024년 9월 25일
Thanks for the reply!
Yes the error messages are the same, the only difference is the number (ID) of the worker that fails.
Chao
Chao 2024년 12월 26일
I faced the same problem and I have no idea why it happened.

댓글을 달려면 로그인하십시오.

답변 (1개)

Sergio E. Obando
Sergio E. Obando 2024년 9월 25일
While not exactly the same error, this post covers some good troubleshooting steps: Validation Fails
If you prefer or if those steps do not resolve your issue, I would highly recommend contacting Technical Support.
  댓글 수: 8
Raffael
Raffael 2025년 1월 2일
Same here:
running a simulation with more than 60 workers crashed with R2024b on several machines.
The same simulation runs fine with R2024a using 700 cores/Matlab workers.
No idea why R2024b crashed; also running SPMD validation test.
in the Job log there is only a "Matlab crashed on worker XXX" message - no other useful information.
Raffael-
Sergio E. Obando
Sergio E. Obando 2025년 1월 2일
Raffael, please reach out to technical support. They can help you debug this issue and see if the root cause is similar to the one from the original post.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Startup and Shutdown에 대해 자세히 알아보기

제품


릴리스

R2024b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by