Why do I get an error initializing MPI when validating my cluster?

When I try to validate my MDCS cluster on Linux, I get an error validating Parallel Pool that reads:
 
Stage: Parallel pool test (parpool)
Status: Failed
Description:The validation stage encountered a MATLAB exception.
Command Line Output:(none)
Error Report:
Failed to initialize the interactive session.
 
Caused by:
    Error using parallel.internal.pool.InteractiveClient>iThrowIfBadParallelJobStatus (line 759)
    The interactive communicating job errored with the following message: Cannot rerun task because there are no rerun attempts left (The task has no rerun attempts left.).
    Original cancel message:
    The task was cancelled by user "matlab" on machine "c086.cm.cluster" with message: "MPI initialisation failed:
    Not enough worker Java memory to allocate data.  Refer to the troubleshooting section of the documentation for information on how to increase the size of the local Java memory and the memory on the head node and the workers.".
Debug Log:(none)

 채택된 답변

MathWorks Support Team
MathWorks Support Team 2014년 7월 16일

0 개 추천

This issue is likely caused by a limit on the number of processes available to your user account. You should:
1) Make sure the MDCE + MJS + MDCS workers are launched with "ulimit -u unlimited" set.
2) Also make sure matlab is launched with "ulimit -u unlimited" set.
 
If this works, please speak with your system administrator to increase your user process limit in /etc/security/limits.conf and/or system's default login shells.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Startup and Shutdown에 대해 자세히 알아보기

태그

아직 태그를 입력하지 않았습니다.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by