Parallel pool fails to start with maximum number of available workers

조회 수: 17 (최근 30일)
Haroon Zafar
Haroon Zafar 2023년 9월 28일
댓글: Damian Pietrus 2023년 10월 4일
Hi,
I have setup a Matlab parallel server with 14 workers. During the validation test all pass with 14 workers , except the last stage, paralell pool test (parpool ) which fails with inteactive session initialization error.
However, if I explicitly start the paralell pool using parpool in command window , by specifying 10 workers, parpool (10).,the paralle pool is able to start. But this is not working for 14 workers, thus I am able to acces only 10 workers.
I tried to change the range of number of workers in cluster prfile as [14 inf], so that prallel pool takes14 as minimum workers, but still pool starting fails.
Is there any fix for that or if this is any limitation with the license?
  댓글 수: 4
Haroon Zafar
Haroon Zafar 2023년 10월 3일
Thanks for your reply.
No other parpool command runs the cluster. Only parpool (10 ) allows the cluster to run, thoug hit fail sduring cluster validation stage.
>> parpool(14)
Starting parallel pool (parpool) using the 'MJSProfile1' profile ...
Error using parpool
Parallel pool failed to start with the following error. For more detailed information, validate the profile 'MJSProfile1' in the Cluster Profile Manager.
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job errored with the following message: A timeout occurred setting up the parallel pool communication.
>> parpool(12)
Starting parallel pool (parpool) using the 'MJSProfile1' profile ...
Error using parpool
Parallel pool failed to start with the following error. For more detailed information, validate the profile 'MJSProfile1' in the Cluster Profile Manager.
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job errored with the following message: A timeout occurred setting up the parallel pool communication.
>> parpool(10)
Starting parallel pool (parpool) using the 'MJSProfile1' profile ...
Connected to the parallel pool (number of workers: 10).
ans =
ClusterPool with properties:
Connected: true
NumWorkers: 10
Busy: false
Cluster: MJSProfile1 (MJS Cluster)
AttachedFiles: {}
AutoAddClientPath: true
FileStore: [1x1 parallel.FileStore]
ValueStore: [1x1 parallel.ValueStore]
IdleTimeout: 120 minutes (120 minutes remaining)
SpmdEnabled: true
EnvironmentVariables: {}
>>
I am using Matlab Job scheduler , 2022b. Both the parallel server PCs and client PC Matlab versions are same. Operating system on all PCs is Windows 10. I am tryng to setup Matlab server for my group and can manage cluster. License is online managed.
Log/Error files during cluster Validation at Client end:
VALIDATION REPORT
Profile: MJSProfile1
Scheduler Type: MJS
Stage: Cluster connection test (parcluster)
Status: Passed
Start Time: Tue Oct 03 20:22:36 BST 2023
Finish Time: Tue Oct 03 20:22:47 BST 2023
Running Duration: 0 min 12 sec
Description:
Details: Network latency: 10 ms
Upload speed: 1519 KB/s
Download speed: 1312 KB/s
For more information on these results, see the "Troubleshooting Slow Network Connection" documentation page.
Error Report:
Command Line Output:
Debug Log:
Stage: Job test (createJob)
Status: Passed
Start Time: Tue Oct 03 20:22:47 BST 2023
Finish Time: Tue Oct 03 20:23:41 BST 2023
Running Duration: 0 min 53 sec
Description:
Details:
Error Report:
Command Line Output:
Debug Log:
Stage: SPMD job test (createCommunicatingJob)
Status: Passed
Start Time: Tue Oct 03 20:23:41 BST 2023
Finish Time: Tue Oct 03 20:24:37 BST 2023
Running Duration: 0 min 56 sec
Description: Job ran with 14 workers.
Details:
Error Report:
Command Line Output:
Debug Log:
Stage: Pool job test (createCommunicatingJob)
Status: Passed
Start Time: Tue Oct 03 20:24:37 BST 2023
Finish Time: Tue Oct 03 20:25:37 BST 2023
Running Duration: 1 min 0 sec
Description: Job ran with 14 workers.
Details:
Error Report:
Command Line Output:
Debug Log:
Stage: Parallel pool test (parpool)
Status: Failed
Start Time: Tue Oct 03 20:25:37 BST 2023
Finish Time: Tue Oct 03 20:27:11 BST 2023
Running Duration: 1 min 34 sec
Description: Failed to initialize the interactive session.
Details:
Error Report: Failed to initialize the interactive session.
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job errored with the following message: A timeout occurred setting up the parallel pool communication.
Command Line Output:
Debug Log: CLIENT LOG OUTPUT
Created Task 1 of Job 5
Submitted Job 5
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Checking communicating job status.
Session failed to start when creating InteractiveClient. Error: Error using parallel.internal.pool.AbstractInteractiveClient>iThrowWithCause
Failed to initialize the interactive session.
Error in parallel.internal.pool.AbstractInteractiveClient/start (line 138)
iThrowWithCause( 'parallel:convenience:FailedToInitializeInteractiveSession', err );
Error in parallel.internal.pool.AbstractClusterPool>iStartClient (line 872)
spmdInitialized = client.start(sessionBuildFcn, sessionInfo, numWorkers, cluster, ...
Error in parallel.internal.pool.AbstractClusterPool.hBuildPool (line 630)
iStartClient(client, sessionInfo, forceSpmdEnabled, cluster, supportRestart, argsList);
Error in parallel.internal.types.ValidationStages>iOpenPoolForCluster (line 511)
aPool = parallel.internal.pool.AbstractClusterPool.hBuildPool('Cluster', cluster, 'NumWorkers', numWorkers);
Error in parallel.internal.types.ValidationStages>@()iOpenPoolForCluster(runInfo)
Error in parallel.internal.types.ValidationStages>iCallWithNoHotlinks (line 391)
[varargout{1:nargout}] = fcn();
Error in parallel.internal.types.ValidationStages>iRunParpoolStage (line 302)
[commandWindowOutput, aPool] = evalc(iWrapForEvalc(openPoolFcn));
Error in parallel.internal.types.ValidationStages/run (line 74)
[eventData, runInfo] = obj.RunFunction(obj, runInfo);
Error in parallel.internal.validator.Validator/runValidationSuite (line 191)
[eventData, stageRunInfo] = currentStage.run(stageRunInfo);
Error in parallel.internal.validator.Validator/validate (line 103)
status = obj.runValidationSuite(profileName, suite);
Error in parallel.internal.ui.AbstractValidationManager/validate (line 36)
obj.Validator.validate(profileName, validationSuite);
Error in parallel.internal.ui.ValidationManager.validateProfile (line 36)
parallel.internal.ui.ValidationManager.getOrCreateInstance().validate(profileName, suite);
Caused by:
Error using parallel.internal.pool.AbstractInteractiveClient>iThrowIfBadParallelJobStatus
The interactive communicating job errored with the following message: A timeout occurred setting up the parallel pool communication.
Failed to run the DisarmableOncleanup callback due to the following error:
Unrecognized method, property, or field 'pStopLabsAndDisconnect' for class 'parallel.internal.pool.InteractivePoolClient'.
JavaBackedSession.delete()
Failed to run the DisarmableOncleanup callback due to the following error:
Dot indexing is not supported for variables of this type.
Head Node Admin center status
Only warnings at Head node are due to difference in hostname and canonical host name. Rest everything is passed.
Head Node conncetivity test and warnings:
1 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 CLIENT_HOSTNAMES ResolveHostname Test SUCCESS
2 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 CLIENT_TEST OpenServerSocket Test (on port 27371+) SUCCESS
3 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 CLIENT_TEST GetClientInfo Test SUCCESS
4 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 CLIENT_TEST ResolveHostname Test SUCCESS
5 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 CLIENT_HOSTNAMES ResolveIPToHostname Test SUCCESS
6 03-Oct-2023 20:24:33 03-Oct-2023 20:24:33 10.126.83.41 10.126.83.41 MDCE_CONNECT LockDown Test SUCCESS
7 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 CLIENT_TEST ResolveIPToHostname Test SUCCESS
8 03-Oct-2023 20:24:33 03-Oct-2023 20:24:34 10.126.83.41 10.126.43.2 CLIENT_HOSTNAMES ResolveHostname Test SUCCESS
9 03-Oct-2023 20:24:33 03-Oct-2023 20:24:34 10.126.83.41 10.126.47.154 CLIENT_HOSTNAMES ResolveHostname Test SUCCESS
10 03-Oct-2023 20:24:33 03-Oct-2023 20:24:34 10.126.83.41 10.126.148.47 CLIENT_HOSTNAMES ResolveHostname Test SUCCESS
11 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 MDCE_HOSTNAME GetServerInfo Test SUCCESS
12 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 POOL2CLIENT ResolveHostname Test SUCCESS
13 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.47.154 CLIENT_HOSTNAMES ResolveIPToHostname Test SUCCESS
14 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 MDCE_HOSTNAME ResolveHostname Test SUCCESS
15 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.43.2 CLIENT_HOSTNAMES ResolveIPToHostname Test SUCCESS
16 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.148.47 CLIENT_HOSTNAMES ResolveIPToHostname Test SUCCESS
17 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 MPI_HOSTNAME ResolveHostname Test SUCCESS
18 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.83.41 10.126.83.41 POOL2CLIENT ResolveIPToHostname Test SUCCESS
19 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.43.2 10.126.43.2 MDCE_CONNECT LockDown Test SUCCESS
20 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.148.47 10.126.148.47 MDCE_CONNECT LockDown Test SUCCESS
21 03-Oct-2023 20:24:34 03-Oct-2023 20:24:34 10.126.47.154 10.126.47.154 MDCE_CONNECT LockDown Test SUCCESS
22 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.83.41 MDCE_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-4qmtc8k.) and canonical hostname (10.126.83.41) do not match. desktop-4qmtc8k. may be misconfigured or the domain name service may not be set up correctly.
23 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.47.154 10.126.83.41 POOL2CLIENT ResolveHostname Test SUCCESS
24 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.43.2 10.126.83.41 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
25 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.148.47 10.126.83.41 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
26 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.83.41 MPI_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-4qmtc8k.) and canonical hostname (10.126.83.41) do not match. desktop-4qmtc8k. may be misconfigured or the domain name service may not be set up correctly.
27 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.148.47 10.126.83.41 POOL2CLIENT ResolveHostname Test SUCCESS
28 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.83.41 POOL2CLIENT PingServerSocketHost Test SUCCESS
29 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.83.41 POOL2CLIENT ConnectToServerSocket Test (remote port 27371) SUCCESS
30 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.83.41 PORTS_AVAILABLE CheckServices Test (on port 27350) INFO 2 workers found.
31 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.43.2 10.126.43.2 MDCE_HOSTNAME GetServerInfo Test SUCCESS
32 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.47.154 10.126.47.154 MDCE_HOSTNAME GetServerInfo Test SUCCESS
33 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.148.47 10.126.148.47 MDCE_HOSTNAME GetServerInfo Test SUCCESS
34 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.47.154 10.126.83.41 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
35 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.43.2 10.126.83.41 POOL2CLIENT ResolveHostname Test SUCCESS
36 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.43.2 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
37 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.83.41 10.126.47.154 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
38 03-Oct-2023 20:24:35 03-Oct-2023 20:24:35 10.126.43.2 10.126.43.2 MPI_HOSTNAME ResolveHostname Test SUCCESS
39 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.47.154 10.126.47.154 MPI_HOSTNAME ResolveHostname Test SUCCESS
40 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.43.2 10.126.148.47 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
41 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-4qmtc8k.) and canonical hostname (desktop-4qmtc8k) do not match. desktop-4qmtc8k. may be misconfigured or the domain name service may not be set up correctly.
42 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.83.41 PORT_CONNECT PingServerSocketHost Test SUCCESS
43 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.83.41 PORTS_AVAILABLE OpenServerSocket Test (on port 27357+) INFO Opened server socket on port 27360.
44 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-4qmtc8k.) and canonical hostname (desktop-4qmtc8k) do not match. desktop-4qmtc8k. may be misconfigured or the domain name service may not be set up correctly.
45 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.47.154 10.126.43.2 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
46 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.43.2 10.126.47.154 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
47 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.148.47 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
48 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 POOL2CLIENT ResolveIPToHostname Test SUCCESS
49 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.47.154 10.126.148.47 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
50 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 POOL2CLIENT ResolveIPToHostname Test SUCCESS
51 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.148.47 10.126.83.41 POOL2CLIENT ResolveIPToHostname Test SUCCESS
52 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.148.47 10.126.43.2 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
53 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.148.47 10.126.47.154 INTERNODE_HOSTNAMES ResolveHostname Test SUCCESS
54 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.148.47 10.126.83.41 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-4qmtc8k.) and canonical hostname (desktop-4qmtc8k) do not match. desktop-4qmtc8k. may be misconfigured or the domain name service may not be set up correctly.
55 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.148.47 10.126.148.47 MPI_HOSTNAME ResolveHostname Test SUCCESS
56 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.47.154 MPI_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-g59pjvh.) and canonical hostname (10.126.47.154) do not match. desktop-g59pjvh. may be misconfigured or the domain name service may not be set up correctly.
57 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 POOL2CLIENT PingServerSocketHost Test SUCCESS
58 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 PORT_CONNECT ConnectToServerSocket Test (remote port 27360) SUCCESS
59 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.47.154 MDCE_HOSTNAME ResolveHostname Test SUCCESS
60 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.43.2 MDCE_HOSTNAME ResolveHostname Test SUCCESS
61 03-Oct-2023 20:24:35 03-Oct-2023 20:24:36 10.126.83.41 10.126.148.47 MDCE_HOSTNAME ResolveHostname Test SUCCESS
62 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 POOL2CLIENT ConnectToServerSocket Test (remote port 27371) SUCCESS
63 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 POOL2CLIENT ConnectToServerSocket Test (remote port 27371) SUCCESS
64 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 PORT_CONNECT PingServerSocketHost Test SUCCESS
65 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.47.154 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-g59pjvh.) and canonical hostname (desktop-g59pjvh) do not match. desktop-g59pjvh. may be misconfigured or the domain name service may not be set up correctly.
66 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.43.2 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-f175i16.) and canonical hostname (desktop-f175i16) do not match. desktop-f175i16. may be misconfigured or the domain name service may not be set up correctly.
67 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.83.41 POOL2CLIENT PingServerSocketHost Test SUCCESS
68 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.148.47 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-ojt73pk.) and canonical hostname (desktop-ojt73pk) do not match. desktop-ojt73pk. may be misconfigured or the domain name service may not be set up correctly.
69 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 PORT_CONNECT ConnectToServerSocket Test (remote port 27360) SUCCESS
70 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.83.41 PORT_CONNECT PingServerSocketHost Test SUCCESS
71 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.43.2 MPI_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-f175i16.) and canonical hostname (10.126.43.2) do not match. desktop-f175i16. may be misconfigured or the domain name service may not be set up correctly.
72 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.47.154 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-g59pjvh.) and canonical hostname (desktop-g59pjvh) do not match. desktop-g59pjvh. may be misconfigured or the domain name service may not be set up correctly.
73 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.148.47 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-ojt73pk.) and canonical hostname (desktop-ojt73pk) do not match. desktop-ojt73pk. may be misconfigured or the domain name service may not be set up correctly.
74 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.83.41 PORT_CONNECT ConnectToServerSocket Test (remote port 27360) SUCCESS
75 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.43.2 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-f175i16.) and canonical hostname (desktop-f175i16) do not match. desktop-f175i16. may be misconfigured or the domain name service may not be set up correctly.
76 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.47.154 MDCE_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-g59pjvh.) and canonical hostname (desktop-g59pjvh) do not match. desktop-g59pjvh. may be misconfigured or the domain name service may not be set up correctly.
77 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.43.2 MDCE_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-f175i16.) and canonical hostname (desktop-f175i16) do not match. desktop-f175i16. may be misconfigured or the domain name service may not be set up correctly.
78 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.148.47 MDCE_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-ojt73pk.) and canonical hostname (desktop-ojt73pk) do not match. desktop-ojt73pk. may be misconfigured or the domain name service may not be set up correctly.
79 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.43.2 10.126.148.47 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-ojt73pk.) and canonical hostname (desktop-ojt73pk) do not match. desktop-ojt73pk. may be misconfigured or the domain name service may not be set up correctly.
80 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.43.2 PORT_CONNECT PingServerSocketHost Test SUCCESS
81 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.148.47 PORT_CONNECT PingServerSocketHost Test SUCCESS
82 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.83.41 10.126.47.154 PORT_CONNECT PingServerSocketHost Test SUCCESS
83 03-Oct-2023 20:24:36 03-Oct-2023 20:24:36 10.126.47.154 10.126.43.2 PORT_CONNECT PingServerSocketHost Test SUCCESS
84 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.47.154 10.126.148.47 PORT_CONNECT PingServerSocketHost Test SUCCESS
85 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.47.154 10.126.47.154 PORT_CONNECT PingServerSocketHost Test SUCCESS
86 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.43.2 10.126.47.154 PORT_CONNECT PingServerSocketHost Test SUCCESS
87 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.148.47 MPI_HOSTNAME ResolveIPToHostname Test WARNING The hostname (desktop-ojt73pk.) and canonical hostname (10.126.148.47) do not match. desktop-ojt73pk. may be misconfigured or the domain name service may not be set up correctly.
88 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.47.154 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-g59pjvh.) and canonical hostname (desktop-g59pjvh) do not match. desktop-g59pjvh. may be misconfigured or the domain name service may not be set up correctly.
89 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.43.2 10.126.43.2 PORT_CONNECT PingServerSocketHost Test SUCCESS
90 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.83.41 POOL2CLIENT ConnectToServerSocket Test (remote port 27371) SUCCESS
91 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.83.41 PORT_CONNECT ConnectToServerSocket Test (remote port 27360) SUCCESS
92 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.43.2 INTERNODE_HOSTNAMES ResolveIPToHostname Test WARNING The hostname (desktop-f175i16.) and canonical hostname (desktop-f175i16) do not match. desktop-f175i16. may be misconfigured or the domain name service may not be set up correctly.
93 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.83.41 POOL2CLIENT PingServerSocketHost Test SUCCESS
94 03-Oct-2023 20:24:36 03-Oct-2023 20:24:37 10.126.148.47 10.126.83.41 PORT_CONNECT PingServerSocketHost Test SUCCESS
95 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.148.47 10.126.43.2 PORT_CONNECT PingServerSocketHost Test SUCCESS
96 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.83.41 10.126.47.154 PORTS_AVAILABLE CheckServices Test (on port 27350) INFO 4 workers found.
97 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.43.2 10.126.148.47 PORT_CONNECT PingServerSocketHost Test SUCCESS
98 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.83.41 10.126.43.2 PORTS_AVAILABLE CheckServices Test (on port 27350) INFO 4 workers found.
99 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.148.47 10.126.47.154 PORT_CONNECT PingServerSocketHost Test SUCCESS
100 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.83.41 10.126.148.47 PORTS_AVAILABLE CheckServices Test (on port 27350) INFO 4 workers found.
101 03-Oct-2023 20:24:37 03-Oct-2023 20:24:37 10.126.148.47 10.126.148.47 PORT_CONNECT PingServerSocketHost Test SUCCESS
102 03-Oct-2023 20:24:37 03-Oct-2023 20:24:38 10.126.43.2 10.126.43.2 PORTS_AVAILABLE OpenServerSocket Test (on port 27357+) INFO Opened server socket on port 27358.
103 03-Oct-2023 20:24:37 03-Oct-2023 20:24:38 10.126.47.154 10.126.47.154 PORTS_AVAILABLE OpenServerSocket Test (on port 27357+) INFO Opened server socket on port 27358.
104 03-Oct-2023 20:24:37 03-Oct-2023 20:24:38 10.126.148.47 10.126.148.47 PORTS_AVAILABLE OpenServerSocket Test (on port 27357+) INFO Opened server socket on port 27358.
105 03-Oct-2023 20:24:37 03-Oct-2023 20:24:38 10.126.83.41 10.126.83.41 CLIENT_TEST Cleanup Test SUCCESS
106 03-Oct-2023 20:24:37 03-Oct-2023 20:24:38 10.126.83.41 10.126.83.41 PORTS_AVAILABLE Cleanup Test SUCCESS
107 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.83.41 10.126.43.2 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
108 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.83.41 10.126.148.47 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
109 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.148.47 10.126.148.47 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
110 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.83.41 10.126.47.154 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
111 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.47.154 10.126.148.47 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
112 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.47.154 10.126.47.154 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
113 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.43.2 10.126.47.154 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
114 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.148.47 10.126.47.154 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
115 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.47.154 10.126.43.2 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
116 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.43.2 10.126.148.47 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
117 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.43.2 10.126.43.2 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
118 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.148.47 10.126.43.2 PORT_CONNECT ConnectToServerSocket Test (remote port 27358) SUCCESS
119 03-Oct-2023 20:24:38 03-Oct-2023 20:24:38 10.126.83.41 10.126.83.41 MDCE_CONNECT Cleanup Test SUCCESS
120 03-Oct-2023 20:24:38 03-Oct-2023 20:24:40 10.126.47.154 10.126.47.154 PORTS_AVAILABLE Cleanup Test SUCCESS
121 03-Oct-2023 20:24:38 03-Oct-2023 20:24:40 10.126.43.2 10.126.43.2 PORTS_AVAILABLE Cleanup Test SUCCESS
122 03-Oct-2023 20:24:38 03-Oct-2023 20:24:40 10.126.148.47 10.126.148.47 PORTS_AVAILABLE Cleanup Test SUCCESS
123 03-Oct-2023 20:24:40 03-Oct-2023 20:24:40 10.126.43.2 10.126.43.2 MDCE_CONNECT Cleanup Test SUCCESS
124 03-Oct-2023 20:24:40 03-Oct-2023 20:24:40 10.126.148.47 10.126.148.47 MDCE_CONNECT Cleanup Test SUCCESS
125 03-Oct-2023 20:24:40 03-Oct-2023 20:24:40 10.126.47.154 10.126.47.154 MDCE_CONNECT Cleanup Test SUCCESS
Only warnings at Head node are due to difference in hostname and canonical host name. Rest everything is passed.
Please let me know if you need anyother info.
Thanks,
Haroon
Damian Pietrus
Damian Pietrus 2023년 10월 4일
Based on the fact that you can successfully open a pool of 10 but not 12 or 14, then that leads me to believe that there is a communication issue tied to one of your compute nodes. In order for a parpool to open successfully, all of the workers on the compute nodes need to be able to connect to one another, as well as form a connection to the client. Since your communicating pool job succeeded with 14 workers, that further indicates that the workers on the compute node can communicate, but the communication breaks down when trying to connect to the client.
For next steps, I'd recommend trying to ping the client's hostname from each of the compute nodes. The fact that there is a difference in hostname and canonical host name on your desktop machine could be causing the issue, so I'd try pinging both to see how the results compare.
You could also try using the pctconfig command to explicitly define your client's (where you are running the validation from) hostname before you try your validation:
pctconfig('hostname','hostname-of-client-goes-here');
If this works, we can confirm that the hostname was the issue. You can either modify your network/DNS, look further into why there are those hostname issues, or use the pctconfig command every time you start MATLAB to define your client's hostname.
If you do need some further support, you can also contact our Installation and Licensing Support Team. They can be reached at +1-508-647-7000. Dial option 3, then 1 at the menu or email support@mathworks.com.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by