All workers aborted during execution of parfor loop

조회 수: 26 (최근 30일)
Birjit
Birjit 2023년 6월 12일
답변: Raymond Norris 2023년 6월 15일
" Error using distcomp.remoteparfor/rebuildParforController
All workers aborted during execution of the parfor loop.
Error in distcomp.remoteparfor/handleIntervalErrorResult (line 259)
obj.rebuildParforController();
Error in distcomp.remoteparfor/getCompleteIntervals (line 396)
[r, err] = obj.handleIntervalErrorResult(r);
Error in MTD_Finalinv (line 655)
parfor i=16:numberofplaintexts
The client lost connection to worker 1. This might be due to network problems,
or the interactive communicating job might have errored. "
This is the error when I use 1 lakh csv files for a parallel computation involving code. But the same code runs properly for 50000 csv files without any error. Anyone knows the why it is so???

답변 (1개)

Raymond Norris
Raymond Norris 2023년 6월 15일
When I see
The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have errored
I'm inclined to think it's because one of the workers crashed (out of memory). One suggestion I have is to use ticBytes/tocBytes to see how much data is being passed in/out of the workers. However, this won't tell you how much is being consumed in the parfor loop. To get a sense of how much is being consumed, try the following.
Refactor your parfor so that is calls a single function, as such
function birjit
ticBytes(gcp)
parfor idx = 16:numberofplaintexts
unit_of_work()
end
tocBytes(gcp)
end
function unit_of_work
A = rand(10);
whos
end
Because of transparency rules, you need to call whos within a function, not directly in parfor.
Another consideration is using parforOptions
function birjit2
opts = parforOptions(parcluster,'RangePartitionMethod','fixed','SubrangeSize',N);
parfor (idx = 16:numberofplaintexts,opts)
...
end
Here, were are running a job array of (numberofplaintexts-16)/N jobs. This way each worker might not run out of memory.

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by