필터 지우기
필터 지우기

Parfor loop just hangs, CPU usage goes to zero

조회 수: 23 (최근 30일)
JohnDapper
JohnDapper 2016년 3월 14일
답변: 海粟 吴 2024년 5월 30일
Hi all. Here is a sample code of what I am attempting to run.
parfor i = 1:num
answer(:,i) = someFunction(someData(:,i));
end
Key information: "someFunction" is a C++ mex file. "someData" is a memmapfile (memmapfilename.data) because it is too large to be loaded onto each worker
Oddly, the parfor loop just hangs, the CPU usage goes to zero, and when I CTRL+C, here is what I get:
Operation terminated by user during distcomp.remoteparfor/getCompleteIntervals (line
127)
In parallel_function>distributed_execution (line 820)
[tags, out] = P.getCompleteIntervals(chunkSize);
In parallel_function (line 587)
R = distributed_execution(...
This isn't an issue if I replace the "parfor" with a simple "for" - everything works fine. What seems to happen is that some of the workers become unresponsive. After the above issue is encountered, even running a simple command such as
pctRunOnAll 1+1
will return "2" on only some, but not all, workers.
Any help would be great. A fresh re-installation did not help. Validation for "parpool" passed.
  댓글 수: 6
David Saidman
David Saidman 2020년 1월 14일
Did anybody get any success with this? I'm having exact same 2017b, definitely no keyboard statement.
If I wait a bit, it ends up running but on a single CPU (event tho I have 18 in my pool on a cpu with 20 physical and 40 logical cores, about 10gb spare memory in performance monitor).
海粟 吴
海粟 吴 2024년 5월 30일
parallel.internal.parfor.ParforEngine/getCompleteIntervals
位置 parallel_function>distributed_execution (第 746 行)
[tags, out] = P.getCompleteIntervals(chunkSize);
位置 parallel_function (第 578 行)
R = distributed_execution(...
Same problem observed in 2024a, this problem remains for 7 years, and no solution came out yet.

댓글을 달려면 로그인하십시오.

답변 (8개)

Dave Behera
Dave Behera 2016년 3월 24일
It seems that there is a deadlock when the workers are trying to the access the file using the same object (that you got from memmapfile). Due to that, the progress is getting stalled with zero CPU usage and no abort message.
Can you try creating a separate memmapfile object within each parfor iteration and passing it to the someFunction function? This may make the file access thread-safe.
Also, could you try the same workflow with spmd?
  댓글 수: 9
Arabarra
Arabarra 2021년 1월 11일
Same problem here. In my case it is not reproductible, sometimes it will work, sometimes not. No deadlocks or anything suspicious in the code.
海粟 吴
海粟 吴 2024년 5월 30일
Agree on what had been discussed above, it happened on 2024a too. When the MATLAB could solve this problem. Soon or Nerver?

댓글을 달려면 로그인하십시오.


arvid Martens
arvid Martens 2018년 1월 9일
I noticed that the problem started to occur after I updated the drivers of the GPUs that are being used during the calculations. Rolling back the drivers resolved the problem. However, new GPU hardware is on its way, as the current ones are pretty old. So I hope the problem is resolved by then.
Is there a way to throw an error when this stalling occurs? I could write an error handling to reduce the time lost by this stalling.

Andrea Stevanato
Andrea Stevanato 2018년 7월 13일
I have the same error with matlab 2018a.

Sanjay Manohar
Sanjay Manohar 2019년 6월 3일
편집: Sanjay Manohar 2019년 6월 3일
I was having the same parfor problem, until I noticed I had a "keyboard" instruction in my code.

DeepSea
DeepSea 2021년 8월 15일
I've been stucked in this problem for couples of weeks, and fixed it by removing "continue" in an if-judgement and a for-loop.
for CondA
...
if CondB
continue; % Avoid using "continue"
end
...
end
  댓글 수: 1
海粟 吴
海粟 吴 2024년 5월 30일
My code have the similar structure, it has while in forloop

댓글을 달려면 로그인하십시오.


Aditya Shukla
Aditya Shukla 2021년 10월 23일
I suddenly got this problem since yesterday, before which all the code ran nicely. I really do not know why this happened. it is so annoying. Any one found a solution?

Tianzong Wang
Tianzong Wang 2022년 10월 27일
Same here, most cores are not working. Any suggestions? And what is the JCEF?

海粟 吴
海粟 吴 2024년 5월 30일
Exactly same problem encountered in 2024a!!

카테고리

Help CenterFile Exchange에서 Parallel for-Loops (parfor)에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by