PCT : cannot cancel a running job without a PCT chrash

조회 수: 1 (최근 30일)
Mikaël LE GRAND
Mikaël LE GRAND 2019년 4월 2일
댓글: Edric Ellis 2019년 4월 4일
Hello
I think i have an infinite loop running in a worker. Each time i try to cancel it, PCT crashes. When i restart parallel pool with :
if isempty(gcp('nocreate'))
p = parpool(1);
else
p = gcp('nocreate');
end
and then i ask for jobs with :
p.Cluster.Jobs
which gives :
ans =
Job
Properties:
ID: 1
Type: concurrent
Username: blafa
State: running
SubmitDateTime: 02-Apr-2019 14:49:24
StartDateTime: 02-Apr-2019 14:49:33
Running Duration: 0 days 0h 14m 47s
NumWorkersRange: [1 1]
AutoAttachFiles: true
Auto Attached Files: List files
AutoAddClientPath: true
AttachedFiles: {}
AdditionalPaths: 9 paths
Associated Tasks:
Number Pending: 0
Number Running: 1
Number Finished: 0
Task ID of Errors: []
Task ID of Warnings: []
when i try to cancel it, PCT crashes :
p.Cluster.Jobs.cancel
The client lost connection to worker 1. This might be due to network problems, or the interactive communicating job might have
errored.

채택된 답변

Edric Ellis
Edric Ellis 2019년 4월 3일
When you run a parallel pool, PCT uses a parallel.Job behind the scenes to launch and co-ordinate the workers. By directly cancelling that Job, you're asking the PCT Cluster object to forcibly terminate all the worker processes. This causes the parallel pool session to abort, because the workers have been shut down. This is precisely what you're seeing here.
Could I ask: where is the actual problem you're encountering?
  댓글 수: 2
Mikaël LE GRAND
Mikaël LE GRAND 2019년 4월 4일
Hello Edric, thanks for your quicly answer.
I understand your comments. But the Job we talk about reappears each time i launch parallel pool, even after a reboot of my computer ! Like i said above, when i try to cancel it, parallel pool crashes. So, when i restart parallel pool and i refresh the Job Monitor, this damned job is still there, and so on.
I found no way to destroy it for ever... It continues to run, i think, because, shame on me, there is an infinite loop in the code of the worker. So, it never vanishes...
Edric Ellis
Edric Ellis 2019년 4월 4일
The job reappears because it is being used behind the scenes by the parallel pool. This is entirely normal - you should leave it running, and when the parallel pool is deleted, the job will be deleted.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

제품


릴리스

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by