How to stop all workers simultaneously when an error occurs in one of the workers?

조회 수: 3 (최근 30일)
Hi guys
I am working with parpool with n number of workers. It is likely that one of the workers returns error at some points. So, I would like to catch error by means of:
parfor i = 1:length(Data)
Try
Simulation(i);
catch ME
stop all workers; % Not the parpool.I want the workers to stop doing %simulations. I do not want them to be closed
change something in Simulation(i);
start workers to do simulation(i);
continue;
end
end
and make some changes and start workers again.
Could you please let me know how to handle it?
Regards,
Vahid

답변 (2개)

Edric Ellis
Edric Ellis 2015년 7월 27일
You can do this using parfeval to send off individual tasks for execution on the workers, and then you can call cancel() on those tasks if you spot an error. Something like this:
% Initiate the work on the workers:
for i = 1:length(Data)
f(i) = parfeval(@Simulation, 1, i);
end
% Check the results, cancel all execution if an error is spotted
completedSuccessfully = true;
for i = 1:length(f)
try
[idx, result] = fetchNext(f);
catch E
% Get here if a simulation threw an error
cancel(f);
completedSuccessfully = false;
break;
end
end
if ~completedSuccessfully
% do stuff...
end

Walter Roberson
Walter Roberson 2015년 7월 24일
You can cancel() task objects. I think at one point I saw a way to determine all of the task IDs, but that is not something I have researched.
  댓글 수: 2
Vahid Ghorbanian
Vahid Ghorbanian 2015년 7월 25일
Walter
Thank you for the response. How can I create object? I do not know if each worker has to have its own object or not. Does the object have to be introduced in the parfor or outside of it?!! Could you please sent a sample code to do what I need?
Walter Roberson
Walter Roberson 2015년 7월 25일
For example,
CreateTask(j, @Simulation, num2cell(1:length(Data)))
and once you have found an error and want to restart, perhaps use recreate(j)
At the moment I do not see a way to access the results of one task other than to know which state it is in. I have not used these facilities so I am likely overlooking something.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by