How to shut down all running workers of paarpools?

조회 수: 24 (최근 30일)
Felix
Felix 2023년 3월 6일
댓글: Davy Figaro 2024년 5월 16일
How can I find and shut down all workers of all parpools that might currently be running?
During debugging I frequently run into crashes and out of memory errors. Often, some worker processes keep running and I would like to know, how to best close all of them, before starting another script.

답변 (3개)

Raymond Norris
Raymond Norris 2023년 3월 6일
Hi @Felix. If even if a single worker crashes, all workers will terminate. Can you elaborate a bit more on a couple of things
  1. Are you using a local pool or a cluster? If cluster, MJS or your own scheduler (and if so, which)?
  2. Which parallel constructs are you using (parfor, parfeval, etc.)? Can you give a simple example of what might crash. Not interested in the details (I'm sure the worker(s) are crashing), more interested in how your running the code.
  댓글 수: 1
Edric Ellis
Edric Ellis 2023년 3월 7일
Note that on "local" and MJS clusters, the parallel pool will not necessarily immediately terminate when a single worker crashes. On those clusters, pools that have not yet used spmd can survive losing workers.

댓글을 달려면 로그인하십시오.


Edric Ellis
Edric Ellis 2023년 3월 7일
You can shut down all remaining workers of the currently running pool by executing:
delete(gcp('nocreate'))
There should be no running workers other than in the current pool.
  댓글 수: 1
Davy Figaro
Davy Figaro 2024년 5월 16일
This shuts down the current parallel pool (created with parpool). How can I stop and clear all the workers without shutting down the pool?

댓글을 달려면 로그인하십시오.


Felix
Felix 2023년 3월 8일
  1. I'm using local pools on my machine with default settings. On my machine this defaults to 12 workers.
  2. So far, I'm using parfor and the run command with MultiStart problems. I'll sometimes start a pool before running a script via parpool to reduce runtime of that script.
A simple, somewhat pseudocode example of my monte carlo stuff might be:
relevant_input = randn(1000, 1);
relevant_output = nan(height(relevant_input), 1);
param = 10;
parpool;
my_fun = @(input) elaborate_function(par, relevant_input);
parfor h=1:height(relevant_input)
relevant_ouput(h,1) = my_fun(input);
end
function y = elaborate_function(par, x)
y = param*x.*sin(x);
end
Another use case is the MultiStart object with
ms = MultiStart('UseParallel', true, 'Display','iter');
, which I use with run.
My scripts sometimes crash and I have trouble restarting them, because some workers do not seem to clear their memory when they crash. When I try to restart I get warnings such as:
Starting parallel pool (parpool) using the 'Processes' profile ...
Preserving jobs with IDs: 10 12 13 because they contain crash dump files.
You can use 'delete(myCluster.Jobs)' to remove all jobs created with profile Processes. To create 'myCluster' use 'myCluster = parcluster('Processes')'.
However, these crash dump files and the preserved jobs hog up way too much memory on my machine. I am looking for a couple lines of code to put at the start of my scripts that search running jobs, such as the ones containing crash dump files and terminate them if they exist, so I don't have to type delete(myCluster.Jobs) every time myself.
  댓글 수: 1
Raymond Norris
Raymond Norris 2023년 3월 14일
I'm confused how the crash dump files and preserverd jobs how up too much memory. Do you mean disk space?
If a job is running, I'm not sure there would be a crash dump file (untill the end). And do you want to delete the crash file or the job? If you're running a parallel pool and the pool crashes, there's no job to delete.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by