Specify the parallel pool job timeout

Maria

2021 10월 1

1 답변

조회 수: 20 (30일)

0 개 추천

Hi,

I am running some tests using a remote cluster (no local cluster). I send my functions in batch mode. I know that the functions take a large execution time, around 2 / 4 hours. When I try to run, I get the message:

'The parallel pool job was cancelled because it failed to finish within the specified parallel pool job timout of 300 seconds'

I looked into the documentation how to change the default timeout time. The only way I could find is with the "wait" command, as

wait(job,"finished",18000);

However, I keep on getting the same error. How can I change the default parallel pool job timeout in the remote cluster?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

답변 (1개)

Raymond Norris 2021년 10월 1일

MATLAB Online에서 열기

0 개 추천

So you doing something like the following

cluster = parcluster;
job = cluster.batch(@mycode,...., 'Pool',size);

Then what you're suggesting is that let's say your code looked something like

function mycode
pause(10 * 60)
parfor idx = 1:N
    ...
end

On the cluster, the workers have a default timeout of 5 minutes and therefore errors out because you're running code (the pause) for 10 minutes before the workers are being used in the parfor.

'The parallel pool job was cancelled because it failed to finish within the specified parallel pool job timout of 300 seconds'

I tried quickly reproducing this with the local scheduler, but didn't (it shouldn't matter if I'm jusing local). How are you getting the error message? And which scheduler are you using MJS or generic (e.g. PBS)?

댓글 수: 9
이전 댓글 7개 표시 이전 댓글 7개 숨기기

Maria 2021년 10월 1일

MATLAB Online에서 열기

Aha, this is good to know!

But now I took away the distributed and I get

MATLAB worker exited with status 9 during task execution.

What does that mean?

Raymond Norris 2021년 10월 2일

One of the workers crashed, possibly because of an out of memory issue. Email Technical Support (support@mathworks.com) and they can walk you through debug steps and getting the MJS log files to troubleshoot.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

카테고리

도움말 센터 및 File Exchange에서 MATLAB Parallel Server에 대해 자세히 알아보기

제품

릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Specify the parallel pool job timeout

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 9 이전 댓글 7개 표시 이전 댓글 7개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 9
이전 댓글 7개 표시 이전 댓글 7개 숨기기