- Are each of the MATLAB jobs running the same exact code?
- How are parallel pools starting up? Is parpool being called explicitly and if so with what size?
- If parpool is not passed an argument, the default size is min(12,24) ==> 12 workers
- If parpool is not called, parfor/spmd will start a pool (by default), which follows the same patter of min(12,24) ==> 12 workers.
- The min(12,24) is because 12 is the default pool size and 24 is the number of cores found on the node (or within cgroup).
Problem allocating 24 workers using parpool
조회 수: 23 (최근 30일)
이전 댓글 표시
I am trying to execute a few jobs in parallel on a computing cluster. The maximum number of workers available per node is 24. When I execute the job, some jobs are allocated 24 workers, some are allocated 12, whereas some jobs are only running on one worker. What could possibly be the reason for this when the job script is the same for all jobs requesting for 24 workers per node? I looked at the job output for the job running on one worker and it says "
>> >> >> Starting parallel pool (parpool) using the 'local' profile ...
>> >> >> >> >> Starting parallel pool (parpool) using the 'local' profile ..."
This shows that the parpool did start for this job, however, only one worker was allocated. When I check the output of a job running on 24 workers, it says
">> >> >> Starting parallel pool (parpool) using the 'local' profile ...
Connected to the parallel pool (number of workers: 24)."
I am trying to run all jobs using all 24 workers for each job but I am not able to. Any help would be greatly appreciated.
댓글 수: 3
답변 (1개)
Mohammad Sami
2020년 9월 18일
편집: Mohammad Sami
2020년 9월 18일
Perhaps you can try
core = feature('numcores');
pool = parpool('local',core);
disp(['Pool has been started with Num Workers ' num2str(pool.NumWorkers)]);
Additionally you can try restarting the pool, if the workers are less then the number of cores.
retries = 0;
retry_limit = 3;
while (pool.NumWorkers < core)
retries = retries + 1;
disp('Restarting parallel pool');
delete(pool);
pool = parpool('local',core);
disp(['Pool has been started with Num Workers ' num2str(pool.NumWorkers)]);
if(retries >= retry_limit)
break;
end
end
참고 항목
카테고리
Help Center 및 File Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!