Parallel computing on a cluster

조회 수: 5 (최근 30일)
Mads
Mads 2014년 7월 7일
댓글: Johannes Kalliauer 2020년 2월 3일
I have a script test.m that includes parfor-loops.
In MATLAB 2014a on my personal computer it runs the parallel job perfectly.
On a huge Linux computer cluster it runs test.m perfectly if I have started MATLAB 2014a graphically through X on the frontend.
However, when submitting test.m to the queue it discards the parallelness and runs everything as in for-loops -- on a single core on the given node.
What I write:
>> submat -q q12 test.m
q12 is the queue name.
Anyone with a clue??
  댓글 수: 2
Kevin Claytor
Kevin Claytor 2014년 7월 7일
This would probably be a question more suited for your cluster sysadmin. My guess is there's probably something in your queue submission file that is missing. Your cluster probably has a help page / wiki (for example, Duke's is: https://wiki.duke.edu/display/SCSC/DSCR ), I'd start there.
If you still can't find anything, we'll need some more details, for instance, what scheduler are you using? Is 'submat' a script or a command? If it's a script, can you post it?
Mads
Mads 2014년 7월 8일
Hi Kevin
I found out that submat was a wrapper. I didn't know that. So it using PBS for queueing.
This is what
submat -q q12 test.m
meant
#!/bin/sh
#PBS -q q12
#PBS -A mmatlab
#PBS -S /bin/bash -N test -j oe
version=R2014a
rel=$(uname -r | tr '-' '.' | awk -F. '{print $3}')
if [ ${rel:-25} -le 18 -a $version == "R2014a" ]; then
version=R2010b
echo "NB: Using Matlab version R2010b"
echo ===============================
fi
export PATH=/com/matlab/$version/bin:$PATH
cd /home/msv/Projects/SAR/W6
echo "======= Started at `date` ======="
echo
matlab -nojvm -nodisplay -r "test;exit"
echo
echo "======= Finished at `date` ======="
#

댓글을 달려면 로그인하십시오.

채택된 답변

Shashank Prasanna
Shashank Prasanna 2014년 7월 7일
My guess is that when you queue it, it launches MATLAB without Java. Java is required to use PCT. That also explains why it works fine when you X11 forward MATLAB. See below:
"The client session of MATLAB must be running the Java® Virtual Machine (JVM™) to use Parallel Computing Toolbox software."
  댓글 수: 3
Shashank Prasanna
Shashank Prasanna 2014년 7월 8일
you could alternatively consider "-nodesktop" which launches jvm but not the desktop.
Johannes Kalliauer
Johannes Kalliauer 2020년 2월 3일
use -noFigureWindows -nosplash -nodesktop -nodisplay instead of -nojvm.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Thomas Ibbotson
Thomas Ibbotson 2014년 7월 8일
We would need to see the code for 'submat', but my guess is that an independent job is being created rather than a communicating job. If you want to run a script with parfor loops on a cluster you need a communicating 'pool' job. For example you can submit one with 'batch' like this:
myCluster = parcluster('myClusterProfile');
job = batch(myCluster, 'test', 'Pool', myCluster.NumWorkers - 1);
wait(job);
fetchOutputs(job);
The 'Pool' argument instructs batch to create a communicating 'pool' job using the given number of workers to create the pool. You need to have at least 1 spare worker to act as the 'client', which is why I subtracted 1 from the total number of workers that the cluster has.
For more information see: Run a batch parallel loop

Mads
Mads 2014년 7월 9일
Thanks for all the answers, they were all good, given the incomplete information I provided.
It turned out that turning on java in the job submission did the trick. So I removed the
-nojvm
from the terminal command.
Best wishes

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by