Slow Training of RL Agent on HPC Compared to Local Machine

조회 수: 6 (최근 30일)
Gaurav
Gaurav 2024년 6월 6일
댓글: Harald 2024년 6월 7일
I am currently running a MATLAB 2021a script (execute.m added as attachment for reference) to train a reinforcement learning (RL) agent in Simulink to control a drone. While training it in my local machine it connects to 6 workers and the training speed is much higher compared to HPC which is connected to 12 workers. I have ensured that the whole node is assigned to the the job with 28 cores in total.
Here is the SLURM script:
#!/bin/bash -l
#SBATCH -J MATLAB_Execute # Job name
#SBATCH -N 1 # Number of nodes
#SBATCH -n 1 # Number of tasks (1 instance of the program)
#SBATCH -c 28 # Number of CPU cores per node
#SBATCH --gres=gpu:0 # Number of GPUs per node
#SBATCH --time=1:00:0 # Time limit (10 minutes)
#SBATCH -p batch -C skylake # Partition name (GPU partition)
export JAVA_LOG_DIR=/scratch/users/gshetty/java_logs
mkdir -p $JAVA_LOG_DIR
# Load the MATLAB module
module load math/MATLAB/2021a
module load openssl/1.1.1k
export LD_PRELOAD=/usr/lib64/libcrypto.so.1.1
# Run the MATLAB script
srun matlab -nodisplay -nosplash -r execute -logfile execute.out
what can be the potential reason?
  댓글 수: 4
Gaurav
Gaurav 2024년 6월 6일
Also need to mention that i use R2021a version as that is loaded in my HPC
Harald
Harald 2024년 6월 7일
Hi,
that's a big difference, indeed. If it takes hours on HPC, I am surprised that it finishes at all since you have specified a time limit.
If you get error messages, please copy the precise error message you get and the code that throws them. That makes it easier to investigate.
Assuming that we are speaking of run time and not any time that your job may be queued, waiting for resources to become available, I cannot imagine why it would take that long on HPC.
If there are no further ideas here, it may be an idea to reach out to Technical Support: https://www.mathworks.com/support/contact_us.html
Best wishes,
Harald

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Containers에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by