필터 지우기
필터 지우기

Improve GPU utilization during regression deep learning

조회 수: 11 (최근 30일)
Adam Shaw
Adam Shaw 2023년 4월 11일
댓글: Joss Knight 2023년 5월 7일
I'm having trouble improving GPU utilization on, I think, a fairly straightforward deep learning example, and wonder if there is anything clearly being done incorrectly - I'm not an expert on this field, and so am not quite sure exactly what information is most relevant to provide.
I'm using a 3090 GPU, the actual neural net architecture is a few fully-connected layers, each with ~100 neurons. The input data is a featureInput with 3 inputs, and ~20k points, going to one regression output.
The relatively sparse training options are as follows:
options = trainingOptions("adam", ...
MaxEpochs=500, ...
Shuffle="every-epoch", ...
InitialLearnRate=0.001,...
MiniBatchSize=128);
However, when I train the network, I only reach ~10% gpu utilization. I'm assuming that somehow I'm either being bottlenecked by some other step of the process.
My goal ultimately is actually to train the model ~100s of times, each with different choices of initial data. So in that sense, though my input data is relatively small (which perhaps is leading to a bottleneck?), I'm hoping to find some way to paralellize multiple trainings on the same gpu. Is this possible, or is there some other thing I've clearly overlooked when it comes to improving the utilization?
  댓글 수: 1
Joss Knight
Joss Knight 2023년 4월 12일
What is your data? What does the MATLAB Profiler say about where time is being spent? Have you tried to maximize the MiniBatchSize to improve throughput?

댓글을 달려면 로그인하십시오.

답변 (1개)

Aishwarya Shukla
Aishwarya Shukla 2023년 5월 2일
It's hard to say exactly what's causing the low GPU utilization without more information, but here are a few potential issues to consider:
  1. Batch size: With a mini-batch size of 128, it's possible that your GPU is underutilized because the batches are too small to fully occupy the GPU. You could try increasing the batch size to see if that improves GPU utilization.
  2. Data loading: If your data loading process is slow, then the GPU may be waiting for data to arrive during training, leading to low utilization. Consider using data augmentation techniques or pre-loading your data onto the GPU to improve data loading performance.
  3. Model complexity: Your neural network may not be complex enough to fully utilize the GPU. Consider adding more layers or increasing the number of neurons per layer to see if that improves GPU utilization.
  4. Other system constraints: It's possible that your GPU is being bottlenecked by other system constraints, such as CPU or memory bandwidth. You can monitor these metrics during training to see if they are limiting GPU utilization.
Regarding parallel training, it is possible to train multiple models simultaneously on the same GPU using parallel computing libraries such as PyTorch's DistributedDataParallel or TensorFlow's MirroredStrategy. However, keep in mind that training multiple models on the same GPU will increase memory usage, potentially leading to memory errors or slower training times.
  댓글 수: 1
Joss Knight
Joss Knight 2023년 5월 7일
Or perhaps, since you're using MATLAB not python, use MATLAB to train multiple models such as described in our documentation .
Even better use the App Experiment Manager which is specifically designed to help with this.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Image Data Workflows에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by