Multiple GPUs perform slower than single GPU to train a semantic segmentation network

Question

Matheus Ferreira 2020년 3월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/509309-multiple-gpus-perform-slower-than-single-gpu-to-train-a-semantic-segmentation-network

댓글: Preetham Manjunatha 2022년 7월 30일

I have to my disposal two NVIDIA Tesla V100-16Gb GPUs to train a deep neural network model for semantic segmentation. I am training the Inception-ResNet-v2 network with the DeepLab v3+ architecture. I am using the randomPatchExtractionDatastore to feed the network with training data. When I set the 'ExecutionEnvironment' option to multi-gpu the processing time for each iteration is higher than using only gpu, that is a single GPU. I am working in Windows 10 with MatLab 2019b. What should I do to use the full potential of both GPUs for training? Bellow an example of my code

pathSize = 512;
imageSize = [pathSize pathSize 3];
numClasses = 6
lgraph = deeplabv3plusLayers(imageSize, numClasses, 'inceptionresnetv2','DownsamplingFactor',16);
MaxEpochs=10;
PatchesPerImage=1500;
MiniBatchSize=20;
options = trainingOptions('sgdm', ...
    'ExecutionEnvironment','gpu',...
    'LearnRateSchedule','piecewise',...
    'LearnRateDropPeriod',3,...
    'LearnRateDropFactor',0.2,...
    'Momentum',0.9, ...
    'InitialLearnRate',0.03, ...
    'L2Regularization',0.001, ...
    'MaxEpochs',MaxEpochs, ...  
    'MiniBatchSize',MiniBatchSize, ...
    'Shuffle','every-epoch', ...
    'CheckpointPath', tempdir, ...
    'VerboseFrequency',2,...
    'Plots','training-progress',...
    'ValidationPatience', 4);
imageAugmenter = imageDataAugmenter( ...
    'RandRotation',[-20,20], ...
    'RandXTranslation',[-10 10], ...
    'RandYTranslation',[-10 10]);
% Random patch extraction datastore
PatchSize=[pathSize pathSize];
dsTrain = randomPatchExtractionDatastore(imds,pxds,PatchSize,'PatchesPerImage',PatchesPerImage,'DataAugmentation',imageAugmenter);
[net, ~] = trainNetwork(dsTrain,lgraph,options);  

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Joss Knight 2020년 3월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/509309-multiple-gpus-perform-slower-than-single-gpu-to-train-a-semantic-segmentation-network#answer_419304

On Windows, due to GPU communication issues on that platform, it is difficult to get any benefit from multi-GPU training. This will be improved in a future release. Try the following:

Maximize the patches per image and the MiniBatchSize
Increase the learn rate to match the number of GPUs

If moving to Linux is an option for you that is definitely the way to go.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Joss Knight 2021년 3월 23일

Yes, this situation has been much improved since R2020a, although you still cannot quite get as good performance as Linux.

WSL works well and is an excellent solution, but you do need to update to Windows Insider. Follow instructions here.

Preetham Manjunatha 2022년 7월 30일

I still notice the same issue in classification problem using 2 GPUs (RTX 2080TI - VRAM 11 GB). It is super slow (takes 8 seconds for 1 iteration) on Ubuntu 22.04, MATLAB 2022a. It has taken 1228 minutes for 11837 iterations or 19 epochs (batch size of 256 images). There are about 159,500 images for training. In this speed, I expect the training to finish after a week or so!

댓글을 달려면 로그인하십시오.

Answer 2

junnet 2021년 3월 24일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/509309-multiple-gpus-perform-slower-than-single-gpu-to-train-a-semantic-segmentation-network#answer_656348

Thank! That is great news. Also thanks for the link.

Now if only I can score a pair or trio of RTX 3060s somewhere, anywhere ...

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Joss Knight 2021년 3월 25일

If you work out how, let me know!

댓글을 달려면 로그인하십시오.

Multiple GPUs perform slower than single GPU to train a semantic segmentation network

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Multiple GPUs perform slower than single GPU to train a semantic segmentation network

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기