Volatile GPU-Util is 0% during Neural network training

% Define Input Size
inputSize = [224, 224, 3];
%Data loading
imds = imageDatastore('my path'), ...
    'IncludeSubfolders', true, 'LabelSource', 'foldernames');
imds.ReadSize = 500;
% Split the data first
[imdsTrain, imdsVal] = splitEachLabel(imds, 0.7, 0.3);
dsTrain_p = combine(imdsTrain,imdsTrain);
dsVal_p = combine(imdsVal,imdsVal);
dsTrain = transform(dsTrain_p,@commonPreprocessing);
dsVal = transform(dsVal_p,@commonPreprocessing);
exampleData = preview(dsTrain);
inputs = exampleData(:,1);
responses = exampleData(:,2);
minibatch = cat(2,inputs,responses);
montage(minibatch',Size=[8 2])
title("Inputs (Left) and Responses (Right)")
layers = [
    imageInputLayer(inputSize, 'Name', 'input')
    % Encoder
    convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_1_1')
    batchNormalizationLayer('Name', 'bn_1_1')
    reluLayer('Name', 'relu_1_1')
    convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_1_2')
    batchNormalizationLayer('Name', 'bn_1_2')
    reluLayer('Name', 'relu_1_2')
    maxPooling2dLayer(2, 'Stride', 2, 'Name', 'maxpool_1')
    convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_2_1')
    batchNormalizationLayer('Name', 'bn_2_1')
    reluLayer('Name', 'relu_2_1')
    convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_2_2')
    batchNormalizationLayer('Name', 'bn_2_2')
    reluLayer('Name', 'relu_2_2')
    maxPooling2dLayer(2, 'Stride', 2, 'Name', 'maxpool_2')
    % Decoder
    transposedConv2dLayer(2, 64, 'Stride', 2, 'Cropping', 'same', 'Name', 'trans_conv_1')
    reluLayer('Name', 'relu_3_1')
    convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_3_1')
    batchNormalizationLayer('Name', 'bn_3_1')
    reluLayer('Name', 'relu_3_2')
    convolution2dLayer(3, 64, 'Padding', 'same', 'Name', 'conv_3_2')
    batchNormalizationLayer('Name', 'bn_3_2')
    reluLayer('Name', 'relu_3_3')
    transposedConv2dLayer(2, 32, 'Stride', 2, 'Cropping', 'same', 'Name', 'trans_conv_2')
    reluLayer('Name', 'relu_4_1')
    convolution2dLayer(3, 32, 'Padding', 'same', 'Name', 'conv_4_1')
    batchNormalizationLayer('Name', 'bn_4_1')
    reluLayer('Name', 'relu_4_2')
    convolution2dLayer(3, 3, 'Padding', 'same', 'Name', 'conv_output') 
    regressionLayer('Name', 'regression_output')
];
% Training options
options = trainingOptions('adam', ...
    'InitialLearnRate', 0.0001, ...
    'MaxEpochs', 100, ...
    'Shuffle', 'every-epoch', ...
    'ValidationData', dsVal, ...
    'ValidationFrequency', 5, ... % Depending on your preference
    'Verbose', true, ...
    'Plots', 'training-progress', ...
    'ExecutionEnvironment','multi-gpu',...
    'MiniBatchSize', 1024);
% Train the network
[net, info] = trainNetwork(dsTrain, layers, options);
function dataOut = commonPreprocessing(data)
    dataOut = cell(size(data));
    for col = 1:size(data,2)
        for idx = 1:size(data,1)
            temp = single(data{idx,col});
            temp = imresize(temp,[32,32]);
            temp = rescale(temp);
            dataOut{idx,col} = temp;
        end
    end
end

기태 김 2023년 9월 10일

편집: 기태 김 2023년 9월 11일

MATLAB Online에서 열기

Sorry, the commonPreprocessing function I provided in the previous comment differs from the one I used.

I tried to parfor to enhance the preprocessing performance.

function dataOut = commonPreprocessing(data)
    dataOut = cell(size(data));
    parfor col = 1:size(data,2)
        for idx = 1:size(data,1)
            temp = single(data{idx,col});
            temp = imresize(temp,[32,32]);
            temp = rescale(temp);
            dataOut{idx,col} = temp;
        end
    end
end

When I use the above function, I see the more than 25 MATLAB processes are running.

However, I replaced the function in the example https://mathworks.com/help/deeplearning/ug/image-to-image-regression-using-deep-learning.html. Three Volatile GPU util sitll remained 0% during most of the training time.

Sometimes, it shows 20% to 30%, but that is an extremely short period. I am unsure if the neural network is training using the GPUs.

Following is my code

digitDatasetPath = fullfile("my path");
imds = imageDatastore(digitDatasetPath, ...
    IncludeSubfolders=true,LabelSource="foldernames");
imds.ReadSize = 1024    ;
imds = shuffle(imds);
[imdsTrain,imdsVal,imdsTest] = splitEachLabel(imds,0.95,0.025);
dsTrainNoisy = transform(imdsTrain,@addNoise);
dsValNoisy = transform(imdsVal,@addNoise);
dsTestNoisy = transform(imdsTest,@addNoise);
dsTrain = combine(dsTrainNoisy,imdsTrain);
dsVal = combine(dsValNoisy,imdsVal);
dsTest = combine(dsTestNoisy,imdsTest);
dsTrain = transform(dsTrain,@commonPreprocessing);
dsVal = transform(dsVal,@commonPreprocessing);
dsTest = transform(dsTest,@commonPreprocessing);
dsTrain = transform(dsTrain,@augmentImages);
exampleData = preview(dsTrain);
inputs = exampleData(:,1);
responses = exampleData(:,2);
minibatch = cat(2,inputs,responses);
montage(minibatch',Size=[8 2])
title("Inputs (Left) and Responses (Right)")
imageLayer = imageInputLayer([224,224,3]);
encodingLayers = [ ...
    convolution2dLayer(3,16,Padding="same"), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    maxPooling2dLayer(2,Padding="same",Stride=2), ...
    convolution2dLayer(3,32,Padding="same"), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    maxPooling2dLayer(2,Padding="same",Stride=2), ...
    convolution2dLayer(3,64,Padding="same"), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    maxPooling2dLayer(2,Padding="same",Stride=2), ...
    dropoutLayer(0.5)];
decodingLayers = [ ...
    transposedConv2dLayer(2,64,Stride=2), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    transposedConv2dLayer(2,32,Stride=2), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    transposedConv2dLayer(2,16,Stride=2), ...
    batchNormalizationLayer, ...
    reluLayer, ...
    convolution2dLayer(1,3,Padding="same"), ...
    clippedReluLayer(1.0), ...
    regressionLayer];    
layers = [imageLayer,encodingLayers,decodingLayers];
options = trainingOptions("adam", ...
    MaxEpochs=50, ...
    MiniBatchSize=imds.ReadSize, ...
    ValidationData=dsVal, ...
    ValidationPatience=5, ...
    Plots="training-progress", ...
    OutputNetwork="best-validation-loss", ...
    ExecutionEnvironment="gpu", ...
    Verbose=true);
net = trainNetwork(dsTrain,layers,options);
modelDateTime = string(datetime("now",Format="yyyy-MM-dd-HH-mm-ss"));
save("trainedImageToImageRegressionNet-"+modelDateTime+".mat","net");    
ypred = predict(net,dsTest);
testBatch = preview(dsTest);
idx = 1;
y = ypred(:,:,:,idx);
x = testBatch{idx,1};
ref = testBatch{idx,2};
montage({x,y});
function dataOut = addNoise(data)
    dataOut = data;
    for idx = 1:size(data,1)
       dataOut{idx} = imnoise(data{idx},"salt & pepper");
    end
end
function dataOut = commonPreprocessing(data)
    dataOut = cell(size(data));
    for col = 1:size(data,2)
        for idx = 1:size(data,1)
            temp = single(data{idx,col});
            temp = imresize(temp,[224,224]);
            temp = rescale(temp);
            dataOut{idx,col} = temp;
        end
    end
end
function dataOut = augmentImages(data)
    dataOut = cell(size(data));
    for idx = 1:size(data,1)
        rot90Val = randi(4,1,1)-1;
        dataOut(idx,:) = {rot90(data{idx,1},rot90Val), ...
            rot90(data{idx,2},rot90Val)};
    end
end

Joss Knight 2023년 9월 12일

편집: Joss Knight 2023년 9월 12일

Right, so the parfor is opening a pool with a lot of workers (presumably you have a large number of CPU cores); but unfortunately these are then not used for your preprocessing during training. You need to enable DispatchInBackground as well. Try that. You should have received a warning on the first run, telling you that most of your workers were not going to be used for training.

It does look as though the general problem is that your data preprocessing is dominating the training time meaning only a small proportion of each second is being spent computing gradients, and this is what the Utilization is measuring. If DispatchInBackground doesn't help we can explore further how to vectorize your transform functions; you might also consider using augmentedImageDatastore, which provides most of what you need. Or you could preprocess data on the GPU.

기태 김 2023년 9월 14일

MATLAB Online에서 열기

Thank you for your response, Joss. Using the augmentedImageDatastore with DispatchInBackground improved the situation with a single GPU, but the results were not entirely satisfactory. Consequently, I attempted to use multiple GPUs with DispatchInBackground, but it was unsuccessful. I am aware that I used the combine function, which cannot be partitioned. Additionally, I found this link: Cannot utilize fully all GPUs during network training - MATLAB Answers - MATLAB Central (mathworks.com). It seems that I could resolve the problem by creating my own custom datastore. However, after building my custom datastore, I still encountered an error message: "Input datastore does not support DispatchInBackground with parallel or multi-gpu ExecutionEnvironment."

Here's my Custom Datstore.

I am not sure what is the problem on it.

Thank you very much for your help.

classdef CustomImageDatastore < matlab.io.Datastore & ...
        matlab.io.datastore.Shuffleable & matlab.io.datastore.Partitionable
    properties
        Datastore       % Image Datastore
        NumObservations
        CurrentFileIndex
        ReadSize = 1;  % Default value is 1
    end
    methods
        function ds = CustomImageDatastore(folder)
            % ds = CustomImageDatastore(folder) creates a datastore
            % from the images in folder using the specified read function.
            
            % Create image datastore
            imds = imageDatastore(folder, ...
                'IncludeSubfolders', true, ...
                'LabelSource', 'none', ...
                'ReadFcn', @customReadFcn);
            ds.Datastore = imds;
            
            ds.NumObservations = numel(imds.Files);
            ds.CurrentFileIndex = 1;
        end
        function tf = hasdata(ds)
            tf = ds.CurrentFileIndex <= ds.NumObservations;
        end
        
        function [data,info] = read(ds)
            % [data,info] = read(ds) read one mini-batch of data.
            
            miniBatchSize = ds.ReadSize;
            info = struct;
            
            for i = 1:miniBatchSize
                img = read(ds.Datastore);
                predictors{i,1} = img;
                responses{i,1} = img;
                ds.CurrentFileIndex = ds.CurrentFileIndex + 1;
            end
            
            data = table(predictors, responses);
        end
        
        function reset(ds)
            reset(ds.Datastore);
            ds.CurrentFileIndex = 1;
        end
        
        function subds = partition(ds, numPartitions, index)
            % Create a copy of datastore
            subds = copy(ds);
            subds.Datastore = partition(ds.Datastore, numPartitions, index);
            subds.NumObservations = numel(subds.Datastore.Files);
            subds.reset();
        end
        
        function dsNew = shuffle(ds)
            % dsNew = shuffle(ds) shuffles the files in the datastore.
            
            % Create a copy of datastore
            dsNew = copy(ds);
            dsNew.Datastore = copy(ds.Datastore);
            imds = dsNew.Datastore;
            
            % Shuffle files
            numObservations = dsNew.NumObservations;
            idx = randperm(numObservations);
            imds.Files = imds.Files(idx);
        end
    end
    methods(Access = protected)
        function n = maxpartitions(ds)
            n = ds.NumObservations;
        end
    end
end
function img = customReadFcn(filename)
    % customReadFcn   Read and process an image.
    %
    % img = customReadFcn(filename) reads the image from the specified
    % filename, resizes it to 224x224x3, and normalizes it.
    
    % Read image
    img = imread(filename);
    
    % Resize image to 224x224x3
    img = imresize(img, [224, 224]);
    
    % If the image is grayscale, replicate it to create a 3-channel image
    if size(img, 3) == 1
        img = repmat(img, [1, 1, 3]);
    end
    
    % Normalize image to [0, 1]
    img = double(img) / 255;
end

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

aditi bagora 2023년 9월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2018571-volatile-gpu-util-is-0-during-neural-network-training#answer_1317652

The error message indicates that there is an issue while distributing the data parallelly in the background. To fix the issue, the class "CustomImageDatastore" needs to implement an additional class "matlab.io.datastore.Subsettable." which will support parallel and multi-GPU environment.

For further details, refer the below documentation link.

https://in.mathworks.com/help/deeplearning/ug/datastores-for-deep-learning.html#mw_4e35721b-f50a-412a-af91-c35e606f52cc

Hope this helps you in solving the error.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

기태 김 2023년 9월 26일

Hello Aditi, It works! Thank you very much!

댓글을 달려면 로그인하십시오.

Volatile GPU-Util is 0% during Neural network training

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Volatile GPU-Util is 0% during Neural network training

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기