Increase GPU Throughput During training

Question

John 2018년 8월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415570-increase-gpu-throughput-during-training

댓글: Joss Knight 2018년 8월 21일

채택된 답변: John

MATLAB Online에서 열기

I have a single Tesla GP100 GPU with 16GB of RAM. When I'm training my neural network, I have two issues

Using a imageDatastore spends a HUGE amount of time doing an fread (I'm using a custom ReadFcn because my data is asymmetric and that seemed easiest). I am able to overcome this by reading all the data into memory prior to training but that will not scale.
During training I am only using 2.2GB of the 16GB available on the GPU. When I use the exact same network and data with TensorFlow, I use all 16GB. This is the case even if I preload all the data above into memory. I'm guessing that is because TensorFlow is "queuing up" batches and MATLAB is not. Is there a way to increase this?

Here is my minimum example code:

 function net = run_training_public(dims, nbatch, lr, nepoch)
    % Load Data
    ds = imageDatastore('./data/set3', 'IncludeSubfolders',true,...
                        'ReadFcn',@(x)reader_public(x,dims),...
                        'LabelSource','foldernames',...
                        'FileExtensions','.dat');
    % load neural network structure
    network = cnn1;
    % Setup options for training and execute training
    options = trainingOptions('adam','MaxEpochs',nepoch,'MiniBatchSize',...
                              nbatch,'Shuffle','every-epoch',...
                              'InitialLearnRate',lr,...
                              'ExecutionEnvironment','gpu','Verbose',true);
    net = trainNetwork(ds,network,options);
 end
 function data = reader_public(fileName, dims)
    f=fopen(fileName,'r');
    data  = fread(f,[dims(2) dims(1)],'*int16').';
    fclose(f);   
 end

댓글 수: 2
없음 표시없음 숨기기

Kevin Chng 2018년 8월 21일

Will it help by activating the parallel computing?

https://www.mathworks.com/help/nnet/ug/neural-networks-with-parallel-and-gpu-computing.html

John 2018년 8월 21일

Good thought. My understanding is that you must have an individual GPU for each CPU core. When I enable ‘parallel’ instead of ‘gpu’ as the ‘ExecutionEnvironment’ it says such. Unless I am misunderstanding something.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

John 2018년 8월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415570-increase-gpu-throughput-during-training#answer_333588

MATLAB Online에서 열기

I solved my problem with help from Joss. I had to create a custom image format via the imformats function:

https://www.mathworks.com/help/matlab/ref/imformats.html

I used my own binary file reader based on looking at the built in png reader functions and my code above. It is a huge speedup, and now I am able to eliminate the ReadFcn but still have my custom reader. The only issue I didn't solve was how to pass the dims variable into the reader instead of hard coding it.

 function net = run_training_public(nbatch, lr, nepoch)
    % Add custom image type to imread registry
    create_custom_image_format()
    % Load Data
    ds = imageDatastore('./data/set3','IncludeSubfolders',true,...
                        'LabelSource','foldernames',...
                        'FileExtensions','.dat');
    % load neural network structure
    network = cnn1;
    % Setup options for training and execute training
    options = trainingOptions('adam','MaxEpochs',nepoch,'MiniBatchSize',...
                               nbatch,'Shuffle','every-epoch',...
                              'InitialLearnRate',lr,...
                              'ExecutionEnvironment','gpu','Verbose',true);
    net = trainNetwork(ds,network,options);
 end
 function create_custom_image_format()
    fmts = imformats; % don't add if already in registry
    if ~any(contains([fmts.ext],'dat'))
        out.ext = 'dat';
        out.isa = @isdat;
        out.info = [];
        out.read = @custom_image_reader;
        out.write = [];
        out.alpha = 0;
        out.description = 'Custom Data Format';
        imformats('add',out);
    end
 end
 function tf = isdat(filename)
    % Returns true if file is type .dat
    [~,~,extn] = fileparts(filename);
    tf = strcmp(extn,'.dat');
 end
 function [X, map] = custom_image_reader(filename)
    dims = [$m $n]; % <-HARD CODE DIMENSIONS OF DATA HERE
    f=fopen(filename,'r');
    X = reshape(fread(f,'*int16'),dims(2), dims(1)).';
    fclose(f);
    map = [];
 end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Joss Knight 2018년 8월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/415570-increase-gpu-throughput-during-training#answer_333525

편집: Joss Knight 2018년 8월 21일

MATLAB Online에서 열기

The problem with using a ReadFcn is that it prevents MATLAB from being able to do I/O in batch and in the background, because it has to use the MATLAB compute thread. If you just need to resize your images you should use augmentedImageDatastore instead. You could see as much as 100x improvement in performance.

As for your second point, I wouldn't worry about it. TensorFlow basically preallocates all of your GPU memory up front whether you need it or not; MATLAB takes the approach that this is antisocial to other applications. It only reserves memory as you use it, up to a quarter of total memory. The rest it releases back to the system if it doesn't need it any more. If you increase your MiniBatchSize up to the point where you start to run out of memory, you should be using the GPU's memory with good efficiency.

Sometimes you can get better performance by allowing MATLAB to reserve more memory. You can try this using the following command:

feature('GpuAllocPoolSizeKb', intmax('int32'));

댓글 수: 2
없음 표시없음 숨기기

John 2018년 8월 21일

편집: John 2018년 8월 21일

Joss, Very helpful. Thanks!

I'm not actually trying to reshape my images, in fact, they are not images at all. They are rectangular shaped data n x m where n << m. I was assuming (as you pointed out) that MATLAB had a better way to handle my data if I could treat it as an image. So would there be a better way? When I just do the typical imageDatastor without a custom ReadFcn it throws the error that it is "Unable to determine the file format." My files are just a binary sequence of 16 bit integers. Should I be using a different dataStore? the fileDatastor seems to require a ReadFcn, but maybe the TallDatastore? I should note that my arrays are not so tall that they would use anywhere near all my memory. All arrays are < 50kB.

I'm assuming that creating my own imformats would have the same issue of not allowing background processes?

Edit: I solved my problem with your help. I had to create a custom image format via the imformats function, where I used my own binary file reader based on looking at the built in png reader functions and my code above. It is a huge speedup, and now I am able to eliminate the ReadFcn but still have my custom reader. I'm actually a little surprised that worked so well. I will post it to the main thread as the answer.

Joss Knight 2018년 8월 21일

Interesting. You were essentially using ReadFcn exactly as intended, to support an unsupported file format. But it still has the effect of preventing imageDatastore from performing mass prefetching in background threads because it needs to use the MATLAB interpreter to read each file. I would have suggested creating a custom MiniBatchable Datastore so you can at least load a whole batch of data at once (and this also has the option of using a parallel pool to load in the background, which should mean you can hide all the I/O). However, it looks like you found a solution just as good.

댓글을 달려면 로그인하십시오.

Increase GPU Throughput During training

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Increase GPU Throughput During training

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

추가 답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기