Reset GPU & Clear its Memory
조회 수: 40 (최근 30일)
이전 댓글 표시
I'm running simulations and computations in MATLAB using some reasonably big data sets, and the bulk of the work is done on the GPU. I can only get through about a third of the work I need to do before I receive an error saying the GPU memory is full:
Warning: An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_OUT_OF_MEMORY
I've had this problem for a while, and have tried to get around it by resetting the GPU between each simulation, using any and all of the following:
gpuDevice;
gpuDevice(1);
reset(gpuDevice(1));
wait(gpuDevice(1));
None of these work, neither on their own or combined, nor do they work if I attempt them after my simulations have crashed out. There seems to be no effective way to reset/flush the GPU other than a reboot of my computer.
I'm getting work done this way, but it's slow, and annoying, and means I can't just leave my code running over the weekend as I'd like to - only half of it gets done. I'm sure there must be a way to reset the GPU in MATLAB, and if one of the methods I've tried is correct, what am I doing wrong?
Any ideas?
EDIT: Problem occurs on both R2016a and the R2017a Prerelease.
댓글 수: 4
Joss Knight
2017년 7월 20일
I think you're going to have to try to create a minimal reproduction that is a condensed version of your code, otherwise it's impossible to diagnose. Also see below for advise about monitoring your memory usage.
답변 (2개)
Joss Knight
2017년 1월 23일
Presumably your simulations are adding results continually to some output variables, which are getting larger and larger. Try gathering your results back to the CPU so that you're not clogging up GPU memory with data that isn't being used for computation any more.
댓글 수: 3
Joss Knight
2017년 7월 20일
No, MATLAB releases variables as soon as they are no longer referenced. But it's common for users to run scripts rather than functions, and to aggregate results into a big output array that sits in their MATLAB workspace, e.g.
results(end+1,:) = myNewResults;
Why don't you run your simulation and monitor GPU memory in a separate terminal or command window using nvidia-smi, something like:
nvidia-smi -l 1 -q -d MEMORY
If memory usage is continually going up then you've got some sort of problem with your simulation not releasing variables.
Vitaly Bur
2020년 10월 29일
I have a same problem with clear GPU memory: After executing this code, the GPU memory is use by 2 GB. Only the D matrix in GPU memory...
A=fix(gpuArray(rand(1,1000))*99)+1;
B=fix(gpuArray(rand(1,1000))*99)+1;
C=gpuArray(rand(100000,100));
E=C(:,A);
F=C(:,B);
D=E.*F;
clear E F C A B
However, if I execute this code.
D=gpuArray(rand(100000,1000));
There will also be a D matrix (same size) in GPU memory, but now it only use 1 GB of GPU memory. Why is there a difference? and how to clear the memory in the first variant?
Remi D
2017년 7월 19일
I also think there is a problem. I as soon as I call a cuda mex file, running reset(gpuDevice) would throw an error.
Error using parallel.gpu.CUDADevice/reset
An unexpected error occurred during CUDA execution. The CUDA error was:
all CUDA-capable devices are busy or unavailable
If I don't try to call reset, I can call again the mex function and it works fine. But as soon as I use reset, the only way to use the GPU is to restart Matlab.
I guess I have to go back to C and leave Matlab in the drawer when I need parallel computing :(
댓글 수: 1
Joss Knight
2017년 7월 20일
편집: Joss Knight
2017년 7월 20일
If you are using custom MEX functions then we'd have to know more about what they're doing to diagnose. Are you storing state, GPU memory, cufft plans? Are you spinning off threads that are using the GPU? You may need to register a listener to the GPUDeviceManager's DeviceDeselecting event (see the documentation here) in order to respond to a call to reset by tidying up your state or waiting for threads to finish.
Another very common scenario is that your custom MEX function is erroring, perhaps seriously, and you are not checking or clearing up that error. If the next thing you do on the GPU is to call reset, than that will be the first place to detect and report the error. So ensure your mex function ends with something like
cudaDeviceSynchronize();
auto err = cudaGetLastError();
if (err != cudaSuccess) {
mexPrintf("CUDA error: %s\n", cudaGetErrorString(err));
}
참고 항목
카테고리
Help Center 및 File Exchange에서 GPU Computing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!