I have now partially solved the problem by using spmd instead of parfor. This way, I can manually slice my variables. However, the memory still isn't cleared after the work is done.
Memory management when using GPU in parfor-loop?
조회 수: 6 (최근 30일)
이전 댓글 표시
Hi,
I have a machine with four CPU-cores and one GPU. In my code, there's a parfor loop which performs computations on GPUarrays, i.e. four CPU-workers are sharing one GPU. I've already determined that this is faster than performing all computations on the CPU, or using the GPU without the parfor loop.
My problem is that the GPU memory fills up inexplicably when using parfor. There's no problem when using regular "for".
Schematically, my code looks like this:
gpu = gpuDevice;
bigGpuArray = gpuArray(bigArray);
% Size of bigArray is something like 1000 x 1000 x 1000.
n = size(bigArray, 3);
gpuMemBeforeLoop = gpu.AvailableMemory/gpu.TotalMemory;
for i = 1:n
bigGpuArray(:,:,i) = subFunction(bigGpuArray(:,:,i));
% subFunction is a function that is faster when run on the GPU.
gpuMemDuringLoop = gpu.AvailableMemory/gpu.TotalMemory;
disp(gpuMemDuringLoop);
end
newBigArray = gather(bigGpuArray);
gpuMemAfterLoop = gpu.AvailableMemory/gpu.TotalMemory;
When using a normal for loop, I can see from gpuMemBeforeLoop, gpuMemDuringLoop and gpuMemAfterLoop that the memory usage stays constant during the loop, as expected.
However, if I replace the "for" by "parfor", then the memory usage increases linearly with the number of loop iterations and stays high until I call
pctRunOnAll reset(gpuDevice);
I'm surprised by this because I thought that the parallel workers could share the GPU memory smartly: I hoped that each worker would only have to receive a handle/pointer to the data that's already on the GPU. Instead, it looks as if each worker creates a separate copy of the GPU data. Even worse, the parallel workers seem to forget to delete these copies after the parfor loop (gpuMemAfterLoop in my example above shows higher memory use in the parfor than in the for case).
Is this behavior expected? Can I change my approach to avoid this memory leak?
Thanks, Matthias
댓글 수: 0
답변 (2개)
Matt J
2014년 10월 27일
편집: Matt J
2014년 10월 27일
Instead, it looks as if each worker creates a separate copy of the GPU data.
This part might not be so surprising. Parfor always makes duplicates of data needed by the workers, including data pointed to by handle objects. The only exceptions are sliced variables. I guess this is evidence that gpuArrays can't be sliced, but instead behave like handles.
Even worse, the parallel workers seem to forget to delete these copies after the parfor loop (gpuMemAfterLoop in my example above shows higher memory use in the parfor than in the for case).
That does seem like a bug.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 GPU Computing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!