gpuDevice command very slow

Question

0 개 추천

I am running CUDA kernels using the parallel computing toolbox and r2012a. Recently upgraded to a 600 series (Kepler) gpu. To setup the CUDA kernel we extract the maximum threads per block using: gpu_han=gpuDevice(1); k = parallel.gpu.CUDAKernel('gpu_tfm_linear_arb.ptx', gpu_tfm_linear_arb.cu'); k.ThreadBlockSize = gpu_han.MaxThreadsPerBlock;

This is now executing very slowly (order 2mins). If I specify the threadblocksize manually to the max of the card (1024 in this case), it executes in 0.1 s.

This used to run quickly with a 400 series card. Any help gratefully received

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

James Lebak 2013년 6월 17일

3 개 추천

MATLAB R2012a doesn't include code for the Kepler series GPUs. This means that the very first time you call any GPU command after upgrading to a Kepler card, be it gpuDevice or something else, MATLAB will wait for the CUDA driver to just-in-time compile all the PTX code that ships with MATLAB for the Kepler device. This behavior allows MATLAB to work with cards that weren't available when that version of MATLAB was released.

The good news is that this should be a one-time hit. The next time you start MATLAB the JIT'd code should be cached and you should not get the performance hit.

The other thing to point out is that you should consider recompiling your CUDA kernel and producing PTX for the new card, if you haven't already done so, or you may see a similar one-time hit the first time you launch your own kernel for the same reason.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Andrei Pokrovsky 2016년 9월 15일

편집: Andrei Pokrovsky 2016년 9월 15일

3 개 추천

Try setting these env vars:

export CUDA_CACHE_MAXSIZE=2147483647

export CUDA_CACHE_DISABLE=0

This cured the problem on my GTX1080.

https://devblogs.nvidia.com/parallelforall/cuda-pro-tip-understand-fat-binaries-jit-caching/

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 3

Anthony 2013년 6월 17일

0 개 추천

Thanks that is helpful. I don't have write access to the program data directory, is there a way to alter where the cache of this data is stored?

댓글 수: 2
없음 표시 없음 숨기기

Edric Ellis 2013년 6월 18일

The cache is not stored where the program lives, this page from NVIDIA has all the gory details, including this:

on Windows, %APPDATA%\NVIDIA\ComputeCache,
on MacOS, $HOME/Library/Application\ Support/NVIDIA/ComputeCache,
on Linux, ~/.nv/ComputeCache

Anthony 2013년 7월 12일

Thanks a lot, don't know how I missed that. Problem fixed.

댓글을 달려면 로그인하십시오.

gpuDevice command very slow

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기

카테고리

제품

태그

Community Treasure Hunt

gpuDevice command very slow

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

추가 답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2 없음 표시 없음 숨기기

카테고리

제품

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기