numerical instabilites for GPU results
이전 댓글 표시
I run this code
T=randn(10000,64);
data=randn(1000,64,10);
Tg=gpuArray(T);
datag=gpuArray(data);
res=zeros(10000,1000);
resg=gpuArray(res);
for i=1:10
res=res+T*data(:,:,i)';
end
for i=1:10
resg=resg+Tg*datag(:,:,i)';
end
resg=gather(resg);
norm(res-resg,'fro')/norm(res,'fro')
where I would expect "res" (CPU comptuted) and "resg" (GPU computed) to be the same, but they are not.
I am running this on a Tesla Card, i.e.
gpuDevice
ans =
parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
Name: 'Tesla C1060'
Index: 1
ComputeCapability: '1.3'
SupportsDouble: 1
DriverVersion: 3.2000
MaxThreadsPerBlock: 512
MaxShmemPerBlock: 16384
MaxThreadBlockSize: [512 512 64]
MaxGridSize: [65535 65535]
SIMDWidth: 32
TotalMemory: 4.2948e+09
FreeMemory: 4.0671e+09
MultiprocessorCount: 30
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Methods, Events, Superclasses
댓글 수: 3
James Tursa
2011년 5월 18일
I would presume that this is simply the difference in how the BLAS matrix multiply routines are coded on the GPU vs CPU (different blocking, etc). What kind of differences are you seeing?
Felix
2011년 5월 18일
Gaszton
2011년 5월 19일
I runned the code on my gt425m:
ans =
2.4946e-016
채택된 답변
추가 답변 (1개)
Edric Ellis
2011년 5월 19일
I've just run this using R2011a on Linux and Windows using C1060 cards, and in each case the final "norm" calculation gives a result of around 2e-16. So, this should work! Could you post the output of running
parallel.internal.gpu.CUDADriverVersion
and
ver distcomp
댓글 수: 4
Felix
2011년 5월 19일
Edric Ellis
2011년 5월 20일
Very strange, I've run on a whole series of different x64 Linux machines here and not seen the problem. That driver is slightly older than the ones we use here, perhaps you could try updating. Also, do you know if it's the matrix multiplication that is introducing the problem?
Felix
2011년 5월 20일
Sean de Wolski
2012년 3월 14일
Copying Felix' first post with license censored:
Here it is:
parallel.internal.gpu.CUDADriverVersion
ans =
260.19.26
ver distcomp
-------------------------------------------------------------------------------------
MATLAB Version 7.12.0.635 (R2011a)
MATLAB License Number: ############
Operating System: Linux 2.6.30.10-105.2.23.fc11.x86_64 #1 SMP Thu Feb 11 07:06:34 UTC 2010 x86_64
Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode
-------------------------------------------------------------------------------------
Parallel Computing Toolbox Version 5.1 (R2011a)
카테고리
도움말 센터 및 File Exchange에서 Parallel Computing Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!