numerical instabilites for GPU results

Question

0 개 추천

I run this code

T=randn(10000,64);
data=randn(1000,64,10);
Tg=gpuArray(T);
datag=gpuArray(data);
res=zeros(10000,1000);
resg=gpuArray(res);
for i=1:10
    res=res+T*data(:,:,i)';
end
for i=1:10
    resg=resg+Tg*datag(:,:,i)';
end
resg=gather(resg);
norm(res-resg,'fro')/norm(res,'fro')

where I would expect "res" (CPU comptuted) and "resg" (GPU computed) to be the same, but they are not.

I am running this on a Tesla Card, i.e.

gpuDevice

ans =

parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
                    Name: 'Tesla C1060'
                   Index: 1
       ComputeCapability: '1.3'
          SupportsDouble: 1
           DriverVersion: 3.2000
      MaxThreadsPerBlock: 512
        MaxShmemPerBlock: 16384
      MaxThreadBlockSize: [512 512 64]
             MaxGridSize: [65535 65535]
               SIMDWidth: 32
             TotalMemory: 4.2948e+09
              FreeMemory: 4.0671e+09
     MultiprocessorCount: 30
             ComputeMode: 'Default'
    GPUOverlapsTransfers: 1
  KernelExecutionTimeout: 0
        CanMapHostMemory: 1
         DeviceSupported: 1
          DeviceSelected: 1
Methods, Events, Superclasses

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Felix 2011년 5월 18일

There are large numerical differences, i.e.norm(res-resg,'fro')/norm(res,'fro') returns something on the order of 1e234. These are clearly no subtle BLAS differences. I suspect there is something wrong when moving data between the CPU and the GPU?

Gaszton 2011년 5월 19일

I runned the code on my gt425m:

ans =

2.4946e-016

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Felix 2011년 5월 20일

0 개 추천

I upgraded to the latest drivers

270.41.19

, which seems to have fixed the problem.

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

James Tursa 2011년 5월 20일

FYI, it is bad form to accept your own answer when Edric was the one that suggested updating your drivers.

댓글을 달려면 로그인하십시오.

Answer 2

Edric Ellis 2011년 5월 19일

MATLAB Online에서 열기

2 개 추천

I've just run this using R2011a on Linux and Windows using C1060 cards, and in each case the final "norm" calculation gives a result of around 2e-16. So, this should work! Could you post the output of running

parallel.internal.gpu.CUDADriverVersion

and

ver distcomp

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

Felix 2011년 5월 20일

what is your driver version?

When I run this:

T=randn(10000,64);

A=randn(1000,64);

Ag=gpuArray(A);

Tg=gpuArray(T);

res=gather(Tg*Ag');

norm(res-T*A','fro')/norm(T*A','fro')

I get ~1e-16 at first and ~0.05 on repeated runs, so there is a problem in the matrix mult.

Sean de Wolski 2012년 3월 14일

Copying Felix' first post with license censored:

Here it is:

parallel.internal.gpu.CUDADriverVersion

ans =

260.19.26

ver distcomp

-------------------------------------------------------------------------------------

MATLAB Version 7.12.0.635 (R2011a)

MATLAB License Number: ############

Operating System: Linux 2.6.30.10-105.2.23.fc11.x86_64 #1 SMP Thu Feb 11 07:06:34 UTC 2010 x86_64

Java VM Version: Java 1.6.0_17-b04 with Sun Microsystems Inc. Java HotSpot(TM) 64-Bit Server VM mixed mode

-------------------------------------------------------------------------------------

Parallel Computing Toolbox Version 5.1 (R2011a)

댓글을 달려면 로그인하십시오.

numerical instabilites for GPU results

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

카테고리

제품

태그

Community Treasure Hunt

numerical instabilites for GPU results

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 4 이전 댓글 2개 표시 이전 댓글 2개 숨기기

카테고리

제품

태그

참고 항목

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기