Matrix algebra very slow on GPU

Bonnie

2013 11월 11

2 답변

조회 수: 3 (30일)

0 개 추천

I've been testing some of the Matlab matrix routines on a TESLA K20 GPU. So far I've found that chol, lu, \, svd, and eig all run significantly slower on the GPU than on the CPU even without including the time to transfer the data to the GPU. Is this a common experience? If not, what might I be doing wrong?

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기

Bonnie 2013년 11월 11일

In single precision the \ function is faster on the GPU than on the CPU but not by much. Also, In single precision the \ function takes twice as long on the GPU as the same calculation in double precision. That makes no sense to me.

Bonnie 2013년 11월 11일

In single precision, the SVD takes twice as long on the GPU as it does on the CPU. It's the same in double precision.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

답변 (2개)

Sean de Wolski 2013년 11월 11일

편집: Sean de Wolski 2013년 11월 11일

0 개 추천

How are you doing the timing?

If upgrading is an option, in R2013b, we released gputimeit which will give better measurements of GPU timing and of course a whole year's worth of other improvements:

http://www.mathworks.com/help/releases/R2013b/distcomp/gputimeit.html

And, as Jill asked: what exactly are you running?

댓글 수: 18
이전 댓글 16개 표시 이전 댓글 16개 숨기기

Bonnie 2013년 11월 13일

편집: Bonnie 2013년 11월 13일

There are several reports in the literature of GPU SVD algorithms that show greater than 7 fold performance over CPUs on commodity GPUs. It is very disappointing that the Matlab algorithm performs so poorly on the Tesla K20.

Matt J 2013년 11월 13일

편집: Matt J 2013년 11월 13일

You should cite that literature. Maybe the developers will look at it...

In case it's worth mentioning, I seem to be seeing the same speeds on the GTX 580. I suppose it's suspicious that the Tesla K20 doesn't offer more speed than a 500-series GPU.

댓글을 달려면 로그인하십시오.

Joss Knight 2016년 4월 27일

MATLAB Online에서 열기

0 개 추천

It might be worth answering this question for posterity.

The questioner it seems was testing at least the linear solves with a very unusual system, many right-hand-sides but only one column in the system matrix. Since this is not a typical circumstance, MLDIVIDE is not optimised for it - to get an accurate answer it has to account for possible poor conditioning by using a QR factorisation, and this is less parallelisable than other approaches to solving these equations, one of which is given in the comments to Sean's answer. Another is to solve the normal equations:

% Solve A*X = B for X
R = chol(A'*A);
X = R\(R'\(A'*B));

For SVD and EIG it is possible the same situation applies, perhaps the questioner was carrying out the SVD on a tall skinny matrix. However, it is true that these functions do not parallelise well. I found that a 2000x2000 random matrix could be factored faster on my K20 than on the CPU, but the performance tails off for larger matrices, presumably due to resource contention on the device. It does make a difference whether you ask for all three factors or just the singular values (or, in the case of EIG, whether you ask for eigenvectors or just the eigenvalues).

For LU on a general matrix and CHOL on a symmetric matrix I found my K20 was much faster than the CPU, so it would be necessary to see exactly what the questioner was doing when they were timing these functions.