Perfomance Loss of Matrix-Vector Multilplication on GPU with Array Indexing

조회 수: 2 (최근 30일)
Hi,
I have a large matrix A and a vector B. I want to do a partial multiplication on GPU using array indexing but the peformance is much lower than doing a full A*B. Below is a simple example of what I am trying to do:
A = rand(20000,'gpuArray');
B = rand(20000,1,'gpuArray');
C = A(8001:18000,1:end)*B;
GPU Device: Tesla V100
MATLAB 2020a
Any suggestion on how to improve the performance? Thank you.

채택된 답변

Edric Ellis
Edric Ellis 2020년 4월 30일
Unfortunately, the expression A(8001:18000,:) requires a strided memory copy. Matrices in MATLAB (even on the GPU) are stored in column-major format, so picking out only certain rows is much less efficient than picking out only certain columns.
There's a trick you can use though that takes advantage of the fact that gpuArray matrix multiplication is optimised for the transposed-times case. Try instead pre-transposing A (this is relatively expensive, but perhaps you can do it only once) and then doing:
A(:, 8001:18000).' * B;
This uses the much-faster indexing pattern, and is about ~2x faster on my GPU.
  댓글 수: 5
Edric Ellis
Edric Ellis 2020년 5월 4일
Strange, I just tried on a WIN64 machine here with a V100, and got the following result:
t1 =
1.6677e-04
t2 =
4.4944e-04
(This was using R2020a).
Afshin Ahmadi
Afshin Ahmadi 2020년 5월 4일
I tried again and it seems your solution is quite fast when the block size is small, which is exactly what I need. Thank you so much for the help! I will just include some information here for the people who are interested in doing the same thing.
A = gpuArray.rand(20000);
B = gpuArray.rand(20000,1);
At = A.';
t1 = gputimeit(@() At(:,500:2000).'*B)
t2 = gputimeit(@() At(:,500:5000).'*B)
t3 = gputimeit(@() At(:,500:10000).'*B)
t4 = gputimeit(@() A(500:2000,:)*B)
t5 = gputimeit(@() A(500:5000,:)*B)
t6 = gputimeit(@() A(500:10000,:)*B)
t7 = gputimeit(@() A*B)
Execution time:
t1 = 4.4423e-04
t2 = 0.0010
t3 = 0.0020
t4 = 0.0035
t5 = 0.0051
t6 = 0.0076
t7 = 0.0044
(MATLAB R2020a, Tesla V100, Linux)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Programming에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by