How to maximize MATLAB's GPU utility?
이전 댓글 표시
I've surveyed my GPU's performance against itself and the CPU for varying matrix sizes, and found the opposite of what most GPU literature suggests: the GPU's computing advantage diminishes with array size. Code, results, & specs shown below. Noteworthy observations: . (1) GPU utility remains sub-10%, according to Task Manager (2) ~(50%, 20%) = (RAM, CPU) usage for large (K > 9000) array (3) Considerable speed ratio drop's observed for around K > 8000 (4) Splitting the K > 8000 (= 9000) Xga matrix into four increases vectorized speed two-fold (5) My GPU ranks far higher among GPUs than my CPU (#24 vs. #174); it thus seems an on-par CPU would outperform the GPU for larger arrays (6) Last pic's GPU vs. CPU benchmark supports (5); GPU isn't as vastly superior as expected
What's the culprit - is my code, or MATLAB, or hardware configuration under-utilizing the GPU? How to find out and resolve it? m-files: testrun.zip (testrun compares performance for a single K; testrun0 for multiple)
%% CODE: centroid indexing in K-means algorithm
% size(X) = [16000, 3]
% size(c) = [K, 3]
% Xsg = single(X); csg = single(c);
% Xga = gpuArray(Xsg); cga = gpuArray(csg);
% Speed ratio = t2/t1, if t2 > t1 - else, t1/t2
%% TIMING
f1 = fasterFunction(...); % e.g. vectorized(Xga, cga, K, m)
f2 = slowerFunction(...); % e.g. forVectorized(X, c, m)
t1 = gputimeit(f1) % OR timeit(f1) for non-GPU arrays
t2 = timeit(f2) % OR gputimeit(f2) for GPU arrays
%% FUNCTIONS
function out = vectorized(X, c, K, m)
[~, out] = min(reshape(permute(sum((X-permute(c,[3 2 1])).^2,2), ...
[1 2 3]),m,K),[],2);
end
function out = forVectorized(X, c, m)
out = zeros(m,1);
for j=1:m
[~,out(j)] = min(sum(((X(j,:))'-c').^2));
end
end
function out = forFor(X,c,K,m)
out = zeros(m,1); idxtemp = zeros(K,1);
for i=1:m
for j=1:K
idxtemp(j) = sum((X(i,:)-c(j,:)).^2,2);
end
[~, out(i)] = min(idxtemp);
end
end
%% PLOTS
% GPU vectorized = vectorized(Xga, cga, K, m) for varying K, timed w/ gputimeit
% CPU vectorized = vectorized(Xsg, csg, K, m) for varying K, timed w/ timeit
% for-loop = forFor(Xsg, csg, K, m) for varying K, timed w/ timeit






댓글 수: 5
It is hard to follow your descriptions. "GPU utility remains sub-10%", "My GPU ranks far higher among GPUs than my CPU (#24 vs. #174)", "Last pic's GPU vs. CPU benchmark supports (5)" - this might be clear for you, but it requires a lot of educated guessing for the readers. "f1 = fasterFunction(...)"? Please post running code. It is not clear, which code creates which diagram. Most of all I do not understand the actual question: "maximize MATLAB's GPU utility?"
What do you do? Which problem do you want to solve? What is your question?
idxtemp(j) = sum((X(i,:)-c(j,:)).^2,2);
The row-wise processing wastes ime compared to a columnwise processing in the CPU. Transpose the inputs to avoid this.
John Muradeli
2019년 3월 20일
Jan
2019년 3월 20일
@John: Thanks for clarifying the question a little bit.
"those able to respond should understand" - yes, of course, we agree here: they should. I've mentioned, that at least for me "maximize MATLAB's GPU utility" is too vague to be answered. Why not decreasing the number of readers who do not understand the question?
You've spent some time to produce the nice diagrams. If you post the complete code instead of letting the readers guess it based on some rough comments, the members of the forum can run it on their machines and maybe confirm your observations.
"the GPU's computing advantage diminishes with array size" - doesn't the last diagram "Single precision matrix-matrix multiply" tell the opposite?
John Muradeli
2019년 3월 20일
John Muradeli
2019년 3월 20일
편집: John Muradeli
2019년 3월 20일
답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 GPU Computing in MATLAB에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!