- For-loop time = 0.010799
- Vectorized CPU time = 0.012376
- Vectorized GPU time = 0.069613
Make double loop run on GPU
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi
I have the following code that I want to run somehow on the GPU.
In short, I have a 2D array (an MRI image). Starting from a user-defined centre-point, the the image is divided into radial concentric "rings" of specified width (f.eks. 0.1mm) and total number of pixels and total signal intensity (pixel value) in each ring are recorded in separate output arrays (called Profile and SumBinSignal). The code below does this using 2 for loops.
for X= Xmin:1:Xmax
for Y= Ymin:1:Ymax
if ~isnan(Image(Y,X)) %NaN pixels are just ignored
Dist = Resolution * ( (X-CentX)^2 + (Y-CentY)^2 )^0.5; %distance of point to centre measured in mm
TargetBin= ceil(Dist * BinDensity); %convert distance from mm to "bins/rings"
if (TargetBin>=1) && (TargetBin<=MaxTargetBin)
Profile(TargetBin)= Profile(TargetBin)+1; %record number of pixels in bin/ring
SumBinSignal(TargetBin)= SumBinSignal(TargetBin) + Image(Y, X); % record total signal in ring/bin
end
end
end
end
Now I know that in order to run on the GPU, I either must vectorize the calculations or use "arrayfun". But I really cant manage to do either. I maganged to vectorize the calculations only partly with no great speed advantage. The arrayfun approach I dont know even how to start with.
Help will be highly appreciated
best regards, Rozh
댓글 수: 0
채택된 답변
Bruno Luong
2023년 9월 6일
편집: Bruno Luong
2023년 9월 6일
GPU is the slowest on my PC. I knew for-loop is now away fast, but I'm still amazed by how fast it is. There is really a remarkable progress made by TMW on Excution Engine last few releases.
Code:
N = 1000;
Image = peaks(N);
Xmin = 1; Xmax = N;
Ymin = 1; Ymax = N;
CentX = 0;
CentY = 0;
Resolution = 1/N;
BinDensity = 100;
MaxTargetBin = 100;
tic
Profile = zeros(MaxTargetBin,1);
SumBinSignal = zeros(MaxTargetBin,1);
for X=Xmin:1:Xmax
for Y=Ymin:1:Ymax
if ~isnan(Image(Y,X)) %NaN pixels are just ignored
Dist = Resolution * ( (X-CentX)^2 + (Y-CentY)^2 )^0.5; %distance of point to centre measured in mm
TargetBin= ceil(Dist * BinDensity); %convert distance from mm to "bins/rings"
if (TargetBin>=1) && (TargetBin<=MaxTargetBin)
Profile(TargetBin)= Profile(TargetBin)+1; %record number of pixels in bin/ring
SumBinSignal(TargetBin)= SumBinSignal(TargetBin) + Image(Y, X); % record total signal in ring/bin
end
end
end
end
fprintf('For-loop time = %1.6f\n', toc)
% Vectorized code, CPU
tic
X = Xmin:Xmax;
Y = (Ymin:Ymax)';
Dist = Resolution * sqrt( (X-CentX).^2 + (Y-CentY).^2 );
TargetBin= ceil(Dist * BinDensity);
Im = Image;
keep = (TargetBin>=1) & (TargetBin<=MaxTargetBin) & ~isnan(Im);
TargetBin = TargetBin(keep);
Profile2 = accumarray(TargetBin, 1, [MaxTargetBin,1]);
if ~isequal([length(Y) length(X)], size(Im))
Im = Image(Y,X); % indexing is costly
end
SumBinSignal2 = accumarray(TargetBin, Im(keep), [MaxTargetBin,1]);
fprintf('Vectorized CPU time = %1.6f\n', toc)
if gpuDeviceCount("available")
% Vectorized code, GPU
tic
X = gpuArray(Xmin:Xmax);
Y = gpuArray((Ymin:Ymax))';
Dist = Resolution * sqrt( (X-CentX).^2 + (Y-CentY).^2 );
TargetBin= ceil(Dist * BinDensity);
Im = gpuArray(Image);
keep = (TargetBin>=1) & (TargetBin<=MaxTargetBin) & ~isnan(Im);
TargetBin = TargetBin(keep);
Profile3 = gather(accumarray(TargetBin, 1, [MaxTargetBin,1]));
if ~isequal([length(Y) length(X)], size(Im))
Im = Im(Y,X); % indexing is costly
end
SumBinSignal3 = gather(accumarray(TargetBin, Im(keep), [MaxTargetBin,1]));
fprintf('Vectorized GPU time = %1.6f\n', toc)
end
% Compare results
%norm(Profile-Profile2)
norm(SumBinSignal-SumBinSignal2)
if gpuDeviceCount("available")
%norm(Profile-Profile3)
norm(SumBinSignal-SumBinSignal3)
end
댓글 수: 2
Sam Marshalik
2023년 9월 6일
To add to Bruno's thorough response: In order to maximize your GPU, you will want to:
- Run only computationally intensive things on the GPU
- Run computations that do not require communication with other components on the motherboard (RAM, storage, etc.)
It seems that in this case the problem you are working on is not computationally intensive enough to really make use of the GPU. I imagine if you were to bump up the problem size you would eventually see the benefit of using a GPU.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Get Started with GPU Coder에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!