Converting parallel CPU processing into GPU processing

조회 수: 10 (최근 30일)
Douglas Miller
Douglas Miller 2022년 3월 11일
댓글: Walter Roberson 2022년 3월 12일
I am trying to convert code that ran in parallel on CPU cores into parallel processing on the gpu.
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu. However, it performs significantly slower than on a parallel CPU processor of 4 cores (25 cells processed in 30 minutes on 4 CPU cores, 5 cells is currently taking over 45 minutes to process on GPU and is still not finished). I'm very new to GPU computing and nothing seemed really obvious on how to speed this up.
GPU properties:
Data to be processed:
  • series is a 568x1 cell array
  • each cell is a 60x60 double (each entry is a value between -1 and 1)
Start processing
tic % test
for i = 1:5
cell_array{i} = gpuArray(cleanSeries{i});
end
Determine size of matrix within the first cell, equivalent to number of biological cells recorded
numCells = gpuArray(length(cell_array{1}));
Preallocate arrays for data
clust_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_mean = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
clust_random_std = gpuArray(NaN(length(cell_array{1}),length(cell_array)));
Initiate the processing
parfor cellNumber = 1:length(cell_array)
threshold_clust = gpuArray(NaN(numCells,100));
random_clust = gpuArray(NaN(numCells,100));
% process data over varying proportional thresholds starting at 25%
% strongest to fully connected (%100) at 25% steps i.e. 25%, 50%, 75%,
% 100%
for threshold = 25:25:100
threshold_matrix = (threshold_proportional(cell_array{cellNumber}, threshold/100)); % proportional threshold matrix - custom function
% clustering requires that all values be between 0 and 1 so remove
% any negatives
threshold_matrix(threshold_matrix < 0) = 0;
% ensure that randomizing the matrix is possible
[rowi,coli] = find(tril(threshold_matrix));
bothi = [rowi coli];
c = bothi(1,1);
d = bothi(1,2);
e=find(c==bothi);
f=find(d==bothi);
if length(e)==length(bothi)||length(f)==length(bothi)
disp(['One cell has all the connections, skipping ', int2str(threshold), '% threshold.'])
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
elseif length(bothi) <=3
threshold_clust(:,threshold) = NaN(numCells,1);
random_clust(:,threshold) = NaN(numCells,1);
else
% create random matrix - custom function
random_matrix = latmio_und(threshold_matrix,1000);
% clustering coefficient per matrix - custom function
threshold_clust(:,threshold) = clustering_coef_wu(threshold_matrix);
random_clust(:,threshold) = clustering_coef_wu(random_matrix);
end % if logic end
end % for loop end
% concatenate over thresholds
clust_mean(:,cellNumber) = mean(threshold_clust,2,'omitnan');
clust_std(:,cellNumber) = std(threshold_clust,0,2,'omitnan');
clust_random_mean(:,cellNumber) = mean(random_clust,2,'omitnan');
clust_random_std(:,cellNumber) = std(random_clust,0,2,'omitnan');
end % parfor loop end
gather(clust_mean);
gather(clust_std)
gather(clust_random_std);
gather(clust_random_mean);
toc
  댓글 수: 6
Douglas Miller
Douglas Miller 2022년 3월 12일
According to the other post, it sounds like running in parallel isn't feasible on the GPU the way I was hoping. But I had never considered that zeros would process quicker. That will definitely help optimize the code. Thank you so much!
Walter Roberson
Walter Roberson 2022년 3월 12일
For operations other than pure copying, NaN has to go through a special "Abort" path in all calculations; calculations with it cannot stream the normal way. There also has to be special checking to see if the NaN is a "signalling NaN" as signalling NaN are required to raise exceptions whenever they occur.
inf cannot readily stream either... but I guess a bit more readily than NaN.

댓글을 달려면 로그인하십시오.

답변 (1개)

Matt J
Matt J 2022년 3월 12일
편집: Matt J 2022년 3월 12일
I would like to process matrices in a cell array on the GPU in parallel for how many cores are present on the gpu.
No, GPU cores cannot act like parpool workers. They are a completely different animal.

카테고리

Help CenterFile Exchange에서 GPU Computing에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by