필터 지우기
필터 지우기

How to efficiently allocate memory using a parfor loop

조회 수: 31 (최근 30일)
tiwwexx
tiwwexx 2022년 6월 28일
댓글: tiwwexx 2022년 6월 30일
Hello all, I have a quick optimization question.
I'm doing calculations on some very large point cloud data. The calculation I'm doing is
for n=1:size(E_mat,1)
Q_matrix(n,:,:) = sigmaE(n)/2/mass_density(n)*squeeze(E_mat(n,:,:))'*squeeze(E_mat(n,:,:));
end
where size(E_mat) ~70000000,3,24. This code should be super parallelizable but when I use parfor I get a memory issue. I have access to a good compute server with 40 cores and 512Gb of RAM. The current for loop utilizes about 300Gb of RAM but only 1.2% CPU. I'm pretty new to high performance computing but I'm pretty sure the for loop is running single threaded due to the low CPU usage. Is there a simple way to fix this?
Thanks so much for the help!!
  댓글 수: 4
Walter Roberson
Walter Roberson 2022년 6월 28일
squeeze is fast. It is extracting the data that is slow. The memory layout is
(1,1,1) (2,1,1) (3,1,1) (4,1,1)... (70000000,1,1), (1,2,1) (2,2,1)... (70000000, 2,1) and so on. The data for (n, :, :) is all over the place in memory. If you make 70000000 the final dimension then each 3x24 is stored in consecutive memory.
tiwwexx
tiwwexx 2022년 6월 29일
Thanks for the explaination!

댓글을 달려면 로그인하십시오.

채택된 답변

Jan
Jan 2022년 6월 29일
편집: Jan 2022년 6월 29일
ET = permute(E_mat, [2,3,1]);
Q = zeros(size(ET));
parfor n = 1:size(E_mat, 3)
Q(:,:,n) = sigmaE(n) / 2 / mass_density(n) * ET(:, :, n)' * ET(:, :, n);
% Or maybe this is faster:
% tmp = ET(:, :, n);
% Q(:,:,n) = sigmaE(n) / 2 / mass_density(n) * tmp' * tmp;
end
I'm curious: What do you observe?
Du you really mean ctranspose or is ET real? Then .' would be the transposition.
What about using pagemtimes ?
ET = permute(E_mat, [2,3,1]);
Q = pagetimes(ET, 'transpose', ET, 'none');
  댓글 수: 5
Jan
Jan 2022년 6월 30일
By the way: A=E_mat_pt(:,:,1:end) is less efficient than A=E_mat_pt .
tiwwexx
tiwwexx 2022년 6월 30일
That was a by product of my GPU running out of memory, I had to split up the array into a few parts to fit it on the gpu.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by