Distributed arrays unevenly distributed

조회 수: 1 (최근 30일)
Maria
Maria 2021년 10월 13일
댓글: Oli Tissot 2021년 11월 12일
Hi,
I have a remote cluster with 8 nodes, and each node has 16 GB of memory.
I am running an example with a big 3D matrix of size around 10000x 4500 x 8. I tried now to launch a batch job. The matrix is created directly in the function as distributed array, as
H_sym = zeros(m,m,LENGTH_BETA,'distributed')+1j*zeros(m,m,LENGTH_BETA,'distributed');
However, if I look at each node status (in Linux, with htop), I see that all cores of all nodes are working, and all nodes have 4 GB of memory occupied that does not change, all except the 1st node. The 1st node shows an allocation of memory that changes between 8GB and 13 GB.
Why is only the first node that has a larger occupation of memory, that changes over time? Shouldn't the "distributed" distribute the matrix in the same way among all nodes?
Best
Maria
  댓글 수: 1
Oli Tissot
Oli Tissot 2021년 11월 12일
Hi Maria,
When distributed arrays are constructed, they are distributed as evenly as possible along the second dimension. In your case, it means 4500 is spread into 8 parts and some workers end up getting 10000x562x8 local parts whereas others are getting 10000x563x8 local parts. So not all workers are using the exact same amount of memory, but I believe that do not explain the discrepancy you're seeing. I suspect the computation you're doing afterwards on H_sym involves communication between workers, thus workers receiving messages use more memory. And that could explain what you are seeing. What computation are doing on H_sym after creating it?
Finally, the way you're building H_sym is correct but there is more efficient here:
H_sym = zeros(m,m,LENGTH_BETA,'like',distributed(1i));
Cheers,
Oli

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 MATLAB Parallel Server에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by