# How can I perform large sparse matrix multiplication efficiently when one sparse matrix is block diagonal?

조회 수: 17(최근 30일)
Wenyu Zhang 2021년 10월 27일
댓글: Wenyu Zhang 2021년 10월 29일
Background:
I have a 3d matrix V of size [J,I,K], where J and I are large, say J=1e5 and I=1e4. I would like to get two matrices:
V_ik = sum(V,1);
V_jk = sum(V,2);
The 3d matrix V is actually very sparse (around 90% sparsity), but the full [J,I,K] matrix takes up too much memory usage.
To deal with this problem, I code V into a 2d sparse matrix V_sp = [V_1;V_2;...;V_I], where each block matrix V_i is a matrix of size [J,K]. The problem arises when I calculate V_ik and V_jk. From the view of matrix multiplication, I write the following code:
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
However, the sparse matrix kron(speye(I),ones(1,J)) is of size [I,J*I], and repmat(speye(J),1,I) is of size [J,J*I]. Either of them takes up 22.4G storage for J=1e5 and I=1e4.
My Question:
Is there an efficient way to perform this kind of matrix multiplication without too much memory usage? I tried using a for-loop to sum along the desired axis, the code with for-loop worked at the cost of around 5x more running time.
If there's no such efficient solution, can I code the 3d sparse matrix V in some other smart format such that I can reach a balance between the memory usage and running time? (I found ndSparse class that could store n-dimensional sparse arrays, but the class will store it internally as an ordinary 2D sparse array, and the sum function applied to it is not that fast.)
##### 댓글 수: 7표시숨기기 이전 댓글 수: 6
Wenyu Zhang 2021년 10월 28일
Thank you for comparison @Bruno Luong!
Although the blockdiag idea seems to work well, if what I have at hand is the non-zero values and the subindex [i,j,k], I don't need to go far to create a sparse diagonal matrix first, so the accumarray is faster. Perhaps the storage of my data is what I should reconsider as the highest priority.
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% James's method using V_blkdiag
tic
V_blkdiag=sparse(i+(k-1)*I,j+(k-1)*J,V,I*K,J*K);
V_ik = reshape(sum(V_blkdiag,2),[I K]);
V_jk = reshape(sum(V_blkdiag,1),[J K]);
toc
Elapsed time is 11.238796 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.329572 seconds.

댓글을 달려면 로그인하십시오.

### 채택된 답변

Bruno Luong 2021년 10월 27일
편집: Bruno Luong 2021년 10월 28일
You might rethink of the storage of your data, such as this (tic/toc result obtained from run on TMW online server):
I=1e4;
J=1e4;
K=10;
n=I*J*K;
density=0.1;
m=round(n*density); % number of non-zero elements
% Storage [i,j,k,V] as sparse 3D array, (i,j,k) are subindex, V are corresponding values
i=randi(I,[m,1],'uint32');
j=randi(J,[m,1],'uint32');
k=randi(K,[m,1],'uint32');
V=rand([m,1]);
% Sparse storage
V_sp=sparse(j+(i-1)*J,k,V,I*J,K);
% Your method using V_sp
tic
V_ik = kron(speye(I),ones(1,J)) * V_sp;
V_jk = repmat(speye(J),1,I) * V_sp;
toc
Elapsed time is 7.273694 seconds.
% Method using [i,j,k,V]
tic
V_ik=accumarray([i k],V,[I K]);
V_jk=accumarray([j k],V,[J K]);
toc
Elapsed time is 2.346202 seconds.
##### 댓글 수: 1표시숨기기 없음
Wenyu Zhang 2021년 10월 29일
Thank you so much! After comparing the running time of several methods, this works the best.

댓글을 달려면 로그인하십시오.

### 추가 답변(3개)

Matt J 2021년 10월 27일
편집: Matt J 2021년 10월 27일
With ndSparse, did you try the summl() method? It's more memory intensive, but it can also be faster. Here's what I see in a simple test comparing ndSparse.sum() and ndSparse.summl().
>> A=ndSparse.sprand([1e4,1e4,10],0.1);
>> tic; sum(A,1);toc
Elapsed time is 10.697812 seconds.
>> tic; summl(A,1);toc
Elapsed time is 0.927524 seconds.
##### 댓글 수: 2표시숨기기 이전 댓글 수: 1
Wenyu Zhang 2021년 10월 29일
I have tried this and found that summl is significantly faster than sum. However, it takes some time to build the ndSparse class. Anyway, thanks for your help!

댓글을 달려면 로그인하십시오.

Bjorn Gustavsson 2021년 10월 27일
Shouldn't straight summation be preferable for the calculations of sums rather than doing it by way of matrix multiplication. Perhaps something like this might work:
v_ik = zeros(I,K);
v_jk = zeros(J,K);
for i1 = 1:I,
v_ik(i1,:) = sum(v_sp((1+J*(i1-1)):(J*i1),:));
end
for j1 = 1:J,
v_jk(j1,:) = sum(v_sp(j1:J:end,:));
end
HTH
##### 댓글 수: 1표시숨기기 없음
Wenyu Zhang 2021년 10월 27일
The code with for-loop worked indeed, but at the cost of around 5x more running time.
I think a possible reason is that extracting submatrices from a sparse matrix takes time.

댓글을 달려면 로그인하십시오.

Matt J 2021년 10월 28일
편집: Matt J 2021년 10월 28일
The typical value of K is 50.
Since the third dimension is so small, it's probably worthwhile just storing the data as a length-K cell array of JxI matrices. You can then just do the summations with,
cellfun(@(v) sum(v,1), V,'uni',0);
cellfun(@(v) sum(v,2), V,'uni',0);
##### 댓글 수: 1표시숨기기 없음
Wenyu Zhang 2021년 10월 29일
Actually my V consists of J*I length-K vectors, each generated from a multinomial distribution using mnrnd. It may bring overhead to reshape the data into a length-K cell array of J*I. Thanks for your suggestion!

댓글을 달려면 로그인하십시오.

R2019a

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!