After exploring ndSparse, I'm still not certain that approach would offer any improvement over option 3 with B=accumarray(I, V.*beta(K).*X(J), N, N), as is suffers from the same problem of repeated multiplication before summation.
Without resorting to FEX/MEX, can the overhead of repeated calls to sparse() be avoided with operations on I, J, V directly?
조회 수: 1 (최근 30일)
이전 댓글 표시
I have a time-marching problem bottlenecked by matrix multiplication , where the sparse matrix is the sum of many time-independent sparse matrices with time-dependent scalar coefficients:
I'm aware of FEX solutions that would enable a 3D to be defined once prior to time-marching, though I would prefer to rule out native MATLAB first. My current thinking is to supplement the I, J, V vectors with an index vector K, and do one of the following:
- Generate a cell array of sparse matrices along index K, with multiplied with bsxfun, similar to this. (Probably the slowest option)
- Sparse matrix construction with B=sparse(I, J, V.*beta(K), N, N)*X
- Avoid sparse matrix entirely with B=accumarray(I, V.*beta(K).*X(J), N, N)
I haven't tested option 1, but option 3 with accumarray runs about 50x slower than option 2 (with or without issparse option). This makes sense, as the multiplication with X in accumarray is done before accumulation, hence on a much larger array.
However, option 2 still appears to be wasting performance. For example, sparse(I, J, V, N, N)*X runs about 50x slower than basic A*X. My initial interpretation of this is that much of the time in sparse is spent on unavoidable summation of common terms in I,J,V. I cant profile sparse internally, but tested this by accumulating V prior to sparse(I, J, V, N, N)*X, which comes out to ~7x slower than A*X.
My conclusion is that there is still some significant & (potentially) avoidable overhead from calling sparse many times unrelated to summation of duplicate indices, and the larger piece of summation that is unavoidable due to the summation of time-dependent terms co-located in .
To boil this down, I have two questions I cannot answer:
- Considering the pattern of summation of duplicate indices is always the same (static I/J), can the summation overhead in sparse be improved?
- What is the source of the remaining overhead in sparse and can it be avoided?
댓글 수: 20
Bruno Luong
2022년 8월 7일
@James Tursa yes you get it right. However
- In the range of density/size and N_k (m in your notation) your method is still 50% slower than the accumarray solution we come up with,
- The runtime scale with N_k and is not bounded by the assumption of limited "bandwidth" (density of A0) stated by @Joel Lynch. It is probably an important requirement for his application.
채택된 답변
Bruno Luong
2022년 8월 7일
편집: Bruno Luong
2022년 8월 7일
EDIT: Here is the final code. Method used in B2 is the fatest;
M = 10000;
N = 10000;
N_k = 100;
A_k = cell(1,N_k);
for k=1:N_k
A_k{k} = sprand(M,N,0.0005);
end
N_k = length(A_k);
[M,N] = size(A_k{1});
beta = rand(N_k,1);
X = rand(N,1);
I = cell(1,N_k);
J = cell(1,N_k);
V = cell(1,N_k);
K = cell(1,N_k);
for k=1:N_k
[I{k},J{k},V{k}] = find(A_k{k});
K{k} = k + zeros(size(I{k}));
end
I = cat(1,I{:});
J = cat(1,J{:});
K = cat(1,K{:});
V = cat(1,V{:});
[IJ,~,ic] = unique([I,J],'rows');
I2 = IJ(:,1);
J2 = IJ(:,2);
Y = sparse(ic, K, V, size(IJ,1), N_k);
tic
B1=sparse(I, J, V.*beta(K), M, N)*X ;
toc
tic
B3 = sparse(I2,J2,Y*beta,M,N)*X;
toc
tic
B2 = accumarray(I2,(Y*beta).*X(J2),[M,1]);
toc
댓글 수: 5
Bruno Luong
2022년 8월 7일
편집: Bruno Luong
2022년 8월 7일
It actualy entirely your method and credit. I simply put the code so as readers can test and understand better the problem and solutions.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Creating and Concatenating Matrices에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!