Matrix product and inversion speed up

조회 수: 22 (최근 30일)
Angelo Cuzzola
Angelo Cuzzola 2020년 6월 18일
댓글: Angelo Cuzzola 2020년 6월 19일
Hi,
my question is simple. I have a function where the most computational intesinve line is this simple matrix operation
iF = iR - iRZ*(iP - Z' *iRZ )\iRZ';
Dimesions of the involved objects vary at each call, but tipically are of the orders of magnitude listed here:
iR -> diagonal 40kx40k (saved as sparse)
iRZ -> 40k x 100 (saved as sparse)
Z -> 40k x 100
iP -> 100x100
Is there something one can do to speed up the code?
Thanks
  댓글 수: 8
Christine Tobler
Christine Tobler 2020년 6월 19일
What are you doing with the matrix iF in the next step? Since this is a dense 1e4x1e4 matrix, it will be comparatively expensive to compute, and just avoiding to store this explicitly altogether would probably be the fastest thing.
For example, if your next step is to do matrix-vector multiplication with iF, you could replace
y = iF*x
with
y = iR*y - iRZ*((iP - Z' *iRZ )\(iRZ'*y));
For some other possible operations, you will need to store the complete iF as a dense matrix - it depends on that next step.
Angelo Cuzzola
Angelo Cuzzola 2020년 6월 19일
Well, this seems to quite improve the situation. Thanks
I have always been taught the opposite, namely to do expensive computations once and store the results. I couldn't imagine that the time consuming part was the storing part.
tic
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
Au = A + PZ*(iF*V);
Pu = P - PZ*(iF*PZ');
toc
Elapsed time is 2.076231 seconds.
tic
PZRZ = PZ*iRZ;
Au = A + PZ*(iR*V) - PZRZ*((iP + Z_t'*iRZ)\iRZ'*V);
Pu = P - PZ*(iR*PZ') + PZRZ*((iP + Z_t'*iRZ)\iRZ'*PZ');
toc
Elapsed time is 0.861536 seconds.
tic
PZRZ = PZ*iRZ;
iPRZ = (iP + Z_t'*iRZ)\iRZ';
Au = A + PZ*(iR*V) - PZRZ*(iPRZ*V);
Pu = P - PZ*(iR*PZ') + PZRZ*(iPRZ*PZ');
toc
Elapsed time is 0.774753 seconds.

댓글을 달려면 로그인하십시오.

답변 (2개)

John D'Errico
John D'Errico 2020년 6월 19일
편집: John D'Errico 2020년 6월 19일
What you seem not to appreciate is that 90-95% is NOT very sparse, not when you are talking matrix factorizations (i.e., backslash.) Backslash does not compute an inverse, although the inverse of a general not terribly sparse matrix will also probably be almost full too.
Inside this expression:
iRZ*(iP - Z' *iRZ )\iRZ'
we see a 100x100 matrix in the parens. (Roughly, since you say the sizes changes somewhat arbitrarily.) That matrix is what controls a LOT, since it will likely be fairly non-sparse, given the components. But once that 100x100 matrix is decomposed using a matrix factorization, it likely becomes close to a full matrix, or a product of a full lower and upper triangular pair.
In that case, then if you multiply that result using a pre multiply by iRZ as well as an implicit multiply by iRZ', you also get something fairly full.
My question in the comments was to ask what is the sparsity of those submatrices. That is, compute
A = iP - Z' *iRZ ;
nnz(A)/numel(A)
You might also then try for that same A:
[L,U] = lu(A);
nnz(L)/numel(L)
nnz(U)/numel(U)
If each of those matrices is nearly 50% non-zero, then they are essentially full triangular matrices.
Next, on the parent computation:
A = iRZ*(iP - Z' *iRZ )\iRZ';
nnz(A)/numel(A)
Those are the things you need to do. You should not be surprised to see a great deal of fill-in. In fact, consider yourself lucky if that does not happen.
You might consider sparsity enhancing permutations, applied to the inner 100x100 matrix, thus tools like colamd, dmperm, etc. (Sorry, but it has been many years since I was actively using those tools. They can help a great deal when used properly.)
Remember that once things start to become too filled in, a sparse matrix computation can start to become LESS efficient than a full one would have been. You might test if making that inner square matrix a full matrix helps too.
Are these things important? Yes.
For example:
A = rand(100);
As = sparse(A);
B = sprand(10000,100,0.01);
So B is 99% sparse. A is essentially a full matrix. And remember that even if A is only fairly full, a factorization of A will probably be quite close to full anyway.
Consider the term
C = A\B';
nnz(C)/numel(C)
ans =
0.6299
It does not matter whether or not A is sparse, backslash created a matrix that was 63% non-zero. It is effectively full in the world of sparse matrices. In fact, if we look at what happens with
Cs = As\B';
>> whos C Cs
Name Size Bytes Class Attributes
C 100x10000 8000000 double
Cs 100x10000 10158408 double sparse
So that 63% non-zero matrix used roughly 25% MORE space to store than the comparable full version.
Now compare times:
timeit(@() B*(A\B'))
ans =
0.3445460668125
timeit(@() B*(As\B'))
ans =
1.5696587328125
Finally, consider this:
Bf = full(B);
timeit(@() Bf*(A\Bf'))
ans =
0.1513420138125
And those times were only for a 10000x100 problem, and a sparse version of B that was 99% sparse, not just 90-95%.
As long as I could store that final result in memory as a full matrix, I actually gained by just keeping the computation completely full. To be useful, sparse matrices need to be seriously sparse. As well, sparsity can disappear quickly.
So if that inner part is full, or even close to being full, then sparsity of B does not help. In fact using sparse matrices may be LESS efficient, using more storage in those products, and using more time. And you need to look at the sparsity of the computed results. At least consider fill-in reducing permutations, as it is fill-in that can be a killer for sparse matrices.
  댓글 수: 1
Angelo Cuzzola
Angelo Cuzzola 2020년 6월 19일
Thanks for your detailed answer. Indeed you seem to be right
Cc = (iP + Z_t'*iRZ);
Cc_sparseness = nnz(Cc)/numel(Cc)
Cc_sparseness = 0.0198
[L U] = lu(Cc);
L_sparseness = nnz(L)/numel(L)
L_sparseness = 0.5033
U_sparseness = nnz(U)/numel(U)
U_sparseness = 0.5033
Sparseness seems to solve a bit (and I need it to save memory).
Non-sparse case:
tic
iRZ = iR*Z_t;
Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 3.856927 seconds.
whos Z_t iRZ Cinv
Name Size Bytes Class Attributes
Z_t 10486x151 12667088 double
iRZ 10486x151 12667088 double
Cinv 10486x10486 879649568 double
Sparse case:
tic
Z_t = sparse(Z_t);
iRZ = sparse(iR*Z_t);
Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 2.119886 seconds.
whos Z_t iRZ Cinv
Name Size Bytes Class Attributes
Z_t 10486x151 336768 double sparse
iRZ 10486x151 336768 double sparse
Cinv 10486x10486 879649568 double
I just hoped that, being the shape of iP + Z_t'*iRZ quite regular (see below), there could be a smart way to invert it.
1.0e+03 *
Columns 1 through 9
4.8288 0.0067 -0.0170 -0.0010 -0.0007 -0.0071 -0.0056 0.0119 0.0160
0.0067 0.0477 0 0 0 0 0 0 0
-0.0170 0 0.0768 0 0 0 0 0 0
-0.0010 0 0 0.0666 0 0 0 0 0
-0.0007 0 0 0 0.0732 0 0 0 0
-0.0071 0 0 0 0 0.0810 0 0 0
-0.0056 0 0 0 0 0 0.0405 0 0
0.0119 0 0 0 0 0 0 0.0765 0
0.0160 0 0 0 0 0 0 0 0.0693

댓글을 달려면 로그인하십시오.


Bjorn Gustavsson
Bjorn Gustavsson 2020년 6월 19일
My comment still seems relevant. A simple commandline test gives:
tic,AA = ones(3e4,2)*(ones(2,3e4)*ones(3e4,1));toc
% Elapsed time is 0.001027 seconds.
tic,BB = (ones(3e4,2)*ones(2,3e4))*ones(3e4,1);toc
% Elapsed time is 2.031760 seconds.
isequal(AA,BB)
% logical 1
If this is run inside a function where the JIT-optimization can work its wonders the multiplication might be a bit more cunning. But when you know the sizes ofthe matrices that should be multiplied together you can guide the order of the multiplications so that you avoid outer-products producing large intermediate matrices that then vanishes to produce a smaller matrix or vector in the end.
HTH
  댓글 수: 2
Angelo Cuzzola
Angelo Cuzzola 2020년 6월 19일
편집: Angelo Cuzzola 2020년 6월 19일
Ok. But I do not see how this applies to my case, where there is an inversion. It is possible to invert the median object with the function inv() and then you can exploit your associativeness, but inv() is a lot more inefficient. Here is an example (10k x 150 dimesions)
tic
iF = iR - iRZ*(inv(iP + Z_t'*iRZ)*iRZ');
toc
Elapsed time is 17.203595 seconds.
tic
iF = iR - (iRZ*inv(iP + Z_t'*iRZ))*iRZ';
toc
Elapsed time is 7.220762 seconds.
tic
iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
toc
Elapsed time is 1.312007 seconds.
Bjorn Gustavsson
Bjorn Gustavsson 2020년 6월 19일
That's a downer, I hoped that inverting a 100x100 matrix wouldn't be that expensive. Perhaps you can have some use of Factorize.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Sparse Matrices에 대해 자세히 알아보기

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by