Matrix product and inversion speed up

Question

Angelo Cuzzola 2020년 6월 18일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/550767-matrix-product-and-inversion-speed-up

댓글: Angelo Cuzzola 2020년 6월 19일

MATLAB Online에서 열기

Hi,

my question is simple. I have a function where the most computational intesinve line is this simple matrix operation

iF = iR - iRZ*(iP - Z' *iRZ )\iRZ';

Dimesions of the involved objects vary at each call, but tipically are of the orders of magnitude listed here:

iR -> diagonal 40kx40k (saved as sparse)

iRZ -> 40k x 100 (saved as sparse)

Z -> 40k x 100

iP -> 100x100

Is there something one can do to speed up the code?

Thanks

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

Angelo Cuzzola 2020년 6월 19일

편집: Angelo Cuzzola 2020년 6월 19일

MATLAB Online에서 열기

To asnwer all the comments.

Tipical time is 10s per call, but i nedd to call 96 times and then restart until convergence (so it can vary, but tipically other 30 cycles).
Yes, everything vary from call to call. iR, Z and iRZ vary in a codfiable way, so that I could calculate the product once and then extract the call-specific results. The problem is that iP is varying according to results of the previous step of the cycle and the inversion makes the problem undecomposable. I can just calculate Z'*iRZ out of the cycle and then extract at each step the needed elements, but I don't think this will reduce much the computational time.
Concerning the sparseness, it is around 90/95% (even more in some cases). This is an example with 20 x 4 (20 is the dimension that reaches 40k). What I omitted is that the matrix are sparse with a pattern (notice that tipically the filled elements in the columns are not of the same number as in this example).

Z = 
    0.2039    0.1445         0         0         0
   -0.0907    0.6961         0         0         0
   -0.2556    1.0681         0         0         0
   -0.1729    0.9794         0         0         0
   -0.0755    0.8164         0         0         0
    0.3922         0    0.6513         0         0
    0.9542         0    0.8579         0         0
   -0.6814         0    0.1662         0         0
    0.3870         0    2.4420         0         0
   -0.6655         0   -0.4036         0         0
   -2.1358         0         0   -0.7712         0
    4.6064         0         0    4.3017         0
   -3.5872         0         0   -2.3074         0
    1.2298         0         0    1.7966         0
   -2.3530         0         0   -1.7511         0
    0.0485         0         0         0   -2.3801
   -0.2576         0         0         0    2.0950
   -0.1127         0         0         0   -0.3713
    0.1639         0         0         0   -4.9018
   -0.0776         0         0         0    0.1692  
    

4. Another, useful information is that iP and iR and diagonal and iRZ = iR*Z. Examples are:

Z_t'*iRZ  =
   41.9403   -3.7581    4.5266   31.1671   -0.4219
   -3.7581   20.9927         0         0         0
    4.5266         0   21.9035         0         0
   31.1671         0         0   27.5707         0
   -0.4219         0         0         0    5.9187
iP =
    0.9897         0         0         0         0
         0    1.0099         0         0         0
         0         0    0.9704         0         0
         0         0         0    1.0190         0
         0         0         0         0    0.9874

5. What I am lookin for is straightforward: I need that product. What is tricky is that iP is varying according to results from a previous step and the other objects are step-dependent extraction from bigger matrices (see point 2).

Thanks for the comments.

Christine Tobler 2020년 6월 19일

MATLAB Online에서 열기

What are you doing with the matrix iF in the next step? Since this is a dense 1e4x1e4 matrix, it will be comparatively expensive to compute, and just avoiding to store this explicitly altogether would probably be the fastest thing.

For example, if your next step is to do matrix-vector multiplication with iF, you could replace

y = iF*x

with

y = iR*y - iRZ*((iP - Z' *iRZ )\(iRZ'*y));

For some other possible operations, you will need to store the complete iF as a dense matrix - it depends on that next step.

Angelo Cuzzola 2020년 6월 19일

MATLAB Online에서 열기

Well, this seems to quite improve the situation. Thanks

I have always been taught the opposite, namely to do expensive computations once and store the results. I couldn't imagine that the time consuming part was the storing part.

          tic    
          iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
          Au  = A  + PZ*(iF*V);
          Pu  = P  - PZ*(iF*PZ');
          toc
          Elapsed time is 2.076231 seconds.
          
          tic
          PZRZ = PZ*iRZ;
          Au  = A  + PZ*(iR*V) - PZRZ*((iP + Z_t'*iRZ)\iRZ'*V);
          Pu  = P  - PZ*(iR*PZ') + PZRZ*((iP + Z_t'*iRZ)\iRZ'*PZ');
          toc
          Elapsed time is 0.861536 seconds.
             
          tic
          PZRZ = PZ*iRZ;
          iPRZ = (iP + Z_t'*iRZ)\iRZ';
          Au  = A  + PZ*(iR*V) - PZRZ*(iPRZ*V);
          Pu  = P  - PZ*(iR*PZ') + PZRZ*(iPRZ*PZ');
          toc
          Elapsed time is 0.774753 seconds.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

John D'Errico 2020년 6월 19일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/550767-matrix-product-and-inversion-speed-up#answer_454216

편집: John D'Errico 2020년 6월 19일

MATLAB Online에서 열기

What you seem not to appreciate is that 90-95% is NOT very sparse, not when you are talking matrix factorizations (i.e., backslash.) Backslash does not compute an inverse, although the inverse of a general not terribly sparse matrix will also probably be almost full too.

Inside this expression:

iRZ*(iP - Z' *iRZ )\iRZ'

we see a 100x100 matrix in the parens. (Roughly, since you say the sizes changes somewhat arbitrarily.) That matrix is what controls a LOT, since it will likely be fairly non-sparse, given the components. But once that 100x100 matrix is decomposed using a matrix factorization, it likely becomes close to a full matrix, or a product of a full lower and upper triangular pair.

In that case, then if you multiply that result using a pre multiply by iRZ as well as an implicit multiply by iRZ', you also get something fairly full.

My question in the comments was to ask what is the sparsity of those submatrices. That is, compute

A = iP - Z' *iRZ ;
nnz(A)/numel(A)

You might also then try for that same A:

[L,U] = lu(A);
nnz(L)/numel(L)
nnz(U)/numel(U)

If each of those matrices is nearly 50% non-zero, then they are essentially full triangular matrices.

Next, on the parent computation:

A = iRZ*(iP - Z' *iRZ )\iRZ';
nnz(A)/numel(A)

Those are the things you need to do. You should not be surprised to see a great deal of fill-in. In fact, consider yourself lucky if that does not happen.

You might consider sparsity enhancing permutations, applied to the inner 100x100 matrix, thus tools like colamd, dmperm, etc. (Sorry, but it has been many years since I was actively using those tools. They can help a great deal when used properly.)

Remember that once things start to become too filled in, a sparse matrix computation can start to become LESS efficient than a full one would have been. You might test if making that inner square matrix a full matrix helps too.

Are these things important? Yes.

For example:

A = rand(100);
As = sparse(A);
B = sprand(10000,100,0.01);

So B is 99% sparse. A is essentially a full matrix. And remember that even if A is only fairly full, a factorization of A will probably be quite close to full anyway.

Consider the term

C = A\B';
nnz(C)/numel(C)
ans =
                    0.6299

It does not matter whether or not A is sparse, backslash created a matrix that was 63% non-zero. It is effectively full in the world of sparse matrices. In fact, if we look at what happens with

Cs = As\B';
>> whos C Cs
  Name        Size                  Bytes  Class     Attributes
  C         100x10000             8000000  double              
  Cs        100x10000            10158408  double    sparse    

So that 63% non-zero matrix used roughly 25% MORE space to store than the comparable full version.

Now compare times:

timeit(@() B*(A\B'))
ans =
           0.3445460668125
           
timeit(@() B*(As\B'))
ans =
           1.5696587328125

Finally, consider this:

Bf = full(B);
timeit(@() Bf*(A\Bf'))
ans =
           0.1513420138125

And those times were only for a 10000x100 problem, and a sparse version of B that was 99% sparse, not just 90-95%.

As long as I could store that final result in memory as a full matrix, I actually gained by just keeping the computation completely full. To be useful, sparse matrices need to be seriously sparse. As well, sparsity can disappear quickly.

So if that inner part is full, or even close to being full, then sparsity of B does not help. In fact using sparse matrices may be LESS efficient, using more storage in those products, and using more time. And you need to look at the sparsity of the computed results. At least consider fill-in reducing permutations, as it is fill-in that can be a killer for sparse matrices.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Angelo Cuzzola 2020년 6월 19일

MATLAB Online에서 열기

Thanks for your detailed answer. Indeed you seem to be right

  Cc = (iP + Z_t'*iRZ);
  Cc_sparseness = nnz(Cc)/numel(Cc)
  Cc_sparseness = 0.0198
          
  [L U] = lu(Cc);
  L_sparseness = nnz(L)/numel(L)
  L_sparseness = 0.5033
  U_sparseness = nnz(U)/numel(U)
  U_sparseness = 0.5033          

Sparseness seems to solve a bit (and I need it to save memory).

Non-sparse case:

  tic
  iRZ = iR*Z_t; 
  Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
  toc
  Elapsed time is 3.856927 seconds.
  
  whos Z_t iRZ Cinv
  
  Name          Size                Bytes  Class     Attributes
  Z_t       10486x151            12667088  double              
  iRZ       10486x151            12667088  double              
  Cinv      10486x10486            879649568  double              

Sparse case:

  tic
  Z_t = sparse(Z_t);
  iRZ = sparse(iR*Z_t); 
  Cinv = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
  toc
  Elapsed time is 2.119886 seconds.
  
  whos Z_t iRZ Cinv
  
  Name          Size              Bytes  Class     Attributes
  Z_t       10486x151            336768  double    sparse    
  iRZ       10486x151            336768  double    sparse    
  Cinv      10486x10486            879649568  double              

I just hoped that, being the shape of iP + Z_t'*iRZ quite regular (see below), there could be a smart way to invert it.

   1.0e+03 *
  Columns 1 through 9
    4.8288    0.0067   -0.0170   -0.0010   -0.0007   -0.0071   -0.0056    0.0119    0.0160
    0.0067    0.0477         0         0         0         0         0         0         0
   -0.0170         0    0.0768         0         0         0         0         0         0
   -0.0010         0         0    0.0666         0         0         0         0         0
   -0.0007         0         0         0    0.0732         0         0         0         0
   -0.0071         0         0         0         0    0.0810         0         0         0
   -0.0056         0         0         0         0         0    0.0405         0         0
    0.0119         0         0         0         0         0         0    0.0765         0
    0.0160         0         0         0         0         0         0         0    0.0693

댓글을 달려면 로그인하십시오.

Answer 2

Bjorn Gustavsson 2020년 6월 19일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/550767-matrix-product-and-inversion-speed-up#answer_454096

MATLAB Online에서 열기

My comment still seems relevant. A simple commandline test gives:

tic,AA = ones(3e4,2)*(ones(2,3e4)*ones(3e4,1));toc
% Elapsed time is 0.001027 seconds.
tic,BB = (ones(3e4,2)*ones(2,3e4))*ones(3e4,1);toc
% Elapsed time is 2.031760 seconds.
isequal(AA,BB)
% logical 1

If this is run inside a function where the JIT-optimization can work its wonders the multiplication might be a bit more cunning. But when you know the sizes ofthe matrices that should be multiplied together you can guide the order of the multiplications so that you avoid outer-products producing large intermediate matrices that then vanishes to produce a smaller matrix or vector in the end.

HTH

댓글 수: 2
없음 표시없음 숨기기

Angelo Cuzzola 2020년 6월 19일

편집: Angelo Cuzzola 2020년 6월 19일

MATLAB Online에서 열기

Ok. But I do not see how this applies to my case, where there is an inversion. It is possible to invert the median object with the function inv() and then you can exploit your associativeness, but inv() is a lot more inefficient. Here is an example (10k x 150 dimesions)

          tic    
          iF = iR - iRZ*(inv(iP + Z_t'*iRZ)*iRZ');
          toc
Elapsed time is 17.203595 seconds.
          tic    
          iF = iR - (iRZ*inv(iP + Z_t'*iRZ))*iRZ';
          toc
Elapsed time is 7.220762 seconds.
          tic    
          iF = iR - iRZ*((iP + Z_t'*iRZ)\iRZ');
          toc
Elapsed time is 1.312007 seconds.

Bjorn Gustavsson 2020년 6월 19일

That's a downer, I hoped that inverting a 100x100 matrix wouldn't be that expensive. Perhaps you can have some use of Factorize.

댓글을 달려면 로그인하십시오.

Matrix product and inversion speed up

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Matrix product and inversion speed up

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기