Matrix multiplication-optimal speed and memory

Question

Ronni 2011년 4월 28일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory

I have large matrices that I'm multiplying, not only is it very slow, it also runs out of memory. I have to do this for many iterations so I really need to find a better way to implement it.

Basically I have A=D'*W*D W is a diagnoal matrix with dimensions MxM and D has dimensions M*N where M>>N.

If it's unclear, someone else had the same problem and he has a nice diagram to illustrate what the problem is. Could someone provide suggestions on how to improve this.

http://stackoverflow.com/questions/4420235/efficient-multiplication-of-very-large-matrices-in-matlab

Thanks.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Ronni 2011년 4월 29일

no D is not sparse. This is for an iterative optimization.

Oleg Komarov 2011년 4월 29일

What are you trying to do A for? Can you post more code so that we can see the final point you want to get to.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

James Tursa 2011년 4월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8918

MATLAB Online에서 열기

Here is the mex routine. Comments should explain its intended use. You will need to install a C compiler as stated earlier. Then do this:

mex -setup
(then press Enter)
(then enter the number of the C compiler)
(then press Enter again)
mex sqrttimes.c

Now you are ready to use it. I didn't bother to parallelize this with OpenMP, but if you happen to install an OpenMP capable compiler then it will be very easy to put a #pragma omp parallel for in front of the loops (with appropriate private clauses) to potentially get more speed.

/*****************************************************************************/
/* sqrttimes.c                                                               */
/* Syntax: sqrttimes(A,B)                                                    */
/* Does the following:                                                       */
/* A = sqrt(A)       in-place first, then                                    */
/* B = diag(A) * B   in-place second.                                        */
/* Where A = a double vector of size 1 x M or M x 1                          */
/*       B = a double matrix of size M x N                                   */
/*                                                                           */
/* The idea is to call this function as follows:                             */
/*                                                                           */
/*   >> sqrttimes(A,B);                                                      */
/*   >> C = B' * B;                                                          */
/*                                                                           */
/* The resulting C matrix will be the same as if you had done this:          */
/*                                                                           */
/*   >> C = B' * diag(A) * B                                                 */
/*                                                                           */
/* But using sqrttimes will result in practically 0 extra temporary memory   */
/* usage so it is very memory efficient. This can be important if the A and  */
/* B matrices are very large. Note that the original A and B matrices have   */
/* been changed in-place, which is necessary to avoid extra memory usage.    */
/*                                                                           */
/* Programmer: James Tursa                                                   */
/* Date: 2011-Apr-28                                                         */
/*****************************************************************************/
#include <math.h>
#include "mex.h"
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    double *Apr, *Bpr, *Apr0;
  mwSize i, j, k, l, m, n;
/* Check inputs */
  if( nrhs != 2 ) {
    mexErrMsgTxt("Need exactly two inputs.");
  }
  if( nlhs != 0 ) {
    mexErrMsgTxt("Too many outputs.");
  }
  if( !mxIsDouble(prhs[0]) || !mxIsDouble(prhs[1]) || mxIsSparse(prhs[0]) || mxIsSparse(prhs[1]) ) {
    mexErrMsgTxt("Inputs must be double non-sparse.");
  }
  if( mxGetNumberOfDimensions(prhs[0]) != 2 || mxGetNumberOfDimensions(prhs[1]) != 2 ) {
    mexErrMsgTxt("Inputs must be 2D.");
  }
  m = mxGetM(prhs[0]);
  k = mxGetN(prhs[0]);
  l = mxGetM(prhs[1]);
  n = mxGetN(prhs[1]);
  if( m != 1 && k != 1 ) {
    mexErrMsgTxt("1st Input must be a vector.");
  }
  m = m * k;
  if( m != l ) {
    mexErrMsgTxt("1st vector input must have same number of elements as 1st dim of 2nd input.");
  }
/* Get pointers to the data */
  Apr = Apr0 = mxGetPr(prhs[0]);
  Bpr = mxGetPr(prhs[1]);
/* Do A = sqrt(A) in-place */
  for( i=0; i<m; i++ ) {
    *Apr = sqrt(*Apr);
    Apr++;
  }
/* Do B = diag(A)*B in-place */
  for( j=0; j<n; j++ ) {
    Apr = Apr0;
    for( i=0; i<m; i++ ) {
      *Bpr++ *= *Apr++;
    }
  }
}

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

James Tursa 2011년 4월 29일

P.S. I need to caution you that sqrttimes modifies its input arguments in-place, meaning that if they are sharing data with another variable then the other variable will be changed also. You may not want this behavior. If you don't know what this means then do this test:

>> A = 1:3

A =

1 2 3

>> B = reshape(1:15,3,5)

B =

1 4 7 10 13

2 5 8 11 14

3 6 9 12 15

>> a = A % a is now sharing data memory with A

a =

1 2 3

>> b = B % b is now sharing data memory with B

b =

1 4 7 10 13

2 5 8 11 14

3 6 9 12 15

>> sqrttimes(A,B)

>> A

A =

1.0000 1.4142 1.7321

>> a

a =

1.0000 1.4142 1.7321

>> B

B =

1.0000 4.0000 7.0000 10.0000 13.0000

2.8284 7.0711 11.3137 15.5563 19.7990

5.1962 10.3923 15.5885 20.7846 25.9808

>> b

b =

1.0000 4.0000 7.0000 10.0000 13.0000

2.8284 7.0711 11.3137 15.5563 19.7990

5.1962 10.3923 15.5885 20.7846 25.9808

You can see that by changing A and B in-place, we have also changed a and b in-place. This is the side-effect of doing in-place operations. So if you don't want this behavior you need to be careful to not share A and B with any other variables.

댓글을 달려면 로그인하십시오.

Answer 2

Ronni 2011년 4월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8859

Also transpose of D (D') takes a while to calculate. Any faster way of doing this for large matrices?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 3

Oleg Komarov 2011년 4월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8863

MATLAB Online에서 열기

sum(D.^2.*diag(W))

or

(D.*diag(W)).'*D

diag(W) can be replaced by the vector containing the diagonal elements.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 4

James Tursa 2011년 4월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8867

MATLAB Online에서 열기

I don't understand why the bsxfun-based proposed solutions listed in the link do not work for you. I would not expect this to run out of memory if you are implementing it correctly unless you are up against the limit with just W and D. e.g.,

M = large number
N = small number
D = M x N matrix
W = 1 x M vector (i.e., diagonal stored as a vector)
bsxfun(@times,D',W) * D

or

D' * bsxfun(@times,W',D)

The latter will likely result in D' not being explicitly calculated. Rather, a transpose flag will simply be passed to the dgemm routine in the BLAS call. And the W' in the latter will not result in a data copy, it will be just be a simple reshaped shared data copy of the W vector.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 5

Ronni 2011년 4월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8904

Thanks Oleg and James. I am using 70% of the memory with just D so maybe that's why I'm not able to do carry out further processes.

I'm now doing some basic things to understand memory allocation in matlab. For example my D takes 9GB. So when I do A=D; the memory is still around 9GB, with slight increase. However, if I do A=D' (after clearing first D), the memory usage by matlab jumps to double, so 18GB. I can understand that while the variable is being tranposed, matlab may need to allocate access memory temporarily, but when this process finishs, I thought the memory would still be slightly over 9GB, why is it 18GB? Size of A is still same as previous A, why is Matlab memory usage 18GB now?

댓글 수: 2
없음 표시없음 숨기기

James Tursa 2011년 4월 29일

Your comment doesn't make sense. How can you do A=D' if you first clear D? If you start with a 9GB D and then do A=D' you will get 18GB total since a data copy takes place. If you then clear D the memory should drop back to 9GB. Are you saying this doesn't happen?

Ronni 2011년 4월 29일

Oops, I meant say clear A.

댓글을 달려면 로그인하십시오.

Answer 6

James Tursa 2011년 4월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8907

MATLAB Online에서 열기

OK, so how about his approach (assumes W elements are non-negative):

D' * W * D 
= D' * sqrt(W) * sqrt(W) * D
= D' * sqrt(W') * sqrt(W) * D
= (sqrt(W) * D) ' * (sqrt(W) * D)

Then do the sqrt(W) * D operation in-place on D in a mex routine. i.e., do D = sqrt(W) * D in-place inside the mex routine. Should be pretty fast and no extra memory requirement (could even parallelize it easily with OpenMP if you had a compatible compiler). Then do this operation:

A = D' * D

That will call a BLAS routine with a transpose flag to do the multiply, so no explicit transpose D' is formed and MATLAB will recognize the symmetry and call a specialized BLAS routine that is faster than a generic matrix multiply.

The big IF in all of this is: Can you tolerate changing D in-place? i.e., do you need the original D explicitly after this operation? If so, you could recover it by undoing the sqrt(W) * D operation in-place with another mex routine.

Let me know how this sounds to you and if it looks like it will work for you I can write up the mex routine and post it here. You will need a 64-bit C compiler to do this. Since one does not ship with MATLAB you will have to install it yourself. e.g., see this thread:

http://www.mathworks.com/matlabcentral/newsreader/view_thread/305789#829725

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Ronni 2011년 4월 29일

James, this is excellent information. Thank you very much. I'm a bit beginner when it comes to matlab so don't really know what BLAS or mex are. I'll try to read up on them and try to digest the information you provided. Hopefully I can fix it.

Although I have a choice parallelzing this code since I have access to a cluster of compouters with multi-matlab license. However, not all those computer have a RAM Of 24GB. Also since I have never really parallelized a code before, I was thinking it may take longer due to overhead time. I will look into this as well. This would really increase the speed of my optimization.

Again, thank you. I really appreciate it.

James Tursa 2011년 4월 29일

Update: I just checked on R2011a and the JIT is not yet smart enough to do the sqrt or the bsxfun(@times,etc) operations in-place. So it looks like mex is the only option for this at present.

댓글을 달려면 로그인하십시오.

Answer 7

Ronni 2011년 4월 29일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/6411-matrix-multiplication-optimal-speed-and-memory#answer_8921

Thanks James. I installed the C compiler and now implementing the code as per your suggestions. This has been very helpful!

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Matrix multiplication-optimal speed and memory

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (6개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Matrix multiplication-optimal speed and memory

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (6개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기