Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

Question

Royi Avital 2020년 4월 20일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz

댓글: Royi Avital 2020년 4월 24일

채택된 답변: James Tursa

MATLAB Online에서 열기

Given a matrix X and its vector form

I am after the most efficient way to build the matrices L and U which extracts the lower and upper triangle from X.

So in MATLAB code it would be something like that:

clear();
numRows = 3;
numCols = numRows;
mX = randn(numRows, numCols);
vX = mX(:);
% Lower Triangle are indices 2, 3, 6
mL = [  0, 1, 0, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 1, 0, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 1, 0, 0, 0   ];
% Upper Triangle are indices 4, 7, 8
mU = [  0, 0, 0, 1, 0, 0, 0, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 1, 0, 0   ; ...
        0, 0, 0, 0, 0, 0, 0, 1, 0   ];
assert(isequal(mL * vX, mX(logical(tril(mX, -1)))));
assert(isequal(mU * vX, mX(logical(triu(mX, 1)))));

I am after sparse represenation of mU and mL in the most efficient way.

My current implementation is given by:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

Is there a more efficient way to generate vJ without extensive allocation of memory (In order to allow generating really large matrices)?

Thank You.

댓글 수: 24
이전 댓글 22개 표시이전 댓글 22개 숨기기

Matt J 2020년 4월 23일

편집: Matt J 2020년 4월 23일

MATLAB Online에서 열기

@Matt, there are many cases for using those matrices.

I can't think of any. You can pursue this for hypothetical interest if you want, of course.

Regarding fmincon(), Solving problem which have dedicate solver with general solver is usually a really bad chice.

The situations when that is true are those where calculating the objective and derivatives are faster in matrix form than in operator form. For large problems that won't always be the case, because the computational cost of implementing things in matrix form can start to outweigh the benefits of using a specialized algorithm.

Not to speak defining Linear Constarint in the form of Non Linear (Think time spent on calculating the Derivataive when it is so well defined).

I don't think mU and mL are helpful for defining either linear or nonlinear constraints. A linear constraint on the lower triangle of your unknown matrix X will always be of the form sum(T.*X, 'all')<=b, where T is some lower triangular matrix that you know in advance. The matrix form of the constraint gradient is simply T(:), which doesn't require mL at all to set up.

For non-linear constraints c(mL*X)<=0 on the lower triangular part of X, the gradient can be expressed mL.'*gradc(mL*X), but this could implemented efficeintly and without mL as follows:

B=B=tril(true(numRows),-1);
Bd=double(B);
g=Bd;
g(B)=gradc(X(B));

So, for this, you really only need to pre-compute B and Bd, which can be done with much less time and memory allocation than mL:

N=3000;
tic;
B=tril(true(N),-1);
Bd=double(B);
toc
%Elapsed time is 0.056492 seconds.
tic;
mL=GenerateTriangleExtractorMatrix( N, 1, 2);
toc
%Elapsed time is 0.264385 seconds.

>> whos B Bd mL
  Name   Size                Kilobytes     Class     Attributes
                                                               
  B      3000x3000                8790     logical             
  Bd     3000x3000               70313     double              
  mL     4498500x9000000        140602     double    sparse    

Matt J 2020년 4월 23일

But that would mean your constraints are of the form mL*X(:)<=b. But since each row of mL contains only a single non-zero element, this means the constraint is equivalent to a simple bound X(j)<=b. In Matlab, you would never have to construct a matrix to represent such a constraint. You would use the vector input arguments lb and ub to specify those. I assume Gurobi has something similar.

Royi Avital 2020년 4월 23일

@Matt, I know that. Whenever I can use other features of the solver I user. I have cases I need those extractors in Matrix Form. I appericiate the dialogue. I think other who will read it will gain something. I still hope someone will bring a different point of view to the pattern of vJ. Though I guess @James' solution as practically as good as it gets (Also appericate if there is something to make it even faster).

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

James Tursa 2020년 4월 22일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_427698

편집: James Tursa 2020년 4월 22일

MATLAB Online에서 열기

Here is a mex routine that generates the sparse double matrices mL and mU directly, so no wasted memory in creating them. Seems to run about 3x-5x faster than m-code for somewhat large sizes.

/* S = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag)
 *
 * S = double sparse matrix
 * numRows = integer > 0
 * triangleFlag = 1 , extract lower triangle
 *                2 , extract upper triangle
 * diagFlag = 1 , include diagonal
 *            2 , exclude diagonal
 * where
 *
 * M = an numRows X numRows matrix of non-zero terms
 * assert(isequal(S * M(:), mX(logical(tril(M, -1))))); % for lower
 * assert(isequal(S * M(:), mX(logical(triu(M,  1))))); % for upper
 *
 * Programmer: James Tursa
 * Date: 2020-April-22
*/
        
#include "mex.h"
void mexFunction( int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
    mwSize numRows, triangleFlag, diagFlag, numElements;
    mwIndex *Ir, *Jc;
    mwIndex i, j, k, m;
    double *pr;
    
    if( nrhs != 3 || !mxIsNumeric(prhs[0]) || !mxIsNumeric(prhs[1]) || !mxIsNumeric(prhs[2]) ||
        mxGetNumberOfElements(prhs[0]) != 1 || mxGetNumberOfElements(prhs[1]) != 1 ||
        mxGetNumberOfElements(prhs[2]) != 1 ) {
        mexErrMsgTxt("Need three numeric scalar inputs");
    }
    if( nlhs > 1 ) {
        mexErrMsgTxt("Too many outputs");
    }
    numRows = mxGetScalar(prhs[0]);
    triangleFlag = mxGetScalar(prhs[1]);
    diagFlag = mxGetScalar(prhs[2]);
    if( numRows < 1 ) {
        mexErrMsgTxt("Invalid numRows, should be > 0");
    }
    if( triangleFlag != 1 && triangleFlag != 2 ) {
        mexErrMsgTxt("Invalid triangleFlag, should be 1 or 2");
    }
    if( diagFlag != 1 && diagFlag != 2 ) {
        mexErrMsgTxt("Invalid diagFlag, should be 1 or 2");
    }
    if( diagFlag == 1 ) {
        numElements = numRows * (numRows + 1) / 2; /* include diagonal */
    } else {
        numElements = (numRows - 1) * numRows / 2; /* exclude diagonal */
    }
    plhs[0] = mxCreateSparse(numElements, numRows*numRows, numElements, mxREAL);
    pr = (double *) mxGetData(plhs[0]);
    Ir = mxGetIr(plhs[0]);
    Jc = mxGetJc(plhs[0]);
    Jc[0] = 0;
    diagFlag--;
    k = 0;
    m = 1;
    if( triangleFlag == 1 ) { /* Lower */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i >= j+diagFlag ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    } else { /* Upper */
        for( j=0; j<numRows; j++ ) {
            for( i=0; i<numRows; i++ ) {
                if( i+diagFlag <= j ) {
                    *pr++ = 1.0;
                    *Ir++ = k++;
                    Jc[m] = Jc[m-1] + 1;
                } else {
                    Jc[m] = Jc[m-1];
                }
                m++;
            }
        }
    }
}

You mex the routine as follows (you need a supported C compiler installed):

mex GenerateTriangleExtractorMatrixMex.c

And some test code:

% GenerateTriangleExtractorMatrix_test.m
n = 300;
disp('m-code timing')
tic
GenerateTriangleExtractorMatrix(10000,1,1);
toc
disp('mex code timing')
tic
GenerateTriangleExtractorMatrixMex(10000,1,1);
toc
for k=1:n
    numRows = ceil(rand*5000+100);
    numCols = numRows;
    triangleFlag = (rand<0.5) + 1;
    diagFlag = (rand<0.5) + 1;
    Mm = GenerateTriangleExtractorMatrix(numRows,triangleFlag,diagFlag);
    Mx = GenerateTriangleExtractorMatrixMex(numRows,triangleFlag,diagFlag);
    if( ~isequal(Mm,Mx) )
        error('Not equal');
    end
end
disp('Random tests passed')

With a sample run:

>> GenerateTriangleExtractorMatrix_test
m-code timing
Elapsed time is 9.964882 seconds.
mex code timing
Elapsed time is 1.901741 seconds.
Random tests passed

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Royi Avital 2020년 4월 22일

@James, I meant I want to use all the abstractions of the MATLAB C API. I just want to use it in my own C files. Not for MEX but for general computing. Yet I guess MATLAB blocks that kind of use.

Royi Avital 2020년 4월 23일

편집: Royi Avital 2020년 4월 23일

MATLAB Online에서 열기

By the way, I tried optimizing the code:

	if( triangleFlag == 1 ) { // Lower Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
			for( ii = jj + diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
		}
	} else { // Upper Triangle
		for( jj = 1; jj < numRows + 1; jj++ ) {
			for( ii = 1; ii < jj + 1 - diagFlag; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1] + 1;
				vV[kk] = 1.0;
				Ir[kk] = kk;
				kk++;
			}
			for( ii = jj + 1 - diagFlag; ii < numRows + 1; ii++ ) {
				ll++;
				Jc[ll] = Jc[ll - 1];
			}
		}
	}

But for some reason even removing the branching inside the loop didn't improve results.

Really Nice! If nothing comes up I will mark this as an answer. Thank You!

댓글을 달려면 로그인하십시오.

Answer 2

Matt J 2020년 4월 23일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_428017

편집: Matt J 2020년 4월 23일

MATLAB Online에서 열기

Another approach to consider is to use my MatrixObj class

https://www.mathworks.com/matlabcentral/fileexchange/26611-on-the-fly-definition-of-custom-matrix-objects

to construct an object that has the same effect as the operations mL*X and mL.'*Y, but doesn't require you to actually build the matrix,

    N=5000;
    
    tic;
     mL0=GenerateTriangleExtractorMatrix( N, 1, 2);
    toc
    %Elapsed time is 0.678702 seconds.
         
    tic;
    
        B=tril(true(N),-1);
        Bd=double(B(:));
        
        mL=MatrixObj;
        mL.Params.B=B;
        mL.Params.Bd=Bd;
        mL.Ops.mtimes=@(obj,z) z(obj.Params.B);
        mL.Trans.mtimes=@mtimesT;
    
    toc;
    %Elapsed time is 0.086228 seconds.
    
    function out=mtimesT(obj,z) 
        
        out=obj.Params.Bd;
        out(obj.Params.B)=z;
        
    end

In addition to requiring less time to construct, you can verify that it gives the same results as multiplications with mL and mL.',

        >> X=rand(N^2,1);   isequal(mL0.'*(mL0*X),mL.'*(mL*X))
        
        ans =
    
          logical
        
           1

but with considerably less memory consumption:

  >> whos mL mL0
  
  Name   Size                  Kilobytes     Class       Attributes
                                                                   
  mL     1x1                      219739     MatrixObj             
  mL0    12497500x25000000        390586     double      sparse    

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Royi Avital 2020년 4월 24일

This is really nice. Thank you for the effort!

댓글을 달려면 로그인하십시오.

Answer 3

Royi Avital 2020년 4월 21일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/519408-most-efficient-way-to-construct-the-matrices-to-extract-the-lower-and-upper-triangle-from-a-vectoriz#answer_427294

MATLAB Online에서 열기

My current solution:

function [ mLU ] = GenerateTriangleExtractorMatrix( numRows, triangleFlag, diagFlag )
EXTRACT_LOWER_TRIANGLE = 1;
EXTRACT_UPPER_TRIANGLE = 2;
INCLUDE_DIAGONAL = 1;
EXCLUDE_DIAGONAL = 2;
switch(diagFlag)
    case(INCLUDE_DIAGONAL)
        numElements = 0.5 * numRows * (numRows + 1);
        diagIdx = 0;
    case(EXCLUDE_DIAGONAL)
        numElements = 0.5 * (numRows - 1) * numRows;
        diagIdx = 1;
end
vJ = zeros(numElements, 1);
if(triangleFlag == EXTRACT_LOWER_TRIANGLE)
    elmntIdx = 0;
    for jj = 1:numRows
        for ii = (jj + diagIdx):numRows
            elmntIdx = elmntIdx + 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
elseif(triangleFlag == EXTRACT_UPPER_TRIANGLE)
    elmntIdx = numElements + 1;
    for jj = numRows:-1:1
        for ii = (jj - diagIdx):-1:1
            elmntIdx = elmntIdx - 1;
            vJ(elmntIdx) = ((jj - 1) * numRows) + ii;
        end
    end
end
mLU = sparse(1:numElements, vJ, 1, numElements, numRows * numRows, numElements);
end

I like the memory allocation is kept to a minimum.

I wonder if there is a more efficient way to generate vJ. It is trivial to remove the inner loop and just count the number of elements yet in MATLAB it will mean each iteration will allocate memory (As we don't have iterators).

댓글 수: 2
없음 표시없음 숨기기

Tommy 2020년 4월 21일

The two methods are fairly similar - I also like that yours minimizes memory allocation. I ran a few simple fun tests:

I didn't dare try higher than 20,000 for numRows. It seems that your code may possibly perform better at higher values of numRows. In the second case (calculating both the upper and lower triangles) I had your code running both sets of for loops, one after the other (shown in red). In green is the result from your code if only the first set of loops runs, and you recognize that vJ for one triangle is easy to determine if you have vJ for the other triangle (N^2+1-flip(vJ)). So the only thing I'll conclude from this is, if you will eventually calculate both the lower and upper triangle matrices for a given size, it might be better to calculate them together and only find vJ once. I suppose it depends on how expensive N^2+1-flip(vJ) is.

Royi Avital 2020년 4월 21일

@Tommy, Really liked your analysis. Yes, when dealing with sparse matrices the whole point it making sure allocation is kept to minimum. I agree if one wants both, it is better to do the trick you mentioned.

Let's see if someone can think on a different pattern to populate vJ which is more efficient.

댓글을 달려면 로그인하십시오.

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

댓글 수: 24
이전 댓글 22개 표시이전 댓글 22개 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Most Efficient Way to Construct the Matrices to Extract the Lower and Upper Triangle from a Vectorized Matrix

댓글 수: 24 이전 댓글 22개 표시이전 댓글 22개 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 24
이전 댓글 22개 표시이전 댓글 22개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기