Why `sparse` function accumulate large sparse COO format matrix by index so fast?

Question

wei zhang 2021년 2월 19일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/749874-why-sparse-function-accumulate-large-sparse-coo-format-matrix-by-index-so-fast

댓글: wei zhang 2021년 2월 20일

Hi, I am dealing with "repeated COO format" sparse matrix. I try to "accumulate" it by its index efficiently. I am inspired by the similar problem

COO is one format of sparse Matrix. In Matlab, I can tranfer matrix A to COO format by "find" function. Here, I suppose the matrix A's shape is square.

[ii,jj,val] = find(A);
Acoo.coo = [ii,jj,val];
Acoo.N = size(A,1); % I suppose the matrix A's shape is square.

But my COO format matrix is ill-conditioned. It has many repeated values on each index. I need to sum them by the index(row and column). Here is an mini example.

Acoo.N = 3;
Acoo.coo = [1,1,2;
            1,2,3; % repeated
            2,1,4;
            1,2,5; % repeated
            3,1,6;
            3,3,2];
A = full(sparse(Acoo.coo(:,1),Acoo.coo(:,2),Acoo.coo(:,3),N,N));
% A =
%      2     8     0
%      4     0     0
%      6     0     2

My matrix A's size is 780000*780000. The nnz of it is nearlly 10,000,000. The size of reapeated COO matrix's size is about 60,000,000*3. It seem each index has 6 repeated elements to accumulate in average.

I had planned two ways to "accumulate" this COO matrix A. One is using "sparse" function. It may use "accumarray" function inside. The other is writing a c++ mex with unordered Hash map. I suppose the mex would be faster. This idea came from a comment of similar Matlab problem. But in fact "sparse" method is 2 times faster than the "mex" method. This makes me very confused. Could I do something more to speed up? Why "sparse" method is so fast?

The method 1. using "sparse"

function A1 = acumCOO(A)
% method 1 using sparse 
As = sparse(A.coo(:,1),A.coo(:,2),A.coo(:,3),A.N,A.N);
[ii,jj,val] = find(As);
A1.N = A.N;
A1.coo = [ii,jj,val];
end
% time profile: 9.107291s

The method2. using c++ unordered hash map.

The matlab wrapper

function A2 = acumCOOMex(A)
% c++ method wrapper
N = A.N;
index = sub2ind([N,N],A.coo(:,1),A.coo(:,2));
[key2,val2] = acumMex(index,A.coo(:,3));
[ii,jj] = ind2sub([N,N],key2);
A2.N = N;
A2.coo = [ii,jj,val2];
end

The c++ mex "acumMex.cpp"

// MATLAB related
#include "mex.h"
// c++ related
#include <unordered_map>
#include <time.h>  
// Input Arguments
#define	KEY1 prhs[0]
#define	VAL1 prhs[1]
// Output Arguments
#define	KEY2 plhs[0]
#define	VAL2 plhs[1]
using namespace std;
void mexFunction(int nlhs, mxArray * plhs[], int nrhs, const mxArray * prhs[])
{
    clock_t t0,t1,t2;
    t0 = clock();
    size_t i;
    size_t N1 = mxGetM(KEY1);
    mxDouble *key = mxGetPr(KEY1);
    mxDouble *val = mxGetPr(VAL1);
    unordered_map<size_t,double> coo;
    for (i = 0;i< N1 ;i++)
    {
        coo[key[i]] = coo[key[i]]+val[i];
    }
    t1 = clock() - t0;
    mexPrintf("Map session takes %d clicks (%f seconds).\n",t1,((float)t1)/CLOCKS_PER_SEC);
    mexEvalString("drawnow") ;
    // output
    size_t N2 = coo.size();
    KEY2 = mxCreateDoubleMatrix(N2,1,mxREAL);
    VAL2 = mxCreateDoubleMatrix(N2,1,mxREAL);
    mxDouble *key2 = mxGetPr(KEY2);
    mxDouble *val2 = mxGetPr(VAL2);
    auto coo_it = coo.cbegin();
    for (i = 0;i< N2 ;i++)
    {
        key2[i] = coo_it->first;
        val2[i] = coo_it->second;
        ++coo_it;
    }
    t2 = clock() - t0;
    mexPrintf("whole session takes %d clicks (%f seconds).\n",t2,((float)t2)/CLOCKS_PER_SEC);
    mexEvalString("drawnow");
}
%% Map session takes 20583 clicks (20.583000 seconds).
%% whole session takes 21705 clicks (21.705000 seconds).

댓글 수: 2
없음 표시없음 숨기기

James Tursa 2021년 2월 19일

편집: James Tursa 2021년 2월 19일

Can you back up a step and restate your overall goal? Are you trying to write code that takes a MATLAB sparse matrix and converts it into a different format? If so, what is the complete description of this other format? Will this be passed to some 3rd party code for processing? Will you have to write the reverse conversion code also?

wei zhang 2021년 2월 20일

@James Tursa My goal is to accumulate array based on a 2d index. The matlab "sparse" matrix is a wrapped built-in function. It may use the "acumarray" function inside, I guess. I want to speed it up.

As for the COO format matrix, it may be transfered into cudamex to do some GPU based matrix multiplication. But it required no reapeated rows. So I need the accumulation.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Bjorn Gustavsson 2021년 2월 19일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/749874-why-sparse-function-accumulate-large-sparse-coo-format-matrix-by-index-so-fast#answer_627554

Because sparse is a built-in function that Mathworks most likely have spent many man-months (we wouldn't know) on optimizing. You've spent much less time than that.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Bjorn Gustavsson 2021년 2월 19일

I suspected that it was way more than a couple of months worth of work - but I intend to weasle around that by claiming that there's no upper bound on many...

wei zhang 2021년 2월 20일

Thank you for your reply. Undoubtably, Matlab is a great work with many developer's strive. The goal of this post is to discuss the idea or algorithm of "accumarray based on 2d index". I want to speed up my code, which is mainly in Matlab with some self-made mex. The sparse matrix may be a extension of the function "accumarray" . So I assume the hash map could be a better way. Actually, Matlab does a greater job. I would like to know if Matlab deal this proble with another strategy. For me now, I am trying cuda thrust for speed up.

댓글을 달려면 로그인하십시오.

Why `sparse` function accumulate large sparse COO format matrix by index so fast?

댓글 수: 2
없음 표시없음 숨기기

답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Why `sparse` function accumulate large sparse COO format matrix by index so fast?

댓글 수: 2 없음 표시없음 숨기기

답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기