Fast way to retrieve nonzero entries of each row in a sparse matrix

Question

Dominik Mattioli 2017년 8월 30일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/354660-fast-way-to-retrieve-nonzero-entries-of-each-row-in-a-sparse-matrix

편집: Matt J 2017년 8월 31일

I'm working with very large sparse matrices (10's of thousands to millions of entries) and I'd like to efficiently retrieve and store the unique nonzero entries of each row. For my application, the rows will have anywhere between 1 and 9 unique and nonzero entries.

I've decided to utilize "sprand" for an example - I'm fairly new to using sparse matrices and am not the best programmer so forgive my naivety - and I've tried to choose a density within the aforementioned range. My code is:

numrows = 1000000;
numcols = 1000000;
density = numrows^(-1.87);
num_entries_per_row  = numrows*numcols*density   %  This is an average.
rsm = sprand(numrows,numcols,density);           % random sparse matrix
tic
out = cell(numrows,1);                           % output.
for idx = 1:numrows
  [~,~,value] = find(rsm(idx,:));
  out{idx} = unique(value);
end
toc

I've had this running for a few minutes and still no result. Previously when the number of rows and columns were both 100,000 the result took about 32 seconds - way too slow for my purposes. Clearly not the fastest, but it is simple. Is there a better and more clever way to achieve this result in MATLAB?

댓글 수: 2
없음 표시없음 숨기기

James Tursa 2017년 8월 31일

편집: James Tursa 2017년 8월 31일

Can you double check your density equation? It seems to produce a very small number compared to what you seem to be working with.

Also, do the resulting values need to retain any order or can they be sorted?

And can you build and run mex routines?

The idea would be to transpose the matrix, then sort it, then have a mex routine do the unique calculation by column. Result would be a single matrix with the unique row data contained in the columns of the result.

Are you sure you want a cell array containing millions of elements?

Dominik Mattioli 2017년 8월 31일

Can you double check your density equation? It seems to produce a very small number compared to what you seem to be working with.

>> I set the density equation to give me a num_entries (per row) that is approximately equal to 6, which is a good number for me. Sorry for the confusion, I will edit the variable name in my question.

Also, do the resulting values need to retain any order or can they be sorted?

>> No order necessary, as long as they are unique.

And can you build and run mex routines?

>> I have zero experience with other languages, so unfortunately I cannot know how to build these.

The idea would be to transpose the matrix, then sort it, then have a mex routine do the unique calculation by column. Result would be a single matrix with the unique row data contained in the columns of the result.

>> This sounds simple enough. Do you think it is more efficient than Matt J's answer?

Are you sure you want a cell array containing millions of elements?

>> Is there a better storage method considering the rows of rsm will have varying numbers of unique nonzero entries?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2017년 8월 31일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/354660-fast-way-to-retrieve-nonzero-entries-of-each-row-in-a-sparse-matrix#answer_279771

편집: Matt J 2017년 8월 31일

MATLAB Online에서 열기

[I,~,values]=find(rsm);
u=unique([I,values],'rows');
    N=size(u,1);
   tmp=diff([0;find(diff(u(:,1)));N]);
out=mat2cell(u(:,2),tmp);

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

James Tursa 2017년 8월 31일

With that timing result, I think we can dispense with the mex entirely. Transposing the matrix, working with columns, and keeping the result in a sparse matrix rather than millions of cell elements seems to work out just fine.

Matt J 2017년 8월 31일

편집: Matt J 2017년 8월 31일

MATLAB Online에서 열기

keeping the result in a sparse matrix rather than millions of cell elements

With further testing, I'm actually seeing that cells overtake matrices in efficiency when the density of rsm gets above a certain threshold. Not sure why...

    numrows = 1000000;
    numcols = 1000000;
    rsm = sprand(numrows,numcols,30/numcols);  
    rsm0=rsm.';
    tic;
        rsm=sort(rsm0);
        [~,J,values]=find(rsm);
        u=unique([J,values],'rows');
            N=size(u,1);
           tmp=diff([0;find(diff(u(:,1)));N]);
        out1=mat2cell(u(:,2),tmp);
    toc;
    %Elapsed time is 8.225773 seconds.
    tic;
        rsm=sort(rsm0);
        [I,J,values]=find(rsm);
        [u,k]=unique([J,values],'rows');
         out2=sparse(I(k),u(:,1),u(:,2));
    toc;
    %Elapsed time is 16.365144 seconds.

댓글을 달려면 로그인하십시오.

Fast way to retrieve nonzero entries of each row in a sparse matrix

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

Fast way to retrieve nonzero entries of each row in a sparse matrix

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기