Running out of memory selecting parts of a large sparse matrix

조회 수: 17 (최근 30일)
Benjamin Lender
Benjamin Lender 2021년 5월 25일
댓글: Benjamin Lender 2021년 6월 17일
I have a sparse matrix of around 400k x 400k rows/columns containing similarities (symmetric, sparse, double values between 0 and 1, diagonal is always 1). I'm hoping to eventually be able to work with 1000k x 1000 k matrices.
I need to select all entries above a given threshold. My problem is that I keep running out of memory (I have 64 GB available).
I found that I get the best results so far if I do not set values below the threshold to zero, but rather if I select all values at or above the threshold and build a new sparse matrix with those. However with matrix sizes increasing, I'm hitting the memory limit.
Can you point me to a way to reduce memory consumption for this process?
With the help of this forum, I tried several ways:
% Get relevant values from matrix
rel_values = matrix(matrix>threshold);
% Get index to relevant values
[i,j] = find(matrix>threshold);
% Form sparse matrix
matrix = sparse(i,j,rel_values,columns_mat,columns_mat);
I have run into limits both with matrix(matrix>threshold) and with find(matrix>threshold).
To avoid matrix(matrix>threshold) I tried:
% get relevant positions
rel_values_pos = matrix>threshold;
% Get index to relevant values
[i,j] = find(rel_values_pos);
% Get relevant values (i.e. values over threshold)
idx = sub2ind(size(matrix),i,j);
rel_values = matrix(idx);
% Form sparse matrix
matrix = sparse(i,j,rel_values,columns_mat,columns_mat);
This doesn't work either, both first lines have run into memory shortages.
My hope was to use the symmetry and only consider the following:
matrix = triu(matrix,1); % I know the diagonal to be one, so I can neglect those entries and append them later
However, triu(matrix,1) runs out of memory itself.
Using the symmetry seems to me the most potent approach, as it reduces the amount of entries by more than half (considering that I drop the diagonal), however I'm unaware of a smooth way to select the upper right triangle without triu.
If there is an aspect I'm missing entirely, I'm thankful for those hints as well of course.
  댓글 수: 11
James Tursa
James Tursa 2021년 6월 1일
A mex routine is compiled C/C++ or Fortran code that, once compiled, can be called like a normal MATLAB function for inputs/outputs. If I can find a way to test it, I would be supplying you with C code. You would need to install a supported C compiler and compile the code at your end. E.g., see this link:
Benjamin Lender
Benjamin Lender 2021년 6월 17일
Hello world,
is there any new developmet, is there additional information I could provide that would help? @James Tursa a routine that would not necessitate large intermediate variables sounds really really good, nevermind that I hadn't heard of mex routines at all previously.

댓글을 달려면 로그인하십시오.

답변 (1개)

David Goodmanson
David Goodmanson 2021년 5월 26일
편집: David Goodmanson 2021년 5월 26일
Hi Benjamin,
I believe this works as required. This example is 4e5 x 4e5, with 99.99% zeros. On my pc it takes 3 sec to make the initial sparse matrix and 1 sec to make the new one.
tic
n = 4e5;
fracnz = 1e-4;
m = round(n^2*fracnz);
r = randi(n,m,1);
c = randi(n,m,1);
p = rand(m,1);
s = sparse(r,c,p,n,n);
nnz(s)/n^2 % check
toc
tic
F = find(s~=0);
S = full(s(F));
ind = S<1/2; % or whatever the required condition is;
% these elements are eliminated
F(ind) = [];
S(ind) = [];
[r1 c1] = ind2sub([n n],F);
snew = sparse(r1,c1,S,n,n);
toc
  댓글 수: 1
Benjamin Lender
Benjamin Lender 2021년 6월 1일
편집: Benjamin Lender 2021년 6월 1일
Hi David,
thank you! I have tried your code. At the point where I can make the calculation task manager shows 14 GB available memory. I run out of memory at:
F = find(s~=0)
@Bjorn Gustavsson with this in mind my problem is getting to the subset above zero.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Programming Utilities에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by