pidst and sparse vectors/graphs

The pdist function allows the user to specify the function coding the similarity between rows of matrices (called DISTFUN in the documentation).
With the increasing diffusion of large datasets, techniques to sparsify graphs are increasingly being explored and used in practical applications. It is easy to code one own's DISTFUN such that it returns a sparse vector. Unfortunately, pdist (and pdist2) will return a dense vector in the output, which for very large graphs will cause an out of memory error. The offending code is
(lines 434 etc.)
% Make the return have whichever numeric type the distance function
% returns, or logical.
if islogical(Y)
Y = false(1,n*(n-1)./2);
else % isnumeric
Y = zeros(1,n*(n-1)./2, class(Y));
end
To have pdist return a sparse vector, the only modification that is required is
if islogical(Y)
Y = false(1,n*(n-1)./2);
elseif issparse(Y)
Y = sparse(1,n*(n-1)./2);
else % isnumeric
Y = zeros(1,n*(n-1)./2, class(Y));
end
It is a bit more work to modify squareform to produce a sparse matrix, given a sparse vector produced by the modified pdist. Squareform includes several checks on the inputs, but the core functionality for sparse vectors would be given by something like
% given a sparse vector d returned by pdist, compute a sparse squareform
[~,j,v] = find(d);
[m,n] = pdist_ind2sub(j, nobs);
W = sparse(m, n, v, nobs, nobs);
W = W + W';
Here, pdist_ind2sub is a function that given a set of indices into a vector produced by pdist, returns the corresponding subscripts in a (triangular) matrix. Computing this requires information about the number of observations given to pdist, i.e. what was n in the preceding code. I could not figure out a way to use the function adjacency to accomplish this.

답변 (0개)

카테고리

도움말 센터File Exchange에서 Sparse Matrices에 대해 자세히 알아보기

태그

질문:

2025년 2월 8일

편집:

2025년 2월 8일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by