Find the K most orthogonal vectors in a set of vectors

Question

Peter Cook 2016년 5월 19일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/285024-find-the-k-most-orthogonal-vectors-in-a-set-of-vectors

댓글: Peter Cook 2016년 6월 8일

Hello All,

The context of this particular search is a step in tuning a spectral clustering routine a la Ng et al 2002. The purpose of this search is to give a initialization point for k-means clustering in higher dimensional space. In particular, well separated data ought to sit in K tight clusters on the surface of a hypersphere, so the purpose of this search is to find the locations of these cluster centroids with which to initialize k-means clustering of the same data.

I have a data matrix "Y" with O(10^4) rows and O(10^2-10^3) columns (the columns of this matrix are the [transformed and normalized a couple times] K largest eigenvectors of the affinity matrix).

Ng et al suggest "Briefly, we let the first cluster centroid be a randomly chosen row of Y, and then repeatedly choose as the next centroid the row of Y that is closest to being 90 degrees from all the centroids already picked." I translated this mathspeak to mean I need to take a bunch of dot products and look for values close to zero. This was quoted as computationally cheap (perhaps it is for clustering fewer points into say O(10^1) clusters or perhaps they meant it in the sense that it requires fewer iterations of k-means clustering once initialized), but my CPU is dragging ass at it.

So far I've tried 2 approaches: Approach #1 - Compute everything then search

% "cheap" initialization of k-means
dotProductY = zeros(length(Y)); %preallocate to make the parser turn green
% compute dot product of every row with every other row first
for k = 1:length(Y)
  dotProductY(:,k) = sum(bsxfun(@times,Y(k,:),Y),2);
end
dotProductY(logical(eye(length(Y)))) = nan; %exclude dot product of row with self
centroidIdx = randi(length(Y)); %initialize on a random row of Y
dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
dotProductY = abs(dotProductY); %use the absolute value because looking for closer to zero
for k = 2:K
  [~,im] = min(sum(dotProductY(:,centroidIdx),2)); %find next best centroid
  centroidIdx(k) = im; %reassign
  dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
end

Approach #2 - Simultaneous computation and search

    %try a cheaper one?
    centroidIdx = randi(length(Y)); %initialize on a random row of Y
    for k = 1:K-1
        dotProductY(:,k) = sum(bsxfun(@times,Y(centroidIdx(k),:),Y),2); %compute inner product 
        dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
        [~,im] = min(sum(abs(dotProductY),2)); %find next best centroid
        centroidIdx(k+1) = im; %reassign
    end

Neither of these approaches seems cheap to me. Anyone else take a stab at this before? Any suggestions?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2016년 5월 19일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/285024-find-the-k-most-orthogonal-vectors-in-a-set-of-vectors#answer_222786

편집: Matt J 2016년 5월 19일

MATLAB Online에서 열기

Your computation of dotProductY could be more efficient. The most vectorized may of computing it, I believe is

dotProductY=abs(Y*Y.');

I expect that would have been the main bottleneck.

댓글 수: 2
없음 표시없음 숨기기

Matt J 2016년 5월 19일

편집: Matt J 2016년 5월 22일

MATLAB Online에서 열기

This part

for k = 2:K
    [~,im] = min(sum(dotProductY(:,centroidIdx),2)); %find next best centroid
    centroidIdx(k) = im; %reassign
    dotProductY(centroidIdx,:) = nan; %dont pick the same row twice
  end

also looks like it could be incrementalized as follows

im=randi(size(Y,1));
centroidIdx(1)=im;
temp=dotProductY(:,im);
   for k = 2:K
      [~,im] = min(temp); %find next best centroid
      centroidIdx(k) = im; %reassign
      temp=temp+dotProductY(:,im); %update temp
        temp(im) = inf; %dont pick the same row twice
    end

Peter Cook 2016년 6월 8일

Thanks for the help, I can't believe I had that boneheaded dotProductY computation in there. The algorithm runtime is still quite slow, but that, I am accepting, is to be expected for most clustering algorithms with this amount of data.

댓글을 달려면 로그인하십시오.

Find the K most orthogonal vectors in a set of vectors

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

Find the K most orthogonal vectors in a set of vectors

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기