Understanding Gaussian Mixture Models

조회 수: 4 (최근 30일)

Faraz 2014년 9월 23일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/155944-understanding-gaussian-mixture-models

댓글: Image Analyst 2014년 9월 23일

I'm trying to understand GMM by reading the sources available online. I have achieved clustering using K-Means and was seeing how GMM would compare to K-means.

Here is what I have understood, please let me know if my concept is wrong:

GMM is like KNN, in the sense that clustering is achieved in both cases. But in GMM each cluster has their own independent mean and covariance. Furthermore k-means performs hard assignments of data points to clusters whereas in GMM we get a collection of independant gaussian distributions, and for each data point we have a probability that it belongs to one of the distributions.

To understand it better I have used MatLab to code it and achieve the desired clustering. I have used SIFT features for the purpose of feature extraction. And have used k-means clustering to initialize the values. (This is from the VLFeat documentation)

%images is a 459 x 1 cell array where each cell contains the training image
[locations, all_feats] = vl_dsift(single(images{1}), 'fast', 'step', 50); %all_feats will be 128 x no. of keypoints detected
for i=2:(size(images,1))
    [locations, feats] = vl_dsift(single(images{i}), 'fast', 'step', 50);
    all_feats = cat(2, all_feats, feats); %cat column wise all features
end
numClusters = 50; %Just a random selection.
% Run KMeans to pre-cluster the data
[initMeans, assignments] = vl_kmeans(single(all_feats), numClusters, ...
    'Algorithm','Lloyd', ...
    'MaxNumIterations',5);
initMeans = double(initMeans); %GMM needs it to be double
% Find the initial means, covariances and priors
for i=1:numClusters
    data_k = all_feats(:,assignments==i);
    initPriors(i) = size(data_k,2) / numClusters;
      if size(data_k,1) == 0 || size(data_k,2) == 0
          initCovariances(:,i) = diag(cov(data'));
      else
          initCovariances(:,i) = double(diag(cov(double((data_k')))));
      end
  end
% Run EM starting from the given parameters
[means,covariances,priors,ll,posteriors] = vl_gmm(double(all_feats), numClusters, ...
    'initialization','custom', ...
    'InitMeans',initMeans, ...
    'InitCovariances',initCovariances, ...
    'InitPriors',initPriors);

Based on the above I have means, covariances and priors. My main question is, What now? I am kind of lost now.

Also the means, covariances vectors are each of the size 128 x 50. I was expecting them to be 1 x 50 since each column is a cluster, wont each cluster have only one mean and covariance? (I know 128 are the SIFT features but I was expecting means and covariances).

In k-means I used the the MatLab command knnsearch(X,Y) which basically finds the nearest neighbour in X for each point in Y.

So how to achieve this in GMM, I know its a collection of probabilities, and ofcourse the nearest match from that probability will be our winning cluster. And this is where I am confused. All tutorials online have taught how to achieve the means, covariances values, but do not say much in how to actually use them in terms of clustering.

Thank you

댓글 수: 2
없음 표시없음 숨기기

Adam 2014년 9월 23일

편집: Adam 2014년 9월 23일

I don't have a clue what Sift is, but if it produces 128 'features' and you have 50 clusters then each cluster presumably has a mean in each feature.

When I did a GMM I was doing it on multi-dimensional images so the Gaussians in question were n-dimensional for n images as they had a value in each image for any given location.