Fixing the Silhouette Plot (for k-means)?
    조회 수: 6 (최근 30일)
  
       이전 댓글 표시
    
I'm working k-means clustering in MATLAB. My file has three coloumns and I have done the codes for clustering. And I need a function to measure the clustering quality, and I pick silhouette plot. I got the silhoutte code from here (and I want it shows like that): http://stackoverflow.com/questions/6644445/equivalent-of-matlabs-cluster-quality-function
And I fit it with my variables. So here it is the k-means clustering code:
     load cobat.txt;  % read the file
k=input('Enter a number: ');        % determine the number of cluster
isRand=0;   % 0 -> sequeantial initialization
            % 1 -> random initialization
[maxRow, maxCol]=size(cobat);
if maxRow<=k, 
    y=[m, 1:maxRow];
elseif k>7
    h=msgbox('cant more than 7');
else
    % initial value of centroid
    if isRand,
        p = randperm(size(cobat,1));      % random initialization
        for i=1:k
            c(i,:)=cobat(p(i),:) ; 
        end
    else
        for i=1:k
           c(i,:)=cobat(i,:);        % sequential initialization
        end
    end
      temp=zeros(maxRow,1);   % initialize as zero vector
      u=0;
      while 1,
          d=DistMatrix3(cobat,c);   % calculate the distance 
          [z,g]=min(d,[],2);      % set the matrix g group
          if g==temp,             % if the iteration doesn't change anymore
              break;              % stop the iteration
          else
              temp=g;             % copy the matrix to the temporary variable
          end
          for i=1:k
              f=find(g==i);
              if f                % calculate the new centroid 
                  c(i,:)=mean(cobat(find(g==i),:),1)
              end
          end
      end
      y=[cobat,g]
      %plot silhouette
      s = mySilhouette(cobat, g)
      [~,ord] = sortrows([g s],[1 -2]);
      indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});
      ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices);
      ytickLabels = num2str((1:K)','%d');           %#'
      h = barh(1:N, s(ord),'hist');
      set(h, 'EdgeColor','none', 'CData',IDX(ord))
      set(gca, 'CLim',[1 K], 'CLimMode','manual')
      set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels)
      xlabel('Silhouette Value'), ylabel('Cluster')
      %# compare against SILHOUETTE
      figure, silhouette(cobat,g)
Here is the DistMatrix3 function (this is used to calculate the distance)
function d=DistMatrix3(A,B)
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1 & hB==1 
   d=sqrt(dot((A-B),(A-B)));
else
   C=[ones(1,hB);zeros(1,hB);zeros(1,hB)];
   D=[zeros(1,hB);ones(1,hB);zeros(1,hB)];
   E=flipud(C);
   F=[ones(1,hA);zeros(1,hA);zeros(1,hA)];
   G=[zeros(1,hA);ones(1,hA);zeros(1,hA)];
   H=flipud(F);
     I=A*C;
     J=A*D;
     K=A*E;
     L=B*F;
     M=B*G;
     N=B*H;
     d=sqrt((I-L').^2+(J-M').^2+(K-N').^2);
  end
And here is the mySilhouette function code:
function s = mySilhouette(cobat, g)
    %# X  : matrix of size N-by-p, data where rows are instances
    %# IDX: vector of size N, cluster index of each instance (starting from 1)
    %# s  : vector of size N, silhouette score value of each instance
      N = size(cobat,1);            %# number of instances
      K = numel(unique(g));   %# number of clusters
      %# compute pairwise distance matrix
      D = squareform( pdist(cobat,'euclidean').^2 );
      %# indices belonging to each cluster
      kIndices = accumarray(g, 1:N, [K 1], @(x){sort(x)});
      %# compute a,b,s for each instance
      %# a(i): average distance from i to all other data within the same cluster.
      %# b(i): lowest average dist from i to the data of another single cluster
      a = zeros(N,1);
      b = zeros(N,1);
      for i=1:N
          ind = kIndices{g(i)}; ind = ind(ind~=i);
          a(i) = mean( D(i,ind) );
          b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=g(i))) );
      end
      s = (b-a) ./ max(a,b);
  end
Here is cobat file:
65  80  55
45  75  78
36  67  66
65  78  88
79  80  72
77  85  65
76  77  79
65  67  88
85  76  88
56  76  65
I run the code, but it's getting error for: "??? Undefined function or variable 'K'. Error in ==> clustere at 54 indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});"
I know that this is because of the K variable. But I don't have any idea what is K for. And I just can't figure it out. Anyone can help me to fix the error and make it works? You help will be much appreciated.
Thank you.
댓글 수: 2
채택된 답변
추가 답변 (1개)
참고 항목
카테고리
				Help Center 및 File Exchange에서 Debugging and Analysis에 대해 자세히 알아보기
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

