필터 지우기
필터 지우기

Fixing the Silhouette Plot (for k-means)?

조회 수: 5 (최근 30일)
Alvi Syahrin
Alvi Syahrin 2013년 5월 6일
I'm working k-means clustering in MATLAB. My file has three coloumns and I have done the codes for clustering. And I need a function to measure the clustering quality, and I pick silhouette plot. I got the silhoutte code from here (and I want it shows like that): http://stackoverflow.com/questions/6644445/equivalent-of-matlabs-cluster-quality-function
And I fit it with my variables. So here it is the k-means clustering code:
load cobat.txt; % read the file
k=input('Enter a number: '); % determine the number of cluster
isRand=0; % 0 -> sequeantial initialization
% 1 -> random initialization
[maxRow, maxCol]=size(cobat);
if maxRow<=k,
y=[m, 1:maxRow];
elseif k>7
h=msgbox('cant more than 7');
else
% initial value of centroid
if isRand,
p = randperm(size(cobat,1)); % random initialization
for i=1:k
c(i,:)=cobat(p(i),:) ;
end
else
for i=1:k
c(i,:)=cobat(i,:); % sequential initialization
end
end
temp=zeros(maxRow,1); % initialize as zero vector
u=0;
while 1,
d=DistMatrix3(cobat,c); % calculate the distance
[z,g]=min(d,[],2); % set the matrix g group
if g==temp, % if the iteration doesn't change anymore
break; % stop the iteration
else
temp=g; % copy the matrix to the temporary variable
end
for i=1:k
f=find(g==i);
if f % calculate the new centroid
c(i,:)=mean(cobat(find(g==i),:),1)
end
end
end
y=[cobat,g]
%plot silhouette
s = mySilhouette(cobat, g)
[~,ord] = sortrows([g s],[1 -2]);
indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});
ytick = cellfun(@(ind) (min(ind)+max(ind))/2, indices);
ytickLabels = num2str((1:K)','%d'); %#'
h = barh(1:N, s(ord),'hist');
set(h, 'EdgeColor','none', 'CData',IDX(ord))
set(gca, 'CLim',[1 K], 'CLimMode','manual')
set(gca, 'YDir','reverse', 'YTick',ytick, 'YTickLabel',ytickLabels)
xlabel('Silhouette Value'), ylabel('Cluster')
%# compare against SILHOUETTE
figure, silhouette(cobat,g)
Here is the DistMatrix3 function (this is used to calculate the distance)
function d=DistMatrix3(A,B)
[hA,wA]=size(A);
[hB,wB]=size(B);
if hA==1 & hB==1
d=sqrt(dot((A-B),(A-B)));
else
C=[ones(1,hB);zeros(1,hB);zeros(1,hB)];
D=[zeros(1,hB);ones(1,hB);zeros(1,hB)];
E=flipud(C);
F=[ones(1,hA);zeros(1,hA);zeros(1,hA)];
G=[zeros(1,hA);ones(1,hA);zeros(1,hA)];
H=flipud(F);
I=A*C;
J=A*D;
K=A*E;
L=B*F;
M=B*G;
N=B*H;
d=sqrt((I-L').^2+(J-M').^2+(K-N').^2);
end
And here is the mySilhouette function code:
function s = mySilhouette(cobat, g)
%# X : matrix of size N-by-p, data where rows are instances
%# IDX: vector of size N, cluster index of each instance (starting from 1)
%# s : vector of size N, silhouette score value of each instance
N = size(cobat,1); %# number of instances
K = numel(unique(g)); %# number of clusters
%# compute pairwise distance matrix
D = squareform( pdist(cobat,'euclidean').^2 );
%# indices belonging to each cluster
kIndices = accumarray(g, 1:N, [K 1], @(x){sort(x)});
%# compute a,b,s for each instance
%# a(i): average distance from i to all other data within the same cluster.
%# b(i): lowest average dist from i to the data of another single cluster
a = zeros(N,1);
b = zeros(N,1);
for i=1:N
ind = kIndices{g(i)}; ind = ind(ind~=i);
a(i) = mean( D(i,ind) );
b(i) = min( cellfun(@(ind) mean(D(i,ind)), kIndices([1:K]~=g(i))) );
end
s = (b-a) ./ max(a,b);
end
Here is cobat file:
65 80 55
45 75 78
36 67 66
65 78 88
79 80 72
77 85 65
76 77 79
65 67 88
85 76 88
56 76 65
I run the code, but it's getting error for: "??? Undefined function or variable 'K'. Error in ==> clustere at 54 indices = accumarray(g(ord), 1:k, [K 1], @(x){sort(x)});"
I know that this is because of the K variable. But I don't have any idea what is K for. And I just can't figure it out. Anyone can help me to fix the error and make it works? You help will be much appreciated.
Thank you.
  댓글 수: 2
José-Luis
José-Luis 2013년 5월 6일
편집: José-Luis 2013년 5월 6일
Have you tried using the debugger?
doc dbstop
What's the value of K when the code fails?
Alvi Syahrin
Alvi Syahrin 2013년 5월 7일
I don't understand why I have to use doc dbstop? See my answer below, I have edited the variables according to my code. But it's still error. Your help will be appreciated, thank you, Jose.

댓글을 달려면 로그인하십시오.

채택된 답변

Alvi Syahrin
Alvi Syahrin 2013년 5월 8일
This problem is solved. If you guys have a similiar problem, look at this link: http://stackoverflow.com/questions/16399645/fix-silhouette-plot-for-k-means

추가 답변 (1개)

Alvi Syahrin
Alvi Syahrin 2013년 5월 7일
Now I have edited the variables according to my code. K becomes k. N becomes maxRow. IDX becomes g. But now I got another error.
"??? Error using ==> accumarray Second input VAL must be a vector with one element for each row in SUBS, or a scalar.
Error in ==> clustere at 56 indices = accumarray(g(ord), 1:k, [k 1], @(x){sort(x)});"
You guys have any idea?

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by