필터 지우기
필터 지우기

Why Kmeans function give us give different answer?

조회 수: 6 (최근 30일)
Mahesh
Mahesh 2014년 9월 22일
댓글: Mahesh 2014년 9월 22일
I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

채택된 답변

José-Luis
José-Luis 2014년 9월 22일
Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.
  댓글 수: 2
Mahesh
Mahesh 2014년 9월 22일
So what is the default setting then i have chosen:
rng('default');
Am I right?
Adam Filion
Adam Filion 2014년 9월 22일
Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:
>> doc kmeans
You can set the order of random numbers generated with the rng command:
>> doc rng
Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Image Analyst
Image Analyst 2014년 9월 22일
Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.
  댓글 수: 1
Mahesh
Mahesh 2014년 9월 22일
Yes I do understand. However, I got different answer while it is single value of cluster like
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',8, 'display','iter');
and others inside loop like
rng('default'); % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',nkmeans, 'display','iter');
[silh,h] = silhouette(param_sac,idx);
xlabel('Silhouette Value')
ylabel('Cluster');
meanh = mean(silh);
iditer = [iditer idx];
% cen = [cen cent];
% sumdistitr = [sumdistitr sumdist];
meansil = [meansil; nkmeans meanh];
silhitr = [silhitr silh];
end
I got totally different in classification.
Thanks for responses to all

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by