Why Kmeans function give us give different answer?

Question

Mahesh 2014년 9월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/155830-why-kmeans-function-give-us-give-different-answer

댓글: Mahesh 2014년 9월 22일

I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

José-Luis 2014년 9월 22일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/155830-why-kmeans-function-give-us-give-different-answer#answer_152568

Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.

댓글 수: 2
없음 표시없음 숨기기

Mahesh 2014년 9월 22일

MATLAB Online에서 열기

So what is the default setting then i have chosen:

rng('default');

Am I right?

Adam Filion 2014년 9월 22일

MATLAB Online에서 열기

Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:

>> doc kmeans

You can set the order of random numbers generated with the rng command:

>> doc rng

Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

댓글을 달려면 로그인하십시오.

Answer 2

Image Analyst 2014년 9월 22일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/155830-why-kmeans-function-give-us-give-different-answer#answer_152569

http://www.mathworks.com/help/stats/k-means-clustering.html

Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Mahesh 2014년 9월 22일

MATLAB Online에서 열기

Yes I do understand. However, I got different answer while it is single value of cluster like

      [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
          'replicates',8, 'display','iter');

and others inside loop like

rng('default');  % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
    [idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
        'replicates',nkmeans, 'display','iter');
    [silh,h] = silhouette(param_sac,idx);
    xlabel('Silhouette Value')
    ylabel('Cluster');
    meanh = mean(silh);
    iditer = [iditer idx];
%     cen = [cen cent];
%     sumdistitr = [sumdistitr sumdist];
    meansil = [meansil; nkmeans meanh];
    silhitr = [silhitr silh];    
end

I got totally different in classification.

Thanks for responses to all

댓글을 달려면 로그인하십시오.

Why Kmeans function give us give different answer?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Why Kmeans function give us give different answer?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기