How to avoid uncertainty in processing result of MATLAB Statistics Toolbox

조회 수: 3 (최근 30일)
I’m annoyed with the uncertainty of the processing result of my MATLAB program. My codes are as follows.
%-----------------------------
clear all; close all;
a = [0.3948 0.4644 0.4412 0.6270 0.6270 0.1626];
[idx c] = kmeans(a,2)
rate = c(1)/c(2)
%-----------------------------
I ran this program several times and found the results were quite interesting. Although the data set to be processed was determinate, the processing results could be different each time. I found there were at least four groups of answers.
%-----------------------------
idx = 1 1 1 2 2 1 c = 0.3658 0.6270 rate = 0.5833
idx = 1 1 1 1 1 2 c = 0.5109 0.1626 rate = 3.1419
idx = 2 2 2 1 1 2 c = 0.6270 0.3658 rate = 1.7143
idx = 2 2 2 2 2 1 c = 0.1626 0.5109 rate = 0.3183
%-----------------------------
Can anybody help me on how to avoid this uncertainty? BTW, my MATLAB version is R2008a.
Thank you in advance for any response.
Best regards,
Jean

채택된 답변

Mahmoud Hammoud
Mahmoud Hammoud 2011년 2월 24일
This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., idx is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:
[idx c] = kmeans(a,2,'start',[0 0.5]')
This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the partition with the lowest sum, over all clusters, of the within-cluster sums of point-to-cluster-centroid distances.
  댓글 수: 1
jean young
jean young 2011년 2월 25일
Thank you very much! I have modified my program and the problem is solved.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Random Number Generation에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by