Why does KMEANS return different results when invoked on the same input?

When I run the following code multiple times, KMEANS returns different partitions (and hence a different vector s of within-cluster sums of point-to-centroid distances) although the data matrix a is the same:
a = [0 -1 0 2 0]
[b c s] = kmeans(a,2,'distance','cityblock')
Output 1:
b =
2
2
2
1
2
c =
2
0
s =
0
1
Output2:
b =
2
1
2
2
2
c =
-1
0
s =
0
2

 채택된 답변

This is expected behavior because KMEANS by default selects the initial cluster centroid positions at random (albeit from the observations). That is, the value of the 'start' parameter is set to 'sample' as can be seen from the documentation. Another outcome you would also observe if you run your code several times is that KMEANS errors out because an empty cluster is created at the first iteration (i.e., b is all 1's or all 2's). You could always pass a matrix of initial positions as the value for the 'start' parameter, for example:
[b c s] = kmeans(a,2,'distance','cityblock','start',[0 1]')
This would yield the same result every time but since the partition returned by KMEANS highly depends on the initial centroid positions, you would probably get a sub-optimal partition (unless your provide a "lucky" vector for the 'start' parameter). The typical use of KMEANS entails setting the 'Replicates' parameter to an integer n corresponding to the number of times to repeat the clustering. KMEANS then returns the solution with the lowest value for s.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Cluster Analysis and Anomaly Detection에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by