Question about kmeans centroid

hi, i have a quick question about kmeans.
i randomly generated 1,000 number in the range of (0,1) and clustered them into 20.
however, i found the mean of each cluster is slightly different from their centroid. Why? By definition, they should be the same, right?
thanks.

 채택된 답변

Star Strider
Star Strider 2012년 7월 20일
편집: Star Strider 2012년 7월 20일

0 개 추천

I wouldn't expect them to be the same. The mean is a probability measure (the ‘expected value’ of the set) and is a linear function of the individual probabilities of the members of the set. The centroid minimizes the Euclidean (or other metric) distance between itself and the members of the set, and is not specifically a probability measure.
The ‘cityblock’ metric might approximate the mean, but there is no reason to expect any metric based on a quadratic or other nonlinear function to do so.

추가 답변 (2개)

Peter Perkins
Peter Perkins 2012년 7월 20일

0 개 추천

Rebecca, are you seeing something like this?
>> x = rand(1000,1);
>> [idx,c] = kmeans(x,20);
>> c2 = grpstats(x,idx,@mean);
>> c - c2
ans =
0
0
-1.38777878078145e-17
0
0
0
0
0
0
-1.38777878078145e-17
0
0
0
-2.77555756156289e-17
0
0
-5.55111512312578e-17
0
0
0
That is to be expected, the differences are due to different rounding errors. Consider this:
>> x = rand(1000,1);
>> ( sum(x) - sum(x(randperm(length(x)))) ) / sum(x)
ans =
-7.87959181618481e-16
which is because the sums are in different order. Same idea.
If you're seeing something else, you;ll have to provide more info. Hope this helps.
rebecca
rebecca 2012년 7월 20일

0 개 추천

thank you both

태그

질문:

2012년 7월 20일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by