Initial centroids selection - Kmeans

조회 수: 9 (최근 30일)
Salad Box
Salad Box 2019년 9월 26일
편집: Adam 2019년 9월 26일
Hi,
Am I allowed to choose k initial centroids that are not contained in the original data set, in another word, not using the random sampling.
For instance, in the below two graphs the middle coloured points are my original data set.
  • In the left graph, the 5 red points are the initial centroids I selected using my own method.
  • In the right graph, the initial centroids will be evenly distributed on the megenta circle. Notice that, although my original data set will all be positive numbers, some initial centroids will have negative values in this case depending on the location of the initial centroids on the circle.
I wonder whether there are any fundemental mistakes I made which I haven't been aware of yet for selecting initial centroids using above two proposed methods.
Even there are no fundermental mistakes, any disadvantages of using these two ways of selecting initial centroids?

채택된 답변

Adam
Adam 2019년 9월 26일
편집: Adam 2019년 9월 26일
doc kmeans
shows the
idx = kmeans(X,k,Name,Value)
function signature. If you look at the options for 'Name', 'Value' pairs you will see that 'Start' allows you to input your own starting positions.
As for what is a valid choice, simplest way is to try them and find out. In some cases they may not converge to where you want, in others they may do. Without random initialisation it is a 100% deterministic algorithm though so it would only be a single test to get the 1 answer in each case (although there are, of course, an infinite number of ways to place evenly distributed points around that circle)..

추가 답변 (0개)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by