I'm working on unsupervised classification or clustering, i want to estimate the K (which refers to cluster number) before starting th k-means algorithm

 채택된 답변

Walter Roberson
Walter Roberson 2016년 5월 16일

0 개 추천

You will probably not find any code already implemented for this purpose.
The theoretical answer for the "best" number of clusters to use is "one cluster for every unique point", as that will always have the best possible fit.
If you do not wish to use one cluster for every unique point, you need to have some kind of penalty term that favors fewer clusters. I read through the theory paper on that a few years ago, and it was clear to me that they were setting the weights arbitrarily (but usefully for the kinds of clustering they were doing), and that there was no way to calculate what the weights should be without some knowledge of the range of number of clusters that would be appropriate for the physical system being examined. The theoretical algorithms were not suitable for "unsupervised learning", only for "supervised learning". The work we were doing at the time required unsupervised learning, so there was no way for us to determine what the proper number of clusters should be.

추가 답변 (3개)

the cyclist
the cyclist 2016년 5월 15일

0 개 추천

This is not really a MATLAB question, but rather a general data science question.
Googling "how to choose k in k means" found this Wikipedia page on the topic (and many others) that might help you.

댓글 수: 4

wisekily
wisekily 2016년 5월 15일
there are several methods i want to find the implemented ones, i don't have enough time to spend on this implementation
the cyclist
the cyclist 2016년 5월 15일
편집: the cyclist 2016년 5월 15일
I don't know of any methods in MATLAB to help you choose K, other than plotting results post hoc to see how different choices of K did. See, for example, this page.
Image Analyst
Image Analyst 2016년 5월 15일
There are MATLAB functions for estimating the best k. I don't remember what they were - I'd have to look them up in the Machine Learning course notes.
wisekily
wisekily 2016년 5월 15일
I'm waiting for your answer

댓글을 달려면 로그인하십시오.

Image Analyst
Image Analyst 2016년 5월 15일

0 개 추천

The web page on kmeans explains how you can use silhouette() to determine the best number of clusters, k:

댓글 수: 3

wisekily
wisekily 2016년 5월 16일
still looking and waiting for an answer !!
Walter Roberson
Walter Roberson 2016년 5월 16일
Did you read through the link that Image Analyst posted?
the cyclist
the cyclist 2016년 5월 16일
Which is also the same link that I pointed you to earlier. So, uh, now you have 3 of the top 10 contributors to this forum telling you consistently the same thing.

댓글을 달려면 로그인하십시오.

kira
kira 2019년 5월 2일

0 개 추천

old question, but I just found a way myself looking at matlab documentation:
klist=2:n;%the number of clusters you want to try
myfunc = @(X,K)(kmeans(X, K));
eva = evalclusters(net.IW{1},myfunc,'CalinskiHarabasz','klist',klist)
classes=kmeans(net.IW{1},eva.OptimalK);

카테고리

도움말 센터File Exchange에서 Cluster Analysis and Anomaly Detection에 대해 자세히 알아보기

질문:

2016년 5월 15일

댓글:

2019년 7월 12일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by