필터 지우기
필터 지우기

how to use testing data to validate kmeans?

조회 수: 1 (최근 30일)
Mnr
Mnr 2014년 3월 22일
댓글: Mnr 2014년 3월 23일
Hello there,
I have some data in 8 text files. I would like to classify the similar ones into same classes. I am using k-means for now. I would like to have 5 of the files as training and 3 of them for testing. I have used kmeans command to have k classes, however, I do not know how to validate my results. In other words, I do not know how to use my testing data to calculate the error? I would appreciate if somebody help me. Thanks in advance.

채택된 답변

Image Analyst
Image Analyst 2014년 3월 23일
If you do not know the "ground truth" of your data then there's no way to tell if it's "wrong". The only thing you can do (I think) is to classify your "unknown" data and measure how far off your data are from the means of the classes. For example, let's say you had a cluster of data "class#1" around 30 +/- 5, and you had a second cluster "class#2" at 100+/-20. So you run kmeans with 2 classes and it tells you about those two classes, with the mean at 30 and 100. Now you have a data point in the "non-training" set of data and it has a value of 70. So you can say that the 65 belongs to class#2 and it's 40 from class#1 and 30 from class#2. You can do the same for all other data in your test sets.
  댓글 수: 3
Image Analyst
Image Analyst 2014년 3월 23일
To accurately get the error you have to know the tru e values, don't you? And you don't know those. So all you have is a guess.
Mnr
Mnr 2014년 3월 23일
Thanks!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by