필터 지우기
필터 지우기

K-mode clustering algorithm to cluster categorical data?

조회 수: 9 (최근 30일)
Dankur Mcgoo
Dankur Mcgoo 2018년 8월 10일
댓글: Image Analyst 2018년 8월 12일
Has anyone come across k-mode script in the Matlabsphere? I've seen people respond with links to supervised learning algos, but I need unsupervised. Even a pseudo code would be okay, so I can build it.
I'm using R2017b.
Really trying to avoid using R..

답변 (1개)

Image Analyst
Image Analyst 2018년 8월 11일
I can't imagine why you'd use kmeans with categorical data. If it's categorical you can simply just use the category to classify the data point, right?
  댓글 수: 4
Dankur Mcgoo
Dankur Mcgoo 2018년 8월 12일
편집: Image Analyst 2018년 8월 12일
I apologize for not clearly stating my question/issue. I was hoping just for some one having come across k-mode script, but I'll try to pose my question better.
I think this analogy is similar enough to my data set. I have 200 questionnaires, and within each questionnaire I have 40 questions that are categorical. I would like to cluster them such that similar questionnaires cluster together. So even if 1-2 questions were answered different, the distance measure would not be too large between those two data points.
How my question differs from what you replied, which perhaps my interpretation is wrong, but I can't simply cluster the questionnaire based on an arbitrary question (i.e just Question 1, or just the car makers)-- I need to consider all of them.
k-means is appropriate for numerical data. There is no way of translating my categorical data into meaningful numeric data. They are currently numeric in my matrix, but consecutive numbers are not related and thus any distance measure is meaningless.
Does that make more sense?
I've found this, https://shapeofdata.wordpress.com/2014/03/04/k-modes/, which may seem to be of use -- and this is what I am looking to try? I just would rather avoid having to code it myself because of time constraints.
I would also entertain any other suggestion of data clustering. I am not sold on k-mode.
Image Analyst
Image Analyst 2018년 8월 12일
I'm not an expert on questionnaires, though we have many statisticians in our company who spend their whole lives doing that. I'd suggest you try the Classification Learner app, and pick the best one. Check out this page https://www.mathworks.com/help/stats/machine-learning-in-matlab.html. You have unsupervised learning because you have data but no ground truth - you don't know the classes/groupings of any of them in advance.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by