Clustering - different size clusters
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
I have a pretty large matrix of data which I want to cluster against the first column which can be separated into six clusters / categories of different sizes. I know the k means clustering algorithm allows input of number of clusters but allows those to be determined iteratively. Is there anything on MATLAB which would be suitable for my task?
채택된 답변
Image Analyst
2015년 10월 29일
Yes, silhouette() lets you graphically judge the quality of the clustering produced by kmeans(). evalclusters() lets to evaluate the quality of the clustering achieved with a range of k values so you can pick the right k if you don't know it for certain.
% Try values of k 2 through 5
clustev = evalclusters(X, 'kmeans', 'silhouette', 'KList', 2:5);
% Get the best one value for k:
kBest = clustev.OptimalK
댓글 수: 6
Bran
2015년 11월 4일
Thank you Image Analyst. I also wanted to ask you if you had experience of validating data that has already been clustered. I am reading lots of conflicting stuff about how this should be approached. I was hoping to produce p values for the clusters to say if they are real or not but I am not sure if this would be a sensible approach
An observation's silhouette value is a normalized (between -1 and 1) measure of how close the observation is to others in the same cluster, compared to observations in other clusters. Looking at the shape of the curves it generates can tell you how good the clusters are.
You can also use hierarchical clustering with linkage(), dendrogram(), and cluster() to see how close the various clusters are to each other.
Z = linkage(X);
dendrogram(Z);
You can divide the observations into groups, according to teh linkage distances Z:
grp = cluster(Z, 'maxclust', 6);
With the maxclust criterion, the observations are assigned to mo more than the given number of groups.
To examine the quality of the hierarchical structure, you can determine the Cophenetic correlation coefficient, which quantifies how accurately the tree represents the distances (dissimilarities) between the observations. The cophenet() function requires the linkage() distances and the pairwise distances between the points as input arguments
Y = pdist(X)
C = cophenet(Z, Y);
Values of C close to 1 indicate a high quality solution (similar to a linear correlation coefficient). I'm guessing this is what you would like.
Hi,
Thank you for the suggestions. Just wanted to note that the data has already been seperated into groups of different sizes and in some cases they have been assigned as opposed to clustered via an algorithm. As a result I was thinking maybe hypothesis testing would be appropriate. I am currently looking at the linkage values etc for my clusters. Also I was wondering, as in some cases it is unclear where there is a cluster at all even though they have been grouped together whether it would be OK to do a ttest(). For example I was considering testing to see if the values from the group are simply random are if they do indeed differ from the normally distributed data and produce a p value that way. The other method I have worked with is generating the p value via monte carlo sampling
No - I don't believe so. I'm not a Ph.D. statistician but I'm pretty sure you would not use ttest2() to create your model. The function you want to use if your scattered points are normally spaced/distributed is the fitcnb() function to create a Naive Bayes Classification. The Naive Bayes Classification was one of the first formal classification algorithms and remains on of the most popular methods. Its popularity is primarily due to the ease of constructing the classifier and largely due to its interpretable output. Naive Bayes classification models are based on Baye's rule of conditional probability. During the training step, the model estimates the parameters of a normal probability distribution, assuming the features are independent of one another within each class.
nbModel = fitcnb(xTrain, yTrain);
To estimate the class of some non-training data:
yPredicted = predict(bnModel, xTest);
To compare data with a standard probability distribution, a probability plot can be used as a simple visual check:
probplot('normal', xTrain);
If the points fall close to the line, it's normal, if not, it's not normal.
Also look up jbtest(), lillietest(), and kstest() - they all deal with testing data for normality.
Bran
2015년 11월 6일
Thank you very much Image Analyst for all your help and advice. I've been looking at the various features offered by MATLAB and it is very useful. Just a final quick question, does MATLAB have a Mann-Whitney test that also accounts for clusters? For example comparing the distribution of two groups that may have several clusters within them?
Image Analyst
2015년 11월 6일
This is all I could find:
p = ranksum(x,y) returns the p-value of a two-sided Wilcoxon rank sum test. ranksum tests the null hypothesis that data in x and y are samples from continuous distributions with equal medians, against the alternative that they are not. The test assumes that the two samples are independent. x and y can have different lengths. This test is equivalent to a Mann-Whitney U-test.
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Naive Bayes에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
