Getting the data points of each cluster in kmeans.

After i perform kmeans algorithm of matlab , i get the number of the desired clusters. Now if i want to see the data points of each cluster , how will i proceed. Is there any command in matlab that would help me get that. Any suggestions would be really helpful. Thank You

 채택된 답변

KSSV
KSSV 2017년 10월 18일
편집: KSSV 2017년 10월 18일
K = 4 ; % groups
N = 5000 ;
x = rand(N,1) ;
y = rand(N,1) ;
% apply kmeans
idx = kmeans([x,y],K) ;
% get each cluster
data = cell(K,1) ;
figure
hold on
for i = 1:K
data{i} = [x(idx==i),y(idx==i)] ;
plot(x(idx==i),y(idx==i),'.')
end

댓글 수: 18

Sir if my F matrix is 35*150 and i have to cluster it into 15 groups. how will get the datas of each cluster ( for higher dimension)
CC=F'
[idx,C]=kmeans(CC,15);
for i = 1:15
data{i} = [CC(idx==i)] ;
end
K = rand(35,150) ;
% apply kmeans
CC=K' ;
[idx,C]=kmeans(CC,15);
% get each cluster
data = cell(15,1) ;
figure
hold on
for i = 1:15
data{i} = CC(idx==i,:) ;
plot(data{i},'.')
end
Thank you very much sir. I will keep this technique in mind.
KSSV
KSSV 2017년 10월 18일
Thanking is accepting the answer...
If it is just for plotting, and and you want to plot all the data points according to membership, and you want to treat the data as y coordinates, one line per coordinate, perhaps using the cluster number as z to distinguish, and you have R2015b or later, then
F = rand(35,150) ;
% apply kmeans
CC = F' ;
NC = 15;
[idx, C] = kmeans(CC, N);
%plotting
ncol = size(CC,2);
x = (1:ncol) .';
z1 = ones(1, ncol);
cmap = parula(NC);
hold on
splitapply(@(sF, clustnumber) plot3(x, sF, clustnumber(1)*z1, 'Color', cmap(clustnumber(1),:)), F.', idx, idx);
hold off
view(3)
... but I don't think it will help much. Plotting all of the points according to membership is best handled with 2 or 3 dimensions, is tolerable for 4, difficult to understand for 5 dimensions, really complicated for 6 dimensions, and almost no hope for more dimensions.
MA-Winlab
MA-Winlab 2019년 3월 15일
편집: MA-Winlab 2019년 3월 17일
@KSSV
I have a distance matrix (attached) of many rows and colomns, I will feed the upper part of it to K-means to cluster the data into K cluster (say 6). How can I modify your code to find the data points of each cluster? i.e. in your example, the matrix was of 2 colomns, but mine has more than this.
Appreciating your efforts
EDIT
I think I could find the data points that belong to each cluster, see below:
figure
hold on
for i = 1:K
data{i} = [SampleData(idx==i,idx==i)] ;
plot(SampleData(idx==i,idx==i),'.')
end
Not sure if this is correct!
Also this will store the data points in the cell array (data), but how to get them displayed in the original SampleData matrix?
See this post for an example that uses Python
KSSV
KSSV 2019년 3월 16일
편집: Image Analyst 2019년 3월 17일
[data,txt,raw] = xlsread('ONEUSER.xlsx') ;
k = 6 ;
idx = kmeans(data(:),k) ;
idx = reshape(idx,size(data)) ;
pcolor(idx) ;
colorbar
figure(2)
[nx,ny] = size(data) ;
[X,Y] = meshgrid(1:ny,1:nx) ;
scatter(X(:),Y(:),10,idx(:),'filled') ;
colorbar
Thank you @KSSV , I wish I know how to vote and accept this answer!
Really helpful
one more thing @KSSV
I would like to include only the elements that are above the diagonal (i.e. not including the 0s on the diagonal, as these may affect the clustering outcome).
I did not get the code you provided to work by flatting the upper part and trying to reshape it.
Your help is appreciated
Underneath his picture at the top left, there is a Vote button.
@Image Analyst, I already did this, but this is for the original answer. Thank you though
MA-Winlab
MA-Winlab 2019년 3월 29일
편집: MA-Winlab 2019년 3월 29일
@KSSV and @Image Analyst, I am confused on how to find the centroids (for n clusters, for example 6 clusters). kmeans returns the k cluster centroid locations in the k-by-p matrix C. And the data matrix is n-by-p data.
In this case, if I have 100 data points (nxp = 10x10), then C will be 6x10. i.e., each row will have 10 points. Which one of these (in each row) represents the centroid? please help me understanding this and correct me if I am mistaken.
Thank you
If you have a 10x10 set of points, then you have 10 data points (not 100), that are located in a 10 dimensional space, not in a 2-D (x,y) or 3-D (x,y,z) space like we all know so very well.
If you then have 6 centroids, that will be way too many for only 10 data points since most "centroids" and centroids would be only a single 10-Dimensional point, though some of the clusters would have 2 or possibly 3 points in them.
More likely is that you have 100 data points in a 100 by 2 array where column 1 is x and column 2 is y. Then you might have 6 clusters (about 16 or 17 points per cluster roughly), and each centroid would be a 1x2 (x,y) pair, so you'd have an array of 6 rows (one for each cluster) and 2 columns (for x and y). One row per centroid/cluster.
Does that explain it?
That explains it very well. And you are right, I think I picked bad numbers to explain my problem. In fact I have 275 x 275 matrix (275 points), and you are right.
and do clustering on them using kmenas with K=6 (as an example, as I am still playing with K and compute Sillhouette score to decide which k to use). In theis case the centroids will be in a matrix of 6 rows and 275 colomns, where each row represents the centroid of a cluster. How to deal with this?
Thank you
Visualizing a point in 275 dimensional space is hopeless. You are not going to be able to do much better than the code I posted above at https://www.mathworks.com/matlabcentral/answers/361878-getting-the-data-points-of-each-cluster-in-kmeans#comment_494162
MA-Winlab
MA-Winlab 2019년 3월 29일
편집: MA-Winlab 2019년 3월 29일
@Walter, thank you very much, I run the code you provided in the 2017 post. It is really clear and explain the limitations as dimensions increase.
Since we can't even visualize 275 dimensions, why do you think you have 6 centroids? What leads you to that conclusion/guess? Do you have any screenshots to explain that?
I am assuming 6 clusters. But I am working on testing with different k number and then compute sillhouette score to see which number to pick.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

shahab anjum
shahab anjum 2020년 3월 2일

0 개 추천

did we got the data as matrix form as before..? After the kmeans apply. can u help plz

댓글 수: 4

Would you please explain more
No, the first output of kmeans() is a column vector, not a 2D array. Entry #K of the column vector tells you which cluster number the K'th row of the input coordinates is associated with.
There are several examples of plotting the results, above.
dear mr MA-Winlab
i have 1000x286 matrix after kmeans i got 1000x286 matrix of clustered data in the form of 1 and 2. i want to get the data back for next work in the shape of 2D matrix 1000x286 same as clustered matrix of kmeans but when i use the above code its gives the data back but not in 2D matrix form but in matrix form with 2 cells. please help me out
dear Mr. Walter can u please explain it easy way i really so confused please

댓글을 달려면 로그인하십시오.

질문:

2017년 10월 18일

댓글:

2020년 3월 4일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by