How to change color of the data on biplot by the result of clustering

조회 수: 38 (최근 30일)
徹也 長島
徹也 長島 2022년 11월 11일
댓글: Adam Danz 2022년 11월 11일
Hello, everyone
I'm a begginer of Matlab.
I wanna show the result of the clustering with PCA and biplot.
But, I don't know how to change the color of the data on biplot by the result of clustering. In the picture, the color of the data is only red. I wanna separate the data into 3 colors(because the number of clusters is 3).
Could you tell me your idea?
D=readmatrix("Test.xlsx");
[coeff,score,latent]=pca(D);
[idx,H,sumd]=kmeans(D,3,MaxIter=1000,Display="final",Replicates=5);
Replicate 1, 10 iterations, total sum of distances = 10012.1. Replicate 2, 12 iterations, total sum of distances = 10011.4. Replicate 3, 11 iterations, total sum of distances = 10012.1. Replicate 4, 10 iterations, total sum of distances = 10012.1. Replicate 5, 13 iterations, total sum of distances = 10017.1. Best total sum of distances = 10011.4
vbls = {'Depth','Sample','Ping','sea bottom mean','Length','Height','Perimeter','Area','BAmean','TAmean','Elongation','UNEVENNESS1','UNEVENNESS"','Lectangularity','Fractual demensiton','Circularity'};
figure
biplot(coeff(:,1:3),'scores',score(:,1:3),"VarLabels",vbls)

답변 (1개)

Atsushi Ueno
Atsushi Ueno 2022년 11월 11일
편집: Atsushi Ueno 2022년 11월 11일
For attached data, the output of biplot function becomes like below.
The graphic handle "h" in this example contains 104 object handles.
  • Handles h(1:16) correspond to line handles for the three variables.
  • Handles h(17:32) correspond to marker handles for the three variables.
  • Handles h(33:48) correspond to text handles for the three variables.
  • Handles h(49:1012) correspond to line handles for the observations.
  • The last handle h(1013) corresponds to a line handle for the axis lines.
Also, "Cluster indices" (idx) which is one of output of kmeans function, is used as color index.
But there is a drawback that these values (from 1 to 3 in this case) change every time they are executed.
D=readmatrix("https://jp.mathworks.com/matlabcentral/answers/uploaded_files/1188973/Test.xlsx");
[coeff,score,latent]=pca(D);
[idx,H,sumd]=kmeans(D,3,MaxIter=1000,Display="final",Replicates=5);
Replicate 1, 14 iterations, total sum of distances = 10080.8. Replicate 2, 14 iterations, total sum of distances = 10804.3. Replicate 3, 9 iterations, total sum of distances = 10014.8. Replicate 4, 12 iterations, total sum of distances = 10796.3. Replicate 5, 8 iterations, total sum of distances = 11103.3. Best total sum of distances = 10014.8
vbls = {'Depth','Sample','Ping','sea bottom mean','Length','Height','Perimeter','Area','BAmean','TAmean','Elongation','UNEVENNESS1','UNEVENNESS"','Lectangularity','Fractual demensiton','Circularity'};
figure
h = biplot(coeff(:,1:3),'scores',score(:,1:3),"VarLabels",vbls); % output h has been added
% added from here
xlim([-0.1 0.5]); ylim([-0.1 0.5]); zlim([-0.5 0.3]); % to make it look good
color = 'rgb'; % just for this example
for k = 1:size(D,1)
h(k + size(D,2)*3).MarkerEdgeColor = color(idx(k)); % chenge the color of data
end
  댓글 수: 1
Adam Danz
Adam Danz 2022년 11월 11일
I would encourage you to investigate this approach further by using a simpler data set with fewer points so you can see what's going on and confirm that this is what you want to do.
The demo below plots the results twice using the same data and same exact code but the results differ. This is because kmeans uses a random starting point so the grouping indices will likely differ each time you run it.
Load and compute data
load carsmall
X = [Acceleration Displacement Horsepower MPG Weight];
X = rmmissing(X);
Z = zscore(X); % Standardized data
Plot the results
[coefs,score] = pca(Z);
nClusters = width(Z);
[idx,H,sumd]=kmeans(Z,nClusters,MaxIter=1000,Replicates=5);
figure()
h = biplot(coefs(:,1:2),'Scores',score(:,1:2));
% Change color of varlines and observations according to kmeans results
colors = lines(width(Z));
tags = {h.Tag};
observationHandles = h(strcmp(tags, 'obsmarker'));
for i = 1:nClusters
h(i).Color = colors(i,:);
h(i).LineWidth = 2;
set(observationHandles(idx==i), 'Color', colors(i,:))
end
set(observationHandles, 'MarkerSize', 12)
Copy-pasted from the block above to plot this again
[coefs,score] = pca(Z);
nClusters = width(Z);
[idx,H,sumd]=kmeans(Z,nClusters,MaxIter=1000,Replicates=5);
figure()
h = biplot(coefs(:,1:2),'Scores',score(:,1:2));
% Change color of varlines and observations according to kmeans results
colors = lines(width(Z));
tags = {h.Tag};
observationHandles = h(strcmp(tags, 'obsmarker'));
for i = 1:nClusters
h(i).Color = colors(i,:);
h(i).LineWidth = 2;
set(observationHandles(idx==i), 'Color', colors(i,:))
end
set(observationHandles, 'MarkerSize', 12)

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

제품


릴리스

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by