Clustering 3D data based on Euclidean distance turns out insufficient

조회 수: 4 (최근 30일)
Hi,
I have 3 variables that seem to cluster very well (the color code is the amplitude of an oscillation):
vars = [acc, fb.fullscore(:,1), fc.fullscore(:,1)];
So I tried to run a hierarchical clustering:
figure(1);
clf;
Z = linkage(vars, 'ward', 'euclidean');
cutvector = Z(~isnan(Z(:,3)),3);
cutoff = median(cutvector(end-10:end,1)); %define the cutoff
dendrogram(Z, 1000, 'ColorThreshold',cutoff); %hierarchical clustering
T1 = cluster(Z, 'cutoff', cutoff, 'Criterion', 'distance'); %define clusters
set(gca,'xticklabel',[])
title('Both Hemispheres');
So then, when I plot again the scatter plot now using the cluster ID as the color code, I get something like this:
As you can see, the colors are not grouped in the expected clusters, they are more like bands in the beta dimension. The clustering looks well in a certain projection of the plot, but not in all:
Do you have any idea how can I improve my clustering? I have tried linkage using the centroid and the median method with similar results.
Thank you very much!
Sebastian

채택된 답변

Aditya
Aditya 2024년 2월 26일
Hierarchical clustering is a method that seeks to build a hierarchy of clusters based on a chosen distance metric and linkage criterion. However, the resulting clusters may not always align with the expected grouping, especially in higher-dimensional space where certain projections might not clearly show the clusters.
Here's an example of how you might implement some preprocessing and cluster validation in MATLAB:
% Standardize variables
vars_standardized = zscore(vars);
% Perform hierarchical clustering
Z = linkage(vars_standardized, 'ward', 'euclidean');
% Determine the cutoff using the inconsistency coefficient
inconsistency = inconsistent(Z);
cutoff = prctile(inconsistency(:,4), 75); % 75th percentile as an example
% Create dendrogram
figure(1);
clf;
dendrogram(Z, 1000, 'ColorThreshold',cutoff);
title('Both Hemispheres');
% Cluster assignment
T1 = cluster(Z, 'cutoff', cutoff, 'Criterion', 'distance');
% Silhouette analysis
figure(2);
silhouette(vars_standardized, T1, 'Euclidean');
title('Silhouette Analysis');
% Multi-Dimensional Scaling for visualization
distMatrix = pdist(vars_standardized);
[Y, stress] = mdscale(distMatrix, 2);
figure(3);
gscatter(Y(:,1), Y(:,2), T1);
title('MDS Plot of Clusters');
Remember that clustering is exploratory in nature, and there is no one-size-fits-all approach. It's often a good idea to combine domain knowledge with various clustering techniques to find the most meaningful groupings for your data.

추가 답변 (0개)

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by