Find Optimal Number of Cluster using Silhoutte Criterion from Scratch In MATLAB

Question

Hammad Younas 2023년 2월 15일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1912955-find-optimal-number-of-cluster-using-silhoutte-criterion-from-scratch-in-matlab

댓글: Gian23 2023년 2월 16일

ello, I Hope you are doing well. I am trying to Find optimal Number of Cluster using evalclusters with K-means and silhouette Criterion

The build in Command takes very large time to find optimal Cluster. I am implementing this method from scratch. I have the following code. The score obtained by scratch algorithm is different from build in Function

The Dataset and the build-in function in the following section. The evaluation.CriterionValues are the scores for optimal K

x =[ [0.1 0.2 0.15 0.2 0.21 ]  1+[0.1 0.2 0.15 0.2 0.21 ]];
y =[ [0.1 0.2 0.15 0.2 0.21 ]  1+[0.1 0.2 0.15 0.2 0.21 ]];
X = [x.' y.'];
dataset_len = size(X,1);
num_kmeans = 6;
%%
evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)
evaluation.CriterionValues

Here is the Code to implement this from scratch. The array_silhoutte are the scores for optimal K

array_silhoutte = zeros(1,num_kmeans);
distance_a = [];
distance_b = [];
for j=1:num_kmeans
    [cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');
    %[~,grps_11]=grp2idx(cluster_assignments);
 
 
    for i = 1:dataset_len
        distance_a = [];
        distance_b = [];
        
        current_datapoint = X(i,:);
        
        for k=1:dataset_len    
            
            if i~=k
                if  (cluster_assignments(i)== cluster_assignments(k)) 
                    dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
                    distance_a = [distance_a;dist];
                else
                    dist = pdist2( current_datapoint,X(k,:),'squaredeuclidean') ;
                   distance_b=[distance_b;dist];
                end
            end
        
        end
        
        Average_a=mean(distance_a);
        Average_b=mean(distance_b);
        
    end
    array_silhoutte(j) = (Average_b-Average_a)./max(Average_b, Average_a); 
    
end

Can anybody help me with this to equal the score for scratch and build-in-function

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Hammad Younas 2023년 2월 16일

@KSSV @Image Analyst @Walter Roberson @Jan Can you help me with this?

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Marco Riani 2023년 2월 16일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1912955-find-optimal-number-of-cluster-using-silhoutte-criterion-from-scratch-in-matlab#answer_1172890

편집: Marco Riani 2023년 2월 16일

MATLAB Online에서 열기

x =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];

y =[ [0.1 0.2 0.15 0.2 0.21 ] 1+[0.1 0.2 0.15 0.2 0.21 ]];

X = [x.' y.'];

dataset_len = size(X,1);

num_kmeans = 6;

evaluation = evalclusters(X,"kmeans","silhouette","KList",1:num_kmeans)

evaluation =

SilhouetteEvaluation with properties: NumObservations: 10 InspectedK: [1 2 3 4 5 6] CriterionValues: [NaN 0.9956 0.8842 0.7731 0.8798 0.9864] OptimalK: 2

disp("Criterion values from evalclusters")

Criterion values from evalclusters

disp(evaluation.CriterionValues)

NaN 0.9956 0.8842 0.7731 0.8798 0.9864

array_silhoutte = zeros(1,num_kmeans);

for j=1:num_kmeans

% [cluster_assignments,centroids] = kmeans(X,j,'Distance','sqeuclidean','Start','sample');

[cluster_assignments,centroids] = kmeans(X,j,'Replicates',100);

avgDWithin=zeros(dataset_len,1);

avgDBetween=Inf(dataset_len,j);

for i=1:dataset_len

for jj=1:j

boo=cluster_assignments==cluster_assignments(i);

Xsamecluster=X(boo,:);

if size(Xsamecluster,1)>1

avgDWithin(i)=sum(sum((X(i,:)-Xsamecluster).^2,2))/(size(Xsamecluster,1)-1);

end

boo1= cluster_assignments~=cluster_assignments(i);

Xdifferentcluster=X(boo1 & cluster_assignments ==jj,:);

if ~isempty(Xdifferentcluster)

avgDBetween(i,jj)=mean(sum((X(i,:)-Xdifferentcluster).^2,2));

end

% Calculate the silhouette values

minavgDBetween = min(avgDBetween, [], 2);

silh = (minavgDBetween - avgDWithin) ./ max(avgDWithin,minavgDBetween);

array_silhoutte(j) =mean(silh);

end

disp("Criterion values computed manually")

Criterion values computed manually

disp(array_silhoutte)

NaN 0.9956 0.8841 0.7731 0.8798 0.9864

I slighly rewrote your code and put Replicates',100 in the call to kmeans. Please let me know if now everything is clear. Of course kmeans does not take into account the correlation among the variables and it is not robust to the presence of atypical observations. Anyway, this is another story.

Best

Marco

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Gian23 2023년 2월 16일

Great solution!

댓글을 달려면 로그인하십시오.

Find Optimal Number of Cluster using Silhoutte Criterion from Scratch In MATLAB

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Find Optimal Number of Cluster using Silhoutte Criterion from Scratch In MATLAB

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기