implementing k means algorithm on spike sorting data

조회 수: 10 (최근 30일)
cameron lord
cameron lord 2021년 2월 11일
답변: Aditya Patil 2021년 2월 17일
Hi there, I am trying to implemement my own K means function without using the unbuilt function 'kmeans'.
I started with some complex waveform data and reduced the dimensionality to 2 PC and plotted on a scatter, 3 distinct clusters emerge.
to do k means first i set random centroids within the range of the data - e.g.
k=3
%state the number of clusters%
centroids = min(wav_pca) + (max(wav_pca)-min(wav_pca)).* rand(k,1) %create random centroids in the range of test data%
scatter(wav_pca(:,1),wav_pca(:,2))
hold on
scatter(centroids(:,1),centroids(:,2),'x');
hold off
this gives me starting centroids - howevr i don't this the distribution is as random as i'd like.
then I have to compute the euclidean distance from each point to a centroid and assign it to the one with the shortest distance
for j=1:k
for i=1:length(wav_pca)
distance=sqrt( (centroids(j,1)- wav_pca(i,1))^2 + (centroids(j,2)- wav_pca(i,2)^2) )
end
end
for this I tried to use this for loop but it's not creating the matrix of distances that I need.
then each point must be assigned to it's closest centroid, giving it a cluster ID
the cluster centroids need to be recomputed as an average of all the assigned points and the points reassigned, this needs to be iterated though until the assignments change and I am unsure how to do this.
thanks for all that you can help with, if you need any more info let me know, and apologies for being new to matlab.

채택된 답변

Aditya Patil
Aditya Patil 2021년 2월 17일
Note that the parenthesis is wrong for the second part of the equation. The square is to be taken of the y1 - y2 term, and not just y2(wav_pca in your case).
The correct equation would be
sqrt((centroids(j,1) - wav_pca(i,1))^2 + (centroids(j,2) - wav_pca(i,2))^2)
You can further simplify the code by using vectorization as follows
sqrt((centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2)
This will calculate the distance for all centroids, and not just one point at at time. You can also do it other way around, taking distance for all points at a time for each centroid.
Further, the sqrt is unnecessary, as you are only interested in the relative distance, and not the exact value.
(centroids(:,1) - wav_pca(i,1)).^2 + (centroids(:,2) - wav_pca(i,2)).^2

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Cluster Analysis and Anomaly Detection에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by