Hi Everyone, can someone help me on how to use the K-mean clustering or perhaps share with me the suitable coding use to cluster wind speed data. I hava wind speed data in the form of Latitude, Longitude, Wind Speed. I want to cluster the data into 3 groups.

 채택된 답변

Image Analyst
Image Analyst 2021년 11월 12일

0 개 추천

If you have all the lat and lon values, then just put each into kmeans separately:
numColumns = 26; % Or however many columns you know there to be.
[xIndexes, xCentroids] = kmeans(lon, numColumns);
numRows = 50; % Or however many rows you know there to be.
[yIndexes, yCentroids] = kmeans(lat, numRows);
The values of the columns (x or longitude values) will be in xCentroids.
The values of the rows (y or lat values) will be in yCentroids.

댓글 수: 16

MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 12일
So there will be different centroids based on the Lat Long ? or it will produce the same centroids ?
Image Analyst
Image Analyst 2021년 11월 12일
Youi will have different locations. Imagine that you ran lines through every column and every row of your spots. Doing this will get you the x locations of every column, and the y location of every row. Isn't that what you want to achieve with kmeans?
Image Analyst
Image Analyst 2021년 11월 14일
, it doesn't make sense. Why is Y1 random? And why is Y1 a row vector while X1 is a column vector? Even if Y1 was also a column vector, it doesn't make sense to cluster random data.
And where is K in your kmeans() call? You read in the badly-named "k" but don't even consider it when you're doing kmeans? Did you realize you're calling kmeans without your data???
I would have fixed it for you but I realized I don't know what each row of k represents.
% Demo by Image Analyst
clc; % Clear the command window.
close all; % Close all figures (except those of imtool.)
clear; % Erase all existing variables. Or clearvars if you want.
workspace; % Make sure the workspace panel is showing.
format long g;
format compact;
fontSize = 22;
% Read in data
k = readmatrix('WIND_26YEARS.csv');
% Plot raw data
subplot(3, 1, 1);
plot(k, 'b-')
grid on;
xlabel('index', 'FontSize',fontSize);
ylabel('Value of k', 'FontSize',fontSize)
title('All the k Values', 'FontSize',fontSize)
% Plot histogram of k data.
subplot(3, 1, 2);
histogram(k);
grid on;
xlabel('k', 'FontSize',fontSize);
ylabel('Count', 'FontSize',fontSize)
title('Distribution of k. Note no clusters!', 'FontSize',fontSize)
% Original poster's (bad) code below:
subplot(3, 1, 3);
X1=(1:6943)';
Y1=randn(6943,1);
numClusters=3;
idx1=kmeans([X1, Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1
hold on;
end
end
hold off;
xlabel('X1', 'FontSize',fontSize);
ylabel('Y1', 'FontSize',fontSize)
title('Clusters are in different colors', 'FontSize',fontSize)
grid on;
g = gcf;
g.WindowState = 'maximized'
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 15일
Okay, thanks a lot for your help .. I tried to modified which part that I want to use and also suitable for my data. I tried a few coding because this is also my first time use Matlab for CLustering.
Image Analyst
Image Analyst 2021년 11월 15일
@MAT NIZAM UTI, I can continue to help but you'd have to tell me how to split apart your data. There might be clusters but until your data is organized correctly they may not be evident. Again, what does each row of your data represent? Can it be divided evenly into a number of subsets? Like morning and evening windspeeds, or by month or something?
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 16일
편집: MAT NIZAM UTI 2021년 11월 16일
Since my data is gridded, the actual format for my wind speed data is
Lat1, Long1, Wind Speed Value2
Lat2, Long2, Wind Speed Value2
and continue until last latitude and longitude
So the value of wind speed is actually the monthly average of wind speed on each latitude and longitude during the northeast monsoon. So one point of location (Latitude, Longitude) only consist with one value of wind speed as shown in the excel data. But I only provide the wind speed value.
So the idea of the clustering (K-mean) is to produce 3 clusters of data based on the wind speed value.
Image Analyst
Image Analyst 2021년 11월 16일
Then why is the data not a multiple of 3?
k = readmatrix('WIND_26YEARS.csv');
lats = k(1:3:end); % 2315 long
lons = k(2:3:end); % 2314 long
speeds = k(3:3:end); % 2314 long
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 16일
Do you mean 'multiple of 3' is refer to wind speed value times 3 (Wind speed x 3) or refer to the 3 column ?
Image Analyst
Image Analyst 2021년 11월 16일
If the format is as you said, all arrays (lats, lons, and speeds) should be the same length, right? Why is lats one element longer?
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 17일
Based on the data i has shared with you, both lattitude and longitude have the same number of column which is 3527 number of rows.
Image Analyst
Image Analyst 2021년 11월 17일
I think you uploaded a different data file than you think. Look what happens when I run this code:
% Read in data
k = readmatrix('WIND_26YEARS.csv');
k = readmatrix('WIND_26YEARS.csv');
lats = k(1:3:end); % 2315 long
lons = k(2:3:end); % 2314 long
speeds = k(3:3:end); % 2314 long
whos k
whos lats
whos lons
whos speeds
Name Size Bytes Class Attributes
k 6943x1 55544 double
Name Size Bytes Class Attributes
lats 2315x1 18520 double
Name Size Bytes Class Attributes
lons 2314x1 18512 double
Name Size Bytes Class Attributes
speeds 2314x1 18512 double
As you can see, k is not a multiple of 3 so lats is one element longer than the other two. Why is that?
Moreover, the wind speeds are practically the same value as lats and lons (they are all around values 0-8), which is suspicious unless you measured the wind near the north pole. Please attach the actual data.
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 18일
편집: MAT NIZAM UTI 2021년 11월 18일
https://drive.google.com/drive/folders/1tFOl0ZHQo4XzB-VGvi-LBLPg_lERG98u?usp=sharing The reason why the wind speeds value are approximately arround 0-8 because the location is at the equator region and generally the equator region recieved less wind compare with northern and southern hemisphere. Here i attach the actual data.
Image Analyst
Image Analyst 2021년 11월 18일
But what about this question that you didn't answer:
As you can see, k is not a multiple of 3 so lats is one element longer than the other two. Why is that?
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 18일
편집: MAT NIZAM UTI 2021년 11월 18일
I have tried run this coding
% Read in data
%k = xlsread('ACTUAL DATA_WIND SPEED.csv');
k = xlsread('AVERAGE_WINDSPEEDS_1.csv');
lats = k(1:end,1); % 2315 long
lons = k(2:end,2); % 2314 long
speeds = k(3:end,3); % 2314 long
whos k
whos lats
whos lons
whos speeds
But, when I compare the values of lats, lons and speeds with the actual data. I get this
1) The number of elements of lats (or LATITUDE_AFTER READ) is the same with the actual latitude.
2) But for lons (or LONGITUDE_AFTER READ) and speeds (or WIND SPEEDS_AFTER READ), there were differences in terms values and number of elements compared to the actual longitude and wind speeds.
As you can see too, at the end of the data for each column, the number of elements for lons and speeds is not same with the actual, thus may be this is the reason why lats is longer than lons and speeds.
Image Analyst
Image Analyst 2021년 11월 18일
So can we just take the first 2314 values and ignore the extra lat?
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 18일
편집: MAT NIZAM UTI 2021년 11월 18일
Sure..well I dont really know how the matlab works, because after comparing the actual values and the after read values, both lons and speeds were different with the actual data.
https://drive.google.com/drive/folders/1tFOl0ZHQo4XzB-VGvi-LBLPg_lERG98u (this is my very actual data) Column C until LB is the wind speeds values.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

H R
H R 2021년 11월 9일

1 개 추천

If your data is in a matrix format X, then you can use the following:
[idx,C] = kmeans(X,3,'Distance','cityblock','Replicates',5);

댓글 수: 6

H R
H R 2021년 11월 9일
Make sure to use a correct 'Distance' that makes sene to you. Try different distances and observe the results.
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 12일
My wind speed data is in grid format which is the data were arranged based on 0.25 degree for each latitude and logitude and each point is provided with one value of wind speed, so in your opinion what is the suitable "Distance" that I should use to cluster the wind speed data.
H R
H R 2021년 11월 12일
편집: H R 2021년 11월 12일
It seems you basically have v=f(x,y). So, I don't think you can mix dependent and independent variables in clustering i.e. (x,y,v) to gain useful information (if this is the case). I think it's better to use a supervised method instead. Alternatively, you may try to perform clustering using (x,y) and then see if the outcome of the clustering can be useful to give information about v. In doing so, I guess Euclidean distance would be enough. It seems there is a relevant paper on your subject as well: https://ieeexplore.ieee.org/document/7884477.
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 12일
Let say, I remove the Lat Long data and just used the wind speed data and arrange the wind speed data into one matrix (100:1), is it possible ?
H R
H R 2021년 11월 12일
Yes, every thing is possible (even using 1D data) , but you have to finally check what you are looking for from the clustering task and check if the outcome makes sense to you.
MAT NIZAM UTI
MAT NIZAM UTI 2021년 11월 14일
편집: Image Analyst 2021년 11월 14일
Here is my coding, and I have an error on it
Error using horzcat
Dimensions of matrices being concatenated are not consistent.
Error in k_mean (line 7)
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
This is the code:
k = xlsread('WIND_26YEARS.csv');
X1=(1:6943);
Y1=randn(6943,1);
numClusters=3;
idx1=kmeans([X1 Y1],numClusters,'Replicates',5);
pointclust=repmat(idx1,1,numClusters)==repmat(1:numClusters,numel(idx1),1);
colors=hsv(numClusters);
for j=1:numClusters,
plot(X1(pointclust(:,j)),Y1(pointclust(:,j)),'Color',colors(j,:));
if j==1,
hold on;
end;
end,
hold off;

댓글을 달려면 로그인하십시오.

태그

질문:

2021년 11월 9일

편집:

2021년 11월 18일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by