Gaussian Mixture Model - not working
조회 수: 1 (최근 30일)
이전 댓글 표시
Hi everybody this is the following code I wrote:
%flowdata = readtable("flodata.txt");
%%
% column 1: pressure at pipe start [bars]
% column 2: acoustic noise at start [standard deviation]
% column 3: pressure at pipe end (10km from start) [bars]
% column 4: acoustic noise at pipe end[standard deviation]
% column 5: flow rate [m3/hour]
load('flowdata.mat');
% summary(flowdata)
% Vectors of Pressures
pressureIN = flowdata.pressureAtPipeStart_bars_;
pressureOUT= flowdata.pressureAtPipeEnd_10kmFromStart__bars_;
% Vectors of Acoustic Noises at IN and at OUT
aNoiseIN=flowdata.acousticNoiseAtStart_standardDeviation_;
aNoiseOUT=flowdata.acousticNoiseAtPipeEnd_standardDeviation_;
% Vector of flowrate
flowrate = flowdata.flowRate_m3_hour_;
% % Table of flowRate
% flowRate = table(flowrate);
% rawData = [pressureIN aNoiseIN pressureOUT aNoiseOUT flowrate];
sample_distance = 10;
%t = (linspace(0, 7948790, 794879))/86400;
[samples,channels] = size(flowdata);
PHeadLoss = pressureIN-pressureOUT;
t = linspace(0, samples*sample_distance, samples)/86400;
figure()
plot(flowrate, PHeadLoss, '.')
densityplot(flowrate, PHeadLoss, [50, 50])
xlabel('flowrate')
ylabel('PHeadLoss')
% histogram on raw data
hist(flowrate, 5000)
title('Histogram on raw data flowrate and PHeadLoss')
hold on
%hist(flowrate + 0.1*randn(numel(flowrate), 1), 5000) % leggera perturbazione rispetto ai valori originali
hist(PHeadLoss, 5000)
%flowrate
xlabel('flowrate and PHeadLoss')
% all main curves overlaid
figure()
plot(t, pressureIN, '-g',t, pressureOUT, '-r',t, flowrate, '-b');
axis tight;
title(' Input , Output Pressure , FR vs time BEFORE EVERY CLEANING');
legend('Input Pressure','Output Pressure','Flow Rate');
xlabel('Tempo [Days]');
ylabel('all');
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
rawData = [pressureIN pressureOUT flowrate];
% Removing the NaN values
% Creo un array di indici logici che indica quali righe contengono almeno un NaN
idx_nan = any(isnan(rawData), 2);
cleanData = rawData;
% Elimino le righe selezionate
cleanData(idx_nan, :) = [];
[samples,channels] = size(cleanData);
t = linspace(0, samples*sample_distance, samples)/86400;
% Vectors of Pressures
pressureIN = cleanData(:,1);
pressureOUT= cleanData(:,2);
flowrate = cleanData(:,3);
PHeadLoss = pressureIN - pressureOUT;
%DATA
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
hold on
% After having removed NaN
densityplot(flowrate, PHeadLoss, [50,50]);
% Now I remove the negative values of pressures and flowrate (No physical
% meaning)
% Creo un array di indici logici che indica quali righe hanno almeno un valore negativo
idx_neg = any(cleanData < 0, 2);
% Elimino le righe selezionate
cleanData(idx_neg, :) = [];
[samples,channels] = size(cleanData);
t = linspace(0, samples*sample_distance, samples)/86400;
pressureIN = cleanData(:,1);
pressureOUT= cleanData(:,2);
flowrate = cleanData(:,3);
PHeadLoss = pressureIN - pressureOUT;
%DATA
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
%hold on
densityplot(flowrate, PHeadLoss, [50,50]);
sample_distance = 10;
%t = (linspace(0, 7948790, 794879))/86400;
[samples,channels] = size(cleanData);
t = linspace(0, samples*sample_distance, samples)/86400;
% all main curves overlaid
figure()
plot(t, pressureIN, '-g',t, pressureOUT, '-r',t, flowrate, '-b');
axis tight;
title(' Input , Output Pressure , FR vs time');
legend('Input Pressure','Output Pressure','Flow Rate');
xlabel('Tempo [Days]');
ylabel('all');
% First hypothesis
% large variations on the output pressure can be seen at days 1 to 15 ,
% not related to input pression
% we should not use that portion of the data
% It's a brutal cleaning but...
% I am sceptical about what happens during the first 15 days :
% lots of peaks are seen on the output pressure side that are
% not reflected / correlated to the input side ,
% so I prefer to discard the first 20 days (yes it's brutal !)
ind = t>13;
t = t(ind);
pressureIN = pressureIN(ind);
pressureOUT = pressureOUT(ind);
flowrate = flowrate(ind);
PHeadLoss = pressureIN-pressureOUT;
figure()
plot(t, pressureIN, '-g',t, pressureOUT, '-r',t, flowrate, '-b');
axis tight;
title(' Input , Output Pressure , FR vs time');
legend('Input Pressure','Output Pressure','Flow Rate');
xlabel('Tempo [Days]');
ylabel('all');
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
%DATA
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
densityplot(flowrate, PHeadLoss, [50,50]);
% Second Hypothesis
% keep data (valid) for gradient below a certain threshold (and non zero
% FR too btw)
% i also don't like those sudden drops to zero ,
% because to me its like dividing zero by zero ,
% when you want to make a relationship between two variables ,
% I avoid using background noise data , it will blurr the result.
% This function gradient calculates the gradient of a signal,
% which is the rate of change of the signal over time.
pressureIN_grad = abs(gradient(pressureIN));
pressureOUT_grad = abs(gradient(pressureOUT));
flowrate_grad = abs(gradient(flowrate));
threshold = 0.0073;
ind1 = pressureIN_grad<threshold*max(pressureIN_grad);
ind2 = pressureOUT_grad<threshold*max(pressureOUT_grad);
ind3 = flowrate_grad<threshold*max(flowrate_grad);
ind4 = flowrate> 0;
ind = ind1 & ind2 & ind3 & ind4;
t = t(ind);
pressureIN = pressureIN(ind);
pressureOUT = pressureOUT(ind);
flowrate = flowrate(ind);
figure()
plot(t, pressureIN, '.g',t, pressureOUT, '.r',t, flowrate, '.b');
axis tight;
title(' Input , Output Pressure , FR vs time');
legend('Input Pressure','Output Pressure','Flow Rate');
xlabel('Tempo [Days]');
ylabel('all');
PHeadLoss = pressureIN - pressureOUT;
densityplot(flowrate, PHeadLoss, [50,50])
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
%DATA
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
densityplot(flowrate, PHeadLoss, [50,50]);
% Third Hypothesis
% PRESSURE HEAD LOSS (APPROX _no fattore di attrito, flusso del fluido e
% densità del fluido)
PHeadLoss = pressureIN - pressureOUT;
% consider only positive values
ind = PHeadLoss> 0;
t = t(ind);
pressureIN = pressureIN(ind);
pressureOUT = pressureOUT(ind);
flowrate = flowrate(ind);
PHeadLoss = pressureIN - pressureOUT;
densityplot(flowrate, PHeadLoss, [50,50])
%DATA
figure('Name',' PRESSURE HEAD LOSS vs FLOWRATE')
plot(flowrate, PHeadLoss, '*k');
title('Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
densityplot(flowrate, PHeadLoss, [50,50]);
%% FITTING SU PRIMA PARABOLA
%flowrate = flowrate + 0.1*randn(numel(flowrate), 1)
figure()
plot(flowrate, PHeadLoss, '*k')
hold on
% Now I create the FITTING CURVE manually guided by data
% mettere passo 0.0001 anche se ci vuole troppo tempo
x1 = 0:0.01:max(flowrate);
%y1 = 0.002941548*x1.^2+45;
a1 = 0.0029415;
y1 = a1.*x1.^2+45;
plot(x1, y1, '.r')
title('FITTING CURVE Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
legend('DATA', 'FITTING CURVE 1')
%% FITTING SU SECONDA PARABOLA
figure()
plot(flowrate, PHeadLoss, '*k')
%% LA PERDITA DI CARICO DELLA PRESSIONE E' PROPORZIONALE AL QUADRATO DELLA
%% PORTATA.
x2= 0:0.1:max(flowrate);
%y2 = 0.005.*x2.^2+45; (DA RICORDARE!)
a2 = 0.005;
y2 = 0.005.*x2.^2+45;
hold on
plot(x2, y2, '.r')
title('FITTING CURVE Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
legend('DATA', 'FITTING CURVE 2')
%densityplot(x2, y2, [50,50])
% Now I create the FITTING CURVE manually guided by data
% mettere passo 0.0001 anche se ci vuole troppo tempo
x = 0:0.01:max(flowrate);
%y1 = 0.002941548*x1.^2+45;
figure()
plot(flowrate, PHeadLoss, '*k')
hold on
a = (a1+a2)/2;
y = a.*x.^2+45;
plot(x, y, '.r')
title('MEDIAN FITTING CURVE Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
legend('DATA', 'FITTING CURVE')
% plot(y1, x1)
figure()
plot(flowrate, PHeadLoss, '*k', x1, y1, '-r', x2, y2, '-r', x, y, '.r')
title('MEDIAN FITTING CURVES Pressure Head Loss vs FLOW RATE');
xlabel('flow rate');
ylabel('P Head Loss');
legend('DATA', 'FITTING CURVE 1', 'FITTING CURVE 2', 'FITTING CURVE')
% Now I compute the distances from each point to the fitted curve
[xy,distance_abs,t_a]= distance2curve([x' y'], [flowrate PHeadLoss]);
plot(xy(:,1), xy(:,2), '.g') % projections of minimum distances points on the fitted curve
dx = xy(:,1)-flowrate;
dy = xy(:,2)-PHeadLoss;
figure()
plot(dx, dy, '*k')
xlabel('dx')
ylabel('dy')
figure()
scatter(dx, dy, 'filled')
xlabel('dx');
ylabel('dy');
%ACCURACY and its mean value
accuracy = sqrt(dx.^2 + dy.^2);
figure()
hist(accuracy, 100)
accuracy_mean = mean(accuracy);
figure
hist(dx, 100)
xlabel('dx')
figure()
hist(dy, 100)
ylabel('dy')
figure()
histogram2(dx, dy, 100)
xlabel('dx');
ylabel('dy');
densityplot(dx, dy, [50,50])
xlabel('dx');
ylabel('dy');
dx = (dx-mean(dx));
dx = dx/prctile(abs(dx), 99);
dy = (dy-mean(dy));
dy = dy/prctile(abs(dy), 99);
% bond.x deve essere una riga!
bond.x = dx;
bond.y = dy;
% bond.x = flowrate;
% bond.y = PHeadLoss;
%% Visualize Data
% Visualization of the slice data in the component "corp".
% Coupon Rate vs. YTM plot, differentiated by credit rating
gscatter(bond.x,bond.y)
% Label the plot
xlabel('bondx')
ylabel('bondy')
%% Select Features to use for Clustering
% Getting the data from the colums Coupon rate, YTM, currente Yeld and
% credit rating.
% Features
% Number of Clusters to create
%eva=evalclusters([bond.x' bond.y'],'kmeans','CalinskiHarabasz','KList',[1:6]);
eva=evalclusters([bond.x bond.y],'gmdistribution','CalinskiHarabasz','KList',[1:6]);
numClust = eva.OptimalK;
%% Gaussian Mixture Models (GMM)
gmobj = fitgmdist([bond.x bond.y],numClust);
gidx = cluster(gmobj,[bond.x bond.y]);
gscatter(bond.x, bond.y, gidx)
% Visualize results
I am having problems since I don't get what I expect from GMM running.
What I get is sthing similar like this :
May someone Help me in solving this?
If you wanna try to run the data, don't hesitate to leave your mail so I can send you since they are too big to send here as attachment!
댓글 수: 1
답변 (1개)
Nipun
2024년 4월 15일
Hi Giuseppe,
I understand that you are getting a different plot for the mentioned Gaussian mixture model than expected.
However, it is not feasible for me to help you with a solution since you have not described your use case or shown the expected plot.
Can you clarify what are you trying to model (the physical relationship, constraints or any equations) and the expected model? I shall be able to help you further.
Regards,
Nipun
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!