Fitting data to a Gaussian when data is a little bit skewed (but not too much)

Hello, I have some data below that should approx to a Gaussian.
1.00 NaN
2.00 4.00
3.00 3.00
4.00 6.00
5.00 9.00
6.00 7.00
7.00 9.00
8.00 57.00
9.00 141.00
10.00 205.00
11.00 204.00
12.00 190.00
13.00 86.00
14.00 23.00
15.00 5.00
16.00 4.00
17.00 7.00
18.00 5.00
19.00 7.00
20.00 4.00
21.00 5.00
I understand its not perfect, but when I give my gaussian fitting the parameters represented by the orange curve, it fails to do any kind of fit.
This is my Routine
b0 = 11.00
c0 = 1.73 % This is actually sqrt (fwhm guess = 3)
function [fwhm,xdataFine,fitGaus,xpeak,ypeak,rsquared]=myGaussianFit2(xdata,ydata, b0,c0)
%--------------------Gaussian Fit----------------------------------------
%Input is xdata, ydata and must be in column vector format
%only two initial guesses required as the following are determine by ydata
a0 = max(ydata(:)); %-min(ydata(:));
d0 = min(ydata(:));
%b0 is the guess at x-location of peak
%c0 is guess at width of peak
%Output gives the fitted parameters as well as xpeak location and its
%yvalue (i.e. xpeak & ypeak
%-------------------------------------------------------------------------
%Define Gauss Equation (remember the . notation
gaussEqn ='a*exp(-0.5*((x-b)/(c^2)).^2)+d';
%Use startpoint to define guess values (otherwise fit doesn't perform well)
try
%f = fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instred of c to enforce postive sigma/fwhm
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instead of c to enforce postive sigma/fwhm
coeffs=coeffvalues(f);
a=coeffs(1); b=coeffs(2);
c=coeffs(3); %need to square it as used c^2 in fitting equation to enforce +ve values
c=c^2;
d=coeffs(4);
rsquared=gof.rsquare;
fwhm=c*sqrt(log(256));
% %Increase resolution of x data (by 30)
xdataFine=(linspace(xdata(1),xdata(end),100))';
% Create high res Gauss function
fitGaus = a*exp(-0.5*((xdataFine-b)/c).^2)+d;
%Find max value and its x location
ypeak=max(fitGaus);
xpeak=b;
xpeak=mean(xpeak(:)); %Take mean incase more than one peak found
catch
a=0;b=0;c=0;d=0;xpeak=0;ypeak=0;
end
Is there anyhting I can do here to make the fit work, although I understand its not great, but I still think it should be able to fit something!

댓글 수: 1

This seems to help.
Assuming its the peak value that could be wrong, so "exclude" it
[idx,v]=find(ydata == a0); %a0 is max of ydata
try
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0]); %use c^2 instead of c to enforce postive sigma/fwhm
catch
[f,gof]=fit(xdata,ydata,gaussEqn,'Normalize','off', 'StartPoint',[a0,b0,sqrt(c0),d0],'Exclude',[idx]);
end

댓글을 달려면 로그인하십시오.

 채택된 답변

xy=[1.00 NaN
2.00 4.00
3.00 3.00
4.00 6.00
5.00 9.00
6.00 7.00
7.00 9.00
8.00 57.00
9.00 141.00
10.00 205.00
11.00 204.00
12.00 190.00
13.00 86.00
14.00 23.00
15.00 5.00
16.00 4.00
17.00 7.00
18.00 5.00
19.00 7.00
20.00 4.00
21.00 5.00];
f=fit(xy(2:end,1),xy(2:end,2),'gauss1')
f =
General model Gauss1: f(x) = a1*exp(-((x-b1)/c1)^2) Coefficients (with 95% confidence bounds): a1 = 224.6 (211, 238.2) b1 = 10.73 (10.62, 10.85) c1 = 2.37 (2.204, 2.536)
Simply removing the NaN observation and using the builtin gaussian seems to work. I didn' take time to try to see what might be going wrong otherwise...
plot(f,xy(:,1),xy(:,2))

댓글 수: 8

Thanks dpb. The reason I didn't use that originally is because it doesn't allow an offset (d) to be incuded in the fit.
out of interest, if I wanted to get the confidence bounds on the "c1" term,
c1 = 2.37 (2.204, 2.536)
how do I access these?
thanks
Jason
Dont worry, found it:
ci = confint(fitresult)
Im accepting dpbs answer, as although Im not using the "gauss1" fit as he suggested, he did point out its the NaNs that are messing with my fits.
So I now have this additional code:
D= [xdata,ydata];
rows = any(isnan(D),2);
D(rows,:) = [];
xdata=D(:,1); ydata=D(:,2);
"... it doesn't allow an offset (d) to be incuded in the fit."
Well, including that I can see why it could cause some issues in fitting; there's certainly nothing in the input data that makes it appear there should be any such term in the model.
If there's reason to think the mode of the fitted distribution should be at or more nearly at the location of the observed maximum(*) and the idea of the offset was to move the model results that way, adding an offset to try to move a symmetric distribution doesn't appear to me to be the solution; using an asymmetric distribution model instead would seem more appropriate.
(*) But, your initial guesses were based more on the middle of the peak it appears, anyway, so it doesn't seem that was the idea, either.
Which in "the MATLAB way" could be written more succinctly as
ix=isfinite(x) & isfinite(y); % test the two vectors instead of building array to test
x=x(ix); y=y(ix);
to save the creation of and subsequent extraction of the two column vectors back out of the temporary array. It does still create the temporary logical addressing vector...
This way keeps the finite values instead of removing the NaN and also would find any Inf values if those also would happen to ever happen.
Small, but just a little less code to write and one variable eliminated....
Jason
Jason 2024년 9월 30일
이동: John D'Errico 2024년 9월 30일
Awesome, thankyou!
It's always a goal of mine to not leave temporary arrays around if they're not really needed...I don't know if the JIT compiler is smart enough that if one were to instead write
x=x(isfinite(x) & isfinite(y));
y=y(isfinite(x) & isfinite(y));
that doesn't store the logical indexing vector it wouldn't actually do the comparison twice but create the temporary behind the scenes anyway. I'm sure there are those who know enough more about the JIT compiler that could answer that, but I'm not one of them...so I tend to write like the first and sacrifice some memory on the assumption am saving a few compute cycles. Whether that's true or not, I really don't know, though, and there's not enough detail in the documentation to give any hints as to whether one way or the other is better--in fact, they go out of their way to tell you to not try to optimize because "things change" and their suggestion is to write as straightforward code as one can and then and only then if it turns out to be too slow try to figure out ways to speed it up...
You might also be interested in the function fitgmdist
help fitgmdist
fitgmdist - Fit Gaussian mixture model to data This MATLAB function returns a Gaussian mixture distribution model (GMModel) with k components fitted to data (X). Syntax GMModel = fitgmdist(X,k) GMModel = fitgmdist(X,k,Name,Value) Input Arguments X - Data numeric matrix k - Number of components positive integer Name-Value Arguments CovarianceType - Type of covariance matrix 'diagonal' | 'full' (default) Options - Iterative EM algorithm optimization options statset options structure ProbabilityTolerance - Tolerance for posterior probabilities 1e-8 (default) | nonnegative scalar value in range [0,1e-6] RegularizationValue - Regularization parameter value nonnegative scalar | 0 (default) Replicates - Number of times to repeat EM algorithm positive integer | 1 (default) SharedCovariance - Flag indicating whether all covariance matrices are identical logical false (default) | logical true Start - Initial value setting method 'randSample' | 'plus' (default) | vector of integers | structure array Output Arguments GMModel - Fitted Gaussian mixture model gmdistribution model Examples openExample('stats/ClusterDataUsingAGaussianMixtureModelExample') openExample('stats/RegularizeGaussianMixtureModelEstimationExample') openExample('stats/ViewGaussianMixtureModelsOverComponentLevelsExample') openExample('stats/DetermineGaussianMixtureFitUsingAICExample') openExample('stats/SetInitialValuesWhenFittingGaussianMixtureModelsExample') See also gmdistribution, cluster Introduced in Statistics and Machine Learning Toolbox in R2014a Documentation for fitgmdist doc fitgmdist

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Resizing and Reshaping Matrices에 대해 자세히 알아보기

제품

릴리스

R2023b

태그

질문:

2024년 9월 30일

댓글:

2024년 10월 1일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by