Fittype and fit do not work as expected (in a curve fitting)

조회 수: 78 (최근 30일)
Sim
Sim 2024년 10월 30일 13:42
댓글: Sim 2024년 10월 31일 10:31
Let's consider the following curve fitting of a set of data, by using the fittype and fit functions:
% inputs
x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
% curve fitting
ft = fittype('a*(x^n)','dependent','y','independent','x');
[fitresult, ~] = fit(x',y',ft,'Startpoint', [0 0]);
% plots
hold on
p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');
p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');
set(gca, 'XScale', 'log', 'YScale', 'log');
legend([p1 p2])
However if I perform the curve fitting with EzyFit, I get the following (correct) result, as expected:
Why don't fittype and fit functions work correctly in this example?
Did I use those two functions in a wrong way?

채택된 답변

John D'Errico
John D'Errico 2024년 10월 30일 14:12
편집: John D'Errico 2024년 10월 30일 14:23
x = [1 4 9 15 25 35 45 55 65 75 85 95 150 250 350 450 900];
y = [0 335 28 37 9 4 3.5 2 1 0.7 0.6 0.5 0.3 0.1 0.01 0.005 0.001];
Look CAREFULLY at your data!
What is y(1)? Do you see that it is tremendously out of character, inconsistent with the rest of your data?
y(1)
ans = 0
Of course, when you plot it on a log scale, MATLAB does not plot that point, because log(0) is undefined. (-inf if you insist.) HOWEVER, when you do a fit, fit sees it!
And what that does is completely drag your curve around. y(1) = 1000 or larger might be more consistent. But not zero.
ft = fittype('a*(x^n)','dependent','y','independent','x');
Next, starting ANY optimization at [0,0] is a HIGHLY, HUGELY dangerous thing to do.
[fitresult, ~] = fit(x(2:end)',y(2:end)',ft,'Startpoint', [1 -1])
fitresult =
General model: fitresult(x) = a*(x^n) Coefficients (with 95% confidence bounds): a = 1.222e+04 (4056, 2.038e+04) n = -2.596 (-3.072, -2.12)
p1 = plot(x,y,'o-','LineWidth',1.5,'color','b','DisplayName','inputs');
hold on
p2 = plot(x,(fitresult.a .* (x.^fitresult.n)),'-','LineWidth',1.5,'color','k','DisplayName','best fit');
set(gca, 'XScale', 'log', 'YScale', 'log');
legend([p1 p2])
hold off
Is the fit very good, when plotted on a log scale? Well, not really. But it never is a good idea to do this sort of thing. Why not? Because MATLAB will not worry about TINY deviations in those lower points, even though in proportional terms, it will be very obvious on a log scale.
What you should see is the fit fits the first two points quite well, and then misses the rest. But again, think about what a log scale does!!!!!
fit assumes an additive error model, with the errors presumed to be homogeneous, and additive. That is a bad thing when your dats spans 4 or 5 powers of 10. It would be far smarter to log your data and your model, THEN do the fit. Now instead of proportional error, you will have additive error. See how nice it gets?
When you log your model, the problem becomes trivial.
Your model: y = a*x^n * (proportional noise)
Logged model: log(y) = log(a) + n*log(x) + (additive homogeneous noise)
We still need to drop the bogus first data point of course. But now your model is just a linear 'poly1' model.
mdl = fit(log(x(2:end)'),log(y(2:end)'),'poly1')
mdl =
Linear model Poly1: mdl(x) = p1*x + p2 Coefficients (with 95% confidence bounds): p1 = -2.292 (-2.525, -2.059) p2 = 9.469 (8.431, 10.51)
Don't forget to exponentiate the coefficient out front!
a = exp(mdl.p2)
a = 1.2953e+04
n = mdl.p1
n = -2.2920
Or, we could use fittype
ft = fittype('log(a) + n*log(x)','dependent','y','independent','x');
Note that I logged x inside fittype, but not y.
[fitresult, ~] = fit(x(2:end)',log(y(2:end)'),ft,'Startpoint', [1 -1])
fitresult =
General model: fitresult(x) = log(a) + n*log(x) Coefficients (with 95% confidence bounds): a = 1.295e+04 (-488, 2.639e+04) n = -2.292 (-2.525, -2.059)
And you see this generates the same result for the parameters.
  댓글 수: 8
John D'Errico
John D'Errico 2024년 10월 31일 9:23
Bound constraints in optimization can sometimes be a problem. As with all constraints, they must employ a tolerance, such that the constraint can sometimes be exceeded by a tiny amount, the constraint tolerance. And while that amount can seem tiny, it is highly important.
For example, suppose you decide to compute the log(x), and you never wish to see a complex result? Is it acceptable to have any negative value of x?
log(-1e-300)
ans = -6.9078e+02 + 3.1416e+00i
The point is, a function like a log or sqrt can never accept ANY negative value at all as an argument, as you will see a complex result.
But when you set a lower bound of 0 for some parameter x, the optimization tool will allow that constraint to be exceeded by the constraint tolerance. And you cannot set the constraint tolerance to be exactly 0.
As I said, when working with logs, constraints can be problematic. You will need to set the lower bound high enough that the lower bound minus the constraint tolerance does not go below zero.
Sim
Sim 2024년 10월 31일 10:31
Very interesting! Yes working with logs can be messy sometimes.. To avoid as much as possible possible troubles, I am following the "path" with the weights that you showed me, thanks a lot :-)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Linear and Nonlinear Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by