Comparing two curve fits (using AIC?)

Question

James Akula 2023년 1월 12일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1893055-comparing-two-curve-fits-using-aic

댓글: Bjorn Gustavsson 2023년 1월 18일

Hello all,

Let's say for a moment I have some a priori ideas about a family of functions that might best describe a particular data set. After I fit the data with each candidate, I can simply look at the outputs (e.g., RMSE, r²) and chose the one with the best values. However, is there a way to gain some degree of confidence about that selection? For example, let's say I have reason to believe that in a set of data like the below, the true distribution is described by a low-order integer exponent in the function y=b×xⁿ+c. So, I might capture some data and then try to fit it as follows:

% Generate some data
b = 1; % True coefficient
n = 3; % True exponent
c = 0; % True y-intercept
rx = randi(100, [100, 1])/10;
rn = 3 + rand([100, 1])/4 - 0.125;
rc = (rand([100, 1]) * 500) - 250 + c;
ry = b * rx.^(rn) + rc;
% Generate and fit the models
m = @(b, c, n, x) b * x.^n + c;
m1 = @(p, x) m(p(1), p(2), 1, x); % n=1
m2 = @(p, x) m(p(1), p(2), 2, x); % n=2
m3 = @(p, x) m(p(1), p(2), 3, x); % n=3
c1 = lsqcurvefit(m1, [1, 0], rx, ry)
c2 = lsqcurvefit(m2, [1, 0], rx, ry)
c3 = lsqcurvefit(m3, [1, 0], rx, ry)
% Plot the results
x = linspace(min(rx), max(rx));
plot(rx, ry, 'ok', x, m1(c1, x), '-g', x, m2(c2, x), '-r', x, m3(c3, x), '-b')
% Determine which model is most likely to be correct.  Use AIC?

In the above example, m3 should generally fit the data best, unless the random number generation is very unlucky, because that's what I set it to.

For those who are familiar with Prism, there I might perform this test by starting an "analysis" and chosing compare, and then selecting "for each data set, which of two equations (models) fits best" and then chosing the "Akaike's Information Criterion" test. After running that on one set of data, I got

This is very helpful as it lets me know how certain I am about the fit choice (i.e., 86% sure n=3 is better than n=2). What's the best way to do something similar in MATLAB?

Thanks in advance.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

James Akula 2023년 1월 13일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1893055-comparing-two-curve-fits-using-aic#answer_1148245

MATLAB Online에서 열기

I belive the following functions perform the AIC test, as shown in https://www.mathworks.com/matlabcentral/answers/1893055-comparing-two-curve-fits-using-aic#comment_2561700. p1 is the probablility that the first model is the correct one.

% y   = y data
% o   = observed
% a   = "answer" to curve fit 
% m   = modeled
% 1/2 = model IDs
df     = @(oy, a) numel(oy) - numel(a);
SS     = @(my, oy) sum(((my - oy).^2));
DifAIC = @(m1y, m2y, oy, a1, a2) numel(oy) .* ...
    log(SS(m1y, oy)./SS(m2y, oy)) + 2 .* (df(oy, a2) - df(oy, a1));
p1 = @(m1y, m2y, oy, a1, a2) ...
    1 - exp(DifAIC(m1y, m2y, oy, a1, a2)/2)./...
    (1 + exp(DifAIC(m1y, m2y, oy, a1, a2)/2));

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

James Akula 2023년 1월 13일

I should have cited my source! Please see https://www.graphpad.com/guides/prism/latest/curve-fitting/reg_how_the_aicc_computations_work.htm for more details.

댓글을 달려면 로그인하십시오.

Answer 2

Bora Eryilmaz 2023년 1월 12일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1893055-comparing-two-curve-fits-using-aic#answer_1147460

MATLAB Online에서 열기

The lsqcurvefit returns the residual norm of the fit. The model that has the least residual norm would be the "best" one by this criterion:

% Generate some data

b = 1; % True coefficient

n = 3; % True exponent

c = 0; % True y-intercept

rx = randi(100, [100, 1])/10;

rn = 3 + rand([100, 1])/4 - 0.125;

rc = (rand([100, 1]) * 500) - 250 + c;

ry = b * rx.^(rn) + rc;

% Generate and fit the models

m = @(b, c, n, x) b * x.^n + c;

m1 = @(p, x) m(p(1), p(2), 1, x); % n=1

m2 = @(p, x) m(p(1), p(2), 2, x); % n=2

m3 = @(p, x) m(p(1), p(2), 3, x); % n=3

[c1, resnorm1] = lsqcurvefit(m1, [1, 0], rx, ry);

Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.

[c2, resnorm2] = lsqcurvefit(m2, [1, 0], rx, ry);

Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.

[c3, resnorm3] = lsqcurvefit(m3, [1, 0], rx, ry);

Local minimum possible. lsqcurvefit stopped because the final change in the sum of squares relative to its initial value is less than the value of the function tolerance.

% Plot the results

x = linspace(min(rx), max(rx));

plot(rx, ry, 'ok', x, m1(c1, x), '-g', x, m2(c2, x), '-r', x, m3(c3, x), '-b')

% Determine which model is most likely to be correct. Use AIC?

[~, idx] = min([resnorm1 resnorm2 resnorm3]) % Index of best model

idx = 3

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

James Akula 2023년 1월 13일

MATLAB Online에서 열기

Whether it is applicable or not, I cannot say for certain, but my understanding is that, indeed, it is. If I am correct, for functions with different numbers of degrees of freedom, you can use an F test, but AIC is "right" here (when the degrees of freedom are the same). I belive I have worked it out. Here is an example of code that seems to calculate the probabilities as expected.

%% Generate some data

rng(10)

b = 1; % True coefficient

n = 3; % True exponent

c = 0; % True y-intercept

rx = randi(100, [100, 1])/10;

rn = 3 + rand([100, 1])/4 - 0.125;

rc = (rand([100, 1]) * 500) - 250 + c;

rb = rand([100, 1]) + 0.5;

ry = rb .* rx.^(rn) + rc;

%% Generate and test the models

m = @(b, c, n, x) b * x.^(n) + c;

m2 = @(p, x) m(p(1), p(2), 2, x);

m3 = @(p, x) m(p(1), p(2), 3, x);

f2 = @(p) rmse(m2(p, rx), ry);

f3 = @(p) rmse(m3(p, rx), ry);

a2 = fminsearch(f2, [1, 0])';

a3 = fminsearch(f3, [1, 0])';

%% Plot the results

x = linspace(min(rx), max(rx));

plot(rx, ry, 'ok', x, m2(a2, x), '-r', x, m3(a3, x), '-b')

%% AIC test

% o = observed data

% a = "answer" to curve fit

% m = modeled data

df = @(oy, a) numel(oy) - numel(a);

SS = @(my, oy) sum(((my - oy).^2));

DifAIC = @(m1y, m2y, oy, a1, a2) numel(oy) .* ...

log(SS(m1y, oy)./SS(m2y, oy)) + 2 .* (df(oy, a2) - df(oy, a1));

p1 = @(m1y, m2y, oy, a1, a2) ...

1 - exp(DifAIC(m1y, m2y, oy, a1, a2)/2)./...

(1 + exp(DifAIC(m1y, m2y, oy, a1, a2)/2));

pm2 = p1(m2(a2, rx), m3(a3, rx), ry, a2, a3);

pm3 = p1(m3(a3, rx), m2(a2, rx), ry, a3, a2);

disp(array2table(round([[a2, a3];[pm2, pm3] * 100], 4, 'significant'), ...

'VariableNames', {'n = 2', 'n = 3'}, ...

'RowNames', {'b', 'c', '% is Correct'}));

n = 2 n = 3 ______ ______ b 8.192 0.8739 c -34.61 23.43 % is Correct 20.98 79.02

James Akula 2023년 1월 17일

I hear ya. But I do feel like if you have a priori reasons to think something is either one thing or another, and you are hopitn the data will guide you to the truth, then this approach perhaps makes sense.

Bjorn Gustavsson 2023년 1월 18일

@James Akula, sure, and my suggestion might work well enough for this (with reasonable certainty) simplified toy-example, but can run into severe problems in more complicated real-world cases. I mainly thought it was worth mentioning as a reminder for the case where someone were to have this specific choise to make. (and I don't need anyone to argue this quandry with - I come down on all three of the two sides in this discussion and have frequent and angry arguments with myself from last week...)

댓글을 달려면 로그인하십시오.

Comparing two curve fits (using AIC?)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Comparing two curve fits (using AIC?)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 8 이전 댓글 6개 표시이전 댓글 6개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 8
이전 댓글 6개 표시이전 댓글 6개 숨기기