How to define a custom equation in fitlm function for linear regression?

조회 수: 29 (최근 30일)
Spirit
Spirit 2017년 11월 22일
댓글: the cyclist 2023년 6월 8일
I'd like to define a custom equation for linear regression. For example y = a*log(x1) + b*x2^2 + c*x3 + k. This is a linear regression problem - but how to do this within FitLm function?
Thanks, Shriram

답변 (3개)

the cyclist
the cyclist 2017년 11월 22일
편집: the cyclist 2017년 11월 22일
% Set the random number seed for reproducibility
rng default
% Make up some pretend data
N = 100;
x1 = rand(N,1);
x2 = rand(N,1);
x3 = rand(N,1);
a = 2;
b = 3;
c = 5;
k = 7;
noise = 0.2*randn(N,1);
y = a*log(x1) + b*x2.^2 + c*x3 + k + noise;
% Put the variables into a table, naming them appropriately
tbl = table(log(x1),x2.^2,x3,y,'VariableNames',{'log_x1','x2_sqr','x3','y'});
% Specify and carry out the fit
mdl = fitlm(tbl,'y ~ 1 + log_x1 + x2_sqr + x3')
  댓글 수: 5
Walter Roberson
Walter Roberson 2023년 6월 8일
If you have a discontinuity in the first or second derivatie of the model, then you surely do not have a linear regression situation.
You probably need to use ga(). Not fmincon() or similar -- those optimizers cannot handle discontinuities in derivatives either.
the cyclist
the cyclist 2023년 6월 8일
I suggest you search the keywords segmented regression matlab and/or piecewise regression matlab. Although I don't believe there are any built-in functions for this, you should find a few different threads that you might find useful. I also think you might want to start a brand-new question for this, after you have done that search. In that question, I would suggest posting your data, which makes it easier for people to try out code suggestions.

댓글을 달려면 로그인하십시오.


laurent jalabert
laurent jalabert 2021년 12월 19일
편집: laurent jalabert 2021년 12월 19일
To proceed with a custom function it is possible to use the non linear regression model
The example below is intended to fit a basic Resistance versus Temperature at the second order such as R=R0*(1+alpha*(T-T0)+beta*(T-T0)^2), and the fit coefficient will be b(1)=R0, b(2) = alpha, and b(3)=beta.
The advantage here, is that the SE will be computed directly for R0, alpha and beta.
beta0 is an initial range of [R0 alpha beta]
b(n) is retrieved using mdl.Coefficients.Estimate(n), for n=1,2,3
standard deviation on the coefficients are retrieved by mdl.Coefficients.SE(n)
(Curve fitting toolbox and Statistical/Machine Learning toolbox are both requiered)
clear tbl mdl
% your vector data T_T0 and R of same dimension
tbl = table(T_T0,R);
modelfun = @(b,x)b(1).*(1+b(2).*x(:,1)+b(3).*x(:,1).^2);
beta0 = [100 1e-3 1e-6];
mdl = fitnlm(tbl,modelfun,beta0,'CoefficientNames',{'R0';'alpha';'beta'})
  댓글 수: 1
Erin Evans
Erin Evans 2023년 6월 7일
Would you be able to help me write this such that there is a conditional statement in it? I essentially need two connected trendlines such that the statistics are for both sections together.
my equation is:
log10(Qs) = log10(a) + b*log(Qr) + c*log10(Max(1, Qr / Qc)) + d*log10(Qr(i) / Qr(i - 1))
the code I have so far is:
AgaDisc = readtable("File Path");
%% Bring in data columns
AgaDischargeArray = table2array(AgaDisc(:,"Discharge"));
AgaLoadArray = table2array(AgaDisc(:,"SedimentLoad"));
%% Normalize the discharge and sediment data
meanDisc = mean(AgaDischargeArray);
medianLoad = median(AgaLoadArray);
AgaDischargeArray = AgaDischargeArray/meanDisc;
AgaLoadArray = AgaLoadArray/medianLoad;
%% log10 sediment
logQs = log10(AgaDischargeArray);
%% log10 discharge
logQt = log10(AgaLoadArray);
%% Create new table of log-normalized data
tbl = table(logQs, logQt);
%% Add weighting factor
weights = zeros(length(logQt), 1);
weightFactor = [0.5, 1, 5, 10];
Q = quantile(logQt, 3);
for i = 1:length(logQt)
if logQt(i) > Q(3)
weights(i) = weightFactor(4);
elseif logQt(i) > Q(2)
weights(i) = weightFactor(3);
elseif logQt(i) > Q(1)
weights(i) = weightFactor(2);
else
weights(i) = weightFactor(1);
end
end
m = fitlm(tbl, 'logQs ~ logQt', 'RobustOpts', 'on', 'weight', weights);

댓글을 달려면 로그인하십시오.


laurent jalabert
laurent jalabert 2023년 6월 8일
please check carefully your expression, cause you use log10 and log (I guess neperian log here)
log10(Qs) in equation and logQs = log10(AgaDischargeArray) in your program; is it same ?
d*log(Qr(i)/Qr(i-1) might be similar to d* diff(log(Qr))
log10(Qs) = log10(a) + b*log(Qr) + c*log10(Max(1, Qr / Qc)) + d*log10(Qr(i) / Qr(i - 1))
log10Qs is x(:,1) as first column of tbl.
logQt is x(:,2) as the second column of tbl.
a,b,c,d are unknown
Qc is not defined
d* diff(log(Qr)) will lead to problem because its length is length(Qr) -1

카테고리

Help CenterFile Exchange에서 Linear and Nonlinear Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by