How to perform linear curve fitting based on distance from line instead of residuals

조회 수: 13 (최근 30일)
I already know how to perform a classic 1st order curve fir using the polyval function. However, I find that the polyval curve fit does accurately follow my current data because it has a high slope and a round trend. Based on the sum of square errors approach used by polyval, it seems to be over-weighing certain points. Is there a way to do a 1st order curve fit that is guided by the total distance of the points from the line, instead of based on residuals?

채택된 답변

John D'Errico
John D'Errico 2022년 7월 5일
편집: John D'Errico 2022년 7월 5일
This is the classic total least square problem. Sometimes it is called the errors in variables problem, sometimes known as orthogonal regression.
Lacking your data, I can't show how to do it using your data. But it is not difficult to do. Assume you have data in the form of vectors x and y. I'll make some up, here:
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
plot(x,y,'o')
axis equal
The actual slope of the curve should have been 50. What happens when we use polyfit?
P1 = polyfit(x,y,1)
P1 = 1×2
2.9081 2.3460
The problem is, polyfit always produces biased estimates in these problems, where the slope will be biased towards zero. Do you see the polyfit extimated slope is very small? Instead of 50, we got a number around 2.9081 from polyfit.
The trick is to use total least squares. We can get the coefficients of the line from:
mux = mean(x);
muy = mean(y);
[~,~,V] = svd([x(:)-mux,y(:)-muy],0);
coef = V(:,end)
coef = 2×1
-0.9998 0.0176
AEst = -coef(1)/coef(2)
AEst = 56.7846
So despite the rather high presence of noise on our data, we get a slope estimate 56.7846, instead of the 2.9 value that polyfit predicted. That seems pretty reasonable.
The constant term in a regression model will be:
BEst = muy - AEst*mux
BEst = 0.0527
The line of best orthogonal fit is:
y = 0.0527 + 56.7846*x
  댓글 수: 1
Damon Aboud
Damon Aboud 2022년 7월 18일
Wow, thank you very much for your thorough help on this task!! This is super helpful, and it totally solves my problem.

댓글을 달려면 로그인하십시오.

추가 답변 (5개)

Sulaymon Eshkabilov
Sulaymon Eshkabilov 2022년 7월 5일
You can try a scaling factor here. Your data seems to be within [355 365]; thus, take out 355 from x and then perform the linear fit.
You can also try fitlm() that is very similar to polyfit().

Torsten
Torsten 2022년 7월 5일
편집: Torsten 2022년 7월 5일
The residuals are the sum of the vertical distances of the points from the line.
If you want to consider the orthogonal distances of the points from the line, you can proceed as follows:
Calculate the distance d_i of a point P = (x_i,y_i) from a line y = a*x+b. If you don't know how to do this: google is your friend.
Once you have this distance, you can formulate the optimization problem as
min_{a,b} sum_i d_i

Torsten
Torsten 2022년 7월 6일
편집: Torsten 2022년 7월 6일
rng("default")
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
plot(x,y,'o')
sol = fminsearch(@(p)fun(p,x,y),[1 1])
sol = 1×2
54.4774 0.2832
fun(sol,x,y)
ans = 1.0309
hold on
plot(x,sol(1)*x+sol(2))
function res = fun(p,x,y)
a = p(1);
b = p(2);
for i = 1:numel(x)
xproj = (x(i)+a*(y(i)-b))/(1+a^2);
yproj = a*xproj+b;
dsquared(i) = (xproj-x(i))^2+(yproj-y(i))^2;
end
res = sum(dsquared);
end
  댓글 수: 1
Torsten
Torsten 2022년 7월 6일
편집: Torsten 2022년 7월 6일
rng("default")
t = rand(100,1);
x = t/10 + randn(size(t))/10;
y = 5*t + randn(size(t))/10;
mux = mean(x);
muy = mean(y);
[~,~,V] = svd([x(:)-mux,y(:)-muy],0);
coef = V(:,end)
coef = 2×1
-0.9998 0.0184
AEst = -coef(1)/coef(2)
AEst = 54.4774
BEst = muy - AEst*mux
BEst = 0.2832

댓글을 달려면 로그인하십시오.


Matt J
Matt J 2022년 7월 18일

Image Analyst
Image Analyst 2022년 7월 18일
In cases like this, with mostly vertical data, I just swap x and y and it seems to provide a good fit. Of course the residuals are horizontal rather than vertical but the fitted line seems reasonable. Of course you would only do this in certain situations where it physically makes sense, like fitting a line to the side of some blob in a binary image. You wouldn't do it for something like time series measurements. Of course you "orthogonal" residuals also would not have any theoretical reason for being right in that case either.

카테고리

Help CenterFile Exchange에서 Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

태그

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by