Line of Best Fit through Scattered Data

조회 수: 3 (최근 30일)
jgillis16
jgillis16 2015년 10월 11일
댓글: Star Strider 2015년 10월 12일
I need to find the line of best fit through my scatterplot. I have attached my text file, and my code is the following.
clear all
fid = fopen( 'oo20.txt');
data = textscan(fid, '%f%f', 'Delimiter', '|', 'TreatAsEmpty','~');
fclose(fid);
GalList.year = data{1};
D = data{2};
X1 = GalList.year;
Y1 = D;
scatter(X1,Y1);
ylim([0 20])
  댓글 수: 3
jgillis16
jgillis16 2015년 10월 11일
Thanks for the reminder, Star.
X1 represents the year of the events plotted, while Y1 represents the distances the events are happening at. I just wanted to see if there was some sort of fit I could derive from the data, rather than just stopping at a messy scatterplot.
Star Strider
Star Strider 2015년 10월 11일
I just peeked at it and it seems reasonable to delete this one:
~|3.4028886
Does it have any specific significance?

댓글을 달려면 로그인하십시오.

채택된 답변

Star Strider
Star Strider 2015년 10월 11일
I don’t see anything strange about the data, but the regression is failing. The data are both column vectors. With the full set of data, the parameters I estimate are both zero for the biparametric regression, and for the uniparametric (origin intercept) regression, the single parameter is zero. Trying it with polyfit results in both parameters being NaN, so it’s not my code. (I deleted the polyfit calls in the posted code.) It’s not obvious to me what problems there may be, but with three attempts failing, something is very wrong somewhere.
Of interest, everything works fine with a random sample of 280 data pairs, giving an intercept of 31 BCE (so that’s when astronomy began!), and a slope of +0.022 (is that Galaxies Discovered/Year?). Any more than 280 breaks the code for some reason.
I don’t believe the duplicated years should cause problems, since linear regression is usually robust to such. If you have any insights as to what the problem may be with your full data set, please share them. You know them better than I do, and what they should look like.
I plotted a linear fit tonight. Any others that might be more descriptive of whatever you’re observing that you’d like to try?
This is the end of my day, so I’ll come back to this in the morning.
My code:
fidi = fopen('jgillis16 oo20.txt', 'rt');
D = textscan(fidi, '%f|%f', 'CollectOutput',1, 'TreatAsEmpty','~');
X1 = D{:}(:,1);
Y1 = D{:}(:,2);
RandRows = randi(length(X1), 280, 1);
X1 = X1(RandRows); % Hypothesis: Works With Random Subset -> Accepted
Y1 = Y1(RandRows);
DesignMtx = [ones(size(X1)) X1]; % MODEL: X1*B = Y1
B2 = DesignMtx\Y1; % Linear Biparametric Regression — Estimate Parameters
Yhat2 = DesignMtx*B2; % Linear Biparametric Regression — Generate Line
B1 = X1\Y1; % Linear Uniparametric Regression — Estimate Parameter
Yhat1 = X1*B1; % Linear Uniparametric Regression — Generate Line
XTX = (DesignMtx'*DesignMtx); % X'X
figure(1)
scatter(X1, Y1, 'bp')
hold on
plot(X1, Yhat2, '-r')
hold off
grid
xlabel('Year')
ylabel('Distance (Parsecs)')
  댓글 수: 3
jgillis16
jgillis16 2015년 10월 12일
Yes, it's nearly impossible trying to get anything out of this. I am going to try a different approach to this set of data. Thanks regardless, Star!
Star Strider
Star Strider 2015년 10월 12일
My pleasure!
Extrapolating back to the x-intercept, the first galaxy discovered was in January 1409. It was undoubtedly the Milky Way, because it has a distance of zero (we’re in it).
Having fun with the numbers...

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Image Analyst
Image Analyst 2015년 10월 11일
There are other more sophisticated methods, but try polyfit() and polyval(). See attached demo.

Matt J
Matt J 2015년 10월 11일
편집: Matt J 2015년 10월 11일
The following attempts to fit the data to an equation A*X1+B*Y1=C,
X1=X1(:);
Y1=Y1(:);
e=-ones(length(X1),1);
[~,~,V]=svd( [X1,Y1,e], 0);
ABC=V(:,end);
A=ABC(1);
B=ABC(2);
C=ABC(3);

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by