Line of Best Fit through Scattered Data
조회 수: 3 (최근 30일)
이전 댓글 표시
I need to find the line of best fit through my scatterplot. I have attached my text file, and my code is the following.
clear all
fid = fopen( 'oo20.txt');
data = textscan(fid, '%f%f', 'Delimiter', '|', 'TreatAsEmpty','~');
fclose(fid);
GalList.year = data{1};
D = data{2};
X1 = GalList.year;
Y1 = D;
scatter(X1,Y1);
ylim([0 20])
댓글 수: 3
Star Strider
2015년 10월 11일
I just peeked at it and it seems reasonable to delete this one:
~|3.4028886
Does it have any specific significance?
채택된 답변
Star Strider
2015년 10월 11일
I don’t see anything strange about the data, but the regression is failing. The data are both column vectors. With the full set of data, the parameters I estimate are both zero for the biparametric regression, and for the uniparametric (origin intercept) regression, the single parameter is zero. Trying it with polyfit results in both parameters being NaN, so it’s not my code. (I deleted the polyfit calls in the posted code.) It’s not obvious to me what problems there may be, but with three attempts failing, something is very wrong somewhere.
Of interest, everything works fine with a random sample of 280 data pairs, giving an intercept of 31 BCE (so that’s when astronomy began!), and a slope of +0.022 (is that Galaxies Discovered/Year?). Any more than 280 breaks the code for some reason.
I don’t believe the duplicated years should cause problems, since linear regression is usually robust to such. If you have any insights as to what the problem may be with your full data set, please share them. You know them better than I do, and what they should look like.
I plotted a linear fit tonight. Any others that might be more descriptive of whatever you’re observing that you’d like to try?
This is the end of my day, so I’ll come back to this in the morning.
My code:
fidi = fopen('jgillis16 oo20.txt', 'rt');
D = textscan(fidi, '%f|%f', 'CollectOutput',1, 'TreatAsEmpty','~');
X1 = D{:}(:,1);
Y1 = D{:}(:,2);
RandRows = randi(length(X1), 280, 1);
X1 = X1(RandRows); % Hypothesis: Works With Random Subset -> Accepted
Y1 = Y1(RandRows);
DesignMtx = [ones(size(X1)) X1]; % MODEL: X1*B = Y1
B2 = DesignMtx\Y1; % Linear Biparametric Regression — Estimate Parameters
Yhat2 = DesignMtx*B2; % Linear Biparametric Regression — Generate Line
B1 = X1\Y1; % Linear Uniparametric Regression — Estimate Parameter
Yhat1 = X1*B1; % Linear Uniparametric Regression — Generate Line
XTX = (DesignMtx'*DesignMtx); % X'X
figure(1)
scatter(X1, Y1, 'bp')
hold on
plot(X1, Yhat2, '-r')
hold off
grid
xlabel('Year')
ylabel('Distance (Parsecs)')
댓글 수: 3
Star Strider
2015년 10월 12일
My pleasure!
Extrapolating back to the x-intercept, the first galaxy discovered was in January 1409. It was undoubtedly the Milky Way, because it has a distance of zero (we’re in it).
Having fun with the numbers...
추가 답변 (2개)
Image Analyst
2015년 10월 11일
There are other more sophisticated methods, but try polyfit() and polyval(). See attached demo.
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!