Why RMSE obtained by fitlm in matlab does not match with RMSE calculated in EXCEL ؟
조회 수: 26 (최근 30일)
이전 댓글 표시
Hi every one . I've used the mdl = fitlm(x,y) function to fit a linear regression model to my Dataset. I also calculate the RMSE in Excel by Known Formula . the fitlm function in matlab return the exact value of R-squared calculated in excel and the exact Coefficients of Trendline. but the Value of RMSE in matlab and excel does not match. i was made a wide search but I'm still in trouble with that . any idea ? thanks for help . with best regards .
댓글 수: 3
the cyclist
2016년 5월 27일
I would specifically suggest posting both a *.m file and a *.xls file that replicate the simplest example you can provide that exhibits the problem.
답변 (2개)
John D'Errico
2016년 5월 28일
편집: John D'Errico
2016년 5월 28일
Your known formula is not always the formula that one might use. In fact, there is a subtly different alternative.
You divided by the number of data points there. In fact, a rational formula for RMSE has one divide by the number of data, less the number of parameters estimated. So by the number of degrees of freedom. A simple test of this fact is often the easiest thing to do, then one can verify my thesis.
x = randn(100,1);
y = randn(100,1);
lm = fitlm(x,y,'linear')
lm =
Linear regression model:
y ~ 1 + x1
Estimated Coefficients:
Estimate SE tStat pValue
__________________ __________________ __________________ _________________
(Intercept) -0.060640930787764 0.0951675587117722 -0.637201706218221 0.52547928403413
x1 0.0370287221087163 0.0935517335328456 0.395810111799967 0.693105501934868
Number of observations: 100, Error degrees of freedom: 98
Root Mean Squared Error: 0.946
R-squared: 0.0016, Adjusted R-Squared -0.00859
F-statistic vs. constant model: 0.157, p-value = 0.693
lm.RMSE
ans =
0.946014427051301
sqrt(sum((y - lm.predict(x)).^2/100))
ans =
0.936506503055594
sqrt(sum((y - lm.predict(x)).^2/98))
ans =
0.946014427051301
As you can see, dividing by the degrees of freedom is what fitlm must be doing.
댓글 수: 3
Greg Heath
2016년 6월 2일
You are asking which one is correct.
Well, they all are correct. They are just different measures of the same model. You are free to choose any one you want. HOWEVER, if the differences are significant then you should be able to explain why.
Since I are un injuneer and not a statistician, I will refer you to Google and Wikipedia re the search words
tutorial degrees-of-freedom
Hope this helps.
Greg
Anurag Banerjee
2018년 7월 4일
Engineers doing statistics. One day Statisticians will design cars
댓글 수: 0
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!