Different confidence intervals for regression slope

Question

0 개 추천

Can anyone explain why I am getting different answers for the confidence limits for the slope of a linear regression when I use polyfit and polyparci compared with using fitlm and coefCI. For example the following code generates some linearly correlated data with added noise, then does the least squares fit directly, using polyfit and using fitlm, extracting the key items of data at each step:

clear variables
x = (0:10)';
Y = 3.5*x + (((rand(size(x))-0.5)/3).*x);
% option 1
X = [ones(size(Y)), x];
B1 = X\Y;
Ycalc = X*B1;
R21 = 1 - sum((Y - Ycalc).^2)/sum((Y - mean(Y)).^2);
R2a1 = 1 - ((1-R21)*(length(Y)-1)/(length(Y)-length(B1)));
clear X Ycalc
% option 2
[p,S] = polyfit(x,Y,1);
B2 = fliplr(p)';
coef = corrcoef(x,Y);
R22 = coef(1,2)^2;
R2a2 = 1 - ((1-R22)*(length(Y)-1)/(length(Y)-length(B2)));
ci2 = polyparci(p,S,0.95);
clear p S coef
% option 3
mdl = fitlm(x,Y,'y ~ x1');
B3 = mdl.Coefficients{:,1};
R23 = mdl.Rsquared.Ordinary;
R2a3 = mdl.Rsquared.Adjusted;
ci3 = coefCI(mdl,0.05);
ci3 = fliplr(ci3');
clear mdl

As one would expect, all of the approaches produce the same regression coefficients, R-squared and adjusted R-squared values. However, the confidence intervals generated by polyparci and coefCI are different. In all cases I have tried, the range of the confidence limits returned by coefCI is wider than that from polyparci.

Can anyone explain why the methods produce different results?

Thanks, Brian

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Brian Scannell 2017년 4월 28일

0 개 추천

Ah, I think I've resolved it. There appears to be a difference in the way that the confidence interval alpha is interpreted. Calling polyparci(p,S,0.95) and coefCI(mdl,0.1) give the same answers.

I'm still not sure which set of limits are most appropriately described as the "95% confidence intervals" though - any views?

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

Star Strider 2017년 4월 28일

MATLAB Online에서 열기

0 개 추천

I originally tested polyparci only with nlparci, and the estimates then were essentially the same. I posted it before fitlm appeared.

Change the ‘tstat’ assignment in polyparci to:

tstat = @(tval) (max(alpha,(1-alpha)) - t_cdf(tval,PolyS.df) );    % Function to calculate t-statistic for p = ‘alpha’ and v = ‘PolyS.df’

and the results are identical with nlparci, fitlm and regress.

Thank you for discovering this glitch with the ‘alpha’ argument. I’ll update polyparci and post it.

댓글 수: 2
없음 표시 없음 숨기기

Brian Scannell 2017년 4월 28일

I am less confused by the alpha versus 1 - alpha issue than by the fact that to get matching results I have to specify 0.95 in polyparci and 0.1 (effectively 0.9) in coefCI.

I am interpreting the results from polyparci as being "there is a 95% probability that the "true" gradient is less than the calculated upper limit". Similarly, "there is a 95% probability that the "true" gradient is more than the lower limit". Taken together, it means there is a 10% chance that the "true" gradient is outside the bounds defined by the upper and lower limits.

So if the alpha input to coefCI is for the probability of the "true" gradient being outside the returned limits, then the factor two difference in the alpha value for the two functions makes sense.

But is this a correct interpretation of the outputs from the two functions?

Is this a distinction between "confidence limits" and "confidence interval"?

Thanks for your help.

Star Strider 2017년 4월 28일

My pleasure.

With the correction I posted, there is no ambiguity, and the confidence interval will be the same.

My impression is that the confidence interval calculation in nlparci changed between the time I wrote the function and now. I changed my function to accord with the current behavior of the MATLAB Statistics and Machine Learning Toolbox functions.

‘Taken together, it means there is a 10% chance that the "true" gradient is outside the bounds defined by the upper and lower limits.’

That is incorrect, at least as I read it. The confidence intervals are such that at a 95% (or 5%) confidence interval, there is a 95% probability that the true value is within those limits and a 5% (or ±2.5%) probability that they will lie outside those limits.

The terms ‘confidence limits’ and ‘confidence interval’ are essentially the same. The context must be clear if either term is used. I prefer the term ‘confidence limits’.

댓글을 달려면 로그인하십시오.

Different confidence intervals for regression slope

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기

카테고리

제품

태그

Community Treasure Hunt

Different confidence intervals for regression slope

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2 없음 표시 없음 숨기기

카테고리

제품

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기