Fastest way of computing standard errors / inverting X'X
이전 댓글 표시
Hello all,
I'm running a Monte Carlo study and need to evaluate many linear regressions y = X b + u very efficiently. Specifically, I need to estimate a regression and compute the standard errors of the estimates many times. So speed is very important, while accuracy is not too much so. Evidently, Matlab's built-in functions such as lscov and regress take a fair amount of time to run. Hence, I wrote a little function to do these tasks myself.
However, as I need to calculate the standard errors, the function is still quite slow. I use inv(X' * X) to get the inverse, as it is used in the calculation of the standard errors (and it is evidently faster than X' * X \ eye(size(X, 2))). Is there a faster way of doing this, i.e. by some smart factorizations? I saw a suggestion to use a QR decomposition, and then use inv( R ) for calculating inv(X'* X) but this is even slower.
All the help is greatly appreciated!
채택된 답변
추가 답변 (1개)
Matt Tearle
2012년 1월 3일
inv is evil -- avoid! The quickest way to do a regression is
c = X\y;
then you can calculate errors yourself
resid = X*c - y;
% etc
but are you doing this numerous times for different y but the same X? If so, maybe an LU decomposition would help.
댓글 수: 10
mickey
2012년 1월 3일
Matt Tearle
2012년 1월 3일
Oh, sorry. I need to read more carefully. I was thinking of the standard error of the regression, rather than the errors of the coefficients.
How big is X? (And I assume it's changing each time?) Unless k is large, I don't think the inverse calculation is actually going to consume that much time. If N is large, the calculation of X'*X may be the time sink. Have you tried running it in the Profiler?
mickey
2012년 1월 3일
Matt Tearle
2012년 1월 3일
But have you separated out the calculations of X'*X and inv(...)? Not that it matters much unless there's some other way to calculate the standard errors without X'*X...
Long shot, but... if it's only 3 variables, is there any chance that there are simple, closed form formulae for the SEs, that don't involve having to go through X'X? I seem to recall you can rewrite the SE formulae for the slope/intercept of a simple linear regression. But I don't recall if it actually saves on any sum-of-products calculations. I don't remember the theory to know if you actually need all the "cross term" dot products. Probably :(
mickey
2012년 1월 3일
Matt Tearle
2012년 1월 3일
Well, don't keep me in suspense -- what was the bottleneck?
Anyway, riffing off the approach in that paper, it seems like
[~,r] = qr(x);
rinv = r(1:3,:)\eye(3);
vcov2 = rinv*rinv';
works a bit faster than inv(x'*x). But you indicated in your initial question that this was slower for you. Odd.
mickey
2012년 1월 3일
mickey
2012년 1월 3일
Matt Tearle
2012년 1월 3일
When I test it, I get the same result for rinv*rinv' as inv(x'*x), to within roundoff. (Which is what should happen, from theory.) I thought extracting the non-zero rows of r might help dispose of a few redundant calculations. But if inv() is the bottleneck... Maybe bruteforce inverse calculation would help? This seems to be a tiny bit faster than inv:
[~,r] = qr(x);
rinv = diag(1./diag(r));
rinv(2,3) = -r(2,3)*rinv(3,3)/r(2,2);
rinv(1,2) = -r(1,2)*rinv(2,2)/r(1,1);
rinv(1,3) = (-r(1,2)*rinv(2,3)-r(1,3)*rinv(3,3))/r(1,1);
vcov = rinv*rinv';
But, after all that, I must have been doing something wrong before, because I'm now getting that inv(x'*x) is faster after all.
mickey
2012년 1월 4일
카테고리
도움말 센터 및 File Exchange에서 Descriptive Statistics and Insights에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!