Maximum size for linear regression
이전 댓글 표시
What is the maximum size for linear regression problems? One dependent variable.
답변 (1개)
John D'Errico
2017년 8월 12일
편집: John D'Errico
2017년 8월 12일
There is NO maximum size.
The only limit will be essentially a function of how much memory you have.
Are you asking about a model like y=a*x+b? Since a simple linear model like this is soooooo trivial to compute, even for a huge number of parameters, I doubt you should be worried.
Do you have hundreds of millions of points for such a model?
How many parameters are you trying to estimate?
For example,
x = randn(1e8,1);
y = randn(size(x));
M = polyfit(x,y,1);
This quick test took roughly 4 seconds to estimate the model for 100 million points. That may seem like it is significant, but it took roughly that long just to generate the random data!
Each of the arrays x and y require roughly 0.8 gigabyte of RAM apiece just to store. The linear regression itself very temporarily required 3 more gigabytes of RAM.
So, if I was trying to do this with say 500 million data points, I would have been stretching the limits of the RAM I have installed, and MATLAB would have started doing a bit of disk swapping. Even at that, it would have been doable since I have a solid state drive.
Again though, it is really only the memory you have installed that limits such a computation.
댓글 수: 5
Tom Graney
2017년 8월 12일
John D'Errico
2017년 8월 12일
편집: John D'Errico
2017년 8월 12일
How fast can I say utterly TRIVIAL? :)
John D'Errico
2017년 8월 12일
편집: John D'Errico
2017년 8월 12일
Assuming you mean 5000 observations with 250 unknowns...
A = rand(5000,250);b = rand(5000,1);
timeit(@() A\b)
ans =
0.10432
I did not even see more than a tiny blip on the monitor to see any additional memory consumed.
If you meant 250 observations with 5000 variables, then the problem is underdetermined as a linear regression. Still trivial to solve, though most of the variables will be zero in the result.
A = rand(250,5000);b = rand(250,1);
timeit(@() A\b)
ans =
0.15686
More variables, so a bit more time. Again, no serious blip on the memory consumed. Though in both cases, it was enough to fire up multiple cores to solve the problem.
Tom Graney
2017년 8월 12일
John D'Errico
2017년 8월 13일
It is not even a large problem for MATLAB.
카테고리
도움말 센터 및 File Exchange에서 Linear Predictive Coding에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!