Why is LASSO in MATLAB so slow in the case of highly correlated predictors?

조회 수: 4 (최근 30일)
I am using LASSO based on 4-fold cross-validation in a regression problem. I observed that with an increasing number of predictors, the computation time for the MATLAB LASSO function increases dramatically, such that it becomes unfeasible for me (since I need to run the LASSO several 1000 times). E.g, for 100 predictors, LASSO needs mor than 60 sec. The same example in Python takes only few seconds. What could be the reason for such a difference in computation speed? ---added later: I observed that it is not the number of predictors that affects LASSO computation time, but the degree of colinearity in the predictors. MATLAB algorithm 'cDescentCycle' takes almost all the computation time. MATLAB help suggests using ELASTIC NET (set alpha < 1) in case of highly correlated predictors. ELASTIC NET is a bit faster,but is still unfeasible slow. I have not done further tests with LASSO implemented in python. I still don't know what to do to increase speed of LASSO in the case of highly correlated predictors (reducing the number of Lambda values or increasing the RelTol parameter does help only very little, ~few sec).

채택된 답변

Ilya
Ilya 2015년 12월 1일
There could be many reasons. The lasso function has a lot of flexibility, so make sure you are comparing apples and apples. To make it run faster, you could
  1. Use fewer values of lambda.
  2. Increase the relative tolerance.
  3. Try standardizing or not standardizing predictors.
  4. Try running in parallel if you have a Parallel Computing Toolbox license.
The function would still be likely slower than C/C++ or Fortran code.
  댓글 수: 2
Marlis Hofer
Marlis Hofer 2015년 12월 2일
편집: Marlis Hofer 2015년 12월 2일
Thanks for your answer! I have already tried out different options of LASSO (e.g., increasing the RelTol one order of magnitude, decreasing NumLambda to 50, using the Parallel option). This helped to increase speed but only for a small fraction of the total run time, such that it is still too slow. I agree that I should not compare Python with MATLAB without specifying the exact options in each algorithm. However, I observed (as also updated in my question) that it is not the number of predictors, but the collinearity amongst them which affects the speed.
Ilya
Ilya 2015년 12월 2일
편집: Ilya 2015년 12월 2일
If you are willing to experiment a bit, try this. Find the cdescentCycle function inside lasso and replace line 799 (line numbers could be different in your version)
for j=find(active);
with these 3 lines:
a = find(active);
a = a(randperm(numel(a)));
for j=a
Does this help?

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Get Started with Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by