Are lassoglm solutions independent of data order?

Question

Ken Johnson 2024년 8월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2145089-are-lassoglm-solutions-independent-of-data-order

댓글: Ken Johnson 2024년 8월 19일

I thought lassoglm solutions were unique, but I find that the solution from lassoglm depends on the X array order. Is there a way to avoid this? Here's my example. I have 3 X variables (3 columns) and the Y variable. I get different solutions with X(var1, var2, var3) or X(var1, var3, var2). In the example code, the fitted values are:

CONC123 = [21.54, 1.689, 0.726]

CONC132 = [21.94, 2.558, 0]

Which solution is most correct? The deviance for the 123 solution is a bit smaller.

load('YX123') % Y and X(var1, var2, var3)

load('YX132') % Y and X(var1, var3, var2)

lambda = 0.0005 ; % lambda was optimized at 0.0005 with a training set

reltol = 1e-4; % default value

alpha = 1; % forces lasso regression

[CONC123,FitInfo123]=lassoglm(X123,Y,'normal','Alpha',alpha,'Lambda',lambda,'RelTol',reltol);

[CONC132,FitInfo132]=lassoglm(X132,Y,'normal','Alpha',alpha,'Lambda',lambda,'RelTol',reltol);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Ayush 2024년 8월 16일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2145089-are-lassoglm-solutions-independent-of-data-order#answer_1499499

MATLAB Online에서 열기

Hi Ken,

The issue you're encountering is related to the numerical stability and convergence properties of the optimization algorithm used in“lassoglm” function. The order of the columns in the“X”matrix can sometimes affect the solution due to these numerical properties. However, in theory, the Lasso regression should yield the same solution regardless of the order of the columns in“X”.

However, I use several methods to mitigate this type of numerical instability:

Standardize the features: Standardizing features, i.e. scaling them to have zero mean and unit variance, can help in making the optimization process more stable and less sensitive to order of the columns. Here’s the pseudo code for standardizing the features and performing required Lasso regression on standardized features.

% Standardize the features 
X123_standardized = zscore(X123); 
X132_standardized = zscore(X132); 
% Perform Lasso regression on standardized features 
[CONC123, FitInfo123] = lassoglm(X123_standardized, Y, 'normal', 'Alpha', alpha, 'Lambda', lambda, 'RelTol', reltol); 
[CONC132, FitInfo132] = lassoglm(X132_standardized, Y, 'normal', 'Alpha', alpha, 'Lambda', lambda, 'RelTol', reltol); 

2. Checking deviance: If deviance for one solution is smaller, that solution is generally more desirable. However, it’s essential to confirm that this is not due to overfitting.

deviance123 = FitInfo123.Deviance; 
deviance132 = FitInfo132.Deviance; 
if deviance123 < deviance132 
    % solution 123 is preferred 
else 
    % solution 132 is preferred 
end 

Note: One more technique which I generally use is “Cross-validation”. It helps to ensure that the chosen model generalizes well to unseen data. This can sometimes help in mitigating the sensitivity to feature order.

So, by standardizing your features, comparing deviance, and using cross-validation, you can reduce the sensitivity of your Lasso regression solutions to the order of the columns in“X”. The solution with the smaller deviance is generally more correct, but it's crucial to ensure that this is not due to overfitting.

For standardization, I’ve used “zscore” function. You can read more about it here:

zscore: https://in.mathworks.com/help/releases/R2024a/stats/zscore.html

Also, for reading more about Lasso regularization, you may refer:

lassoglm : https://in.mathworks.com/help/releases/R2024a/stats/lassoglm.html

Hope it helps!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Ken Johnson 2024년 8월 19일

Super, thank you.

댓글을 달려면 로그인하십시오.

Are lassoglm solutions independent of data order?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Are lassoglm solutions independent of data order?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기