Negative D2 score on training data after lassoglm fit

Question

T0m07 2024년 11월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2166873-negative-d2-score-on-training-data-after-lassoglm-fit

답변: Jaimin 2024년 11월 25일

How can the deviance from a null model (i.e. betas all equal zero) be lower than the deviance from the full model? Surely lassoglm should choose betas all zero in this case?

From the code below, my d2Train is -0.0808.

[B, FitInfo] = lassoglm(table2array(indat.params.trainDataX), indat.params.trainDataY(:, minInd), 'poisson', 'Lambda', indat.combTable.bestLambdas(minInd), 'Alpha', indat.combTable.bestAlphas(minInd));
predCountsTrain = calculateRates(table2array(indat.params.trainDataX),B,FitInfo.Intercept)+eps;
predDevianceTrain = calculateDeviance(indat.params.trainDataY(:, minInd),predCountsTrain);
nullCountsTrain = calculateRates(table2array(indat.params.trainDataX),zeros(size(B)),FitInfo.Intercept)+eps;
nullDevianceTrain  = calculateDeviance(indat.params.trainDataY(:, minInd),nullCountsTrain);
d2Train = 1 - (predDevianceTrain ./ nullDevianceTrain);
function rates = calculateRates(x,y,int)
rates = exp((x * y) + int);
end
function dev = calculateDeviance(observed,predicted)
    scaledLogRatio = log(observed./predicted).*observed;
    rawDifference = observed-predicted;
    diffOfTerms = scaledLogRatio - rawDifference;
    dev = nansum(diffOfTerms)*2;
end

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Jaimin 2024년 11월 25일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2166873-negative-d2-score-on-training-data-after-lassoglm-fit#answer_1549278

MATLAB Online에서 열기

Hi @T0m07

The negative (d^2) value indicates that the full model's deviance is unexpectedly higher than the null model's. Kindly refer to the quick checks and fixes mentioned below:

Verify Calculations: Ensure “calculateRates” and “calculateDeviance’ are correctly implemented. Use a small constant (eps) to avoid division by zero.

function rates = calculateRates(x, y, int)
    rates = exp((x * y) + int);
end
function dev = calculateDeviance(observed, predicted)
    % Avoid division by zero or log of zero by adding a small constant
    observed = observed + eps;
    predicted = predicted + eps;
    
    % Calculate scaled log ratio and raw difference
    scaledLogRatio = log(observed ./ predicted) .* observed;
    rawDifference = observed - predicted;
    
    % Deviance calculation
    diffOfTerms = scaledLogRatio - rawDifference;
    dev = nansum(diffOfTerms) * 2;
end

Model Overfitting: Check if the model is overfitting. Adjust lambda and alpha values in “lassoglm” (https://www.mathworks.com/help/stats/lassoglm.html).

Data Issues: Inspect the data for anomalies or outliers that might affect predictions.

Poisson Assumptions: Ensure the Poisson model assumptions hold (mean ≈ variance). If not, consider alternatives like the negative binomial model.

Cross-validation: Use cross-validation to validate model performance and prevent overfitting.

By addressing these areas, you should improve model performance and resolve the deviance issue.

I hope this will be helpful.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Negative D2 score on training data after lassoglm fit

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Negative D2 score on training data after lassoglm fit

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기