Negative D2 score on training data after lassoglm fit
조회 수: 5 (최근 30일)
이전 댓글 표시
How can the deviance from a null model (i.e. betas all equal zero) be lower than the deviance from the full model? Surely lassoglm should choose betas all zero in this case?
From the code below, my d2Train is -0.0808.
[B, FitInfo] = lassoglm(table2array(indat.params.trainDataX), indat.params.trainDataY(:, minInd), 'poisson', 'Lambda', indat.combTable.bestLambdas(minInd), 'Alpha', indat.combTable.bestAlphas(minInd));
predCountsTrain = calculateRates(table2array(indat.params.trainDataX),B,FitInfo.Intercept)+eps;
predDevianceTrain = calculateDeviance(indat.params.trainDataY(:, minInd),predCountsTrain);
nullCountsTrain = calculateRates(table2array(indat.params.trainDataX),zeros(size(B)),FitInfo.Intercept)+eps;
nullDevianceTrain = calculateDeviance(indat.params.trainDataY(:, minInd),nullCountsTrain);
d2Train = 1 - (predDevianceTrain ./ nullDevianceTrain);
function rates = calculateRates(x,y,int)
rates = exp((x * y) + int);
end
function dev = calculateDeviance(observed,predicted)
scaledLogRatio = log(observed./predicted).*observed;
rawDifference = observed-predicted;
diffOfTerms = scaledLogRatio - rawDifference;
dev = nansum(diffOfTerms)*2;
end
댓글 수: 0
답변 (1개)
Jaimin
2024년 11월 25일
Hi @T0m07
The negative (d^2) value indicates that the full model's deviance is unexpectedly higher than the null model's. Kindly refer to the quick checks and fixes mentioned below:
Verify Calculations: Ensure “calculateRates” and “calculateDeviance’ are correctly implemented. Use a small constant (eps) to avoid division by zero.
function rates = calculateRates(x, y, int)
rates = exp((x * y) + int);
end
function dev = calculateDeviance(observed, predicted)
% Avoid division by zero or log of zero by adding a small constant
observed = observed + eps;
predicted = predicted + eps;
% Calculate scaled log ratio and raw difference
scaledLogRatio = log(observed ./ predicted) .* observed;
rawDifference = observed - predicted;
% Deviance calculation
diffOfTerms = scaledLogRatio - rawDifference;
dev = nansum(diffOfTerms) * 2;
end
Model Overfitting: Check if the model is overfitting. Adjust lambda and alpha values in “lassoglm” (https://www.mathworks.com/help/stats/lassoglm.html).
Data Issues: Inspect the data for anomalies or outliers that might affect predictions.
Poisson Assumptions: Ensure the Poisson model assumptions hold (mean ≈ variance). If not, consider alternatives like the negative binomial model.
Cross-validation: Use cross-validation to validate model performance and prevent overfitting.
By addressing these areas, you should improve model performance and resolve the deviance issue.
I hope this will be helpful.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Gaussian Process Regression에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!