Importance of cross-validation

버전 1.0.0 (100 KB) 작성자: Valentina Unakafova
This example illustrates that omitting cross-validation can result in misleadingly high goodness-of-fit due to overfitting
다운로드 수: 68
업데이트 날짜: 2018/8/30

라이선스 보기

randomCrossValidation.m illustrates that omitting cross-validation can result in misleadingly high goodness-of-fit due to overfitting

DESCRIPTION

Random Poisson distributed matrix x and vector y are fitted with Poisson Generalised Linear Model [1] and goodness-of-fit is estimated for two cases:
1 Without cross-validation which results in high pseudo-R2 (pR2) value
2 With cross-validation which gives correct low pR2 value

Misleadingly high pR2 value and good fit without cross-validation is due to overfitting. This means that the model fits too much to the data without taking into account essential properties of the data, i.e. this model cannot explain any not used in the training set values.

Note, that pR2 measure is a common goodness-of-fit measure for Poisson distributed data, see [2,3] for pR2 definition and [4] for its MATLAB implementation.

INPUT

nPoints - number of predicted points ( 1 x 1 )
nCovariates - number of predicting covariates ( 1 x 1 )
x - matrix of covariates ( nCovariates x nPoints )
y - response variable ( 1 x nPoints )

OUTPUT

pR2 - pR2 value of fit when not using cross-validation ( 1 x 1 )
pR2crossValidated - pR2 value of fit when using cross-validation ( 1 x 1 )
yEstimated - estimated values of y from x when not using cross-validation ( 1 x nPoints )
yEstimatedCrossValidated - estimated values of y from x when using cross-validation ( 1 x nPoints )

EXAMPLE OF USE

1 Upload scripts using Download button
2 Run randomCrossValidation.m (it will take a few minutes, uncomment line 107 and run plotting section if you do not want to wait)
3 Have a look at the plotted figures for different number of observations (points in response variable)

REFERENCES

[1] J.A. Nelder and R.J. Baker. Generalized linear models. Wiley Online Library, 1972.
[2] Heinzl, H. and Mittlboeck, M., 2003. Pseudo R-squared measures for Poisson regression models with over-or underdispersion. Computational statistics & data analysis, 44(1-2), pp.253-271.
[3] Mittlböck, M. (2002). Calculating adjusted R2 measures for Poisson regression models. Computer Methods and Programs in Biomedicine, 68(3), 205-214.
[4] Unakafova V.A. Pseudo-R squared measure for poisson regression models. https://de.mathworks.com/matlabcentral/fileexchange/67041-pseudo-r-squared-measurefor-poisson-regression-models, 2018.

인용 양식

Valentina Unakafova (2024). Importance of cross-validation (https://www.mathworks.com/matlabcentral/fileexchange/68666-importance-of-cross-validation), MATLAB Central File Exchange. 검색됨 .

MATLAB 릴리스 호환 정보
개발 환경: R2018a
모든 릴리스와 호환
플랫폼 호환성
Windows macOS Linux
카테고리
Help CenterMATLAB Answers에서 Analyze Data에 대해 자세히 알아보기
태그 태그 추가

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
버전 게시됨 릴리스 정보
1.0.0