How can I use repeated, k-fold cross-validation results with rocmetrics?

조회 수: 3 (최근 30일)
Thomas Kirsh
Thomas Kirsh 2023년 10월 6일
댓글: the cyclist 2023년 10월 11일
I have 10-repeat 5-fold cross-validation scores and labels for a model that I'm trying to efficiently plot ROC curves for usine rocmetrics. When I run the line
robj = rocmetrics(target, prediction, 1);
I get the error
Error using rocmetrics>validateScoresLabelsAndWeights
The cell array of cross-validated scores must be a vector.
Each cell in target and prediction are double arrays of shapes 54x1 or 55x1. The shapes match cell to cell between both. I'm confused by this error because it's clear that the cell array of my scores(predictions) are vectors. I think the issue is the repeated cross-validation. How can I format my target and prediction in order to use rocmetrics with my results?

답변 (1개)

the cyclist
the cyclist 2023년 10월 6일
Note the following line from the rocmetrics documentation:
"For cross-validated data, you must specify Labels, Scores, and Weights as cell arrays with the same number of elements. rocmetrics treats an element in the cell arrays as data from one cross-validation fold and computes pointwise confidence intervals for the performance metrics. The length of Labels{i} and the number of rows in Scores{i} must be equal."
You need to supply the fold weights.
Alternatively, you could loop over the folds to see ROC metrics on each fold, or decide prior to calling rocmetrics how you want to combine the folds into a single prediction.
  댓글 수: 7
Thomas Kirsh
Thomas Kirsh 2023년 10월 11일
Thank you, that's a good workaround! My only concern is thinking about if this gives an accurate mean ROC for my experiment. Wouldn't it make more sense to concatenate the folds first and then reshape?
the cyclist
the cyclist 2023년 10월 11일
I have to admit that I don't really have experience specifically with repeated k-fold cross-validation, so I don't know what is conventional in terms of combining information from repeats and folds. My impression is that one treats it as M*k results, which is what my code is doing. I don't think concatenating folds is typically done, because that would look like you had a dataset that was k times larger.
Also ... I hope the data you posted isn't your real data. The model performance is no better than random.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Get Started with Statistics and Machine Learning Toolbox에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by