Why NaN values are found in score from kfoldPredict

조회 수: 4 (최근 30일)
Yean Lim
Yean Lim 2020년 11월 17일
답변: Shashank Gupta 2020년 11월 20일
Names = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'};
isCategoricalPredictor = [false, false, true, false, true, false, false, false];
% Use tree learner
template = templateTree('NumVariablesToSample', 'all',... % to analyse predictor importance
'Reproducible',true, 'Surrogate','on', 'MaxNumSplits', maxNumSplits, 'MinLeafSize', minLeafSize); % Surrogate on to obtain measure of association
% optimizable variable does not accept
BestEnsembleMdl = fitcensemble(X_train,y_train,...
'Learners',template, ...
'Method', method, ...
'NumLearningCycles', numLearningCycles, ...
'Holdout', 0.2, ...
'LearnRate', learnRate, ...
'ScoreTransform','logit',... % transform scores to probabilistic estimates
'CategoricalPredictors', isCategoricalPredictor,...
'PredictorNames', Names);
[~, score] = kfoldPredict(BestEnsembleMdl);
Hi, I tried to run kfoldPredict using Classification Partitioned Ensemble produced by fitcensemble method.
When I run kfoldPredict, there are many NaN values found in the score variable returned by kfoldPredict method. Refered to the score variable in the attached mat file.
I am expecting to get real values from the score.
From example above, I use the following values:
learnRate = 0.9702
maxNumSplits = 16826
method = 'LogitBoost'
numLearningCycles = 2
minLeafSize = 1
I have saved X_train & y_train variables in the attached mat file. I have reduced the number of rows in X_train & y_train to 10 rows as a demonstration.
1) Why there are NaN values in the score?
2) What should I do to ensure that there are no NaN values in the score?
Thank you

답변 (1개)

Shashank Gupta
Shashank Gupta 2020년 11월 20일
Hey Yean,
Yes, you get NaNs at the output score, those NaNs value index denotes the "HoldOut" fraction which is used as validation data. So depending on HoldOut value, kfoldPredict choose the index from the training sample which will be used as validation and only those sample index will get scores and rest become NaN. You can check by changing the HoldOut Value and see those NaN keeps on changing. Also one suggestion make sure the classes are distributed well while training and testing.
I hope this clear some confusion and enough for you to explore.
Cheers

카테고리

Help CenterFile Exchange에서 Classification Ensembles에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by