Why NaN values are found in score from kfoldPredict

Question

Yean Lim 2020년 11월 17일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/651488-why-nan-values-are-found-in-score-from-kfoldpredict

답변: Shashank Gupta 2020년 11월 20일

simple.mat

Names = {'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'};
isCategoricalPredictor = [false, false, true, false, true, false, false, false];
        
% Use tree learner
template = templateTree('NumVariablesToSample', 'all',... % to analyse predictor importance
    'Reproducible',true, 'Surrogate','on', 'MaxNumSplits', maxNumSplits, 'MinLeafSize', minLeafSize); % Surrogate on to obtain measure of association
   
% optimizable variable does not accept
BestEnsembleMdl = fitcensemble(X_train,y_train,...
    'Learners',template, ...
    'Method', method, ...
    'NumLearningCycles', numLearningCycles, ...
    'Holdout', 0.2, ...
    'LearnRate', learnRate, ...
    'ScoreTransform','logit',... % transform scores to probabilistic estimates
    'CategoricalPredictors', isCategoricalPredictor,...
    'PredictorNames', Names);
[~, score] = kfoldPredict(BestEnsembleMdl);

Hi, I tried to run kfoldPredict using Classification Partitioned Ensemble produced by fitcensemble method.

When I run kfoldPredict, there are many NaN values found in the score variable returned by kfoldPredict method. Refered to the score variable in the attached mat file.

I am expecting to get real values from the score.

From example above, I use the following values:

learnRate = 0.9702

maxNumSplits = 16826

method = 'LogitBoost'

numLearningCycles = 2

minLeafSize = 1

I have saved X_train & y_train variables in the attached mat file. I have reduced the number of rows in X_train & y_train to 10 rows as a demonstration.

1) Why there are NaN values in the score?

2) What should I do to ensure that there are no NaN values in the score?

Thank you

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Shashank Gupta 2020년 11월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/651488-why-nan-values-are-found-in-score-from-kfoldpredict#answer_550858

Hey Yean,

Yes, you get NaNs at the output score, those NaNs value index denotes the "HoldOut" fraction which is used as validation data. So depending on HoldOut value, kfoldPredict choose the index from the training sample which will be used as validation and only those sample index will get scores and rest become NaN. You can check by changing the HoldOut Value and see those NaN keeps on changing. Also one suggestion make sure the classes are distributed well while training and testing.

I hope this clear some confusion and enough for you to explore.

Cheers

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Why NaN values are found in score from kfoldPredict

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Why NaN values are found in score from kfoldPredict

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기