10-fold cross validation with cross val or without

조회 수: 1 (최근 30일)
Elena Casiraghi
Elena Casiraghi 2020년 4월 4일
Dear, I'm trying to perform 10-fold HOLDOUT cross validation to train a knn classifier.
For each HOLDOUT I also want to use an internal HOLDOUT to oprtime the knn hyperparameters through Bayesian otpimization (which evaluated the accuracy on the internal holdouts)
Dear, I'm trying to perform 10-fold HOLDOUT cross validation to train a knn classifier.
For each HOLDOUT I also want to use an internal HOLDOUT to oprtime the knn hyperparameters through Bayesian otpimization (which evaluated the accuracy on the internal holdouts).
Therefore, I would like to use a procedure like this one:
nfoldExtHoldout = 10;
dataMat = randi(100, 5001,11); %trainig set composed of 5001 points, each being an 11 dimensional (11 features) point.
labels = rand(5001,1)>0.7; % unbalanced labels
numPts = size(dataMat,1);
for nF = 1: nfoldExtHoldout
[trainIdx,testIdx] = crossvalind('Holdout',numPts,0.3);
trainSet = dataMat(trainIdx,:); %I randomly take the 70 percent of samples in the training set
labelsTrain = labels(trainIdx); % corresponding labels for training points
testSet = dataMat(testIdx,:); %the remaining 30 percent of samples in the test set
labelsTest = labels(testIdx); % corresponding labels for test points
cv = cvPartition(numPts, 'Holdout', 0.3);
% train a knn classifier through Bayesian optimization to choose the best knn parameters
mdlknn = fitcknn(trainSet, labelsTrain, 'OptimizeHyperparameters', 'all', ...
'HyperparameterOptimizationOptions', ...
struct('Verbose', 0,'UseParallel',true, 'CVPartition', cv));
% predict test data
testPred = predict(mdlknn, testSet);
conf = confusionmat(testPred, labelsTest);
knnLoss(nF) = 1- sum(diag(conf))/numPts;
end
is this correct??
What's the difference between the code above and this one?
nfoldExtHoldout = 10;
dataMat = randi(100, 5001,11); %trainig set composed of 5001 points, each being an 11 dimensional (11 features) point.
labels = rand(5001,1)>0.7; % unbalanced labels
numPts = size(dataMat,1);
cv = cvPartition(numPts, 'Holdout', 0.3);
% train a knn classifier through Bayesian optimization to choose the best knn parameters
mdlknn = fitcknn(dataMat, labels, 'OptimizeHyperparameters', 'all', ...
'HyperparameterOptimizationOptions', ...
struct('Verbose', 0,'UseParallel',true, 'CVPartition', cv));
for nf = 1: nfoldExtHoldout
crossValKnn = crossval(mdlknn,'Holdout', rHoldout);
knnLoss(nf) = kfoldLoss(crossValKnn);
end
In other words, I dont' understand what is crossval doing.
I suppose in the code below, the hyperparameter optimization is run just once and the optimized hyperparameters are the same for all the external fold right?
Then crossval trains a separate knn for each holdout???

답변 (0개)

카테고리

Help CenterFile Exchange에서 Gaussian Process Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by