testckfold
Compare accuracies of two classification models by repeated cross-validation
Syntax
Description
testckfold statistically assesses the accuracies of two
            classification models by repeatedly cross-validating the two models, determining the
            differences in the classification loss, and then formulating the test statistic by
            combining the classification loss differences. This type of test is particularly
            appropriate when sample size is limited.
You can assess whether the accuracies of the classification models are different, or
            whether one classification model performs better than another. Available tests include a
            5-by-2 paired t test, a 5-by-2 paired F test, and
            a 10-by-10 repeated cross-validation t test. For more details, see
                Repeated Cross-Validation Tests. To speed up computations,
                testckfold supports parallel computing (requires a Parallel Computing Toolbox™ license).
h = testckfold(C1,C2,X1,X2)C1 and C2 have
equal accuracy in predicting the true class labels using the predictor
and response data in the tables X1 and X2. h = 1 indicates
to reject the null hypothesis at the 5% significance level.
testckfold conducts the cross-validation
test by applying C1 and C2 to
all predictor variables in X1 and X2,
respectively. The true class labels in X1 and X2 must
be the same. The response variable names in X1, X2, C1.ResponseName,
and C2.ResponseName must be the same.
For examples of ways to compare models, see Tips.
h = testckfold(___,Name,Value)Name,Value pair
arguments. For example, you can specify the type of alternative hypothesis,
the type of test, or the use of parallel computing.
Examples
Input Arguments
Name-Value Arguments
Output Arguments
More About
Tips
- Examples of ways to compare models include: - Compare the accuracies of a simple classification model and a more complex model by passing the same set of predictor data. 
- Compare the accuracies of two different models using two different sets of predictors. 
- Perform various types of Feature Selection. For example, you can compare the accuracy of a model trained using a set of predictors to the accuracy of one trained on a subset or different set of predictors. You can arbitrarily choose the set of predictors, or use a feature selection technique like PCA or sequential feature selection (see - pcaand- sequentialfs).
 
- If both of these statements are true, then you can omit supplying - Y.- Consequently, - testckfolduses the common response variable in the tables.
- One way to perform cost-insensitive feature selection is: - Create a classification model template that characterizes the first classification model ( - C1).
- Create a classification model template that characterizes the second classification model ( - C2).
- Specify two predictor data sets. For example, specify - X1as the full predictor set and- X2as a reduced set.
- Enter - testckfold(C1,C2,X1,X2,Y,'Alternative','less'). If- testckfoldreturns- 1, then there is enough evidence to suggest that the classification model that uses fewer predictors performs better than the model that uses the full predictor set.
 - Alternatively, you can assess whether there is a significant difference between the accuracies of the two models. To perform this assessment, remove the - 'Alternative','less'specification in step 4.- testckfoldconducts a two-sided test, and- h = 0indicates that there is not enough evidence to suggest a difference in the accuracy of the two models.
- The tests are appropriate for the misclassification rate classification loss, but you can specify other loss functions (see - LossFun). The key assumptions are that the estimated classification losses are independent and normally distributed with mean 0 and finite common variance under the two-sided null hypothesis. Classification losses other than the misclassification rate can violate this assumption.
- Highly discrete data, imbalanced classes, and highly imbalanced cost matrices can violate the normality assumption of classification loss differences. 
Algorithms
If you specify to conduct the 10-by-10 repeated cross-validation t test
using 'Test','10x10t', then testckfold uses
10 degrees of freedom for the t distribution to
find the critical region and estimate the p-value.
For more details, see [2] and [3].
Alternatives
Use testcholdout:
- For test sets with larger sample sizes 
- To implement variants of the McNemar test to compare two classification model accuracies 
- For cost-sensitive testing using a chi-square or likelihood ratio test. The chi-square test uses - quadprog(Optimization Toolbox), which requires an Optimization Toolbox™ license.
References
[1] Alpaydin, E. “Combined 5 x 2 CV F Test for Comparing Supervised Classification Learning Algorithms.” Neural Computation, Vol. 11, No. 8, 1999, pp. 1885–1992.
[2] Bouckaert. R. “Choosing Between Two Learning Algorithms Based on Calibrated Tests.” International Conference on Machine Learning, 2003, pp. 51–58.
[3] Bouckaert, R., and E. Frank. “Evaluating the Replicability of Significance Tests for Comparing Learning Algorithms.” Advances in Knowledge Discovery and Data Mining, 8th Pacific-Asia Conference, 2004, pp. 3–12.
[4] Dietterich, T. “Approximate statistical tests for comparing supervised classification learning algorithms.” Neural Computation, Vol. 10, No. 7, 1998, pp. 1895–1923.
[5] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd Ed. New York: Springer, 2008.
Extended Capabilities
Version History
Introduced in R2015aSee Also
testcholdout | templateECOC | templateEnsemble | templateDiscriminant | templateTree | templateSVM | templateNaiveBayes | templateKNN
