Tutorial for adaboost Package (AdaBoost_samme with two_level_decision_tree)
By Jarek Tuszynski
Adaboost package consists of two multi-class adaboost classifiers:
- AdaBoost_samme - multi-class extension to classic adaboost.M1 algorithm (which was for two-class problems) which was first described in a paper by Ji Zhu, Saharon Rosset, Hui Zou and Trevor Hastie, “Multi-class AdaBoost”, January 12, 2006. https://web.stanford.edu/~hastie/Papers/samme.pdf
- AdaBoost_mult - solves same problem using a bank of two-class adaboost classifiers. A three class problem will use three 2-class classifiers solving class 1 vs. 2 & 3, class 2 vs. 1 & 3 and class 3 vs. 1 and 2 problems, than each sample is tested with each of the three classifiers and class is assigned based on the one with the maximum score.
Boosting classifiers work by using a multiple "weak-learner" classifiers. In this package we provide two weak-learner classifiers:
- decision_stump - a single node decision "tree". For two-class problems decision_stump class calls "train_stump_2" function which performs exhaustive search of all possible thresholds for all features and picks the optimal one. For multi-class problems "train_stump_N" function is called which calls "train_stump_2" for all pairs of classes to pick the optimal one.
- two_level_decision_tree - three nodes in two layers decision "tree" class.
This demo concentrates on AdaBoost_samme class with two_level_decision_tree weak learner
Contents
- Change History
- Licence
- Read in test file
- Track performance per iteration using calc_accuracy metchod
- Check which features are being used
- Test export_model and import_model functions
- Test save_adaboost_model and load_adaboost_model functions
- Use cross validation to prevent training and testing on the same data
- Test classifier on 2-class "donut" problem
- Test classifier on 3-class "donut" problem
- Test classifier on 2-class "diagonals" problem
- Test classifier on 3-class "diagonals" problem
Change History
Licence
The package is distributed under BSD License
format compact; % viewing preference clear variables; close all type('license.txt')
Copyright (C) 2011 by Kenneth Morton and contributors Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Read in test file
Cardiotocography (CTG) Data Set has measurements of fetal heart rate (FHR) and uterine contraction (UC) features on cardiotocograms classified by expert obstetricians as: normal, suspect and pathologic Paper: Ayres de Campos et al. (2000) SisPorto 2.0 A Program for Automated Analysis of Cardiotocograms. J Matern Fetal Med 5:311-318 URL: https://archive.ics.uci.edu/ml/datasets/Cardiotocography
[~, ~, ctg] = xlsread('CTG-1.xlsx', 'Raw Data'); X = cell2mat(ctg(3:end, 4:end-2)); y = cell2mat(ctg(3:end, end)); colLabel = ctg(1, 4:end-2); verbose = true; classifier = AdaBoost_samme(two_level_decision_tree, verbose); % blank classifier nTree = 20; C = classifier.train(X, y, [], nTree);
1) X( 35)> 0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 33)> 0.500 ? 1 : 3): weight( 1)=3.155; stump accuracy( 1)= 92.1%; overall accuracy= 92.1% 167 misclassified 2) X( 9)> 0.550 ? (X( 30)> 0.500 ? 1 : 2) : (X( 34)> 0.500 ? 2 : 3): weight( 2)=2.352; stump accuracy( 2)= 87.5%; overall accuracy= 92.1% 167 misclassified 3) X( 34)> 0.500 ? (X( 2)>359.000 ? 2 : 3) : (X( 33)> 0.500 ? 2 : 3): weight( 3)=2.206; stump accuracy( 3)= 22.1%; overall accuracy= 93.6% 137 misclassified 4) X( 32)> 0.500 ? (X( 18)>172.500 ? 1 : 2) : (X( 34)> 0.500 ? 1 : 3): weight( 4)=2.498; stump accuracy( 4)= 80.1%; overall accuracy= 95.2% 101 misclassified 5) X( 33)> 0.500 ? (X( 1)> 0.000 ? 3 : 3) : (X( 10)> 68.500 ? 2 : 3): weight( 5)=1.485; stump accuracy( 5)= 20.9%; overall accuracy= 92.9% 152 misclassified 6) X( 33)> 0.500 ? (X( 1)> 0.000 ? 3 : 3) : (X( 34)> 0.500 ? 1 : 3): weight( 6)=1.757; stump accuracy( 6)= 86.1%; overall accuracy= 95.3% 99 misclassified 7) X( 10)> 68.500 ? (X( 26)> 0.500 ? 3 : 1) : (X( 33)> 0.500 ? 2 : 3): weight( 7)=1.564; stump accuracy( 7)= 21.1%; overall accuracy= 92.9% 152 misclassified 8) X( 34)> 0.500 ? (X( 24)> 45.500 ? 3 : 2) : (X( 33)> 0.500 ? 1 : 3): weight( 8)=1.559; stump accuracy( 8)= 86.1%; overall accuracy= 95.4% 98 misclassified 9) X( 10)> 68.500 ? (X( 26)> 0.500 ? 3 : 1) : (X( 33)> 0.500 ? 2 : 3): weight( 9)=1.467; stump accuracy( 9)= 21.1%; overall accuracy= 92.9% 151 misclassified 10) X( 35)> 0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 34)> 0.500 ? 1 : 3): weight( 10)=1.587; stump accuracy( 10)= 90.3%; overall accuracy= 95.4% 98 misclassified 11) X( 8)> 79.500 ? (X( 34)> 0.500 ? 1 : 3) : (X( 33)> 0.500 ? 2 : 3): weight( 11)=1.500; stump accuracy( 11)= 20.4%; overall accuracy= 93.0% 148 misclassified 12) X( 14)> 1.500 ? (X( 33)> 0.500 ? 2 : 3) : (X( 35)> 0.500 ? 1 : 3): weight( 12)=1.516; stump accuracy( 12)= 81.1%; overall accuracy= 95.5% 96 misclassified 13) X( 8)> 79.500 ? (X( 34)> 0.500 ? 1 : 3) : (X( 10)> 68.500 ? 2 : 3): weight( 13)=1.429; stump accuracy( 13)= 17.1%; overall accuracy= 94.4% 119 misclassified 14) X( 30)> 0.500 ? (X( 2)>294.000 ? 1 : 2) : (X( 34)> 0.500 ? 1 : 3): weight( 14)=2.195; stump accuracy( 14)= 84.1%; overall accuracy= 98.6% 30 misclassified 15) X( 33)> 0.500 ? (X( 1)> 0.000 ? 3 : 3) : (X( 10)> 68.500 ? 2 : 3): weight( 15)=1.584; stump accuracy( 15)= 20.9%; overall accuracy= 98.6% 30 misclassified 16) X( 22)>107.500 ? (X( 34)> 0.500 ? 1 : 3) : (X( 8)> 24.500 ? 2 : 3): weight( 16)=1.636; stump accuracy( 16)= 85.0%; overall accuracy= 98.6% 29 misclassified 17) X( 10)> 68.500 ? (X( 26)> 0.500 ? 3 : 1) : (X( 33)> 0.500 ? 2 : 3): weight( 17)=1.443; stump accuracy( 17)= 21.1%; overall accuracy= 98.5% 31 misclassified 18) X( 35)> 0.500 ? (X( 10)> 68.500 ? 2 : 3) : (X( 34)> 0.500 ? 1 : 3): weight( 18)=1.481; stump accuracy( 18)= 90.3%; overall accuracy= 98.6% 29 misclassified 19) X( 22)>105.500 ? (X( 33)> 0.500 ? 2 : 3) : (X( 33)> 0.500 ? 1 : 3): weight( 19)=1.481; stump accuracy( 19)= 19.2%; overall accuracy= 98.6% 30 misclassified 20) X( 34)> 0.500 ? (X( 2)>359.000 ? 2 : 3) : (X( 33)> 0.500 ? 1 : 3): weight( 20)=1.459; stump accuracy( 20)= 86.1%; overall accuracy= 98.6% 30 misclassified
Track performance per iteration using calc_accuracy metchod
[accuracy, tpr] = C.calc_accuracy( X, y, nTree); hold on plot(tpr) plot(accuracy, 'LineWidth',2) xlabel('Number of iterations') legend({'true positive rate of normal patients', 'true positive rate of suspect patients', ... 'true positive rate of Pathologic patients', 'overall accuracy'},'Location','southeast')

Check which features are being used
[hx, hy] = C.feature_hist(); hy = 100*hy/sum(hy); % normalize and convert to percent hx(hy==0) = []; % delete unused features hy(hy==0) = []; [hy, idx] = sort(hy); % sort hx = hx(idx); clf barh(hy); axis tight xlabel('Percent used') ylabel('Feature name') ax = gca; ax.YTick = 1:length(hx); ax.YTickLabel = colLabel(hx);

Test export_model and import_model functions
[strList, labels, header] = C.export_model(); CC = classifier.import_model(strList, labels); % initialize new model Y = C .predict(X); YY = CC.predict(X); fprintf('Number of mismatches between models: %i\n', nnz(Y~=YY));
Number of mismatches between models: 0
Test save_adaboost_model and load_adaboost_model functions
save_adaboost_model(C, 'classifier.csv'); CC = load_adaboost_model(classifier, 'classifier.csv'); Y = C .predict(X); YY = CC.predict(X); fprintf('Number of mismatches between models: %i\n', nnz(Y~=YY)); type('classifier.csv')
Number of mismatches between models: 0 Classifier#,nIter,label #1,label #2,label #3 1,20,1,2,3 dimension1, dimension2, dimension3, threshold1, threshold2, threshold3, label 1, label 2, label 3, label 4, weights 35,10,33,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,3.155343e+00 9,30,34,5.500000e-01,5.000000e-01,5.000000e-01,1,2,2,3,2.351993e+00 34,2,33,5.000000e-01,3.590000e+02,5.000000e-01,2,3,2,3,2.206240e+00 32,18,34,5.000000e-01,1.725000e+02,5.000000e-01,1,2,1,3,2.497649e+00 33,1,10,5.000000e-01,0.000000e+00,6.850000e+01,3,3,2,3,1.484926e+00 33,1,34,5.000000e-01,0.000000e+00,5.000000e-01,3,3,1,3,1.757134e+00 10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.564343e+00 34,24,33,5.000000e-01,4.550000e+01,5.000000e-01,3,2,1,3,1.558576e+00 10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.466524e+00 35,10,34,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,1.587135e+00 8,34,33,7.950000e+01,5.000000e-01,5.000000e-01,1,3,2,3,1.500161e+00 14,33,35,1.500000e+00,5.000000e-01,5.000000e-01,2,3,1,3,1.516331e+00 8,34,10,7.950000e+01,5.000000e-01,6.850000e+01,1,3,2,3,1.429478e+00 30,2,34,5.000000e-01,2.940000e+02,5.000000e-01,1,2,1,3,2.195269e+00 33,1,10,5.000000e-01,0.000000e+00,6.850000e+01,3,3,2,3,1.583904e+00 22,34,8,1.075000e+02,5.000000e-01,2.450000e+01,1,3,2,3,1.635855e+00 10,26,33,6.850000e+01,5.000000e-01,5.000000e-01,3,1,2,3,1.443462e+00 35,10,34,5.000000e-01,6.850000e+01,5.000000e-01,2,3,1,3,1.481294e+00 22,33,33,1.055000e+02,5.000000e-01,5.000000e-01,2,3,1,3,1.480922e+00 34,2,33,5.000000e-01,3.590000e+02,5.000000e-01,2,3,1,3,1.459293e+00
Use cross validation to prevent training and testing on the same data
fprintf('Classification is %i%% accurate when training and testing on the same data.\n', ... round(100*mean(y==Y))); classifier = AdaBoost_samme(two_level_decision_tree); nFold = 10; % ten-fold validation Y = cross_validation( classifier, X, y, nTree, nFold); fprintf('Classification is %i%% accurate when using 10-fold cross validation\n', ... round(100*mean(y==Y)));
Classification is 99% accurate when training and testing on the same data. Classification is 97% accurate when using 10-fold cross validation
Test classifier on 2-class "donut" problem
nSamp = 1000;
[Xb,Yb] = pol2cart((1:nSamp)*2*pi/nSamp,3);
X = 2*[randn(nSamp,2); randn(nSamp,2)+ [Xb' ,Yb'] ];
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1)];
nTree = 10; % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);

Test classifier on 3-class "donut" problem
nSamp = 1000;
[Xb,Yb] = pol2cart((1:nSamp)*2*pi/nSamp,3);
[Xc,Yc] = pol2cart((1:nSamp)*2*pi/nSamp,6);
X= 1.25*[randn(nSamp,2); randn(nSamp,2)+[Xb',Yb']; randn(nSamp,2)+[Xc',Yc'] ];
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1); 3+zeros(nSamp,1)];
nTree = 20; % number of trees
C = classifier.train(X, y, [], nTree);
Y = C.predict(X);
AdaBoost_demo_plot(C, X, y);

Test classifier on 2-class "diagonals" problem
nSamp = 1000;
Xa = (1:nSamp)*10/nSamp;
d = 6;
s = sign(randn(nSamp,1));
X = [randn(nSamp,2)+[Xa',Xa']; randn(nSamp,2)+[Xa',Xa'+s.*d]]-5;
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1)];
nTree = 20; % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);

Test classifier on 3-class "diagonals" problem
nSamp = 1000;
Xa = (1:nSamp)*10/nSamp;
d = 4;
X = [randn(nSamp,2)+[Xa',Xa']; randn(nSamp,2)+[Xa',Xa'+d]-1; randn(nSamp,2)+[Xa',Xa'-d]+1]-5;
y = [1+zeros(nSamp,1); 2+zeros(nSamp,1); 3+zeros(nSamp,1)];
nTree = 20; % number of trees
C = classifier.train(X, y, [], nTree);
AdaBoost_demo_plot(C, X, y);
