# resubEdge

Resubstitution classification edge for naive Bayes classifier

## Syntax

``e = resubEdge(Mdl)``

## Description

example

````e = resubEdge(Mdl)` returns the resubstitution Classification Edge (`e`) for the naive Bayes classifier `Mdl` using the training data stored in `Mdl.X` and the corresponding class labels stored in `Mdl.Y`.The classification edge is a scalar value that represents the weighted mean of the Classification Margins.```

## Examples

collapse all

Estimate the resubstitution edge (the average in-sample classification margin) of a naive Bayes classifier.

Load the `fisheriris` data set. Create `X` as a numeric matrix that contains four petal measurements for 150 irises. Create `Y` as a cell array of character vectors that contains the corresponding iris species.

```load fisheriris X = meas; Y = species; rng('default') % for reproducibility```

Train a naive Bayes classifier using the predictors `X` and class labels `Y`. A recommended practice is to specify the class names. `fitcnb` assumes that each predictor is conditionally and normally distributed.

`Mdl = fitcnb(X,Y,'ClassNames',{'setosa','versicolor','virginica'})`
```Mdl = ClassificationNaiveBayes ResponseName: 'Y' CategoricalPredictors: [] ClassNames: {'setosa' 'versicolor' 'virginica'} ScoreTransform: 'none' NumObservations: 150 DistributionNames: {'normal' 'normal' 'normal' 'normal'} DistributionParameters: {3x4 cell} Properties, Methods ```

`Mdl` is a trained `ClassificationNaiveBayes` classifier.

Estimate the resubstitution edge.

`e = resubEdge(Mdl)`
```e = 0.8944 ```

The average of the training sample margins is approximately `0.89`. This result indicates that the classifier labels the in-sample observations with high confidence.

The classifier edge measures the average of the classifier margins. One way to perform feature selection is to compare training sample edges from multiple models. Based solely on this criterion, the classifier with the highest edge is the best classifier.

Load the `ionosphere` data set. Remove the first two predictors for stability.

```load ionosphere X = X(:,3:end);```

Define these two data sets:

• `fullX` contains all predictors.

• `partX` contains the 10 most important predictors.

```fullX = X; idx = fscmrmr(X,Y); partX = X(:,idx(1:10));```

Train a naive Bayes classifier for each predictor set.

```FullMdl = fitcnb(fullX,Y); PartMdl = fitcnb(partX,Y);```

`FullMdl` and `PartMdl` are trained `ClassificationNaiveBayes` classifiers.

Estimate the training sample edge for each classifier.

`fullEdge = resubEdge(FullMdl)`
```fullEdge = 0.6554 ```
`partEdge = resubEdge(PartMdl)`
```partEdge = 0.7796 ```

The edge of the classifier trained on the 10 most important predictors is larger. This result suggests that the classifier trained using only those predictors has a better in-sample fit.

## Input Arguments

collapse all

Full, trained naive Bayes classifier, specified as a `ClassificationNaiveBayes` model trained by `fitcnb`.

collapse all

### Classification Edge

The classification edge is the weighted mean of the classification margins.

If you supply weights, then the software normalizes them to sum to the prior probability of their respective class. The software uses the normalized weights to compute the weighted mean.

When choosing among multiple classifiers to perform a task such as feature section, choose the classifier that yields the highest edge.

### Classification Margins

The classification margin for each observation is the difference between the score for the true class and the maximal score for the false classes. Margins provide a classification confidence measure; among multiple classifiers, those that yield larger margins (on the same scale) are better.

### Posterior Probability

The posterior probability is the probability that an observation belongs in a particular class, given the data.

For naive Bayes, the posterior probability that a classification is k for a given observation (x1,...,xP) is

`$\stackrel{^}{P}\left(Y=k|{x}_{1},..,{x}_{P}\right)=\frac{P\left({X}_{1},...,{X}_{P}|y=k\right)\pi \left(Y=k\right)}{P\left({X}_{1},...,{X}_{P}\right)},$`

where:

• $P\left({X}_{1},...,{X}_{P}|y=k\right)$ is the conditional joint density of the predictors given they are in class k. `Mdl.DistributionNames` stores the distribution names of the predictors.

• π(Y = k) is the class prior probability distribution. `Mdl.Prior` stores the prior distribution.

• $P\left({X}_{1},..,{X}_{P}\right)$ is the joint density of the predictors. The classes are discrete, so $P\left({X}_{1},...,{X}_{P}\right)=\sum _{k=1}^{K}P\left({X}_{1},...,{X}_{P}|y=k\right)\pi \left(Y=k\right).$

### Prior Probability

The prior probability of a class is the assumed relative frequency with which observations from that class occur in a population.

### Classification Score

The naive Bayes score is the class posterior probability given the observation.