fitNaiveBayes

Train naive Bayes classifier

`fitNaiveBayes` will be removed in a future release. Use `fitcnb` instead.

Syntax

• `NBModel = fitNaiveBayes(X,Y)` example
• `NBModel = fitNaiveBayes(X,Y,Name,Value)` example

Description

example

````NBModel = fitNaiveBayes(X,Y)` returns a naive Bayes classifier `NBModel`, trained by predictors `X` and class labels `Y` for K-level classification.Predict labels for new data by passing the data and `NBModel` to `predict`.```

example

````NBModel = fitNaiveBayes(X,Y,Name,Value)` returns a naive Bayes classifier with additional options specified by one or more `Name,Value` pair arguments.For example, you can specify a distribution to model the data, prior probabilities for the classes, or the kernel smoothing window bandwidth.```

Examples

collapse all

Train a Naive Bayes Classifier

Load Fisher's iris data set.

```load fisheriris X = meas(:,3:4); Y = species; tabulate(Y) ```
``` Value Count Percent setosa 50 33.33% versicolor 50 33.33% virginica 50 33.33% ```

The software can classify data with more than two classes using naive Bayes methods.

Train a naive Bayes classifier.

```NBModel = fitNaiveBayes(X,Y) ```
```NBModel = Naive Bayes classifier with 3 classes for 2 dimensions. Feature Distribution(s):normal Classes:setosa, versicolor, virginica ```

`NBModel` is a trained `NaiveBayes` classifier.

By default, the software models the predictor distribution within each class using a Gaussian distribution having some mean and standard deviation. Use dot notation to display the parameters of a particular Gaussian fit, e.g., display the fit for the first feature within `setosa`.

```setosaIndex = strcmp(NBModel.ClassLevels,'setosa'); estimates = NBModel.Params{setosaIndex,1} ```
```estimates = 1.4620 0.1737 ```

The mean is `1.4620` and the standard deviation is `0.1737`.

Plot the Gaussian contours.

```figure gscatter(X(:,1),X(:,2),Y); h = gca; xylim = [h.XLim h.YLim]; hold on Params = cell2mat(NBModel.Params); Mu = Params(2*(1:3)-1,1:2); % Extracts the means Sigma = zeros(2,2,3); for j = 1:3 Sigma(:,:,j) = diag(Params(2*j,:)); % Extracts the standard deviations ezcontour(@(x1,x2)mvnpdf([x1,x2],Mu(j,:),Sigma(:,:,j)),... xylim+0.5*[-1,1,-1,1]) ... % Draws contours for the multivariate normal distributions end title('Naive Bayes Classifier -- Fisher''s Iris Data') xlabel('Petal Length (cm)') ylabel('Petal Width (cm)') hold off ```

You can change the default distribution using the name-value pair argument `'Distribution'`. For example, If some predictors are count based, then you can specify that they are multinomial random variables using `'Distribution','mn'` .

Specify Predictor Distributions for Naive Bayes Classifiers

Load Fisher's iris data set.

```load fisheriris X = meas; Y = species; ```

Train a naive Bayes classifier using every predictor.

```NBModel1 = fitNaiveBayes(X,Y); NBModel1.ClassLevels % Display the class order NBModel1.Params NBModel1.Params{1,2} ```
```ans = 'setosa' 'versicolor' 'virginica' ans = [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] [2x1 double] ans = 3.4280 0.3791 ```

By default, the software models the predictor distribution within each class as a Gaussian with some mean and standard deviation. There are four predictors and three class levels. Each cell in `NBModel1.Params` corresponds to a numeric vector containing the mean and standard deviation of each distribution, e.g., the mean and standard deviation for setosa iris sepal widths are `3.4280` and `0.3791`, respectively.

Estimate the confusion matrix for `NBModel1`.

```predictLabels1 = predict(NBModel1,X); [ConfusionMat1,labels] = confusionmat(Y,predictLabels1) ```
```ConfusionMat1 = 50 0 0 0 47 3 0 3 47 labels = 'setosa' 'versicolor' 'virginica' ```

Element (j, k) of `ConfusionMat1` represents the number of observations that the software classifies as k, but the data show as being in class j.

Retrain the classifier using the Gaussian distribution for predictors 1 and 2 (the sepal lengths and widths), and the default normal kernel density for predictors 3 and 4 (the petal lengths and widths).

```NBModel2 = fitNaiveBayes(X,Y,... 'Distribution',{'normal','kernel','normal','kernel'}); NBModel2.Params{1,2} ```
```ans = KernelDistribution Kernel = normal Bandwidth = 0.179536 Support = unbounded ```

The software does not train parameters to the kernel density. Rather, the software chooses an optimal width. However, you can specify a width using the `'KSWidth'` name-value pair argument.

Estimate the confusion matrix for `NBModel2`.

```predictLabels2 = predict(NBModel2,X); ConfusionMat2 = confusionmat(Y,predictLabels2) ```
```ConfusionMat2 = 50 0 0 0 47 3 0 3 47 ```

Based on the confusion matrices, the two classifiers perform similarly in the training sample.

Train Naive Bayes Classifiers Using Multinomial Predictors

Some spam filters classify an incoming email as spam based on how many times a word or puncutation (called tokens) occurs in an email. The predictors are the frequencies of particular words or punctuations in an email. Therefore, the predictors compose multinomial random variables.

This example illustrates classification using naive Bayes and mutlinomial predictors.

Suppose you observed 1000 emails and classified them as spam or not spam. Do this by randomly assigning -1 or 1 to `y` for each email.

```n = 1000; % Sample size rng(1); % For reproducibility y = randsample([-1 1],n,true); % Random labels ```

To build the predictor data, suppose that there are five tokens in the vocabulary, and 20 observed tokens per email. Generate predictor data from the five tokens by drawing multinomial deviates. The relative frequencies for tokens corresponding to spam emails should differ from emails that are not spam.

```tokenProbs = [0.2 0.3 0.1 0.15 0.25;... 0.4 0.1 0.3 0.05 0.15]; % Token relative frequencies tokensPerEmail = 20; X = zeros(n,5); X(y == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),sum(y == 1)); X(y == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),sum(y == -1)); ```

Train a naive Bayes classifier. Specify that the predictors are multinomial.

```NBModel = fitNaiveBayes(X,y,'Distribution','mn'); ```

`NBModel` is a trained `NaiveBayes` classifier.

Assess the in-sample performance of `NBModel` by estimating the misclassification rate.

```predSpam = predict(NBModel,X); misclass = sum(y'~=predSpam)/n ```
```misclass = 0.0200 ```

The in-sample misclassification rate is 2%.

Randomly generate deviates that represent a new batch of emails.

```nOut = 500; yOut = randsample([-1 1],nOut,true); XOut = zeros(nOut,5); XOut(yOut == 1,:) = mnrnd(tokensPerEmail,tokenProbs(1,:),... sum(yOut == 1)); XOut(yOut == -1,:) = mnrnd(tokensPerEmail,tokenProbs(2,:),... sum(yOut == -1)); ```

Classify the new emails using the trained naive Bayes classifier `NBModel`, and determine whether the algorithm generalizes.

```predSpamOut = predict(NBModel,XOut); genRate = sum(yOut'~=predSpamOut)/nOut ```
```genRate = 0.0260 ```

The out-of-sample misclassification rate is 2.6% indicating that the classifier generalizes fairly well.

Input Arguments

collapse all

`X` — Predictor datamatrix of numeric values

Predictor data to which the naive Bayes classifier is trained, specified as a matrix of numeric values.

Each row of `X` corresponds to one observation (also known as an instance or example), and each column corresponds to one variable (also known as a feature).

The length of `Y` and the number of rows of `X` must be equivalent.

Data Types: `double`

`Y` — Class labelscategorical array | character array | logical vector | vector of numeric values | cell array of strings

Class labels to which the naive Bayes classifier is trained, specified as a categorical or character array, logical or numeric vector, or cell array of strings. Each element of `Y` defines the class membership of the corresponding row of `X`. `Y` supports K class levels.

If `Y` is a character array, then each row must correspond to one class label.

The length of `Y` and the number of rows of `X` must be equivalent.

Data Types: `cell` | `char` | `double` | `logical`

 Note:   The software treats `NaN`, empty string (`''`), and `` elements as missing values.If `Y` contains missing values, then the software removes them and the corresponding rows of `X`.If `X` contains any rows composed entirely of missing values, then the software removes those rows and the corresponding elements of `Y`.If `X` contains missing values and you set `'Distribution','mn'`, then the software removes those rows of `X` and the corresponding elements of `Y`.If a predictor is not represented in a class, that is, if all of its values are `NaN` within a class, then the software returns an error.Removing rows of `X` and corresponding elements of `Y` decreases the effective training or cross-validation sample size.

Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: `'Distribution','mn','Prior','uniform','KSWidth',0.5` specifies the following: the data distribution is multinomial, the prior probabilities for all classes are equal, and the kernel smoothing window bandwidth for all classes is `0.5` units.

`'Distribution'` — Data distributions`'normal'` (default) | `'kernel'` | `'mn'` | `'mvmn'` | cell array of strings

Data distributions `fitNaiveBayes` uses to model the data, specified as the comma-separated pair consisting of `'Distribution'` and a string or cell array of strings.

This table summarizes the available distributions.

ValueDescription
`'kernel'`Kernel smoothing density estimate.
`'mn'`Multinomial distribution. If you specify `mn`, then all features are components of a multinomial distribution. Therefore, you cannot include `'mn'` as an element of a cell array of strings. For details, see Algorithms.
`'mvmn'`Multivariate multinomial distribution. For details, see Algorithms.
`'normal'`Normal (Gaussian) distribution.

If you specify a string, then the software models all the features using that distribution. If you specify a 1-by-D cell array of strings, then the software models feature j using the distribution in element j of the cell array.

Example: `'Distribution',{'kernel','normal'}`

Data Types: `cell` | `char`

`'KSSupport'` — Kernel smoothing density support`'unbounded'` (default) | `'positive'` | cell array | numeric row vector

Kernel smoothing density support, specified as the comma-separated pair consisting of `'KSSupport'` and a numeric row vector, a string, or a cell array. The software applies the kernel smoothing density to this region.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

This table summarizes the available options for setting the kernel smoothing density region.

ValueDescription
1-by-2 numeric row vectorFor example, `[L,U]`, where `L` and `U` are the finite lower and upper bounds, respectively, for the density support.
`'positive'`The density support is all positive real values.
`'unbounded'`The density support is all real values.

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel support in cell j for feature j in `X`.

Example: `'KSSupport',{[-10,20],'unbounded'}`

Data Types: `cell` | `char` | `double`

`'KSType'` — Kernel smoother type`'normal'` (default) | `'box'` | `'epanechnikov'` | `'triangle'` | cell array of strings

Kernel smoother type, specified as the comma-separated pair consisting of `'KSType'` and a string or cell array of strings.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

This table summarizes the available options for setting the kernel smoothing density region. Let I{u} denote the indictor function.

ValueKernelFormula
`'box'`Box (uniform)

$f\left(x\right)=0.5I\left\{|x|\le 1\right\}$

`'epanechnikov'`Epanechnikov

$f\left(x\right)=0.75\left(1-{x}^{2}\right)I\left\{|x|\le 1\right\}$

`'normal'`Gaussian

$f\left(x\right)=\frac{1}{\sqrt{2\pi }}\mathrm{exp}\left(-0.5{x}^{2}\right)$

`'triangle'`Triangular

$f\left(x\right)=\left(1-|x|\right)I\left\{|x|\le 1\right\}$

If you specify a 1-by-D cell array, with each cell containing any value in the table, then the software trains the classifier using the kernel smoother type in cell j for feature j in `X`.

Example: `'KSType',{'epanechnikov','normal'}`

Data Types: `cell` | `char`

`'KSWidth'` — Kernel smoothing window bandwidthmatrix of numeric values (default) | numeric column vector | numeric row vector | scalar | structure array

Kernel smoothing window bandwidth, specified as the comma-separated pair consisting of `'KSWidth'` and a matrix of numeric values, numeric row vector, numeric column vector, scalar, or structure array.

If you do not specify `'Distribution','kernel'`, then the software ignores the values of `'KSSupport'`, `'KSType'`, and `'KSWidth'`.

Suppose there are K class levels and D predictors. This table summarizes the available options for setting the kernel smoothing window bandwidth.

ValueDescription
K-by-D matrix of numeric valuesElement (k,d) specifies the bandwidth for predictor d in class k.
K-by-1 numeric column vectorElement k specifies the bandwidth for all predictors in class k.
1-by-D numeric row vectorElement d specifies the bandwidth in all class levels for predictor d.
scalarSpecifies the bandwidth for all features in all classes.
structure arrayA structure array `S` containing class levels and their bandwidths. `S` must have two fields:
• `S.width`: A numeric row vector of bandwidths, or a matrix of numeric values with D columns.

• `S.group`: A vector of the same type as `Y`, containing unique class levels indicating the class for the corresponding element of `S.width`.

By default, the software selects a default bandwidth automatically for each combination of feature and class by using a value that is optimal for a Gaussian distribution.

Example: `'KSWidth',struct('width',[0.5,0.25],'group',{{'b';'g'}})`

Data Types: `double` | `struct`

`'Prior'` — Class prior probabilities`'empirical'` (default) | `'uniform'` | numeric vector | structure array

Class prior probabilities, specified as the comma-separated pair consisting of `'Prior'` and a numeric vector, structure array, or string.

This table summarizes the available options for setting prior probabilities.

ValueDescription
`'empirical'`The software uses the class relative frequencies distribution for the prior probabilities.
numeric vector

A numeric vector of length K specifying the prior probabilities for each class. The order of the elements of `Prior` should correspond to the order of the class levels. For details on the order of the classes, see Algorithms.

The software normalizes prior probabilities to sum to `1`.

structure arrayA structure array `S` containing class levels and their prior probabilities. `S` must have two fields:
• `S.prob`: A numeric vector of prior probabilities. The software normalizes prior probabilities to sum to `1`.

• `S.group`: A vector of the same type as `Y` containing unique class levels indicating the class for the corresponding element of `S.prob`. `S.class` must contain all the K levels in `Y`. It can also contain classes that do not appear in `Y`. This can be useful if `X` is a subset of a larger training set. The software ignores any classes that appear in `S.group` but not in `Y`.

`'uniform'`The prior probabilities are equal for all classes.

Example: `'Prior',struct('prob',[1,2],'group',{{'b';'g'}})`

Data Types: `char` | `double` | `struct`

Output Arguments

collapse all

`NBModel` — Trained naive Bayes classifier`NaiveBayes` classifier

Trained naive Bayes classifier, returned as a `NaiveBayes` classifier.

collapse all

Bag-of-Tokens Model

In the bag-of-tokens model, the value of predictor j is the nonnegative number of occurrences of token j in this observation. The number of categories (bins) in this multinomial model is the number of distinct tokens, that is, the number of predictors.

Tips

• For classifying count-based data, such as the bag-of-tokens model, use the multinomial distribution (e.g., set `'Distribution','mn'`).

• This list defines the order of the classes. It is useful when you specify prior probabilities by setting `'Prior',prior`, where `prior` is a numeric vector.

• If `Y` is a categorical array, then the order of the class levels matches the output of `categories(Y)`.

• If `Y` is a numeric or logical vector, then the order of the class levels matches the output of `sort(unique(Y))`.

• For cell arrays of string and character arrays, the order of the class labels is the order which each label appears in `Y`.

Algorithms

• If you specify `'Distribution','mn'`, then the software considers each observation as multiple trials of a multinomial distribution, and considers each occurrence of a token as one trial (see Bag-of-Tokens Model).

• If you specify `'Distribution','mvmn'`, then the software assumes each individual predicator follows a multinomial model within a class. The parameters for a predictor include the probabilities of all possible values that the corresponding feature can take.