## Prediction Using Discriminant Analysis Models

`predict` uses three quantities to classify observations: posterior probability, prior probability, and cost.

`predict` classifies so as to minimize the expected classification cost:

`$\stackrel{^}{y}=\underset{y=1,...,K}{\mathrm{arg}\mathrm{min}}\sum _{k=1}^{K}\stackrel{^}{P}\left(k|x\right)C\left(y|k\right),$`

where

• $\stackrel{^}{y}$ is the predicted classification.

• K is the number of classes.

• $\stackrel{^}{P}\left(k|x\right)$ is the posterior probability of class k for observation x.

• $C\left(y|k\right)$ is the cost of classifying an observation as y when its true class is k.

The space of `X` values divides into regions where a classification `Y` is a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, see Create and Visualize Discriminant Analysis Classifier.

### Posterior Probability

The posterior probability that a point x belongs to class k is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with 1-by-d mean μk and d-by-d covariance Σk at a 1-by-d point x is

`$P\left(x|k\right)=\frac{1}{{\left({\left(2\pi \right)}^{d}|{\Sigma }_{k}|\right)}^{1/2}}\mathrm{exp}\left(-\frac{1}{2}\left(x-{\mu }_{k}\right){\Sigma }_{k}^{-1}{\left(x-{\mu }_{k}\right)}^{T}\right),$`

where $|{\Sigma }_{k}|$ is the determinant of Σk, and ${\Sigma }_{k}^{-1}$ is the inverse matrix.

Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is

`$\stackrel{^}{P}\left(k|x\right)=\frac{P\left(x|k\right)P\left(k\right)}{P\left(x\right)},$`

where P(x) is a normalization constant, namely, the sum over k of P(x|k)P(k).

### Prior Probability

The prior probability is one of three choices:

• `'uniform'` — The prior probability of class `k` is 1 over the total number of classes.

• `'empirical'` — The prior probability of class `k` is the number of training samples of class `k` divided by the total number of training samples.

• A numeric vector — The prior probability of class `k` is the `j`th element of the `Prior` vector. See `fitcdiscr`.

After creating a classifier `obj`, you can set the prior using dot notation:

`obj.Prior = v;`

where `v` is a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.

### Cost

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

#### True Misclassification Cost per Class

`Cost(i,j)` is the cost of classifying an observation into class `j` if its true class is `i`. By default, `Cost(i,j)=1` if `i~=j`, and `Cost(i,j)=0` if `i=j`. In other words, the cost is `0` for correct classification, and `1` for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost matrix in the `Cost` name-value pair in `fitcdiscr`.

After you create a classifier `obj`, you can set a custom cost using dot notation:

`obj.Cost = B;`

`B` is a square matrix of size `K`-by-`K` when there are `K` classes. You do not need to retrain the classifier when you set a new cost.

#### Expected Misclassification Cost per Observation

Suppose you have `Nobs` observations that you want to classify with a trained discriminant analysis classifier `obj`. Suppose you have `K` classes. You place the observations into a matrix `Xnew` with one observation per row. The command

`[label,score,cost] = predict(obj,Xnew)`

returns, among other outputs, a cost matrix of size `Nobs`-by-`K`. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the `K` classes. `cost(n,k)` is

`$\sum _{i=1}^{K}\stackrel{^}{P}\left(i|Xnew\left(n\right)\right)C\left(k|i\right),$`

where

• K is the number of classes.

• $\stackrel{^}{P}\left(i|Xnew\left(n\right)\right)$ is the posterior probability of class i for observation Xnew(n).

• $C\left(k|i\right)$ is the cost of classifying an observation as k when its true class is i.