Prediction Using Discriminant Analysis Models

predict uses three quantities to classify observations: posterior probability, prior probability, and cost.

predict classifies so as to minimize the expected classification cost:

$\hat{y} = \underset{y = 1, ..., K}{\arg \min} \sum_{k = 1}^{K} \hat{P} (k | x) C (y | k),$

where

$\hat{y}$ is the predicted classification.
K is the number of classes.
$\hat{P} (k | x)$ is the posterior probability of class k for observation x.
$C (y | k)$ is the cost of classifying an observation as y when its true class is k.

The space of X values divides into regions where a classification Y is a particular value. The regions are separated by straight lines for linear discriminant analysis, and by conic sections (ellipses, hyperbolas, or parabolas) for quadratic discriminant analysis. For a visualization of these regions, see Create and Visualize Discriminant Analysis Classifier.

Posterior Probability

The posterior probability that a point x belongs to class k is the product of the prior probability and the multivariate normal density. The density function of the multivariate normal with 1-by-d mean μ_k and d-by-d covariance Σ_k at a 1-by-d point x is

$P (x | k) = \frac{1}{{({(2 π)}^{d} | Σ_{k} |)}^{1 / 2}} \exp (- \frac{1}{2} (x - μ_{k}) Σ_{k}^{- 1} {(x - μ_{k})}^{T}),$

where $| Σ_{k} |$ is the determinant of Σ_k, and $Σ_{k}^{- 1}$ is the inverse matrix.

Let P(k) represent the prior probability of class k. Then the posterior probability that an observation x is of class k is

$\hat{P} (k | x) = \frac{P (x | k) P (k)}{P (x)},$

where P(x) is a normalization constant, namely, the sum over k of P(x|k)P(k).

Prior Probability

The prior probability is one of three choices:

'uniform' — The prior probability of class k is 1 over the total number of classes.
'empirical' — The prior probability of class k is the number of training samples of class k divided by the total number of training samples.
A numeric vector — The prior probability of class k is the jth element of the Prior vector. See fitcdiscr.

After creating a classifier obj, you can set the prior using dot notation:

obj.Prior = v;

where v is a vector of positive elements representing the frequency with which each element occurs. You do not need to retrain the classifier when you set a new prior.

Cost

There are two costs associated with discriminant analysis classification: the true misclassification cost per class, and the expected misclassification cost per observation.

True Misclassification Cost per Class

Cost(i,j) is the cost of classifying an observation into class j if its true class is i. By default, Cost(i,j)=1 if i~=j, and Cost(i,j)=0 if i=j. In other words, the cost is 0 for correct classification, and 1 for incorrect classification.

You can set any cost matrix you like when creating a classifier. Pass the cost matrix in the Cost name-value pair in fitcdiscr.

After you create a classifier obj, you can set a custom cost using dot notation:

obj.Cost = B;

B is a square matrix of size K-by-K when there are K classes. You do not need to retrain the classifier when you set a new cost.

Expected Misclassification Cost per Observation

Suppose you have Nobs observations that you want to classify with a trained discriminant analysis classifier obj. Suppose you have K classes. You place the observations into a matrix Xnew with one observation per row. The command

[label,score,cost] = predict(obj,Xnew)

returns, among other outputs, a cost matrix of size Nobs-by-K. Each row of the cost matrix contains the expected (average) cost of classifying the observation into each of the K classes. cost(n,k) is

$\sum_{i = 1}^{K} \hat{P} (i | X (n)) C (k | i),$

where

K is the number of classes.
$\hat{P} (i | X (n))$ is the posterior probability of class i for observation Xnew(n).
$C (k | i)$ is the cost of classifying an observation as k when its true class is i.