Create a OneClassSVM
object for uncontaminated training observations by using the ocsvm
function. Then detect novelties (anomalies in new data) by passing the object and the new data to the object function isanomaly
.
Load the 1994 census data stored in census1994.mat
. The data set consists of demographic data from the US Census Bureau to predict whether an individual makes over $50,000 per year.
census1994
contains the training data set adultdata
and the test data set adulttest
.
ocsvm
does not use observations with missing values. Remove missing values in the data sets to reduce memory consumption and speed up training.
Train a one-class SVM for adultdata
. Assume that adultdata
does not contain outliers. Specify StandardizeData
as true
to standardize the input data, and set KernelScale
to "auto"
to let the function select an appropriate kernel scale parameter using a heuristic procedure.
Mdl
is a OneClassSVM
object. If you do not specify the ContaminationFraction
name-value argument as a value greater than 0, then ocsvm
treats all training observations as normal observations. The function sets the score threshold to the maximum score value. Display the threshold value.
Find anomalies in adulttest
by using the trained one-class SVM model. Because you specified StandardizeData=true
when you trained the model, the isanomaly
function standardizes the input data by using the predictor means and standard deviations of the training data stored in the Mu
and Sigma
properties, respectively.
The isanomaly
function returns the anomaly indicators tf_test
and scores s_test
for adulttest
. By default, isanomaly
identifies observations with scores above the threshold (Mdl.ScoreThreshold
) as anomalies.
Create histograms for the anomaly scores s
and s_test
. Create a vertical line at the threshold of the anomaly scores.
Display the observation index of the anomalies in the test data.
ans =
0x1 empty double column vector
The anomaly score distribution of the test data is similar to that of the training data, so isanomaly
does not detect any anomalies in the test data with the default threshold value. You can specify a different threshold value by using the ScoreThreshold
name-value argument. For an example, see Specify Anomaly Score Threshold.