Code Generation for Anomaly Detection
This example shows how to generate single-precision code that detects anomalies in data using a trained isolation forest model or one-class support vector machine (OCSVM).
The isanomaly
function of isolationForest
and the predict
function for ClassificationSVM
support code generation. These object functions require a trained model object, but the -args
option of codegen
(MATLAB Coder) does not accept these objects. Work around this limitation by using saveLearnerForCoder
and loadLearnerForCoder
as described in this example.
This flow chart shows the code generation workflow for anomaly detection.
After you train a model, save the trained model by using saveLearnerForCoder
. Define an entry-point function that loads the saved model by using loadLearnerForCoder
and calls the object function. Then generate code for the entry-point function by using codegen
, and verify the generated code. For a more detailed code generation workflow example, see Code Generation for Prediction of Machine Learning Model at Command Line.
Load Data
Load the lidar scan data set, which contains the coordinates of objects surrounding a vehicle, stored as a collection of 3-D points.
load("lidar_subset.mat")
loc = lidar_subset;
To highlight the environment around the vehicle, set the region of interest to span 20 meters to the left and right of the vehicle, 20 meters in front and back of the vehicle, and the area above the surface of the road.
xBound = 20; % in meters yBound = 20; % in meters zLowerBound = 0; % in meters
Crop the data to contain only points within the specified region.
indices = loc(:,1) <= xBound & loc(:,1) >= -xBound ... & loc(:,2) <= yBound & loc(:,2) >= -yBound ... & loc(:,3) > zLowerBound; loc = loc(indices,:); whos loc
Name Size Bytes Class Attributes loc 19070x3 228840 single
loc
is a single-precision matrix containing 19,070 samples of 3-D points.
Visualize the data as a 2-D scatter plot. Annotate the plot to highlight the vehicle.
scatter(loc(:,1),loc(:,2),"."); annotation("ellipse",[0.48 0.48 .1 .1],Color="red")
The center of the set of points (circled in red) contains the roof and hood of the vehicle. All other points are obstacles.
Assume that the fraction of outliers in the data is 0.05.
contaminationFraction = single(0.05);
Code Generation for Isolation Forest
Train Isolation Forest Model
Train an isolation forest model by using the iforest
function. Specify the fraction of outliers (ContaminationFraction
) as 0.05.
rng("default") % For reproducibility [forest,tf_forest,s_forest] = iforest(loc,ContaminationFraction=contaminationFraction);
forest
is an IsolationForest
object. iforest
also returns the anomaly indicators (tf_forest
) and anomaly scores (s_forest
) for the data (loc
). iforest
determines the score threshold value (forest.ScoreThreshold
) so that the function detects the specified fraction of observations as outliers.
Save Model Using saveLearnerForCoder
Save the model object to the file IsolationForestModel.mat
by using saveLearnerForCoder
.
ForestMdlFileName = "IsolationForestModel";
saveLearnerForCoder(forest,ForestMdlFileName)
saveLearnerForCoder
saves the object to the MATLAB® binary file IsolationForestModel.mat
as a structure array in the current folder.
Define Entry-Point Function
Define an entry-point function that returns anomaly indicators and anomaly scores for the input data. Within the function, load a single-precision model by using loadLearnerForCoder
, and then pass the loaded model to isanomaly
.
type myIsanomaly.m % Display contents of myIsanomaly.m file
function varargout = myIsanomaly(MdlFileName,x,varargin) %#codegen %MYISANOMALY Entry-point function for anomaly detection % This function supports only the example Code Generation for Anomaly % Detection and might change in a future release. % This function detects anomalies in new observations x using the saved % anomaly detection model in the MdlFileName file. Mdl = loadLearnerForCoder(MdlFileName,DataType="single"); [varargout{1:nargout}] = isanomaly(Mdl,x,varargin{:}); end
Generate Code
Specify the input argument types of myIsanomaly
using a 4-by-1 cell array. Assign each input argument type of the entry-point function to each cell. Specify the data type and exact input array size by using an example value that represents the set of values with a certain data type and array size.
ARGS = cell(4,1);
p = numel(forest.PredictorNames);
ARGS{1} = coder.Constant(ForestMdlFileName);
ARGS{2} = coder.typeof(single(0),[Inf,p],[1,0]);
ARGS{3} = coder.Constant("ScoreThreshold");
ARGS{4} = single(0.5);
The second input of myIsanomaly
is a variable-size input. For more details on variable-size arguments, see Specify Variable-Size Arguments for Code Generation.
Generate a MEX function from the entry-point function myIsanomaly
. Specify the input argument types using the -args
option and the cell array ARGS
. Specify the number of output arguments in the generated entry-point function using the -nargout
option.
codegen myIsanomaly -args ARGS -nargout 2
Code generation successful.
codegen
generates the MEX function myIsanomaly_mex
with a platform-dependent extension in the current folder.
Verify Generated Code
Detect anomalies in the training data using the generated MEX function. Compare the anomaly indicators and scores from the MEX function with those returned by iforest
.
[tf_forest_MEX,s_forest_MEX] = myIsanomaly_mex(ForestMdlFileName,loc,"ScoreThreshold",single(forest.ScoreThreshold));
isequal(tf_forest,tf_forest_MEX)
ans = logical
1
max(abs(s_forest-s_forest_MEX))
ans = single
5.9605e-08
isequal
returns logical 1 (true
), which means all the anomaly indicators are equal. The difference in the anomaly scores is insignificant.
Code Generation for OCSVM
Train OCSVM Model
Train a support vector machine model for one-class learning by using the fitcsvm
function. The function trains a model for one-class learning if the class label variable is a vector of ones. Specify the fraction of outliers (OutlierFraction
) as 0.05.
MdlOCSVM = fitcsvm(loc,single(ones(size(loc,1),1)),OutlierFraction=contaminationFraction, ...
Standardize=true);
MdlOCSVM
is a ClassificationSVM
object. Compute the outlier scores for loc
by using the resubPredict
function.
[~,s_OCSVM] = resubPredict(MdlOCSVM);
Negative score values indicate that the corresponding observations are outliers. Obtain the anomaly indicators.
tf_OCSVM = s_OCSVM < 0;
Save Model Using saveLearnerForCoder
Save the model object to the file SVMModel.mat
by using saveLearnerForCoder
.
SVMMdlFileName = "SVMModel";
saveLearnerForCoder(MdlOCSVM,SVMMdlFileName)
Define Entry-Point Function
Define an entry-point function that returns anomaly indicators and anomaly scores for the input data. Within the function, load a single-precision model by using loadLearnerForCoder
, and then pass the loaded model to predict
to compute anomaly scores. Use the scores to find anomaly indicators.
type myIsanomalySVM.m % Display contents of myIsanomalySVM.m file
function [tf,scores] = myIsanomalySVM(MdlFileName,x,scoreThreshold) %#codegen %MYISANOMALY Entry-point function for anomaly detection % This function supports only the example Code Generation for Anomaly % Detection and might change in a future release. % This function detects anomalies in new observations x using the saved % one-class support vector machine model in the MdlFileName file. Mdl = loadLearnerForCoder(MdlFileName,DataType="single"); [~,scores] = predict(Mdl,x); tf = scores < scoreThreshold; end
Generate Code
Specify the input argument types of myIsanomalySVM
using a 3-by-1 cell array.
ARGS = cell(3,1); p = numel(MdlOCSVM.PredictorNames); ARGS{1} = coder.Constant(SVMMdlFileName); ARGS{2} = coder.typeof(single(0),[Inf,p],[1,0]); ARGS{3} = single(0);
Generate a MEX function from the entry-point function myIsanomalySVM
.
codegen myIsanomalySVM -args ARGS -nargout 2
Code generation successful.
Verify Generated Code
Detect anomalies in the training data using the generated MEX function. Compare the anomaly indicators and scores from the MEX function with those returned by resubPredict
.
[tf_OCSVM_MEX,s_OCSVM_MEX] = myIsanomalySVM_mex(SVMMdlFileName,loc,single(0));
isequal(tf_OCSVM,tf_OCSVM_MEX)
ans = logical
1
max(abs(s_OCSVM-s_OCSVM_MEX))
ans = single
0.0133
isequal
returns logical 1 (true
), which means all the anomaly indicators are equal. The difference in the anomaly scores is acceptable because the average score (mean(s_OCSVM)
) is around 700. You see some differences in the scores when you use the Gaussian kernel, which is the default for one-class learning.
Compare Detected Outliers
Plot the normal points and outliers detected in the isolation forest model and one-class SVM model.
tiledlayout(2,1) nexttile gscatter(loc(:,1),loc(:,2),tf_forest) legend("Normal Points","Outliers") title("Isolation Forest") nexttile gscatter(loc(:,1),loc(:,2),tf_OCSVM) legend("Normal Points","Outliers") title("One-Class SVM")
The outliers identified by the two methods are similar to each other. Compute the fraction of the same identifiers in the outputs for both methods.
mean(tf_forest == tf_OCSVM)
ans = 0.9732
See Also
codegen
(MATLAB Coder) | iforest
| isanomaly
| fitcsvm
| predict