Train SVM Classifier with Categorical Predictors and Generate C/C++ Code

This example shows how to generate code for classifying data using a support vector machine (SVM) model. Train the model using numeric and categorical predictors.

Because code generation does not support categorical predictors, use dummyvar to convert categorical predictors to numeric dummy variables before fitting an SVM classifier. When passing new data to your trained model, you must preprocess the data in a similar manner.

Preprocess Data and Train SVM Classifier

Load the patients data set. Create a table using the Diastolic and Systolic numeric variables. Each row of the table corresponds to a different patient.

load patients
tbl = table(Diastolic,Systolic);
head(tbl)
ans=8×2 table
    Diastolic    Systolic
    _________    ________

       93          124   
       77          109   
       83          125   
       75          117   
       80          122   
       70          121   
       88          130   
       82          115   

Convert the Gender variable to a categorical variable. The order of the categories in categoricalGender is important because it determines the order of the columns in the predictor data. Use dummyvar to convert the categorical variable to a matrix of zeros and ones, where a 1 value in the (i,j)th entry indicates that the ith patient belongs to the jth category.

categoricalGender = categorical(Gender);
orderGender = categories(categoricalGender)
orderGender = 2x1 cell array
    {'Female'}
    {'Male'  }

dummyGender = dummyvar(categoricalGender);

Note: The resulting dummyGender matrix is rank deficient. Depending on the type of model you train, this rank deficiency can be problematic. For example, when training linear models, remove the first column of the dummy variables.

Create a table that contains the dummy variable dummyGender with the corresponding variable headings. Combine this new table with tbl.

tblGender = array2table(dummyGender,'VariableNames',orderGender);
tbl = [tbl tblGender];
head(tbl)
ans=8×4 table
    Diastolic    Systolic    Female    Male
    _________    ________    ______    ____

       93          124         0        1  
       77          109         0        1  
       83          125         1        0  
       75          117         1        0  
       80          122         1        0  
       70          121         1        0  
       88          130         1        0  
       82          115         0        1  

Convert the SelfAssessedHealthStatus variable to a categorical variable. Note the order of the categories in categoricalHealth, and convert the variable to a numeric matrix using dummyvar.

categoricalHealth = categorical(SelfAssessedHealthStatus);
orderHealth = categories(categoricalHealth)
orderHealth = 4x1 cell array
    {'Excellent'}
    {'Fair'     }
    {'Good'     }
    {'Poor'     }

dummyHealth = dummyvar(categoricalHealth);

Create a table that contains dummyHealth with the corresponding variable headings. Combine this new table with tbl.

tblHealth = array2table(dummyHealth,'VariableNames',orderHealth);
tbl = [tbl tblHealth];
head(tbl)
ans=8×8 table
    Diastolic    Systolic    Female    Male    Excellent    Fair    Good    Poor
    _________    ________    ______    ____    _________    ____    ____    ____

       93          124         0        1          1         0       0       0  
       77          109         0        1          0         1       0       0  
       83          125         1        0          0         0       1       0  
       75          117         1        0          0         1       0       0  
       80          122         1        0          0         0       1       0  
       70          121         1        0          0         0       1       0  
       88          130         1        0          0         0       1       0  
       82          115         0        1          0         0       1       0  

The third row of tbl, for example, corresponds to a patient with these characteristics: diastolic blood pressure of 83, systolic blood pressure of 125, female, and good self-assessed health status.

Because all the values in tbl are numeric, you can convert the table to a matrix X.

X = table2array(tbl);

Train an SVM classifier using X. Specify the Smoker variable as the response.

Y = Smoker;
Mdl = fitcsvm(X,Y);

Generate C/C++ Code

Generate code that loads the SVM classifier, takes new predictor data as an input argument, and then classifies the new data.

Save the SVM classifier to a file using saveCompactModel.

saveCompactModel(Mdl,'SVMClassifier')

saveCompactModel saves the classifier to the MATLAB® binary file SVMClassifier.mat as a structure array in the current folder.

Define the entry-point function mySVMPredict, which takes new predictor data as an input argument. Within the function, load the SVM classifier by using loadCompactModel, and then pass the loaded classifier to predict.

type mySVMPredict.m % Display contents of mySVMPredict.m file
function label = mySVMPredict(X) %#codegen
Mdl = loadCompactModel('SVMClassifier');
label = predict(Mdl,X);
end

Note: If you click the button located in the upper-right section of this page and open this example in MATLAB, then MATLAB opens the example folder. This folder includes the entry-point function file mySVMPredict.m.

Generate code for mySVMPredict by using codegen. Specify the data type and dimensions of the new predictor data by using coder.typeof so that the generated code accepts a variable-size array.

codegen mySVMPredict -args {coder.typeof(X,[Inf 8],[1 0])}

Verify that mySVMPredict and the MEX file return the same results for the training data.

label = predict(Mdl,X);
mylabel = mySVMPredict(X);
mylabel_mex = mySVMPredict_mex(X);
verifyMEX = isequal(label,mylabel,mylabel_mex)
verifyMEX = logical
   1

Predict Labels for New Data

To predict labels for new data, you must first preprocess the new data. If you run the generated code in the MATLAB environment, you can follow the preprocessing steps described in this section. If you deploy the generated code outside the MATLAB environment, the preprocessing steps can differ. In either case, you must ensure that the new data has the same columns as the training data X.

In this example, take the third, fourth, and fifth patients in the patients data set. Preprocess the data for these patients so that the resulting numeric matrix matches the form of the training data.

Convert the categorical variables to dummy variables. Because the new observations might not include values from all categories, you need to specify the same categories as the ones used during training and maintain the same category order. In MATLAB, pass the ordered cell array of category names associated with the corresponding training data variable (in this example, orderGender for gender values and orderHealth for self-assessed health status values).

newcategoricalGender = categorical(Gender(3:5),orderGender);
newdummyGender = dummyvar(newcategoricalGender);

newcategoricalHealth = categorical(SelfAssessedHealthStatus(3:5),orderHealth);
newdummyHealth = dummyvar(newcategoricalHealth);

Combine all the new data into a numeric matrix.

newX = [Diastolic(3:5) Systolic(3:5) newdummyGender newdummyHealth]
newX = 3×8

    83   125     1     0     0     0     1     0
    75   117     1     0     0     1     0     0
    80   122     1     0     0     0     1     0

Note that newX corresponds exactly to the third, fourth, and fifth rows of the matrix X.

Verify that mySVMPredict and the MEX file return the same results for the new data.

newlabel = predict(Mdl,newX);
newmylabel = mySVMPredict(newX);
newmylabel_mex = mySVMPredict_mex(newX);
newverifyMEX = isequal(newlabel,newmylabel,newmylabel_mex)
newverifyMEX = logical
   1

See Also

| | | | | | |

Related Topics