fitrsvm
Fit a support vector machine regression model
Syntax
Description
fitrsvm
trains or cross-validates a support vector
machine (SVM) regression model on a low- through moderate-dimensional predictor data
set. fitrsvm
supports mapping the predictor data using kernel
functions, and supports SMO, ISDA, or L1 soft-margin minimization via
quadratic programming for objective-function minimization.
To train a linear SVM regression model on a high-dimensional data set, that is, data
sets that include many predictor variables, use fitrlinear
instead.
To train an SVM model for binary classification, see fitcsvm
for low- through moderate-dimensional predictor data sets, or
fitclinear
for high-dimensional data sets.
returns a full, trained support vector machine (SVM) regression model Mdl
= fitrsvm(Tbl
,ResponseVarName
)Mdl
trained using the predictors values in the table Tbl
and the response values in Tbl.ResponseVarName
.
returns an SVM regression model with additional options specified by one or more name-value pair arguments, using any of the previous syntaxes. For example, you can specify the kernel function or train a cross-validated model.Mdl
= fitrsvm(___,Name,Value
)
[
also returns Mdl
,AggregateOptimizationResults
] = fitrsvm(___)AggregateOptimizationResults
, which contains
hyperparameter optimization results when you specify the
OptimizeHyperparameters
and
HyperparameterOptimizationOptions
name-value arguments.
You must also specify the ConstraintType
and
ConstraintBounds
options of
HyperparameterOptimizationOptions
. You can use this
syntax to optimize on compact model size instead of cross-validation loss, and
to perform a set of multiple optimization problems that have the same options
but different constraint bounds.
Examples
Input Arguments
Name-Value Arguments
Output Arguments
Limitations
fitrsvm
supports low- through moderate-dimensional data sets. For high-dimensional data set, use fitrlinear
instead.
Tips
Unless your data set is large, always try to standardize the predictors (see
Standardize
). Standardization makes predictors insensitive to the scales on which they are measured.It is good practice to cross-validate using the
KFold
name-value pair argument. The cross-validation results determine how well the SVM model generalizes.Sparsity in support vectors is a desirable property of an SVM model. To decrease the number of support vectors, set the
BoxConstraint
name-value pair argument to a large value. This action also increases the training time.For optimal training time, set
CacheSize
as high as the memory limit on your computer allows.If you expect many fewer support vectors than observations in the training set, then you can significantly speed up convergence by shrinking the active-set using the name-value pair argument
'ShrinkagePeriod'
. It is good practice to use'ShrinkagePeriod',1000
.Duplicate observations that are far from the regression line do not affect convergence. However, just a few duplicate observations that occur near the regression line can slow down convergence considerably. To speed up convergence, specify
'RemoveDuplicates',true
if:Your data set contains many duplicate observations.
You suspect that a few duplicate observations can fall near the regression line.
However, to maintain the original data set during training,
fitrsvm
must temporarily store separate data sets: the original and one without the duplicate observations. Therefore, if you specifytrue
for data sets containing few duplicates, thenfitrsvm
consumes close to double the memory of the original data.After training a model, you can generate C/C++ code that predicts responses for new data. Generating C/C++ code requires MATLAB Coder™. For details, see Introduction to Code Generation.
Algorithms
For the mathematical formulation of linear and nonlinear SVM regression problems and the solver algorithms, see Understanding Support Vector Machine Regression.
NaN
,<undefined>
, empty character vector (''
), empty string (""
), and<missing>
values indicate missing data values.fitrsvm
removes entire rows of data corresponding to a missing response. When normalizing weights,fitrsvm
ignores any weight corresponding to an observation with at least one missing predictor. Consequently, observation box constraints might not equalBoxConstraint
.fitrsvm
removes observations that have zero weight.If you set
'Standardize',true
and'Weights'
, thenfitrsvm
standardizes the predictors using their corresponding weighted means and weighted standard deviations. That is,fitrsvm
standardizes predictor j (xj) usingxjk is observation k (row) of predictor j (column).
If your predictor data contains categorical variables, then the software generally uses full dummy encoding for these variables. The software creates one dummy variable for each level of each categorical variable.
The
PredictorNames
property stores one element for each of the original predictor variable names. For example, assume that there are three predictors, one of which is a categorical variable with three levels. ThenPredictorNames
is a 1-by-3 cell array of character vectors containing the original names of the predictor variables.The
ExpandedPredictorNames
property stores one element for each of the predictor variables, including the dummy variables. For example, assume that there are three predictors, one of which is a categorical variable with three levels. ThenExpandedPredictorNames
is a 1-by-5 cell array of character vectors containing the names of the predictor variables and the new dummy variables.Similarly, the
Beta
property stores one beta coefficient for each predictor, including the dummy variables.The
SupportVectors
property stores the predictor values for the support vectors, including the dummy variables. For example, assume that there are m support vectors and three predictors, one of which is a categorical variable with three levels. ThenSupportVectors
is an m-by-5 matrix.The
X
property stores the training data as originally input. It does not include the dummy variables. When the input is a table,X
contains only the columns used as predictors.
For predictors specified in a table, if any of the variables contain ordered (ordinal) categories, the software uses ordinal encoding for these variables.
For a variable having k ordered levels, the software creates k – 1 dummy variables. The jth dummy variable is -1 for levels up to j, and +1 for levels j + 1 through k.
The names of the dummy variables stored in the
ExpandedPredictorNames
property indicate the first level with the value +1. The software stores k – 1 additional predictor names for the dummy variables, including the names of levels 2, 3, ..., k.
All solvers implement L1 soft-margin minimization.
Let
p
be the proportion of outliers that you expect in the training data. If you set'OutlierFraction',p
, then the software implements robust learning. In other words, the software attempts to remove 100p
% of the observations when the optimization algorithm converges. The removed observations correspond to gradients that are large in magnitude.
References
[1] Clark, D., Z. Schreter, A. Adams. "A Quantitative Comparison of Dystal and Backpropagation." submitted to the Australian Conference on Neural Networks, 1996.
[4] Lichman, M. UCI Machine Learning Repository, [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
[5] Nash, W.J., T. L. Sellers, S. R. Talbot, A. J. Cawthorn, and W. B. Ford. "The Population Biology of Abalone (Haliotis species) in Tasmania. I. Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait." Sea Fisheries Division, Technical Report No. 48, 1994.
[6] Waugh, S. "Extending and Benchmarking Cascade-Correlation: Extensions to the Cascade-Correlation Architecture and Benchmarking of Feed-forward Supervised Artificial Neural Networks." University of Tasmania Department of Computer Science thesis, 1995.