# sequentialfs

Sequential feature selection using custom criterion

## Description

example

tf = sequentialfs(fun,X,y) selects a subset of features in X that are important for predicting y. The function defines a random nonstratified partition for 10-fold cross-validation using X and y, and then sequentially selects features based on the cross-validate prediction criterion values computed by the fun function. The initial feature set includes no features. sequentialfs adds one feature to the set at each iteration, until adding a feature does not decrease the criterion value by greater than the termination tolerance value. The output tf is a logical vector that indicates the selected features. For more details, see Algorithms.

example

tf = sequentialfs(fun,X1,...,XN) selects a subset of features in X1 by cross-validating the criterion value on the partition defined for X1,...,XN.

example

tf = sequentialfs(___,Name,Value) specifies options using one or more name-value arguments in addition to any of the input argument combinations in the previous syntaxes. For example, specify "Direction","backward" to perform recursive feature elimination (RFE). The initial feature set includes all features. sequentialfs removes one feature from the set at each iteration, until removing a feature does not decrease the prediction criterion.

example

[tf,history] = sequentialfs(___) also returns information about the feature selection process.

## Examples

collapse all

Find important features by performing forward sequential feature selection using the wrapper type.

Display the variables in the data set.

whos
Name           Size            Bytes  Class     Attributes

meas         150x4              4800  double
species      150x1             18100  cell

The matrix meas contains four measurements from three species of iris flowers for 150 different flowers. The variable species lists the species for each flower.

Specify the predictor data X and the response data y. Define X to include the four measurements and six random variables. Place the measurement variables in columns 1, 3, 5, and 7.

rng("default") % For reproducibility
X = randn(150,10);
X(:,[1 3 5 7])= meas;
y = species;

Define the function handle myfun for an anonymous function that takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The anonymous function trains a classification model by using the training data, and returns a loss value on the test data for the trained model.

myfun = @(XTrain,yTrain,XTest,yTest) ...
size(XTest,1)*loss(fitcecoc(XTrain,yTrain),XTest,yTest);

The loss function of a classification model object returns an average loss value, but sequentialfs also divides the sum of the criterion values returned by myfun by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.

Create a random partition for stratified 10-fold cross-validation.

cv = cvpartition(y,"KFold",10);

Use the sequentialfs function to sequentially select important features in X based on the criterion value returned by myfun. Specify to use the stratified partition cv, and set the iteration option to display information about the feature selection process at each iteration.

opts = statset("Display","iter");
tf = sequentialfs(myfun,X,y,"CV",cv,"Options",opts);
Start forward sequential feature selection:
Initial columns included:  none
Columns that can not be included:  none
Step 1, added column 7, criterion value 0.04
Step 2, added column 5, criterion value 0.0333333
Step 3, added column 1, criterion value 0.0266667
Step 4, added column 3, criterion value 0.0133333
Final columns included:  1 3 5 7

sequentialfs correctly finds the important predictors in columns 1, 3, 5, and 7.

Find important features by performing backward sequential feature selection, or recursive feature elimination (RFE), using the wrapper type.

Load the hald data set, which measures the effect of cement composition on its hardening heat.

This data set includes the variables ingredients and heat. The matrix ingredients contains the percent composition of four chemicals present in the cement. The vector heat contains the values for the heat hardening after 180 days for each cement sample.

Use the sequentialfs function to perform backward sequential feature selection based on the criterion value returned by myfun. The code for the helper function myfun appears at the end of this example. Specify the Direction name-value argument as "backward" to include all features in the initial feature set and then sequentially exclude one feature at each iteration. Set the iteration option to display information about the feature selection process at each iteration.

rng("default") % For reproducibility
opts = statset("Display","iter");
tf = sequentialfs(@myfun,ingredients,heat, ...
"Direction","backward","Options",opts);
Start backward sequential feature selection:
Initial columns included:  all
Columns that must be included:  none
Step 1, used initial columns, criterion value 12.4989
Step 2, removed column 3, criterion value 6.25866
Final columns included:  1 2 4

sequentialfs excludes the third variable from the features in ingredients.

Helper Function

The myfun function takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The function trains a regression model by using the training data, and returns the sum of squared errors on the test data for the trained model.

function criterion = myfun(XTrain,yTrain,XTest,yTest)
mdl = fitrlinear(XTrain,yTrain);
predictedYTest = predict(mdl,XTest);
e = yTest - predictedYTest;
criterion = e'*e;
end

Perform filter type feature selection based on the correlation coefficients for the features.

Create the feature matrix X containing six variables.

X = [Acceleration Cylinders Displacement ...
Horsepower Model_Year Weight];

Compute the matrix of the pairwise linear correlation coefficients between each pair of features in X by using the corr function. Specify the Rows name-value argument as "pairwise" to omit any rows containing NaN on a pairwise basis for each two-column correlation coefficient calculation.

corr(X,"Rows","pairwise")
ans = 6×6

1.0000   -0.6473   -0.6947   -0.6968    0.4843   -0.4879
-0.6473    1.0000    0.9512    0.8622   -0.6053    0.8844
-0.6947    0.9512    1.0000    0.9134   -0.5779    0.8895
-0.6968    0.8622    0.9134    1.0000   -0.6082    0.8733
0.4843   -0.6053   -0.5779   -0.6082    1.0000   -0.4964
-0.4879    0.8844    0.8895    0.8733   -0.4964    1.0000

X contains highly correlated features. For example, the correlation between the second and third features (Cylinders and Displacement) is 0.9512.

Use the sequentialfs function to rank the features in X based on the correlation values. Specify these options when you call the sequentialfs function:

• Use the helper function mycorr, which returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The code for this helper function appears at the end of this example.

• Specify "Direction","backward" and "NullModel",true so that sequentialfs starts from the initial feature set containing all features and then excludes all features from the set, one feature at a time.

• Specify "CV","none" to perform feature selection without cross-validation.

• Set the iteration option to display information about the feature selection process at each iteration.

opts = statset("Display","iter");
[~,history] = sequentialfs(@mycorr,X, ...
"Direction","backward","NullModel",true, ...
"CV","none","Options",opts);
Start backward sequential feature selection:
Initial columns included:  all
Columns that must be included:  none
Step 1, used initial columns, criterion value 0.951167
Step 2, removed column 3, criterion value 0.884401
Step 3, removed column 6, criterion value 0.862164
Step 4, removed column 4, criterion value 0.647346
Step 5, removed column 2, criterion value 0.484253
Step 6, removed column 1, criterion value 0
Step 7, removed column 5, criterion value 0
Final columns included:  none

sequentialfs returns the structure array history with two fields (In and Crit) containing information about the feature selection process. The In field contains a logical matrix where row i indicates the features selected at iteration i. A true (logical 1) entry in a row indicates that the corresponding feature is in the feature set after the iteration.

history.In
ans = 7x6 logical array

1   1   1   1   1   1
1   1   0   1   1   1
1   1   0   1   1   0
1   1   0   0   1   0
1   0   0   0   1   0
0   0   0   0   1   0
0   0   0   0   0   0

The Crit field contains the criterion values computed at each iteration.

history.Crit
ans = 1×7

0.9512    0.8844    0.8622    0.6473    0.4843         0         0

The last two criterion values are zero because the mycorr function returns 0 if the input contains fewer than two features.

Extract the indices of the excluded features from the matrix in the In field.

p = size(X,2);
idx = NaN(1,p);
for i = 1 : p
idx(i) = find(history.In(i,:)~=history.In(i+1,:));
end
idx
idx = 1×6

3     6     4     2     1     5

Find the set of features whose criterion value is less than 0.8.

threshold = 0.8;
iter_last_exclude = find(history.Crit(2:end)<threshold,1);
idx_selected = idx(iter_last_exclude+1:end)
idx_selected = 1×3

2     1     5

Compute the correlation coefficient matrix for the selected features.

corr(X(:,idx_selected),"Rows","pairwise")
ans = 3×3

1.0000   -0.6473   -0.6053
-0.6473    1.0000    0.4843
-0.6053    0.4843    1.0000

The absolute values of the off-diagonal elements are less than the threshold value 0.8.

Helper Function

The mycorr function takes a matrix that contains features in columns, and returns the maximum absolute value of the off-diagonal elements in the matrix of correlation coefficients. The off-diagonal elements are the correlations between two distinct features in the input data. Therefore, mycorr returns zero if the input data does not have at least two distinct features.

function criterion = mycorr(X)
if size(X,2) < 2
criterion = 0;
else
p = size(X,2);
R = corr(X,"Rows","pairwise");
R(logical(eye(p))) = NaN;
criterion = max(abs(R),[],"all");
end
end

Convert a table that contains both numeric and categorical variables to an array by using the onehotencode and table2array functions. Then, select important features in the array by using the sequentialfs function.

This data set contains variables that describe several aspects of cars, such as miles per gallon (MPG), country of origin (Origin), and number of cylinders (Cylinders). You can create a regression model of MPG using the other variables.

Specify the predictor data tblX in a table, and specify the response data y.

tblX = table(Acceleration,Cylinders,Displacement, ...
Horsepower,Model_Year,Weight,Origin);
y = MPG;

All variables in tblX are numeric except the Origin variable.

One-hot encode the Origin variable by using the onehotencode function.

tblOrigin = table(categorical(string(Origin)));
tblOrigin = onehotencode(tblOrigin);

Remove the Origin variable from tblX, and add the encoded values to tblX.

tblX.Origin = [];
tblX = [tblX tblOrigin];

Convert the table tblX to an array.

X = table2array(tblX);

Define the function handle myfun for an anonymous function that takes four inputs: training data (XTrain and yTrain) and test data (XTest and yTest). The anonymous function trains a regression model by using the training data, and returns a loss value on the test data for the trained model.

myfun = @(XTrain,yTrain,XTest,yTest) ...
size(XTest,1)*loss(fitrtree(XTrain,yTrain),XTest,yTest);

The loss function of a regression model object returns the mean squared error (MSE), but sequentialfs also divides the sum of the criterion values returned by myfun by the total number of test observations. Therefore, the anonymous function must return the loss value multiplied by the number of test observations.

Use the sequentialfs function to sequentially select important features in X based on the criterion value returned by myfun.

rng("default") % For reproducibility
tf = sequentialfs(myfun,X,y);

Display the variable names of the selected features.

tblX.Properties.VariableNames(tf)'
ans = 6x1 cell
{'Cylinders'   }
{'Displacement'}
{'Model_Year'  }
{'Weight'      }
{'Germany'     }
{'Italy'       }

## Input Arguments

collapse all

Function to compute the feature selection criterion, specified as a function handle.

For each candidate feature set, sequentialfs computes the cross-validated criterion value by repeatedly calling the fun function as follows:

1. For each fold (a group of training and test data sets) defined by the CV name-value argument, sequentialfs calls the fun function to get the criterion value for the fold.

2. sequentialfs divides the sum of the criterion values by the total number of test observations.

If you specify X and y, then the fun function must have this form:

criterion = fun(XTrain,yTrain,XTest,yTest)

• The fun function accepts the training data (XTrain and yTrain) and test data (XTest and yTest).

• XTrain and XTest contain a subset of the columns of X that corresponds to the current candidate feature set.

• The fun function returns a scalar value criterion.

• Typically, fun trains a model by using the training data (XTrain, yTrain), predicts response values for XTest, and returns a loss of the predicted values compared to yTest. Common loss measures include the sum of squared errors for regression models and the number of misclassified observations for classification models.

For example, you can define the myFun function as follows, and then specify fun as @myFun.

function criterion = myFun(XTrain,yTrain,XTest,yTest)
mdl = fitcsvm(XTrain,yTrain);
predictedYTest = predict(mdl,XTest);
criterion = sum(~strcmp(yTest,predictedYTest));
end

Alternatively, you can define the function handle myFunHandle for an anonymous function as follows, and then specify fun as myFunHandle.

myFunHandle = @(XTrain,yTrain,XTest,yTest) ...
loss(fitcsvm(XTrain,yTrain),XTest,yTest)*size(XTest,1);

sequentialfs divides the sum of the criterion values returned by fun by the total number of test observations. So, fun must not divide the loss value by the number of test observations. The loss function of a classification or regression object returns an averaged loss value. Therefore, fun must return the loss value multiplied by the number of test observations. If you define the fun function to return the sum of squared errors or the number of misclassified observations, then the cross-validated criterion value is the mean squared error or the misclassification rate, respectively.

If you specify X1,...,XN, sequentialfs selects features from X1 only, but otherwise imposes no interpretation on X1,...,XN. The function fun still must have this form:

criterion = fun(X1Train,⋯,XNTrain,X1Test,⋯,XNTest)

• The fun function accepts the training data (X1Train,…,XNTrain) and test data (X1Test,…,XNTest).

• X1Train and X1Test contain a subset of the columns of X1 that corresponds to the current candidate feature set.

• The fun function returns a scalar value criterion.

Data Types: function_handle

Feature data, specified as a numeric matrix. The rows of X correspond to observations, and the columns of X correspond to features. X and y must have the same number of rows.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting X. For details, see the fun argument and CV name-value argument.

Data Types: single | double

Responses (labels), specified as a column vector. X and y must have the same number of rows.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting y. For details, see the fun argument and CV name-value argument.

Data Types: single | double | logical | char | string | cell | categorical

Input data, specified as matrices. The matrices must have the same number of rows.

sequentialfs selects features from X1 only, but otherwise imposes no interpretation on X1,...,XN.

The custom function defined by the fun argument must accept a group of training and test data sets defined by splitting X1,...,XN. For details, see the fun argument and CV name-value argument.

Data Types: single | double | logical | char | string | cell | categorical

### Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: KeepIn=[1 0 0 0],KeepOut=[0 0 0 1] always includes the first feature and excludes the last feature.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: "KeepIn",[1 0 0 0],"KeepOut",[0 0 0 1]

Cross-validation option to compute the criterion for each candidate feature subset, specified as a positive integer, cvpartition object, "resubstitution", or "none".

For each candidate feature subset, sequentialfs uses the partition specified by this argument to cross-validate the criterion value returned by the fun function.

• Positive integer ksequentialfs uses a random nonstratified partition for k-fold cross-validation.

• cvpartition object — sequentialfs uses a partition specified in the cvpartition object. You can specify a stratified partition, a partition for holdout validation, or a partition for leave-one-out cross-validation. For details, see cvpartition.

• "resubstitution"sequentialfs does not partition the input data. Both the training set and the test set contain all of the original observations. For example, if you specify X and y, then sequentialfs calls fun as criterion = fun(X,y,X,y).

• "none"sequentialfs does not validate the criterion value and calls fun as criterion = fun(X,y), without separating the training and test sets.

Example: "CV","none"

Number of Monte Carlo repetitions for cross-validation, specified as a positive integer.

If you specify a positive integer greater than 1, sequentialfs repeats the cross-validation computation for the specified number of repetitions for each candidate feature subset.

If CV is "none", "resubstitution", a cvpartition object of type "resubstitution", a cvpartition object of type "leaveout", or a custom cvpartition object (with the IsCustom property set to 1), then the software sets the MCReps value to 1.

Example: "MCReps",10

Data Types: single | double

Direction of the sequential search, specified as "forward" or "backward".

• "forward" — The initial feature set includes no features, and the sequentialfs function sequentially adds features to the set.

• "backward" — The initial feature set includes all features, and the sequentialfs function sequentially removes features from the set. That is, the sequentialfs function performs recursive feature elimination (RFE).

Example: "Direction","backward"

Data Types: char | string

Features to include, specified as [], a logical vector, or a vector of positive integers.

By default, sequentialfs examines all features for the feature selection process. If you specify features to include using this argument, sequentialfs always includes the features in the candidate feature sets. A true entry in a logical vector or an index value in a vector of positive integers indicates that the output argument tf must include the corresponding feature.

Example: "KeepIn",[1 0 0 0]

Data Types: logical

Features to exclude, specified as [], a logical vector, or a vector of positive integers.

By default, sequentialfs examines all features for the feature selection process. If you specify features to exclude using this argument, sequentialfs excludes the features from the candidate feature sets. A true entry in a logical vector or an index value in a vector of positive integers indicates that the output argument tf must exclude the corresponding feature.

Example: "KeepOut",[0 0 0 1]

Data Types: logical

Number of features to select, specified as [] or a positive integer.

By default, sequentialfs stops iterations when the function satisfies one of the stopping criteria (MaxIter or TolFun) specified by the Options name-value argument. If you specify the NFeatures name-value argument as a positive integer, sequentialfs stops iterations after selecting the specified number of features. This argument overrides other iteration options.

Example: "NFeatures",2

Data Types: single | double

Flag to include the null model (model containing no features), specified as a logical 1 (true) or 0 (false).

If you specify true, the sequentialfs function includes the null model as a valid option for the output tf and computes the criterion value for the empty input data. Therefore, the fun function must be able to accept empty matrices as input argument values.

Example: "NullModel",true

Data Types: logical

Options for the iterations and parallel computation, specified as a structure returned by statset.

This table lists the option fields and their values.

Field NameField ValueDefault Value
Display

Level of display, specified as "off", "final", or "iter".

• "off" — Display no information.

• "final" — Display the final information.

• "iter" — Display information at each iteration.

"off"
MaxIterMaximum number of iterations allowed, specified as a positive integerInf
TolFunTermination tolerance on the criterion value, specified as a positive scalar1e-6 if Direction is "forward"; 0 if Direction is "backward"
TolTypeFunType of the termination tolerance for the criterion value, specified as "abs" (absolute tolerance) or "rel" (relative tolerance)"rel"
UseParallelFlag to run in parallel, specified as logical 1 (true) or 0 (false)false
UseSubstreams

Flag to run computations in a reproducible manner, specified as logical 1 (true) or 0 (false).

To compute reproducibly, set Streams to a type that allows substreams: "mlfg6331_64" or "mrg32k3a".

false
Streams

Random number streams, specified as a RandStream object or cell array of such objects. Use a single object except when the UseParallel value is true and the UseSubstreams value is false. In that case, use a cell array that has the same size as the parallel pool.

MATLAB® default random number stream

To compute in parallel, you need Parallel Computing Toolbox™.

Example: "Options",statset("Display","iter")

Data Types: struct

## Output Arguments

collapse all

Selected features, returned as a logical vector. A true (logical 1) entry indicates that the corresponding feature is selected.

History of the feature selection process, returned as a structure array including the In and Crit fields.

• In is a logical matrix in which row i indicates the features selected at iteration i.

• Crit is a vector containing the criterion values computed at each iteration.

collapse all

### Feature Selection

Feature selection reduces the dimensionality of data by selecting only a subset of measured features (predictor variables) to create a model. Feature selection algorithms search for a subset of predictors that optimally models measured responses, subject to constraints such as required or excluded features and the size of the subset.

You can categorize feature selection algorithms into three types:

• Filter type — The filter type feature selection algorithm measures feature importance based on the characteristics of the features, such as feature variance and feature relevance to the response. You select important features as part of a data preprocessing step and then train a model using the selected features. Therefore, filter type feature selection is uncorrelated to the training algorithm.

• Wrapper type — The wrapper type feature selection algorithm starts training using a subset of features and then adds or removes a feature using a selection criterion. The selection criterion directly measures the change in model performance that results from adding or removing a feature. The algorithm repeats training and improving a model until its stopping criteria are satisfied.

• Embedded type — The embedded type feature selection algorithm learns feature importance as part of the model learning process. Once you train a model, you obtain the importance of the features in the trained model. This type of algorithm selects features that work well with a particular learning process.

For more details, see Introduction to Feature Selection.

## Algorithms

sequentialfs sequentially selects features in X by performing these steps:

1. Define a random nonstratified partition for 10-fold cross-validation on n observations, where n is the number of observations in X.

2. Initialize the selected feature set S as an empty set.

3. For each feature xi in X, compute the cross-validated criterion value using the fun function.

4. Add the feature with the smallest criterion value to S.

5. For each feature xi in X\S, define a candidate feature set Ci as S∪{xi}. Compute the cross-validated criterion value using fun for Ci.

6. Among the candidate sets (Cis), select the set that reduces the criterion value the most, compared to the criterion value for S. Add the feature corresponding to the selected candidate set to S.

7. Repeat steps 5 and 6 until adding a feature does not decrease the criterion value by greater than the termination tolerance value.

To customize the feature selection process, use the name-value arguments of sequentialfs.

• You can specify cross-validation options by using the CV and MCReps name-value arguments.

• For wrapper type feature selection, specify the arguments to cross-validate the criterion value for each candidate feature set. You can define the fun function to train a model and return a criterion value for the trained model. For an example, see Forward Feature Selection.

• For filter type feature selection, which does not involve cross-validation, specify CV as "none" and use the fun function to measure characteristics of the input data, such as correlation. For an example, see Filter Type Feature Selection.

• To perform backward feature selection, or recursive feature elimination (RFE), specify the Direction name-value argument as "backward". sequentialfs initializes the selected feature set S as a set with all features, and then removes one feature at a time from the set.

• You can specify which features to always include or exclude, the number of features in the final selected feature set, and whether to consider a model with no features as a valid option. For details, see the KeepIn, KeepOut, NFeatures, and NullModel name-value arguments.

• Use the Options name-value argument to specify options for the iterations and parallel computation. For example, Options,statset("TolFun",1e-2) sets the iteration termination tolerance on the criterion value to 1e-2.

## Version History

Introduced in R2008a