This example shows how to tune hyperparameters of a regression ensemble by using hyperparameter optimization in the Regression Learner app. Compare the test set performance of the trained optimizable ensemble to that of the best-performing preset ensemble model.
In the MATLAB® Command Window, load the
carbig data set, and
create a table containing most of the variables. Separate the table into
training and test sets.
load carbig cartable = table(Acceleration,Cylinders,Displacement, ... Horsepower,Model_Year,Weight,Origin,MPG); rng('default') % For reproducibility of the data split n = length(MPG); partition = cvpartition(n,'HoldOut',0.15); idxTrain = training(partition); % Indices for the training set cartableTrain = cartable(idxTrain,:); cartableTest = cartable(~idxTrain,:);
Open Regression Learner. Click the Apps tab, and then click the arrow at the right of the Apps section to open the apps gallery. In the Machine Learning and Deep Learning group, click Regression Learner.
On the Regression Learner tab, in the File section, select New Session > From Workspace.
In the New Session dialog box, select the
cartableTrain table from the Data Set
As shown in the dialog box, the app selects the response and predictor
variables. The default response variable is
default validation option is 5-fold cross-validation, to protect against
overfitting. For this example, do not change the default settings.
To accept the default options and continue, click Start Session.
Train all preset ensemble models. On the Regression Learner tab, in the Model Type section, click the arrow to open the gallery. In the Ensembles of Trees group, click All Ensembles. In the Training section, click Train. The app trains one of each ensemble model type and displays the models in the History list.
If you have Parallel Computing Toolbox™, the Opening Pool dialog box opens the first time you click Train (or when you click Train again after an extended period of time). The dialog box remains open while the app opens a parallel pool of workers. During this time, you cannot interact with the software. After the pool opens, you can train multiple models simultaneously and continue working.
Validation introduces some randomness into the results. Your model validation results can vary from the results shown in this example.
Select an optimizable ensemble model to train. On the Regression Learner tab, in the Model Type section, click the arrow to open the gallery. In the Ensembles of Trees group, click Optimizable Ensemble. The app disables the Use Parallel button when you select an optimizable model.
Select the model hyperparameters to optimize. In the Model Type section, select Advanced > Advanced. The app opens a dialog box in which you can select Optimize check boxes for the hyperparameters that you want to optimize. By default, all the check boxes are selected. For this example, accept the default selections, and close the dialog box.
In the Training section, click Train.
The app displays a Minimum MSE Plot as it runs the optimization process. At each iteration, the app tries a different combination of hyperparameter values and updates the plot with the minimum validation mean squared error (MSE) observed up to that iteration, indicated in dark blue. When the app completes the optimization process, it selects the set of optimized hyperparameters, indicated by a red square. For more information, see Minimum MSE Plot.
The app lists the optimized hyperparameters in both the upper right of the plot and the Optimized Hyperparameters section of the Current Model pane.
In general, the optimization results are not reproducible.
Compare the trained preset ensemble models to the trained optimizable model. In the History list, the app highlights the lowest validation RMSE (root mean square error) by outlining it in a box. In this example, the trained optimizable ensemble outperforms the two preset models.
A trained optimizable model does not always have a lower RMSE than the trained
preset models. If a trained optimizable model does not perform well, you can try
to get better results by running the optimization for longer. In the
Model Type section, select Advanced >
Optimizer Options. In the dialog box, increase the
Iterations value. For example, you can double-click the
default value of
30 and enter a value of
Because hyperparameter tuning often leads to overfitted models, check the test set performance of the ensemble model with the optimized hyperparameters and compare it to the performance of the best preset ensemble model. Begin by exporting the two models to the MATLAB workspace.
In the History list, select the Boosted
Trees model. On the Regression
Learner tab, in the Export section,
select Export Model > Export Model. In the dialog
box, name the model
In the History list, select the
Optimizable Ensemble model. On the
Regression Learner tab, in the
Export section, select Export Model
> Export Model. In the dialog box, name the model
Compute the RMSE of the two models on the
data. In the MATLAB Command Window, use the
predictFcn function in
each exported model structure to predict the response values of the test set
data. Then, compute the RMSE for each model on the test set data, omitting any
NaN values. Compare the two RMSE values.
y = treeEnsemble.predictFcn(cartableTest); presetRMSE = sqrt((1/length(y))*sum((cartableTest.MPG - y).^2,'omitnan')) z = optimizableEnsemble.predictFcn(cartableTest); optRMSE = sqrt((1/length(z))*sum((cartableTest.MPG - z).^2,'omitnan'))
presetRMSE = 3.4591 optRMSE = 3.1884
In this example, the trained optimizable ensemble still outperforms the trained preset model on the test set data.