Feature Engineering

What Is Feature Engineering?

Feature engineering is the process of turning raw data into features to be used by machine learning. Feature engineering is difficult because extracting features from signals and images requires deep domain knowledge and finding the best features fundamentally remains an iterative process, even if you apply automated methods.

Feature engineering encompasses one or more of the following steps:

Feature extraction to generate candidate features
Feature transformation, which maps features to make them more suitable for downstream modeling
Feature selection identifies subsets that provide the better predictive power in modeling the data while reducing model size and simplifying prediction.

For example, sports statistics include numeric data like games played, average time per game, and points scored, all broken down by player. Feature extraction in this context includes compressing these statistics into derived numbers, like points per game or average time to score. Then feature selection becomes a question of whether you build a model using just these ratios, or whether the original statistics still help the model make more accurate predictions.

Manual feature extraction for signal and image data requires signal and image processing knowledge, though automated techniques such as wavelet transforms have proven very effective. These techniques are useful even if you apply deep learning to signal data since deep neural nets have trouble uncovering structure in raw signal data. The traditional approach for extracting features from text data is modeling text as bag of words. Modern approaches apply deep learning to encode the context of words, such as the popular word embedding technique word2vec.

Feature transformation includes popular data preparation techniques, such as normalization to address large differences in the scale of features, but also aggregation to summarize data, filtering to remove noise, and dimensionality reduction techniques such as PCA and factor analysis.

Many methods for feature selection are supported by MATLAB^®. Some are based on ranking features by importance, which could be as basic as correlation with the response. Some machine learning models estimate feature importance during the learning algorithm (“embedded” feature selection), while so-called filter-based methods infer a separate model of feature importance. Wrapper selection methods iteratively add and remove candidate features using a selection criterion. The figure below provides an overview of the various aspects of feature engineering to guide practitioners in finding performant features for their machine learning models.

Deep learning has become known for taking raw image and signal data as input, thus eliminating the feature engineering step. While that works well for large image and video data sets, feature engineering is still critical for good performance when applying deep learning to smaller data sets and signal-based problems.

Key Points

Feature engineering is essential for applying machine learning, and also relevant for applications of deep learning to signals.
Wavelet scattering delivers good features from signal and image data without manual feature extraction
Additional steps such as feature transformation and selection can yield more accurate yet smaller sets of features suitable for deployment to hardware constrained environments.

Example

Ranking features by applying the minimum redundancy maximum relevance (MRMR) algorithm implemented in the fscmrmr function in MATLAB yields good features for classification without long run times, as demonstrated in this example. Large drops in importance scores imply that you can confidently determine the threshold on which features to use for your model, while small drops indicate that you may have to include lots of additional features to avoid a significant loss in accuracy for the resulting model.

MRMR applies to classification problems only. For regression, neighborhood component analysis is a good option, available in MATLAB as fsrnca.

Examples and How To

Feature Engineering | Applied Machine Learning, Part 1 (4:35) - Video
Data Processing and Feature Engineering with MATLAB - Coursera Course
Traditional Feature Extraction from Image Data - Example
Apply Sequential Feature Selection to High-Dimensional Data - Example
Apply NCA with Regularization - Example
Machine Learning and Deep Learning with Wavelet Scattering (4:03) - Video

Software Reference

Overview of Dimensionality Reduction and Feature Extraction functions - Documentation
Introduction to Feature Selection in MATLAB - Documentation
Feature Ranking Using Minimum Redundancy Maximum Relevance (MRMR) - Function
fsulaplacian: Unsupervised feature ranking - Function

Feature Engineering FAQs

Feature engineering is the process of turning raw data into features to be used by machine learning, encompassing feature extraction, transformation, and selection to create inputs suitable for modeling.

Feature engineering is difficult because extracting features from signals and images requires deep domain knowledge, and finding the best features fundamentally remains an iterative process even with automated methods.

The main steps include feature extraction to generate candidate features, feature transformation to map features for downstream modeling, and feature selection to identify subsets that provide better predictive power while reducing model size.

Feature engineering encompasses extraction, transformation, and selection of features from raw data, while feature selection specifically identifies subsets of features that provide better predictive power and simplify prediction.

Yes, feature engineering is still critical for good performance when applying deep learning to smaller data sets and signal-based problems, though deep learning can work well with raw image and video data for large data sets.

MATLAB provides functions for dimensionality reduction, feature extraction, feature selection methods including MRMR and neighborhood component analysis, and automated techniques like wavelet transforms for signal and image data.

MRMR (minimum redundancy maximum relevance) is an algorithm for ranking features in classification and regression problems that yields good features without long run times, helping determine which features to use for your model.

Feature transformation includes normalization to address scale differences, aggregation to summarize data, filtering to remove noise, and dimensionality reduction techniques such as PCA and factor analysis.

Feature Engineering | Applied Machine Learning, Part 1

Online Course

Machine Learning Onramp

Get started