Cheat Sheets

Which Machine Learning Algorithm Is Right for You?

Algorithm

Dataset

What is the ideal dataset size for each algorithm?

Training Speed

How quickly will the algorithm train without acceleration hardware?

Interpretability

How hard is it to see how the algorithm arrived at a decision?

Tuning

How much tuning does the algorithm allow?

Comments
Linear models Small Very fast Easy Minimal

Widely used basic algorithm

Linear SVM handles high-dimensional data well

Decision trees Small Very fast Easy Some Good generalist algorithm, check for overfitting
(Nonlinear) Support vector machine Medium sized Moderately slow Difficult Some Good accuracy
Nearest neighbor Medium sized Moderately fast Moderately easy Minimal Lower accuracy, but easy to use and interpret
Naïve Bayes Medium sized Very fast Moderately easy Some Widely used for text analytics (e.g., spam filtering); kernel Bayes will run slower
Ensembles Large Moderately fast Difficult Some Higher accuracy with a tradeoff of lower interpretability
Neural network (shallow) Medium sized Moderately fast Moderately easy Some Still used for signal classification, compression, and forecasting
Deep nets Large Very slow Difficult A lot A standard algorithm for image, video, signals, and text