Comparison of Supervised Machine Learning Algorithms

The following two grids compare the main Supervised Machine Learning algorithms available in XLSTAT. One grid is for classification tasks (qualitative Y), the other is for regression tasks (quantitative Y). For a short introduction to the principles of Supervised Machine Learning, check out this article.

Algorithms are compared with regards to several criteria

  • Can they work with more variables than observations?

  • Do they easily adapt to non-linear relationships between the predictors and the outcome?

  • Can the algorithm be used for explanatory purposes? In other words, can it be used to describe the relative impacts of predictors on the outcome?

  • Can they automatically detect and learn interactions among predictors?

  • What are the main hyperparameters to tune?

Classification algorithms

Algorithm Works with more variables than observations? Adapts to non-linear situations? Explanatory intelligibility Automatically learns relevant interactions among predictors? Main Hyperparameters XLSTAT menu Remarks
Logistic Regression No - +++ No none Modeling data Good option for explanatory intelligibility (provides log-odds coefficients and p-values)
Penalized regression (Ridge, Lasso, Elastic Net) Yes - ++ No lambda, alpha XLSTAT-R, glmnet Select Binomial or Multinomial family
Linear Discriminant Analysis No - + No none Analyzing data / Discriminant Analysis; Activate Equality of Covariance Matrices in the Options tab
Quadratic Discriminant Analysis No + + No none Analyzing data / Discriminant Analysis; Deactivate Equality of Covariance Matrices in the Options tab
Partial Least Squares Discriminant Analysis (PLS-DA) Yes - + No number of components Modeling data Typically used with few observations & many variables (chemometrics)
General Additive Models No ++ + No Method, add extra penalty XLSTAT-R, gam
Naive Bayes Yes - - No Smoothing parameter Machine Learning Fast computations on large data sets
Support Vector Machines (SVM) Yes ++ (RBF kernel recommended for non-linear situations) - No C, kernel and kernel-specific hyperparemeters Machine Learning Computationally intensive on large data sets
K Nearest Neighbors (KNN) Yes ++ - No Number of neighbors Machine Learning
Classification trees (C&RT) Yes ++ ++ Yes CP Machine Learning Binary splits at each node
Classification trees (CHAID) Yes ++ ++ Yes CP Machine Learning Multiple splits at each node
Classification Random Forests Yes ++ + Yes CP, mtry Machine Learning Better predictive performance compared to classification trees
Neural networks Yes ++ - Yes Network architecture, error function, activation functions XLSTAT-R, neuralnet Requires advanced expertise

Regression algorithms

Algorithm Works with more variables than observations? Adapts to non linear situations? Explanatory intelligibility Automatically learns relevant interactions among predictors? Main Hyperparameters in XLSTAT XLSTAT menu Remarks
Linear regression No - +++ No none Modeling data Good option for explanatory intelligibility (slope coefficients and p-values)
Penalized regression (Ridge, Lasso, Elastic Net) Yes - ++ No lambda, alpha XLSTAT-R, glmnet Select Gaussian family
Quantile Regression Yes - + No none Modeling data
General Additive Models No ++ + No Method, add extra penalty XLSTAT-R, gam
Partial Least Squares (PLS) Yes - + No number of components Modeling data Typically used with few observations & many variables (chemometrics)
Principal Component Regression (PCR) Yes - + No Standardize variables Modeling data
K Nearest Neighbors (KNN) Yes ++ - No number of neighbors Machine Learning
Regression trees (C&RT) Yes ++ ++ Yes Minimum parent size, minimum son size, maximum depth, CP Machine Learning Binary splits at each node
Regression trees (CHAID) Yes ++ ++ Yes Minimum parent size, minimum son size, maximum depth, CP Machine Learning Multiple splits at each node
Random Forests Yes ++ + Yes CP, mtry Machine Learning Better predictive performance compared to regression trees
Neural Network Yes ++ - Yes Network architecture, error function, activation functions XLSTAT-R, neuralnet Requires advanced expertise

