In a recent blog post our CEO Sebastian Heinz wrote about Google’s newest stroke of genius – AutoML Vision. A cloud service “that is able to build deep learning models for image recognition completely fully automated and from scratch“. AutoML Vision is part of the current trend towards the automation of machine learning tasks. This trend started with automation of hyperparameter optimization for single models (Including services like SigOpt, Hyperopt, SMAC), went along with automated feature engineering and selection about our bounceR package) towards full automation of complete data pipelines including automated model stacking (a common model ensembling technique).
One company at the frontier of this development is certainly h2o.ai. They developed both a free Python/R library (H2O AutoML) as well as an enterprise-ready software solution called Driverless AI. But H2O is by far not the only player on the field. This blog post will provide you with a short comparison between two freely available Auto ML solutions and compare them by predictive performance as well as general usability.
H2O AutoML
H2O AutoML is an extension to H2O’s popular java based open source machine learning framework with APIs for Python and R. It automatically trains, tunes and cross-validates models (including Generalized Linear Models [GLM], Gradient Boosting Machines [GBM], Random Forest [RF], Extremely Randomized Forest [XRF], and Neural Networks). Hyperparameter optimization is done using a random search over a list of reasonable parameters (both RF and XRF are currently not tuned). In the end, H2O produces a leaderboard of models and builds two types of stacked ensembles from the base models. One including all base models, the other including only the best base model of each family.
Model training can be controlled by either the number of models to be trained, or the total training time. Especially the later makes model training quite transparent. One of the big advantages of H2O is that all models are parallelized out-of-the-box.
auto-sklearn
auto-sklearn is an automated machine learning toolkit based on Python’s Scikit-Learn Library. A detailed explanation of auto-sklearn can be found in Feurer et al. (2015). In H2O AutoML, each model was independently tuned and added to a leaderboard. In auto-sklearn, the authors combine model selection and hyperparameter optimization in what they call “Combined Algorithm Selection and Hyperparameter optimization” (CASH). This joint optimization problem is than solved using a tree-based Bayesian optimization methods called “Sequential Model-based Algorithm Configuration” (SMAC) (see Bergstra 2011).
So contrary to H2O AutoML, auto-sklearn optimizes a complete modeling pipeline including various data and feature preprocessing steps as well as the model selection and hyperparameter optimization. Data preprocessing includes one-hot-encoding, scaling, imputation, and balancing. Feature preprocessing includes, among others, feature agglomeration, ICA and PCA. Algorithms included in auto-sklearn are similar to those in H2O AutoML, but in addition also includes more traditional methods like k-Nearest-Neighbors (kNN), Naive Bayes, and Support Vector Machines (SVM).
Similar to H2O AutoML, auto-sklearn includes a final model ensemble step. Whereas H2O AutoML uses simple but efficient model stacking, auto-sklearn uses ensemble selection. A greedy method that adds individual models iteratively to the ensemble if and only if they increase the validation performance. Like H2O, auto-sklearn allows model training to be controlled by the total training time.
Benchmark
In order to compare the predictive performance of H2O’s AutoML with auto-sklearn, one can conduct a small simulation study. My colleague André’s R package Xy offers a straightforward way to simulate regression datasets with linear, non-linear, and noisy relationships. Using multiple (ten in total) simulation runs makes the whole simulation a bit more robust. The following R code was used to simulate the data:
<span class="hljs-attr">library(Xy)</span>
<span class="hljs-attr">library(caret)</span>
<span class="hljs-attr">library(dplyr)</span>
<span class="hljs-attr">library(data.table)</span>
<span class="hljs-comment">
# Number of datasets</span>
<span class="hljs-attr">n_data_set</span> <span class="hljs-string"><- 10</span>
<span class="hljs-attr">for</span> <span class="hljs-string">(i in seq(n_data_set)) {</span>
<span class="hljs-comment">
# Sim settings</span>
<span class="hljs-attr">n</span> <span class="hljs-string"><- floor(runif(1, 1000, 5000))</span>
<span class="hljs-attr">n_num_vars</span> <span class="hljs-string"><- c(sample(2:10, 1), sample(2:10, 1))</span>
<span class="hljs-attr">n_cat_vars</span> <span class="hljs-string"><- c(0, 0)</span>
<span class="hljs-attr">n_noise_vars</span> <span class="hljs-string"><- sample(1:5, 1)</span>
<span class="hljs-attr">inter_degree</span> <span class="hljs-string"><- sample(2:3, 1)</span>
<span class="hljs-comment">
# Simulate data</span>
<span class="hljs-attr">sim</span> <span class="hljs-string"><- Xy(n = n, </span>
<span class="hljs-attr">numvars</span> = <span class="hljs-string">n_num_vars,</span>
<span class="hljs-attr">catvars</span> = <span class="hljs-string">n_cat_vars, </span>
<span class="hljs-attr">noisevars</span> = <span class="hljs-string">n_noise_vars, </span>
<span class="hljs-attr">task</span> = <span class="hljs-string">Xy_task(),</span>
<span class="hljs-attr">nlfun</span> = <span class="hljs-string">function(x) {x^2},</span>
<span class="hljs-attr">interactions</span> = <span class="hljs-string">1,</span>
<span class="hljs-attr">sig</span> = <span class="hljs-string">c(1,4), </span>
<span class="hljs-attr">cor</span> = <span class="hljs-string">c(0),</span>
<span class="hljs-attr">weights</span> = <span class="hljs-string">c(-10,10),</span>
<span class="hljs-attr">intercept</span> = <span class="hljs-string">TRUE,</span>
<span class="hljs-attr">stn</span> = <span class="hljs-string">4)</span>
<span class="hljs-comment">
# Get data and DGP</span>
<span class="hljs-attr">df</span> <span class="hljs-string"><- simdgp</span>
<span class="hljs-comment">
# Remove Intercept</span>
<span class="hljs-meta">df[,</span> <span class="hljs-string">"(Intercept)"] <- NULL</span>
<span class="hljs-comment">
# Rename columns</span>
<span class="hljs-meta">names(df)</span> <span class="hljs-string"><- gsub("(?<![0-9])0+", "", names(df), perl = TRUE)</span>
<span class="hljs-comment">
# Create test/train split</span>
<span class="hljs-attr">df</span> <span class="hljs-string"><- dplyr::rename(df, label = y)</span>
<span class="hljs-attr">in_train</span> <span class="hljs-string"><- createDataPartition(y = df1.04%1%23.4%24.6%$ better.
The sheer closeness of the results can be further illustrated when taking a look at the predicted values. Figure 2 shows exemplary the predicted values for one particular dataset against all feature values (linear, non-linear and noise features). As one can see, the estimated effects for both frameworks are almost identical and pretty close to the actual relationship.
Summary
Automatic Machine Learning frameworks can provide promising results for standard machine learning task while keeping the manual efforts down to a minimum. This blog post compared two popular frameworks, namely H2O's AutoML and auto-sklearn. Both reached comparable results on ten simulated datasets, while outperforming vanilla models significantly. Beside predictive performance, H2O's AutoML offers some additional features like native parallelization, API for R, support for XGBoost and GPU training making it even more attractive.
References
- Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Tobias Springenberg, Manuel Blum, and Frank Hutter 2015. "Efficient and Robust Automated Machine Learning." NIPS 2015. https://ml.informatik.uni-freiburg.de/papers/15-NIPS-auto-sklearn-preprint.pdf
- Balaji, Adithya and Alexander Allen. 2018. "Benchmarking Automatic Machine Learning Frameworks." https://arxiv.org/pdf/1808.06492.pdf.
- Bergstra, James, Rémi Bardenet, Yoshua Bengio, and Balázs Kégl. 2011. "Algorithms for Hyper-Parameter Optimization." NIPS 2011. https://papers.nips.cc/paper/4443-algorithms-for-hyper-parameter-optimization.pdf