Cross-validation is a widely used technique to assess the generalization performance of a machine learning model. In this blog post I will introduce the basics of cross-validation, provide guidelines to tweak its parameters, and illustrate how to build it from scratch in an efficient way.
What the MAPE is FALSELY blamed for, its TRUE weaknesses and BETTER alternatives!
In time series context, one of most the commonly used measures is the MAPE. In this blog post, I evaluate critical arguments and weaknesses concerning the MAPE and demonstrate alternative measures.
Monotoniebedingungen in Machine Learning Modellen mit R
Monotoniebedingungen können helfen den Sachverhalt besser durch Modelle darstellen zu lassen. In diesem Beitrag wird erklärt wir man solche Monotoniebedingungen in R umsetzt.
Coding Random Forests in 100 lines of code*
In our series of explaining method in 100 lines of code, we tackle random forest this time! We build it from scratch and explore it’s functions.
How to Speed Up Gradient Boosting by a Factor of Two
Our latest tool development at STATWORX: random boost, an algorithm twice as fast as gradient boosting, with comparable prediction performance.
Coding Regression trees in 150 lines of R code
Motivation There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics, however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks. At STATWORX we discuss algorithms daily to evaluate their usefulness for a specific project. In any case, understanding these …
Coding Gradient boosted machines in 100 lines of R code
Motivation There are dozens of machine learning algorithms out there. It is impossible to learn all their mechanics, however, many algorithms sprout from the most established algorithms, e.g. ordinary least squares, gradient boosting, support vector machines, tree-based algorithms and neural networks. At STATWORX we discuss algorithms daily to evaluate their usefulness for a specific project or problem. In any case, …
Using Machine Learning for Causal Inference
Machine Learning (ML) is still an underdog in the field of economics. However, it gets more and more recognition in the recent years. One reason for being an underdog is, that in economics and other social sciences one is not only interested in predicting but also in making causal inference. Thus many "off-the-shelf" ML algorithms are solving a fundamentally different …
Regularized Greedy Forest – The Scottish Play (Act II)
In part one of the blog post, the Regularized Greedy Forest (RGF) was introduced as a contender to the more frequently used technique of Gradient Boosting Decision Trees (GBDT). Now it is time to turn words into actions and find out whether it actually is. Among all GBDT implementations, XGBoost is probably the most commonly used implementation in the field …
Regularized Greedy Forest – The Scottish Play (Act I)
Macbeth shall never vanquish'd be until Great Birnam Wood to high Dunsinane Hill Shall come against him. (Act 4, Scene 1) In Shakespeare's The Tragedy of Macbeth, the prophecy of Birnam Wood is one of three misleading prophecies foreshadowing the defeat of the protagonist of the same name. While highly unlikely, the event of a nearby forest moving towards his …