generalized random forest

Using Machine Learning for Causal Inference

Markus Berroth Blog, Data Science

Machine Learning (ML) is still an underdog in the field of economics. However, it gets more and more recognition in the recent years. One reason for being an underdog is, that in economics and other social sciences one is not only interested in predicting but also in making causal inference. Thus many "off-the-shelf" ML algorithms are solving a fundamentally different …

salad pricing

Food for Regression: Using Sales Data to Identify Price Elasticity

Daniel Lüttgau Blog, Data Science

A few hundred meters from our office, there is a little lunch place. It is part of a small chain that specializes in assemble-yourself, ready-to-eat salads. When we moved into our new office a few years ago, this salad vendor quickly became a daily fixture. However, overtime, this changed. We still eat there regularly, but I am certain, if one …

airflow title

A framework to automate your work: How to set up Airflow!

Marvin Taschenberger Blog, Data Science

In the first part of this blog post, we talked about what a DAG is, how to apply this mathematical concept in project planning and programming and why we at STATWORX decided to use Airflow compared to other workflow managers. In this part, however, we will get more technical and investigate a quite informative hello-world programming and how to set …

XY Titel

Pushing Ordinary Least Squares to the limit with Xy()

André Bleier Blog, Data Science

Introduction to Xy() Simulation is mostly about answering particular research questions. Whenever the word simulation appears somewhere in a discussion, everyone knows that this means additional effort. At STATWORX we are using simulations as a first step to proof concepts we are developing. Sometimes such a simulation is simple, in other cases a simulation is plenty of work. Though, research …

Data Science in Python – Matplotlib (Teil 4)

Moritz Gnisia Blog, Data Science

Nachdem wir in dem vorherigen Artikel eine Einführung in Pandas gegeben haben und somit nun Daten auswählen sowie manipulieren können, soll sich in diesem Artikel alles um die Visualisierung von Daten drehen. Bekanntlicherweise lassen sich mit der passenden Grafik Daten häufig noch besser verstehen und ermöglichen eine andere Art der Interpretation, unabhängig von Mittelwerten und anderen Kennzahlen. Welche Bibliothek zu …

greedy forest

Regularized Greedy Forest – The Scottish Play (Act II)

Fabian Müller Blog, Data Science

In part one of the blog post, the Regularized Greedy Forest (RGF) was introduced as a contender to the more frequently used technique of Gradient Boosting Decision Trees (GBDT). Now it is time to turn words into actions and find out whether it actually is. Among all GBDT implementations, XGBoost is probably the most commonly used implementation in the field …