Back to all Blog Posts

Machine Learning Goes Causal II: Meet the Random Forest’s Causal Brother

  • Machine Learning
05. February 2020
·

Team statworx

At  we are excited that a new promising field of Machine Learning has evolved in recent years: Causal Machine Learning. In short, Causal Machine Learning is the scientific study of Machine Learning algorithms that allow estimating causal effects. Over the last few years, different Causal Machine Learning algorithms have been developed, combining the advances from Machine Learning with the theory of causal inference to estimate different types of causal effects. My colleague Markus has already introduced some of these algorithms in an earlier blog post.

As Causal Machine Learning is a rather complex topic, I will write a series of blog posts to slowly dive into this new fascinating world of data science. In my first blog post, I gave an introduction to the topic, focusing on what Causal Machine Learning is and why it is important in practice and for the future of data science. In this second blog post, I will introduce the so-called Causal Forest, one of the most popular Causal Machine Learning algorithms to estimate heterogeneous treatment effects.

Why Heterogeneous Treatment Effects?

In Causal Forests, the goal is to estimate heterogeneity in treatment effects. As explained in my previous blog post, a treatment effect refers to a causal effect of a treatment or intervention on an outcome variable of scientific or political interest. For example the causal effect of a subsidized training program on earnings. As individual treatment effects are unobservable, the practice focuses on estimating unbiased and consistent averages of the individual treatment effect. The most common parameter thereof is the average treatment effect, which is the mean of all individual treatment effects in the entire population of interest. However, sometimes treatment effects may vary widely between different subgroups in the population, bet it larger or smaller than the average treatment effect. In some cases, it might therefore be more interesting to estimate these different, i.e. heterogeneous treatment effects.

In most applications it is also interesting to look beyond the average effects in order to understand how the causal effects vary with observable characteristics.(Knaus, Lechner & Strittmatter, 2018)

The estimation of heterogeneous treatment effects can assist in answering questions like: For whom are there big or small treatment effects? For which subgroup does a treatment generate beneficial or adverse effects? In the field of marketing, for example, the estimation of heterogeneous treatment effects can help to optimize resource allocation by answering the question of which customers respond the most to a certain marketing campaign or for which customers is the causal effect of intervention strategies on their churn behavior the highest. Or when it comes to pricing, it might be interesting to quantify how a change in price has a varying impact on sales among different age or income groups.

Where Old Estimation Methods Fail

Estimating heterogeneous treatment effects is nothing new. Econometrics and other social sciences have long been studying which variables predict a smaller or larger than average treatment effect, which in statistical terms is also known as Moderation. One of the most traditional ways to find heterogeneous treatment effects is to use a Multiple Linear Regression with interaction terms between the variables of interest (i.e. the ones which might lead to treatment heterogeneity) and the treatment indicator. In this blog post, I will always assume that the data is from a randomized experiment, such that the assumptions to identify treatment effects are valid without further complications. We then conclude that the treatment effect depends on the variables whose interaction term is statistically significant. For example, if we have only one variable, the regression model would look as follows:

 

\[Y = beta_0 + beta_1 w + beta_2 x_1 + beta_3 (w * x_1),\]

where

w

is the treatment indicator and

x_1

is the variable of interest. In that case, if

beta_3

is significant, we know that the treatment effect depends on variable

x_1

. The treatment effect for each observation can then be calculated as

 

\[beta_1 + beta_3 * x_1,\]

which is dependent on the value of

x_1

and therefore heterogeneous among the different observations.

So why is there a need for more advanced methods to estimate heterogeneous treatment effects? The example above was very simple, it only included one variable. However, usually, we have more than one variable which might influence the treatment effect. To see which variables predict heterogeneous treatment effects, we have to include many interaction terms, not only between each variable and the treatment indicator but also for all possible combinations of variables with and without the treatment indicator. If we have

p

variables and one treatment, this gives a total number of parameters of:

 

\[displaystylesum_{k = 0}^{p + 1} {p + 1 choose k}.\]

So, for example, if we had 5 variables, we would have to include a total number of 64 parameters into our Linear Regression Model. This approach suffers from a lack of statistical power and could also cause computational issues. The use of a Multiple Linear Regression also imposes linear relationships unless more interactions with polynomials are included. Because Machine Learning algorithms can handle enormous numbers of variables and combine them in nonlinear and highly interactive ways, researchers have found ways to better estimate heterogeneous treatment effects by combining the field of Machine Learning with the study of Causal Inference.

Generalized Random Forests

Over recent years, different Machine Learning algorithms have been developed to estimate heterogeneous treatment effects. Most of them are based on the idea of Decision Trees or Random Forests, just like the one I focus on in this blog post: Generalised Random Forests by Athey, Tibshirani, and Wager (2018).

Generalized Random Forests follow the idea of Random Forests and apart from heterogeneous treatment effect estimation, this algorithm can also be used for non-parametric quantile regression and instrumental variable regression. It keeps the main structure of Random Forests such as the recursive partitioning, subsampling, and random split selection. However, instead of averaging over the trees Generalised Random Forests estimate a weighting function and use the resulting weights to solve a local GMM model. To estimate heterogeneous treatment effects, this algorithm has two important additional features, which distinguish it from standard Random Forests.

1. Splitting Criterion

The first important difference to Random Forests is the splitting criterion. In Random Forests, where we want to predict an outcome variable

Y

, the split at each tree node is performed by minimizing the mean squared error of the outcome variable

Y

. In other words, the variable and value to split at each tree node are chosen such that the greatest reduction in the mean squared error with regard to the outcomes

Y

is achieved. After each tree partition has been completed, the tree’s prediction for a new observation

x

is obtained by letting it trickle through all the way from tree’s root into a terminal node, and then taking the average of outcomes

Y

of all the observations

x

that fell into the same node during training. The Random Forest prediction is then calculated as the average of the predicted tree values.

In Causal Forests, we want to estimate treatment effects. As stated by the Fundamental Problem of Causal Inference, however, we can never observe a treatment effect on an individual level. Therefore, the prediction of a treatment effect is given by the difference in the average outcomes

Y

between the treated and the untreated observations in a terminal node. Without going into too much detail, to find the most heterogeneous but also accurate treatment effects, the splitting criterion is adapted such that it searches for a partitioning where the treatment effects differ the most including a correction that accounts for how the splits affect the variance of the parameter estimates.

tree branches

2. Honesty

Random Forests are usually evaluated by applying them to a test set and measuring the accuracy of the predictions of Y using an error measure such as the mean squared error. Because we can never observe treatment effects, this form of performance measure is not possible in Causal Forests. When estimating causal effects, one, therefore, evaluates their accuracy by examining the bias, standard error and the related confidence interval of the estimates. To ensure that an estimate is as accurate as possible, the bias should asymptotically disappear and the standard error and, thus, the confidence interval, should be as small as possible. To enable this statistical inference in their Generalised Random Forest, Athey, Tibshirani and Wager introduce so-called honest trees.

In order to make a tree honest, the training data is split into two subsamples: a splitting subsample and an estimating subsample. The splitting subsample is used to perform the splits and thus grow the tree. The estimating subsample is then used to make the predictions. That is, all observations in the estimating subsample are dropped down the previously-grown tree until it falls into a terminal node. The prediction of the treatment effects is then given by the difference in the average outcomes between the treated and the untreated observations of the estimating subsample in the terminal nodes. With such honest trees, the estimates of a Causal Forest are consistent (i.e. the bias vanishes asymptotically) and asymptotically Gaussian which together with the estimator for the asymptotic variance allow valid confidence intervals.

Causal Forest in Action

To show the advantages of Causal Forests compared to old estimation methods, in the following I will compare the Generalised Random Forest to a Regression with interaction terms in a small simulation study. I use simulated data to be able to compare the estimated treatment effects with the actual treatment effects, which, as we know, would not be observable in real data. To compare the two algorithms with respect to the estimation of heterogeneous treatment effects, I test them on two different data sets, one with and one wihtout heterogeneity in the treatment effect:

Data SetHeterogeneityHeterogeneity VariablesVariablesObservations1No Heterogeneity

x_1

x_{10}

200002Heterogeneity

x_1

and

x_2
x_1

x_{10}

20000

This means that in the first data set, all observations have the same treatment effect. In this case, the average treatment effect and the heterogeneous treatment effects are the same. In the second data set, the treatment effect varies with the variables

x_1

and

x_2

. Without going into too much detail here (I will probably write a separate blog post only about causal data generating processes), the relationship between those heterogeneity variables (

x_1

and

x_2

) and the treatment effect is not linear. Both simulated data sets have 20’000 observations containing an outcome variable

Y

and 10 covariates with values between zero and one.

To evaluate the two algorithms, the data sets are split in a train (75%) and a test set (25%). For the Causal Forest, I use the causal_forest() from the grf-package with tune.parameters = "all". I compare this to an lm() model, which includes all variables, the treatment indicator and the necessary interaction terms of the heterogeneity variables and the treatment indicator:

Linear Regression Model for data set with heterogeneity:

 

\[Y = beta_0 + beta_1 x_1 + beta_2 x_2 + dots + beta_{10} x_{10} + beta_{11} w +\]

 

\[beta_{12} (w * x_1) + beta_{13} (w * x_2) + beta_{14} (x_1 * x_2) + beta_{15} (w * x_1 * x_2)\]

Linear Regression Model for data set with no heterogeneity:

 

\[Y = beta_0 + beta_1 x_1 + beta_2 x_2 + … + beta_{10} x_{10} + beta_{11} w\]

where

x_1

x_{10}

are the heterogeneity variables and

w

is the treatment indicator (i.e.

w = 1

if treated and

w = 0

if not treated). As already explained above, we usually do not know which variables affect the treatment effect and have therefore to include all possible interaction terms into the Linear Regression Model to see which variables lead to treatment effect heterogeneity. In the case of 10 variables, as we have it here, this means we would have to include a total of 2048 parameters in our Linear Regression Model. However, since the heterogeneity variables are known in the simulated data, here, I only include the interaction terms for those variables.

Data SetMetricgrflmNo HeterogeneityRMSE0.010.00HeterogeneityRMSE0.080.45

Looking at the results, we can see that without heterogeneity, the treatment effect is equally well predicted by the Causal Forest (RMSE of 0.01) and the Linear Regression (RMSE of 0.00). However, as the heterogeneity level increases, the Causal Forest is far more accurate (RMSE of 0.08) than the Linear Regression (RMSE of 0.45). As expected, the Causal Forest seems to be better at detecting the underlying non-linear relationship between the heterogeneity variables and the treatment effect than the Linear Regression Model, which can also be seen in the plots below. Thus, even if we already know which variables influence the treatment effect and only need to include the necessary interaction terms, the Linear Regression Model is still less accurate than the Causal Forest due to its lack of modeling flexibility.

treatment effect hexplot

Outlook

I hope that this blog post has helped you to understand what Causal Forests are and what advantages they bring in estimating heterogeneous treatment effects compared to old estimation methods. In my upcoming blog posts on Causal Machine Learning, I will explore this new field of data science further. I will, for example, take a look at the problems of using classical Machine Learning algorithms to estimate causal effects in more detail or introduce different data generating processes to evaluate Causal Machine Learning methods in simulation studies.

References

  • Athey, S., Tibshirani, J., & Wager, S. (2019). Generalised random forests. The Annals of Statistics, 47(2), 1148-1178.
  • Knaus, M. C., Lechner, M., & Strittmatter, A. (2018). Machine learning estimation of heterogeneous causal effects: Empirical monte carlo evidence. arXiv:1810.13237v2.

Linkedin Logo
Marcel Plaschke
Head of Strategy, Sales & Marketing
schedule a consultation
Zugehörige Leistungen
No items found.

More Blog Posts

  • Artificial Intelligence
AI Trends Report 2025: All 16 Trends at a Glance
Tarik Ashry
05. February 2025
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
Explainable AI in practice: Finding the right method to open the Black Box
Jonas Wacker
15. November 2024
Read more
  • Artificial Intelligence
  • Data Science
  • GenAI
How a CustomGPT Enhances Efficiency and Creativity at hagebau
Tarik Ashry
06. November 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Data Science
  • Deep Learning
  • GenAI
  • Machine Learning
AI Trends Report 2024: statworx COO Fabian Müller Takes Stock
Tarik Ashry
05. September 2024
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
The AI Act is here – These are the risk classes you should know
Fabian Müller
05. August 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 4)
Tarik Ashry
31. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 3)
Tarik Ashry
24. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 2)
Tarik Ashry
04. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Back to the Future: The Story of Generative AI (Episode 1)
Tarik Ashry
10. July 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Generative AI as a Thinking Machine? A Media Theory Perspective
Tarik Ashry
13. June 2024
Read more
  • Artificial Intelligence
  • GenAI
  • statworx
Custom AI Chatbots: Combining Strong Performance and Rapid Integration
Tarik Ashry
10. April 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Human-centered AI
How managers can strengthen the data culture in the company
Tarik Ashry
21. February 2024
Read more
  • Artificial Intelligence
  • Data Culture
  • Human-centered AI
AI in the Workplace: How We Turn Skepticism into Confidence
Tarik Ashry
08. February 2024
Read more
  • Artificial Intelligence
  • Data Science
  • GenAI
The Future of Customer Service: Generative AI as a Success Factor
Tarik Ashry
25. October 2023
Read more
  • Artificial Intelligence
  • Data Science
How we developed a chatbot with real knowledge for Microsoft
Isabel Hermes
27. September 2023
Read more
  • Data Science
  • Data Visualization
  • Frontend Solution
Why Frontend Development is Useful in Data Science Applications
Jakob Gepp
30. August 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • statworx
the byte - How We Built an AI-Powered Pop-Up Restaurant
Sebastian Heinz
14. June 2023
Read more
  • Artificial Intelligence
  • Recap
  • statworx
Big Data & AI World 2023 Recap
Team statworx
24. May 2023
Read more
  • Data Science
  • Human-centered AI
  • Statistics & Methods
Unlocking the Black Box – 3 Explainable AI Methods to Prepare for the AI Act
Team statworx
17. May 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
How the AI Act will change the AI industry: Everything you need to know about it now
Team statworx
11. May 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
Gender Representation in AI – Part 2: Automating the Generation of Gender-Neutral Versions of Face Images
Team statworx
03. May 2023
Read more
  • Artificial Intelligence
  • Data Science
  • Statistics & Methods
A first look into our Forecasting Recommender Tool
Team statworx
26. April 2023
Read more
  • Artificial Intelligence
  • Data Science
On Can, Do, and Want – Why Data Culture and Death Metal have a lot in common
David Schlepps
19. April 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
GPT-4 - A categorisation of the most important innovations
Mareike Flögel
17. March 2023
Read more
  • Artificial Intelligence
  • Data Science
  • Strategy
Decoding the secret of Data Culture: These factors truly influence the culture and success of businesses
Team statworx
16. March 2023
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
How to create AI-generated avatars using Stable Diffusion and Textual Inversion
Team statworx
08. March 2023
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Strategy
Knowledge Management with NLP: How to easily process emails with AI
Team statworx
02. March 2023
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
3 specific use cases of how ChatGPT will revolutionize communication in companies
Ingo Marquart
16. February 2023
Read more
  • Recap
  • statworx
Ho ho ho – Christmas Kitchen Party
Julius Heinz
22. December 2022
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Real-Time Computer Vision: Face Recognition with a Robot
Sarah Sester
30. November 2022
Read more
  • Data Engineering
  • Tutorial
Data Engineering – From Zero to Hero
Thomas Alcock
23. November 2022
Read more
  • Recap
  • statworx
statworx @ UXDX Conf 2022
Markus Berroth
18. November 2022
Read more
  • Artificial Intelligence
  • Machine Learning
  • Tutorial
Paradigm Shift in NLP: 5 Approaches to Write Better Prompts
Team statworx
26. October 2022
Read more
  • Recap
  • statworx
statworx @ vuejs.de Conf 2022
Jakob Gepp
14. October 2022
Read more
  • Data Engineering
  • Data Science
Application and Infrastructure Monitoring and Logging: metrics and (event) logs
Team statworx
29. September 2022
Read more
  • Coding
  • Data Science
  • Machine Learning
Zero-Shot Text Classification
Fabian Müller
29. September 2022
Read more
  • Cloud Technology
  • Data Engineering
  • Data Science
How to Get Your Data Science Project Ready for the Cloud
Alexander Broska
14. September 2022
Read more
  • Artificial Intelligence
  • Human-centered AI
  • Machine Learning
Gender Repre­sentation in AI – Part 1: Utilizing StyleGAN to Explore Gender Directions in Face Image Editing
Isabel Hermes
18. August 2022
Read more
  • Artificial Intelligence
  • Human-centered AI
statworx AI Principles: Why We Started Developing Our Own AI Guidelines
Team statworx
04. August 2022
Read more
  • Data Engineering
  • Data Science
  • Python
How to Scan Your Code and Dependencies in Python
Thomas Alcock
21. July 2022
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
Data-Centric AI: From Model-First to Data-First AI Processes
Team statworx
13. July 2022
Read more
  • Artificial Intelligence
  • Deep Learning
  • Human-centered AI
  • Machine Learning
DALL-E 2: Why Discrimination in AI Development Cannot Be Ignored
Team statworx
28. June 2022
Read more
  • R
The helfRlein package – A collection of useful functions
Jakob Gepp
23. June 2022
Read more
  • Recap
  • statworx
Unfold 2022 in Bern – by Cleverclip
Team statworx
11. May 2022
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
  • Machine Learning
Break the Bias in AI
Team statworx
08. March 2022
Read more
  • Artificial Intelligence
  • Cloud Technology
  • Data Science
  • Sustainable AI
How to Reduce the AI Carbon Footprint as a Data Scientist
Team statworx
02. February 2022
Read more
  • Recap
  • statworx
2022 and the rise of statworx next
Sebastian Heinz
06. January 2022
Read more
  • Recap
  • statworx
5 highlights from the Zurich Digital Festival 2021
Team statworx
25. November 2021
Read more
  • Data Science
  • Human-centered AI
  • Machine Learning
  • Strategy
Why Data Science and AI Initiatives Fail – A Reflection on Non-Technical Factors
Team statworx
22. September 2021
Read more
  • Artificial Intelligence
  • Data Science
  • Human-centered AI
  • Machine Learning
  • statworx
Column: Human and machine side by side
Sebastian Heinz
03. September 2021
Read more
  • Coding
  • Data Science
  • Python
How to Automatically Create Project Graphs With Call Graph
Team statworx
25. August 2021
Read more
  • Coding
  • Python
  • Tutorial
statworx Cheatsheets – Python Basics Cheatsheet for Data Science
Team statworx
13. August 2021
Read more
  • Data Science
  • statworx
  • Strategy
STATWORX meets DHBW – Data Science Real-World Use Cases
Team statworx
04. August 2021
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
Deploy and Scale Machine Learning Models with Kubernetes
Team statworx
29. July 2021
Read more
  • Cloud Technology
  • Data Engineering
  • Machine Learning
3 Scenarios for Deploying Machine Learning Workflows Using MLflow
Team statworx
30. June 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Car Model Classification III: Explainability of Deep Learning Models With Grad-CAM
Team statworx
19. May 2021
Read more
  • Artificial Intelligence
  • Coding
  • Deep Learning
Car Model Classification II: Deploying TensorFlow Models in Docker Using TensorFlow Serving
No items found.
12. May 2021
Read more
  • Coding
  • Deep Learning
Car Model Classification I: Transfer Learning with ResNet
Team statworx
05. May 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
Car Model Classification IV: Integrating Deep Learning Models With Dash
Dominique Lade
05. May 2021
Read more
  • AI Act
Potential Not Yet Fully Tapped – A Commentary on the EU’s Proposed AI Regulation
Team statworx
28. April 2021
Read more
  • Artificial Intelligence
  • Deep Learning
  • statworx
Creaition – revolutionizing the design process with machine learning
Team statworx
31. March 2021
Read more
  • Artificial Intelligence
  • Data Science
  • Machine Learning
5 Types of Machine Learning Algorithms With Use Cases
Team statworx
24. March 2021
Read more
  • Recaps
  • statworx
2020 – A Year in Review for Me and GPT-3
Sebastian Heinz
23. Dezember 2020
Read more
  • Artificial Intelligence
  • Deep Learning
  • Machine Learning
5 Practical Examples of NLP Use Cases
Team statworx
12. November 2020
Read more
  • Data Science
  • Deep Learning
The 5 Most Important Use Cases for Computer Vision
Team statworx
11. November 2020
Read more
  • Data Science
  • Deep Learning
New Trends in Natural Language Processing – How NLP Becomes Suitable for the Mass-Market
Dominique Lade
29. October 2020
Read more
  • Data Engineering
5 Technologies That Every Data Engineer Should Know
Team statworx
22. October 2020
Read more
  • Artificial Intelligence
  • Data Science
  • Machine Learning

Generative Adversarial Networks: How Data Can Be Generated With Neural Networks
Team statworx
10. October 2020
Read more
  • Coding
  • Data Science
  • Deep Learning
Fine-tuning Tesseract OCR for German Invoices
Team statworx
08. October 2020
Read more
  • Artificial Intelligence
  • Machine Learning
Whitepaper: A Maturity Model for Artificial Intelligence
Team statworx
06. October 2020
Read more
  • Data Engineering
  • Data Science
  • Machine Learning
How to Provide Machine Learning Models With the Help Of Docker Containers
Thomas Alcock
01. October 2020
Read more
  • Recap
  • statworx
STATWORX 2.0 – Opening of the New Headquarters in Frankfurt
Julius Heinz
24. September 2020
Read more
  • Machine Learning
  • Python
  • Tutorial
How to Build a Machine Learning API with Python and Flask
Team statworx
29. July 2020
Read more
  • Data Science
  • Statistics & Methods
Model Regularization – The Bayesian Way
Thomas Alcock
15. July 2020
Read more
  • Recap
  • statworx
Off To New Adventures: STATWORX Office Soft Opening
Team statworx
14. July 2020
Read more
  • Data Engineering
  • R
  • Tutorial
How To Dockerize ShinyApps
Team statworx
15. May 2020
Read more
  • Coding
  • Python
Making Of: A Free API For COVID-19 Data
Sebastian Heinz
01. April 2020
Read more
  • Frontend
  • Python
  • Tutorial
How To Build A Dashboard In Python – Plotly Dash Step-by-Step Tutorial
Alexander Blaufuss
26. March 2020
Read more
  • Coding
  • R
Why Is It Called That Way?! – Origin and Meaning of R Package Names
Team statworx
19. March 2020
Read more
  • Data Visualization
  • R
Community Detection with Louvain and Infomap
Team statworx
04. March 2020
Read more
  • Coding
  • Data Engineering
  • Data Science
Testing REST APIs With Newman
Team statworx
26. February 2020
Read more
  • Coding
  • Frontend
  • R
Dynamic UI Elements in Shiny – Part 2
Team statworx
19. Febuary 2020
Read more
  • Coding
  • Data Visualization
  • R
Animated Plots using ggplot and gganimate
Team statworx
14. Febuary 2020
Read more
  • Artificial Intelligence
  • Machine Learning
  • Statistics & Methods
Machine Learning Goes Causal I: Why Causality Matters
Team statworx
29.01.2020
Read more
  • Data Engineering
  • R
  • Tutorial
How To Create REST APIs With R Plumber
Stephan Emmer
23. January 2020
Read more
  • Recaps
  • statworx
statworx 2019 – A Year in Review
Sebastian Heinz
20. Dezember 2019
Read more
  • Artificial Intelligence
  • Deep Learning
Deep Learning Overview and Getting Started
Team statworx
04. December 2019
Read more
  • Coding
  • Machine Learning
  • R
Tuning Random Forest on Time Series Data
Team statworx
21. November 2019
Read more
  • Data Science
  • R
Combining Price Elasticities and Sales Forecastings for Sales Improvement
Team statworx
06. November 2019
Read more
  • Data Engineering
  • Python
Access your Spark Cluster from Everywhere with Apache Livy
Team statworx
30. October 2019
Read more
  • Recap
  • statworx
STATWORX on Tour: Wine, Castles & Hiking!
Team statworx
18. October 2019
Read more
  • Data Science
  • R
  • Statistics & Methods
Evaluating Model Performance by Building Cross-Validation from Scratch
Team statworx
02. October 2019
Read more
  • Data Science
  • Machine Learning
  • R
Time Series Forecasting With Random Forest
Team statworx
25. September 2019
Read more
  • Coding
  • Frontend
  • R
Dynamic UI Elements in Shiny – Part 1
Team statworx
11. September 2019
Read more
  • Machine Learning
  • R
  • Statistics & Methods
What the Mape Is FALSELY Blamed For, Its TRUE Weaknesses and BETTER Alternatives!
Team statworx
16. August 2019
Read more
  • Coding
  • Python
Web Scraping 101 in Python with Requests & BeautifulSoup
Team statworx
31. July 2019
Read more
  • Coding
  • Frontend
  • R
Getting Started With Flexdashboards in R
Thomas Alcock
19. July 2019
Read more
  • Recap
  • statworx
statworx summer barbecue 2019
Team statworx
21. June 2019
Read more
  • Data Visualization
  • R
Interactive Network Visualization with R
Team statworx
12. June 2019
Read more
  • Deep Learning
  • Python
  • Tutorial
Using Reinforcement Learning to play Super Mario Bros on NES using TensorFlow
Sebastian Heinz
29. May 2019
Read more
This is some text inside of a div block.
This is some text inside of a div block.