Flowcharts of functions

Jakob Gepp Blog, Data Science

When you work on bigger R projects there comes a point when you may lose the overview of how your functions are connected. Or even worse: you get a large project and have to figure out what is actually happening! A possible remedy to this problem are flowcharts. If you started your project with a flowchart: good for you – …


A data geek, an AI guy, and a fintech dude go into a bar…

Lukas Strömsdörfer Blog, Data Science, Statistik

… some water down the bridge, we are having a Co-Meetup in Frankfurt – kudos to the organizers. Those guys are just awesome. For the past years they have been making an effort to build a Data Science community in Frankfurt – you should check out their Twitter feed. Whenever there is a Meetup – which you should totally check …

pandas vs. data.table – A study of data-frames

Christian Moreau Blog, Data Science

Overview and Setting Python and R have become the most important languages in analytics and data science. Usually a data scientist can at least navigate one language with relative ease and at STATWORX we luckily have both expertises available. While, with enough will and effort, any coding project can be completed in either language, perhaps they differ in some perfomance …

the package rat

Rats! Where are my R-Packages?

David Schlepps Blog, Data Science

It happened to many of us. Somehow, we managed to get our hands on a neatly prepared script of a colleague. However, instead of getting away with just scrounging beautiful code off our fellow human beings, we realize that something is missing to seamlessly steal their awesomeness: Their packages. A thing that we have to get our heads around when …

XY Titel

Simulating Regression Data with Xy

André Bleier Blog, Data Science, Statistik

In a recent project, I have developed a gradient boosting algorithm to estimate price elasticities. Surely, it is necessary to validate if the functionalities of the algorithm are working as intended. I started using nonlinear time series data from another blog post about lag selection as a validation basis. Unfortunately, at that time I did not wrap the simulation code …



Jakob Gepp Blog, Data Science, Statistik

Life is an ongoing process of learning new things. But how can you stay up to date in an always moving and evolving topic? One way to do this is to read blogs and follow forums like stackoverflow, where you can learn from the problems and solutions of the community. Another way is to meet people on a regular basis …


How NOT to overplot

Lea Waniek Blog, Data Science, Statistik

Overplotting can be a serious problem, which complicates data visualization and thus also data exploration. Overplotting describes situations, in which multiple data points overlay each other within a plot, causing the individual observations to be non-distinguishable. In such cases, plots only indicate the general extent of the data, while existing relationshipsmight be heavily obscured. Overplotting especially occurs when dealing with …

3D splines

XGBoost Tree vs. Linear

Fabian Müller Blog, Data Science

Introduction One of the highlights of this year's H2O World was a Kaggle Grandmaster Panel. The attendees, Gilberto Titericz (Airbnb), Mathias Müller (H2O.ai), Dmitry Larko (H2O.ai), Marios Michailidis (H2O.ai), and Mark Landry (H2O.ai), answered various questions about Kaggle and data science in general. One of the questions from the audience was which tools and algorithms the Grandmasters frequently use. As …

Wörterbuch editor

Compiling R Code in Sublime Text

Lukas Strömsdörfer Blog, Data Science, Statistik

What is Sublime Text? Nearly every coder has at one point googled for the best code editor. To those who did, you already know Sublime Text. To those who didn't: best code editors. After its initial release in 2007 Sublime Text has for sure made its way into the ranks of the most popular editors. Here at STATWORX, most of …