Data Science in Python – Vorschau und Werkzeuge

Marvin Taschenberger Blog, Data Science

Teil 0 – Vorschau und Werkzeuge In Sachen Datenaufbereitung, Datenformatierung und statistischer Auswertung oder kurz Data Science, war (und hier in Deutschland ist immer noch) R die Sprache der Wahl. Global hat Python hier deutlich an Popularität gewonnen und ist mittlerweile sogar vorherrschend in diesem Gebiet (siehe Studie von KDnuggets). Daher soll diese Reihe schon einmal einen Einblick geben „Warum …

pandas vs. data.table – A study of data-frames

Christian Moreau Blog, Data Science

Overview and Setting Python and R have become the most important languages in analytics and data science. Usually a data scientist can at least navigate one language with relative ease and at STATWORX we luckily have both expertises available. While, with enough will and effort, any coding project can be completed in either language, perhaps they differ in some perfomance …

the package rat

Rats! Where are my R-Packages?

David Schlepps Blog, Data Science

It happened to many of us. Somehow, we managed to get our hands on a neatly prepared script of a colleague. However, instead of getting away with just scrounging beautiful code off our fellow human beings, we realize that something is missing to seamlessly steal their awesomeness: Their packages. A thing that we have to get our heads around when …

XY Titel

Simulating Regression Data with Xy

André Bleier Blog, Data Science, Statistik

In a recent project, I have developed a gradient boosting algorithm to estimate price elasticities. Surely, it is necessary to validate if the functionalities of the algorithm are working as intended. I started using nonlinear time series data from another blog post about lag selection as a validation basis. Unfortunately, at that time I did not wrap the simulation code …



Jakob Gepp Blog, Data Science, Statistik

Life is an ongoing process of learning new things. But how can you stay up to date in an always moving and evolving topic? One way to do this is to read blogs and follow forums like stackoverflow, where you can learn from the problems and solutions of the community. Another way is to meet people on a regular basis …


How NOT to overplot

Lea Waniek Blog, Data Science, Statistik

Overplotting can be a serious problem, which complicates data visualization and thus also data exploration. Overplotting describes situations, in which multiple data points overlay each other within a plot, causing the individual observations to be non-distinguishable. In such cases, plots only indicate the general extent of the data, while existing relationshipsmight be heavily obscured. Overplotting especially occurs when dealing with …

3D splines

XGBoost Tree vs. Linear

Fabian Müller Blog, Data Science

Introduction One of the highlights of this year's H2O World was a Kaggle Grandmaster Panel. The attendees, Gilberto Titericz (Airbnb), Mathias Müller (, Dmitry Larko (, Marios Michailidis (, and Mark Landry (, answered various questions about Kaggle and data science in general. One of the questions from the audience was which tools and algorithms the Grandmasters frequently use. As …


Einführung in Reinforcement Learning – wenn Maschinen wie Menschen lernen

Sebastian Heinz Blog, Data Science

Die meisten Machine Learning Algorithmen, die heute in der Praxis Anwendung finden, gehören zur Klasse des überwachten Lernens (Supervised Learning). Im Supervised Learning wird dem Machine Learning Modell ex post eine bereits bekannte Zielgröße präsentiert, die auf Basis verschiedener Einflussfaktoren in den Daten durch eine Funktion möglichst genau vorhergesagt werden soll. Die Funktion repräsentiert dabei abstrakt das jeweilige Machine Learning …

Wörterbuch editor

Compiling R Code in Sublime Text

Lukas Strömsdörfer Blog, Data Science, Statistik

What is Sublime Text? Nearly every coder has at one point googled for the best code editor. To those who did, you already know Sublime Text. To those who didn't: best code editors. After its initial release in 2007 Sublime Text has for sure made its way into the ranks of the most popular editors. Here at STATWORX, most of …

Data Processing

Gut in Form! Reshapes in R, Stata und SPSS

Jessica Aust Blog, Statistik

In diesem Blogeintrag aus der „Gut in Form”-Reihe wird gezeigt, wie sich Datenreshapes in R, Stata und SPSS umsetzten lassen. Diese Datenreshapes dienen dazu, die vorliegenden Daten zu transformieren und so die optimale Darstellung zu erhalten, wenn pro Einheit mehrere Informationen zu einer Begebenheit vorliegen. Was vielleicht etwas kompliziert klingt, soll anhand eines Beispiels erläutert werden: Es soll ein Datensatz …