System of right-angled coordinates

Coordinate systems in ggplot2: easily overlooked and rather underrated

Lea Waniek Blog, Data Science, Statistik

All plots have coordinate systems. Perhaps because they are such an integral element of plots, they are easily overlooked. However, in ggplot2, there are several very useful options to customize the coordinate systems of plots, which we will not overlook but explore in this blog post. Since it is spring, we will use a random subset of the famous iris …

About Risks and Side-Effects… Consult your Purrr-Macist

David Schlepps Blog, Data Science, Statistik

Capture errors, warnings and messages, but keep your list operations going In a recent post about text mining, I discussed some solutions to webscraping the contents of our STATWORX blog using the purrr-package. However, while preparing the next the episode of my series on text mining, I remembered a little gimmick that I found quite helpful along the way. Thus, …

burglr – stealing code from the web

André Bleier Blog, Data Science

Introduction All we do at STATWORX all day long is stealing code from the web. That is why I thought it would only be fair to code a function which does that conveniently. With burglr you have all functions and kickass machine learning models at your fingertips. This would have been a more exciting description of the function I will …

pandas vs. data.table – A study of data-frames – Part 2

Tobias Krabel Blog, Data Science

The story continues As Christian and I have already mentioned in part 1 of this simulation study series, pandas and data.table have become the most widely used packages for data manipulation in Python and R, respectively (in R, of course, one may not miss mentioning the dplyr package). Furthermore, at STATWORX we have experts in both domains, and besides having …

diamonds

Diamonds and Faceting are a Data Scientist’s best Friends

Lea Waniek Blog, Data Science, Statistik

In the last post of this series, we took a first look at strategies for the effective visualization and exploration of data patterns within large data sets. Namely, we examined ways to overcome overplotting, with a focus on a two-dimensional feature space defined by two continuous features. However, oftentimes we want to visualize the distribution of data across several subgroups. …