Binaries and Colors

Learning Images with Keras

Lukas Strömsdörfer Blog, Data Science

Introduction Teaching machines to handle image data is probably one of the most exciting tasks in our daily routine at STATWORX. Computer vision in general is a path to many possibilities some would consider intruiging. Besides learning images, computer vision algorithms also enable machines to learn any kind of video sequenced data. With autonomous driving on the line, learning images …

code-r-title

CodeR: an LSTM that writes R Code

Tobias Krabel Blog, Data Science

Everybody talks about them, many people know how to use them, few people understand them: Long Short-Term Memory Neural Networks (LSTM). At STATWORX, with the beginning of the hype around AI and projects with large amounts of data, we also started using this powerful tool to solve business problems. In short, an LSTM is a special type of recurrent neural …

scikit learn title

Data Science in Python – Der Einstieg in Machine Learning mit Scikit-Learn

Moritz Gnisia Blog, Data Science

In unseren bisherigen Artikeln zu Data Science in Python haben wir uns mit der grundlegenden Syntax, Datenstrukturen, Arrays, der Datenvisualisierung und Manipulation/Selektion auseinander gesetzt. Was jetzt noch für den Einstieg fehlt, ist die Möglichkeit Modelle auf die Daten anzuwenden, um so zum einen Muster in diese zu erkennen und zum anderen Prädiktionen abzuleiten. Die Vielfalt an implementierten Modellen in Python …

themes title

Using themes in ggplot2

Lea Waniek Blog

As noted elsewhere, sometimes beauty matters. A plot that’s pleasing to the eye will be considered more gladly, and thus might be understood more thoroughly. Also, since we at STATWORX oftentimes need to subsume and communicate our results, we have come to appreciate how a nice plot can upgrade any presentation. So how make a plot look good? How make …

XY Titel

Benchmarking Feature Selection Algorithms with Xy()

André Bleier Blog, Data Science

Feature Selection Feature Selection is one of the most interesting fields in machine learning in my opinion. It is a boundary point of two different perspectives on machine learning – performance and inference. From a performance point of view, feature selection is typically used to increase the model performance or to reduce the complexity of the problem in order to …

generalized random forest

Using Machine Learning for Causal Inference

Markus Berroth Blog, Data Science

Machine Learning (ML) is still an underdog in the field of economics. However, it gets more and more recognition in the recent years. One reason for being an underdog is, that in economics and other social sciences one is not only interested in predicting but also in making causal inference. Thus many "off-the-shelf" ML algorithms are solving a fundamentally different …

airflow title

A framework to automate your work: How to set up Airflow!

Marvin Taschenberger Blog, Data Science

In the first part of this blog post, we talked about what a DAG is, how to apply this mathematical concept in project planning and programming and why we at STATWORX decided to use Airflow compared to other workflow managers. In this part, however, we will get more technical and investigate a quite informative hello-world programming and how to set …

XY Titel

Pushing Ordinary Least Squares to the limit with Xy()

André Bleier Blog, Data Science

Introduction to Xy() Simulation is mostly about answering particular research questions. Whenever the word simulation appears somewhere in a discussion, everyone knows that this means additional effort. At STATWORX we are using simulations as a first step to proof concepts we are developing. Sometimes such a simulation is simple, in other cases a simulation is plenty of work. Though, research …