Insights From Our Latest Internal Hackathon

Marlon Schumacher Blog, Data Science

A few weeks ago, several of my colleagues here at STATWORX and I participated in an exciting time series competition, hosted by a large german-based company. This was the opportunity for us to organize an internal hackathon in our office in Frankfurt. A hackathon like this has the purpose of combining knowledge to provide an excellent solution for a competition in a short time.

In this blog article, I will give you an insight into how we plan and execute such hackathons at STATWORX. At first, I’m going to describe the initial situation, the specific problem we had to solve, and who participated in the hackathon. Then I’ll get to the organizational process, which tasks came up and how we distributed them. Of course, the methodological approach will also be discussed in this part. Finally, the results are then roughly discussed, and I’ll show you what the possible development looks like now.

The team gathered in our headquarter in Frankfurt for the hackathon.

Our preparation and first steps

Quite a few of my teammates offered to work together on this competition. Our team consisted of Sebastian (Founder & CEO of STATWORX), Fabian (Head of Data Science), my Data Science colleagues Jakob, Fran, Alexander, and Andreas. We gathered in the office on the weekend, so we could really concentrate on the hackathon. Of course, everyone was roughly informed about what the competition was about. As already mentioned, it was about forecasting the daily sales quantity for the year 2019. Therefore, we got information about 10 different products (A-J). In addition, we received information about promotions until 2019. The data looked like this:

After a short first glimpse, you can see that the time series are very volatile, and a clear trend cannot be identified in every time series. There are also some outliers, with the products I, J, and E particularly stand out.

FYI: The data has been edited a little for this article and does correspond to the original data that we received.

How did we distribute the tasks?

Discussion of data and emerging tasks

First of all, we sat down and discussed the provided data. Therefore, Fabian prepared the first descriptive analyzes. We talked about possible seasonalities, trends, effects of the given promotion information, potential problems, and external features. Starting from the discussion, we identified and distributed the corresponding tasks. Thus, three main topics have emerged: The data preparation had to be completed (e.g., outliers and external features). As a benchmark for our models, simple forecasts needed to be modeled. And of course, our final forecasts had to be modeled.

Data preparation and outliers

Since the explorative analysis and the discussion resulted in potential features, Sebastian and Jakob prepared external features. For this purpose, temperature data were downloaded using an API and then prepared for our dataset. Moreover, event data such as holidays were prepared accordingly, whereby there was also a feature engineering. I examined the time series for outliers and imputed them differently. For this purpose, the R package anomalize was also used, which is very helpful for quick preliminary outlier detection. The imputation was then followed by seasonality and trend. Above all, product I had many extreme outliers. But also with other products, there were some strong outliers (e.g., J and E).

Modeling and evaluation

As a forecast for the full year 2019 should be prepared daily, we have therefore used the year 2018 for the evaluation. However, to be able to judge how well our models perform, we need corresponding benchmarks, as mentioned before. For this purpose, Fran and Andreas created corresponding forecasting models. Among other things, a simple auto Arima and naive forecasts were used, and the results were very different for the various products. As an evaluation metric, the MAPE was used, which also holds specific weaknesses. If you are interested in the MAPE, the accompanying weaknesses, and possible alternatives, you can read the blog post about this exact topic, written by my colleague Jan.

ProductMAPE (auto arima)MAPE (naive Forecast)

Some products have a large MAPE of up to more than 370%. Especially for product E, this can be strongly justified by the time series, in which there was a significant decline over one month in 2018. Sales averaged around 560 from the beginning of August to the middle of September, while the average for 2018 was around 12000. Other products also suffered such break-ins. This naturally leads to high errors. Unfortunately, we did not know whether this was a data error, a production problem, or something similar. Moreover, the product I had huge outliers, which on some days were more than 10 times higher than the average.

What was the outcome?

For our forecasting models, we used different approaches. We mainly used XGBoost, Light GBM, and Deep Learning, which was a 3-layer MLP with dropout and batch-norm. If you want to get started with Deep Learning on your own, you should read this blog post from Sebastian (“Forecasting ‘Last Christmas’ Search Volume on Google Trends using Deep Learning”). Of course, there are other models that we could have tried, even Random Forest can be used for time series forecasts. But we had relatively little information about the products and a limited amount of time. Additionally, all models were at the end a 365-days-ahead forecast, which is generally expected to result in worse results. However, let’s take a look at the results.

ProductLight GBM (MAPE)Light GBM vs Benchmark

Although the MAPE is still high for some products, there is also a significant improvement compared to the benchmarks. The MAPE for product E has been reduced from over 370% to less than 50%. For products I and J, there was also a significant improvement of 20 to 40 percentage points. For other products, however, there was only a slight improvement in a few percentage points.

That’s it? Of course not!

As you surely already expect, there are also a lot of potential improvements. The knowledge of the exact origin process of the data, which is so far unknown to us, gives possible hints on aspects, which could be considered in the modeling. Likewise, the forecasting period itself represents considerable potential for improvement. Of course, a 365-day-ahead forecast can not provide high accuracy. However, rolling quarterly or monthly forecasting can lead to a significant improvement. Similarly, lags could be used for shorter periods, which, unfortunately, could not be used with our models. The list of possible optimizations, like information about competitors, regional data, and so forth, is exceptionally long.

It should be noted that the company didn’t want a 365-day forecast, but a rolling 7-day forecast. Unfortunately, this did not emerge from the communicated task. The results would, of course, have been much better in this way, but they were still very well received. Possibly, our efforts will amount to a project in 2020, which we are really looking forward to.

It was a fun experience, and I hope that I could provide you with an entertaining insight into our hackathon here at STATWORX.

If you’re interested in staying updated with everything that happens at our company, join our mailing list and we’ll keep you posted! If you want to join our team and possibly take part in events like an internal hackathon, check our job offerings on our website. We’re looking forward to your application!

Über den Autor

Marlon Schumacher

I am a data scientist at STATWORX and like working with data. Whether it's visualization or the development of machine learning models, it's always interesting.


is a consulting company for data science, statistics, machine learning and artificial intelligence located in Frankfurt, Zurich and Vienna. Sign up for our NEWSLETTER and receive reads and treats from the world of data science and AI. If you have questions or suggestions, please write us an e-mail addressed to blog(at)