en
                    array(1) {
  ["en"]=>
  array(13) {
    ["code"]=>
    string(2) "en"
    ["id"]=>
    string(1) "1"
    ["native_name"]=>
    string(7) "English"
    ["major"]=>
    string(1) "1"
    ["active"]=>
    string(1) "1"
    ["default_locale"]=>
    string(5) "en_US"
    ["encode_url"]=>
    string(1) "0"
    ["tag"]=>
    string(2) "en"
    ["missing"]=>
    int(0)
    ["translated_name"]=>
    string(7) "English"
    ["url"]=>
    string(96) "https://www.statworx.com/en/content-hub/blog/revisited-forecasting-last-christmas-search-volume/"
    ["country_flag_url"]=>
    string(87) "https://www.statworx.com/wp-content/plugins/sitepress-multilingual-cms/res/flags/en.png"
    ["language_code"]=>
    string(2) "en"
  }
}
                    
Contact
Content Hub
Blog Post

Revisited: Forecasting Last Christmas Search Volume

  • Expert Sebastian Heinz
  • Date 28. June 2019
  • Topic CodingR
  • Format Blog
  • Category Technology
Revisited: Forecasting Last Christmas Search Volume

Introduction

It is June and nearly half of the year is over, marking the middle between Christmas 2018 and 2019. Last year in autumn, I’ve published a blog post about predicting Wham’s “Last Christmas” search volume using Google Trends data with different types of neural network architectures. Of course, now I want to know how good the predictions were, compared to the actual search volumes.

The following table shows the predicted values by the different network architectures, the true search volume data in the relevant time region from November 2018 until January 2019, as well as the relative prediction error in brackets:

month MLP CNN LSTM actual
2018-11 0.166 (0.21) 0.194 (0.078) 0.215 (0.023) 0.21
2018-12 0.858 (0.057) 0.882 (0.031) 0.817 (0.102) 0.91
2019-01 0.035 (0.153) 0.034 (0.149) 0.035 (0.153) 0.03

Which approach won?

There’s no clear winner in this game. For the month of November, the LSTM model performs best with a relative error of only 2.3%. However, in the “main” month of December, the LSTM drops in accuracy in favor of the 1-dimensional CNN with 3.1% error and the MLP with 5.7% error. Compared to November and December, January exhibits higher prediction errors >10% regardless of the architecture.

To bring a little more data science flavor into this post, I’ve created a short R script that presents the results in a cool “heatmap” style.

library(dplyr)
library(ggplot2)

# Define data frame for plotting
df_plot <- data.frame(MONTH=rep(c("2018-11", "2018-12", "2019-01"), 3),
                      MODEL = c(rep("MLP", 3), rep("CNN", 3), rep("LSTM", 3)),
                      PREDICTION = c(0.166, 0.858, 0.035, 0.194, 0.882, 0.034, 0.215, 0.817, 0.035),
                      ACTUAL = rep(c(0.21, 0.91, 0.03), 3))

# Do plot
df_plot %>%
  mutate(MAPE = round(abs(ACTUAL - PREDICTION) / ACTUAL, 3)) %>%
  ggplot(data = ., aes(x = MONTH, y = MODEL)) +
  geom_tile(aes(fill = MAPE)) +
  scale_fill_gradientn(colors = c('navyblue', 'darkmagenta', 'darkorange1')) +
  geom_text(aes(label = MAPE), color = "white") +
  theme_minimal()
prediction heat map

This year, I will (of course) redo the experiment using the newly acquired data. I am curious to find out if the prediction improves. In the meantime, you can sign up to our mailing list, bringing you the best data science, machine learning and AI reads and treats directly into your mailbox!

Sebastian Heinz Sebastian Heinz

Learn more!

As one of the leading companies in the field of data science, machine learning, and AI, we guide you towards a data-driven future. Learn more about statworx and our motivation.
About us