Last Christmas is one of the most popular Christmas tunes that were, are and will be out there. The song is written by the brilliant musician George Michael and was released in 1984, when at that time, Epic Records quickly wanted to release a Christmas tune. According to Wikipedia, there are rumours going around that George Michael just changed the lyrics of an already composed tune named "last easter" in a more "winterly" manner. However, this has officially never been confirmed by the record company. Nonetheless, "Last Christmas" remains at the top of all christmas pop songs – also at STATWORX, where "Last Christmas" is on heavy rotation during the holiday season!

In a recent meme on the web I saw, that the Google Trends search volume for "last cristmas" beginning to kick in (first week of october), indicating that Christmas (and the voice of George Michael, backed by christmas bells) is knocking at the door. In order to get ready for the "most wonderful time of the year", I decided to build a small neural network in Keras that is able to perform a multistep forecast of the expected "last christmas" search volume this year (that maybe correlates with the number of plays on TV, radio etc.).

The screenshot above shows the normalized search volume (range between 0 and 100) for last christmas for Germany from 2004 to October 2018. In winter 2017 there was the alltime high in search traffic for "Last Christmas", maybe due to the tragic death of Michael on December 25th in 2016.

## Data Preparation

I've downloaded the search volume data from Google Trends as a CSV file and manually formatted the file header (it included a description of the data as well as some blank lines) as well as string values of "<1", which I've replace with numeric zeros. After preparing the file, I imported it into Python using `pandas.read_csv()`

.

Since neural networks work best with scaled data, i.e. data that ranges between a specific lower and upper bound, e.g. 0 and 1 or -1 and 1, I divided the normalized search volume by 100, .

There are many neural network architectures that can be used in order to perform time series forecasting. Since the data is modeled in a simple input-output-style, of course, MLPs can be used. Furthermore, 1-dimensional convolutional networks can be employed. Last but not least, LSTMs are also applicable.

Multistep forecasting can be done in several ways: (1) building a separate forecasting model for each forecast timestep, (2) building a recursive model, that predicts the next value based on previous predictions and (3) building a model that is able to predict multiple values into the future at the same time. Since neural networks can easily handle multiple outputs, I decided to go with neural networks.

Preparing data for multistep forecasting can be a bit cumbersome, especially, when the input data consists on multiple timesteps and variables. In order to prepare my `X`

and `y`

data, I used the following snippet:

```
def prepare_data(target, window_X, window_y):
""" Data preprocessing for multistep forecast """
# Placeholders
X, y = [], []
n = len(target)
# Iterators
start_X = 0
end_X = start_X + window_X
start_y = end_X
end_y = start_y + window_y
# Build tensors
for _ in range(n):
if end_y < n:
X.append(target[start_X:end_X])
y.append(target[start_y:end_y])
start_X += 1
end_X = start_X + window_X
start_y += 1
end_y = start_y + window_y
# Convert to array
X = np.array(X)
y = np.array(y)
# Return
return np.array(X), np.array(y)
```

The function transforms a single vector of values, `target`

, into two 2-dimensional tensors `X`

and `y`

. The function arguments `window_X`

and `window_y`

define the number of input lags per observation and the number of output values (timesteps to be predicted), respectively. Let's take a look at the tensors. `X[:3]`

yields:

```
array([[ 2, 1, 0, 0, 1, 1, 1, 1, 2, 4, 21, 69],
[ 1, 0, 0, 1, 1, 1, 1, 2, 4, 21, 69, 1],
[ 0, 0, 1, 1, 1, 1, 2, 4, 21, 69, 1, 1]])
```

whereas `y[:3]`

yields:

```
array([[1, 1, 1, 1, 0, 0],
[1, 1, 1, 0, 0, 1],
[1, 1, 0, 0, 1, 1]])
```

One can see, that the tensors contain iterating data over the vector `target`

of a fixed length. Specifically, I used `window_x = 12`

and `window_y = 6`

which means, that each row in our input tensor `X`

consists of `window_X = 12`

elements that are used to forecast the next `window_y = 6`

timesteps. Note, that the original time series contains months of data, whereas `X`

is of shape `(160, 12)`

. The reason for this is that values only get appended to the `X`

and `y`

tensors, if the current `end_y`

iterator is smaller than the number of total obervations in the dataset. Otherwise, `y`

would contain `NaN`

values at the end of the series.

```
# Training and test
train = 100
X_train = X[:train]
y_train = y[:train]
X_valid = X[train:]
y_valid = y[train:]
```

For model training, I used the first 100 observations for training purposes and the remaining 60 observations for validation (early stopping). Furthermore, I built a tensor for prediction, `X_test`

that contains the last 12 observations from the time series (November, 1st. 2017 – October, 1st. 2018) in order to compute a prediction for the following 6 months.

## Model Building

The following Python snippet shows the function for fitting three neural network models: a simple MLP, a 1-dimensional CNN and a LSTM (I will not go into detail about how the models exactly work – there are tons of great tutorials on the web). I wrote a single function for all three architectures. Thereby, I exploited a litte trick I recently discovered: if you're experimenting with different network archirectures you always have to make sure, that the input tensors are of the appropriate shape. Since MLPs, CNNs and LSTMs require different dimensions of input tensors, it can be helpful to provide a single input layer that takes the original input tensor and then add a `Reshape()`

layer in order to reshape the input into the required dimensionality of the respective `Dense()`

, `Conv()`

or `LSTM()`

layer. I know, that from a computational point of view, this is not very efficient, however, I do not care in this case 😉

```
def fit_model(type, X_train, y_train, X_test, y_test, batch_size, epochs):
""" Training function for network """
# Model input
model = Sequential()
model.add(InputLayer(input_shape=(X_train.shape[1], )))
if type == 'mlp':
model.add(Reshape(target_shape=(X_train.shape[1], )))
model.add(Dense(units=64, activation='relu'))
if type == 'cnn':
model.add(Reshape(target_shape=(X_train.shape[1], 1)))
model.add(Conv1D(filters=64, kernel_size=4, activation='relu'))
model.add(MaxPool1D())
model.add(Flatten())
if type == 'lstm':
model.add(Reshape(target_shape=(X_train.shape[1], 1)))
model.add(LSTM(units=64, return_sequences=False))
# Output layer
model.add(Dense(units=64, activation='relu'))
model.add(Dense(units=y_train.shape[1], activation='sigmoid'))
# Compile
model.compile(optimizer='adam', loss='mse')
# Callbacks
early_stopping = EarlyStopping(monitor='val_loss', patience=10)
model_checkpoint = ModelCheckpoint(filepath='model.h5', save_best_only=True)
callbacks = [early_stopping, model_checkpoint]
# Fit model
model.fit(x=X_train, y=y_train, validation_data=(X_test, y_test),
batch_size=batch_size, epochs=epochs,
callbacks=callbacks, verbose=2)
# Load best model
model.load_weights('model.h5')
# Return
return model
```

The `InputLayer()`

consumes the prepared input tensor `X_train`

. Depending on the type of model, distinct reshape operations are carried out in order to provide proper tensor dimensions for the model. Thereby, the MLP requires the input tensor to be 2-dimensional whereas the CNN and LSTM models require 3-dimensional input tensors. After the respective model architecture layers, another `Dense()`

layer and the output layer are added. Note, that I've used a sigmoidal activation for the output. This makes sure, that the predicted values of the network range between 0 and 1, which corresponds to the normalized search volume data (another cool trick for predicting normalized outcomes or percentages). Dense activations are set to ReLU, optimization is performed using the ADAM optimizer. The model uses early stopping to stop model training when the validation error does not improve for 10 consecutive epochs. Then, thanks to model checkpointing, the best model is reloaded and returned.

## Prediction

After model training, the predictions for the following 6 months are generated by feeding `X_test`

into the respective model architecture. The following table shows the predicted values:

month | MLP | CNN | LSTM |
---|---|---|---|

2018-11 | 0.165803 | 0.193593 | 0.214891 |

2018-12 | 0.857727 | 0.881791 | 0.817105 |

2019-01 | 0.034604 | 0.034484 | 0.034604 |

2019-02 | 0.007150 | 0.002432 | 0.007150 |

2019-03 | 0.012865 | 0.000508 | 0.012865 |

2019-04 | 0.013644 | 0.000502 | 0.013644 |

Overall, the predictions look quite similar between the models. However, there are certain systematic differences:

- the CNN model predicts the highest search volume in december
- the MLP seems to overestimate the search volume after the holiday season
- the LSTM model shows the highest prediction in November and January but the lowest prediction in December

The following plot shows the search volume as well as the predictions for all three models from April 2016 to April 2019:

## Conclusion and Outlook

Based on the predictions, there is an expected minor decline in search interest. This might be a predictor for lower "Last Christmas" song penetration. However, I did not find any data on radio plays etc. Of course, the model can be improved by incorporating further information into the network. Anyways, I had great fun building the model and working with the search volume data. In the next step, I will check in November, which model performed best when new actual data comes in (beginning of November). Besides this dataset, Google Trends is a great ressource to include external information into your models. For example, in a recent use case at STATWORX, we used Google Trends data to incorporate search interest of specific products into our sales forecasting models. You should give it a try!

If you want to play around with the data or the model, you can find everything on our GitHub repository.

If you have any comments or questions on my blog post, contact me, I will try to answer them. Also, feel free to use my code or share this story with your peers on social platforms of your choice. Follow me on LinkedIn or Twitter, if you want to stay in touch.

Make sure, you frequently check the awesome STATWORX Blog for more interesting data science, ML and AI content straight from the our office in Frankfurt, Germany!

If you’re interested in more quality content like this, join my mailing list, constantly bringing you new data science, machine learning and AI reads and treats from me and my team right into your inbox!

# ABOUT US

STATWORX

is a consulting company for data science, statistics, machine learning and artificial intelligence located in Frankfurt, Zurich and Vienna. Sign up for our NEWSLETTER and receive reads and treats from the world of data science and AI. If you have questions or suggestions, please write us an e-mail addressed to blog(at)statworx.com.