Binaries and Colors

Learning Images with Keras

Lukas Strömsdörfer Blog, Data Science


Teaching machines to handle image data is probably one of the most exciting tasks in our daily routine at STATWORX. Computer vision in general is a path to many possibilities some would consider intruiging. Besides learning images, computer vision algorithms also enable machines to learn any kind of video sequenced data. With autonomous driving on the line, learning images and videos is probably one of hottest topics right.

learning images - so hot right now

In this post, I will show you how to get started with learning image data. I will make use of Keras, a high level API for Tensorflow, CTNK, and Theano. Keras is implemented in Python and in R. For this post I will work through the Python implementation.


I am using Python 3.6.5. and Keras is running with a Tensorflow backend. The dataset I will be using is the Airbus Ship Detection dataset from their recent Kaggle Competition. To get started, we will be building a very simple network, a so called autoencoder network. Autoencoders are simple networks in the sense that they are not aiming to predict any target. They rather aim to learn and reconstruct images or data in general. In his blog post Venelin Valkov shows the following figure, I think is quite cool:

mushroom encoder

The figure perfectly describes the intension of an autoencoder. The algorithm takes an input, compresses, and then tries to reconstruct it. Why would we do this? Well, autoencoders have numerous interesting applications. First, they are reasonably good in detecting outliers. The idea is, you teach a machine to reconstruct non-outliers. Thus, when confronted with an outlier, the algorithm will probably have a hard time reconstructing that very observation. Second, autoencoders are fairly interesting to look at, when you are looking to reduce the dimensionality of your data. Speaking about images, you can think of it as a complexity reduction for the images. An algorithm is unlikely to reconstruct nuances of the image that are rather irrelevant to the content. Image recognition or classification algorithms are prone to overreact to certain nuances of images, so denoising them, might ease the learning procedure. Thus, autoencoders can serve as a powerful preprocessing tool to denoising your data.

Data Preparation

Prepararing your data is one of the most important tasks when training algorithms. Even more so, when you are handling image data. Images typically require large amounts of storage, especially since computer vision algorithms usually need to be fed with a considerable amount of data. To encompass this issue my colleauges and I typically make use of either large on-premise servers or cloud instances.

For this blog post however, I am choosing to run everything on my local machine. Why? Well, if you are reading this and you are interested in taking your first steps in developing your own code to handle image data, I would probably bother you with details of setting up cloud instances. If you are reading this and you are already experienced in working with this kind of problems, you will most likely work with cloud instances and you will be bothered by my description as well. So, for this little experiment I am running everything on my local machine and I organized the data as follows:

    | train
	| train_image_01
	| train_image_02
	| ...
    | test
	| test_image_01
	| ...

To read in the data, I am simply looping over the images. I am using the OpenCV implementation cv2 and the Keras preprocessing tools. I know, I know Keras has this genious ImageDataGenerator modul, however I think it is kind of important to understand the required input, so for this post I will make use of the OpenCV tools. The preprocessing is a little different than with other data. While we see something similar to this:

training images

A machine however, does not see images, but rather data. Each image is representated by a matrix of pixel values. Thus each picture is a data matrix. Unlike with other problems where all data is compressed in one matrix, we need to consider this complex setup. To deal with this issue, we can use the ndarray data type. Implemented in the numpy ecosystem, ndarrays provided a handy data type for multidimensional data. Thus, we convert all our images to numpy arrays and pack them together in an ndarraydata format.

# import libs
import os
import pandas as pd
import numpy as np
import cv2 
import random
from keras.preprocessing.image import ImageDataGenerator, img_to_array

# set the path to the images
train_path = "00_data/train"
test_path = "00_data/test"

# load train images
train_images = os.listdir(f'{train_path}')

# load test images
test_images = os.listdir(f'{test_path}')

# load a couple of random pics
train_images_first_run = train_images[0:10000]
test_images_first_run = test_images[0:1000]

# open up container for training_data
train_data = []
test_data = []

# loop over training images
for imgs in train_images_first_run:
    # load the image and resize it
    img = cv2.imread(f'{train_path}/{imgs}')
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, (128, 128))
    img = img_to_array(img)

# loop over testing images
for imgs in test_images_first_run:
    # load the images and resize it
    img = cv2.imread(f'{test_path}/{imgs}')
    img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    img = cv2.resize(img, (128, 128))
    img = img_to_array(img)

# convert data to np array
train_data = np.array(train_data, dtype = "float")
test_data = np.array(test_data, dtype = "float")
# reshape the data
x_train = train_data.astype('float32') / train_data.max()
x_test = test_data.astype('float32') / test_data.max()
x_train = np.reshape(x_train, (len(x_train), 128, 128, 1)) 
x_test = np.reshape(x_test, (len(x_test), 128, 128, 1)) 

We use the cv2 function cvtColor to change the color palette to a rather easy to interpret gray-scale. Next, we resize the input to 128 x 128. In addition, we are converting the image to an numpy array. Afterwards we stack all the arrays together. At last, we rescale the input data between 0 and 1. So let's check out what the data looks like right now.

preprep images

Algorithm Design

The architecture of my autoencoder is somehwat arbitrary I have to confess. To equip my network with some computer vision features, I am adding convolutional layers. Convolutional layers are the essence of Convolutional Neural Networks (CNN). I won't be going into detail, cause I could probably bore you with 20 pages about CNNs and still, I would barely cover the basics. Thus, I am just assuming you kind of know what's going on.

As I said, we are setting up a convolutional autoencoder. It sounds quite fancy, though Keras is making it ridiculously simple. A little disclaimer, I am quite aware that there are many other ways to setup the code and so the code above might offend you. Though, I checked the Keras documentation and tried to align my code with the documentation. So if you are offended by my coding, don't judge me… or at least not too much.

# import libraries
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
import matplotlib.pyplot as plt
from keras.models import load_model

# define input shape
input_img = Input(shape=(128, 128, 1))

# encoding dimension
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)

# decoding dimension
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((4, 4))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)

# build model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')

As I said before, the design is somewhat arbitrary, however those of you who are working with these kind of networks probably know, the preliminary architecture quite often is somewhat arbitrary. Let us go through the code above and again, I know there are a million ways to setup my model. At first of course I am importing all the modules I need. Then I am defining the input shape. We reshaped the images to 128 x 128 and we gray-scaled all the images, thus the third dimension is of value 1. Second, I am defining the encoding layers, so the first part of the autoencoder network. I am using three convolutional layers to compress the input. The decoding dimension is build using three convolutional layers as well. I am using relufor an activation function and sigmoidfor the last layer. Once I set up the layers, I am just stacking them all together with the Keras Model function. I am using adadelta as an optimizer and the binary crossentropy as the loss function. So let's have a look at our model's architecture the keras way:


Layer (type)                 Output Shape              Param #
input_1 (InputLayer)         (None, 128, 128, 1)       0
conv2d_1 (Conv2D)            (None, 128, 128, 16)      160
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 16)        0
conv2d_2 (Conv2D)            (None, 64, 64, 8)         1160
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 8)         0
conv2d_3 (Conv2D)            (None, 32, 32, 8)         584
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 8)         0
conv2d_4 (Conv2D)            (None, 16, 16, 8)         584
up_sampling2d_1 (UpSampling2 (None, 32, 32, 8)         0
conv2d_5 (Conv2D)            (None, 32, 32, 8)         584
up_sampling2d_2 (UpSampling2 (None, 128, 128, 8)       0
conv2d_6 (Conv2D)            (None, 128, 128, 1)       73
Total params: 3,145
Trainable params: 3,145
Non-trainable params: 0


To run the model we make use of the fit() method for objects. To fit the model we just need to specify the batch size and the number of epochs. Since I am running this on my machine I am choosing way to large a batch size and way to small a epoch number., x_train,
                validation_data=(x_test, x_test))

Our autoencoder is now trained and evaluated on the testing data. As a default, Keras provides extremely nice progress bars for each epoch. To evaluate the results I am not going to bother you with a lot of metrics, instead let's check the input images and the reconstructed ones. To do so, we can quickly loop over some test images and some reconstructed images. First, we need to predict the reconstructed ones, once again Keras is incredibly handy.

decoded_imgs = autoencoder.predict(x_test)

The prediction is stored in a numpy ndarray and has the exact same structure as our prepped data. Now, let's take a look at our reconstructed images:

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # display original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(x_test[i + 100].reshape(128, 128))

    # display reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(decoded_imgs[i + 100].reshape(128, 128))

result images

Well, well, well.. isn't that impressive. Of course this is not quite the result, we are looking for. Thus, we need to keep on improving the model. The first steps to take are quite obvious: smaller batch size, more epochs, more images, and of course a lot of iterations to adjust the architecture of the model.

This is quite a nice finding though. You see, the code above is incredibly simply. Thanks to implementations such as Keras it is becoming increasingly simple to build the most complex algorithms. However, the design is still very complicated and requires a lot of time and experience.

Über den Autor
Lukas Strömsdörfer

Lukas Strömsdörfer

Lukas ist im Data Science Team und promoviert gerade extern an der Uni Göttingen. In seiner Freizeit fährt er leidenschaftlich gerne Fahrrad und schaut Serien.