Introduction
Teaching machines to handle image data is probably one of the most exciting tasks in our daily routine at STATWORX. Computer vision in general is a path to many possibilities some would consider intruiging. Besides learning images, computer vision algorithms also enable machines to learn any kind of video sequenced data. With autonomous driving on the line, learning images and videos is probably one of hottest topics right.
In this post, I will show you how to get started with learning image data. I will make use of Keras, a high-level API for Tensorflow, CTNK, and Theano. Keras is implemented in Python and in R. For this post I will work through the Python implementation.
Setup
I am using Python 3.6.5. and Keras is running with a Tensorflow backend. The dataset I will be using is the Airbus Ship Detection dataset from their recent Kaggle Competition. To get started, we will be building a very simple network, a so called autoencoder network. Autoencoders are simple networks in the sense that they are not aiming to predict any target. They rather aim to learn and reconstruct images or data in general. In his blog post, Venelin Valkov shows the following figure, I think is quite cool:
The figure perfectly describes the intention of an autoencoder. The algorithm takes an input, compresses it, and then tries to reconstruct it. Why would we do this? Well, autoencoders have numerous interesting applications. First, they are reasonably good at detecting outliers. The idea is, you teach a machine to reconstruct non-outliers. Thus, when confronted with an outlier, the algorithm will probably have a hard time reconstructing that very observation. Second, autoencoders are fairly interesting to look at, when you are looking to reduce the dimensionality of your data. Speaking about images, you can think of it as a complexity reduction for the images. An algorithm is unlikely to reconstruct nuances of the image that are rather irrelevant to the content. Image recognition or classification algorithms are prone to overreact to certain nuances of images, so denoising them, might ease the learning procedure. Thus, autoencoders can serve as a powerful preprocessing tool for denoising your data.
Data Preparation
Preparing your data is one of the most important tasks when training algorithms. Even more so, when you are handling image data. Images typically require large amounts of storage, especially since computer vision algorithms usually need to be fed with a considerable amount of data. To encompass this issue my colleagues and I typically make use of either large on-premise servers or cloud instances.
For this blog post, however, I am choosing to run everything on my local machine. Why? Well, if you are reading this and you are interested in taking your first steps in developing your own code to handle image data, I would probably bother you with details of setting up cloud instances. If you are reading this and you are already experienced in working with this kind of problem, you will most likely work with cloud instances and you will be bothered by my description as well. So, for this little experiment, I am running everything on my local machine and I organized the data as follows:
00_data
|
| train
| train_image_01
| train_image_02
| ...
| test
| test_image_01
| ...
To read in the data, I am simply looping over the images. I am using the OpenCV implementation cv2
and the Keras preprocessing tools. I know, I know Keras has this genius ImageDataGenerator
module, however, I think it is kind of important to understand the required input, so for this post, I will make use of the OpenCV tools. The preprocessing is a little different than with other data. While we see something similar to this:
A machine, however, does not see images, but rather data. Each image is represented by a matrix of pixel values. Thus each picture is a data matrix. Unlike with other problems where all data is compressed in one matrix, we need to consider this complex setup. To deal with this issue, we can use the ndarray
data type. Implemented in the numpy
ecosystem, ndarray
s provided a handy data type for multidimensional data. Thus, we convert all our images to numpy
array
s and pack them together in a ndarray
data format.
# import libs
import os
import pandas as pd
import numpy as np
import cv2
import random
from keras.preprocessing.image import ImageDataGenerator, img_to_array
# set the path to the images
train_path = "00_data/train"
test_path = "00_data/test"
# load train images
train_images = os.listdir(f'{train_path}')
# load test images
test_images = os.listdir(f'{test_path}')
# load a couple of random pics
train_images_first_run = train_images[0:10000]
test_images_first_run = test_images[0:1000]
# open up container for training_data
train_data = []
test_data = []
# loop over training images
for imgs in train_images_first_run:
# load the image and resize it
img = cv2.imread(f'{train_path}/{imgs}')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (128, 128))
img = img_to_array(img)
train_data.append(img)
# loop over testing images
for imgs in test_images_first_run:
# load the images and resize it
img = cv2.imread(f'{test_path}/{imgs}')
img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img = cv2.resize(img, (128, 128))
img = img_to_array(img)
test_data.append(img)
# convert data to np array
train_data = np.array(train_data, dtype = "float")
test_data = np.array(test_data, dtype = "float")
# reshape the data
x_train = train_data.astype('float32') / train_data.max()
x_test = test_data.astype('float32') / test_data.max()
x_train = np.reshape(x_train, (len(x_train), 128, 128, 1))
x_test = np.reshape(x_test, (len(x_test), 128, 128, 1))
We use the cv2
function cvtColor
to change the color palette to a rather easy to interpret gray scale. Next, we resize the input to 128 x 128. In addition, we are converting the image to an numpy
array
. Afterward we stack all the array
s together. At last, we rescale the input data between 0 and 1. So let’s check out what the data looks like right now.
Algorithm Design
The architecture of my autoencoder is somewhat arbitrary I have to confess. To equip my network with some computer vision features, I am adding convolutional layers. Convolutional layers are the essence of Convolutional Neural Networks (CNN). I won’t be going into detail, cause I could probably bore you with 20 pages about CNNs, and still, I would barely cover the basics. Thus, I am just assuming you kind of know what’s going on.
As I said, we are setting up a convolutional autoencoder. It sounds quite fancy, though Keras is making it ridiculously simple. A little disclaimer, I am quite aware that there are many other ways to set up the code and so the code above might offend you. Though, I checked the Keras documentation and tried to align my code with the documentation. So if you are offended by my coding, don’t judge me… or at least not too much.
# import libraries
from keras.layers import Input, Dense, Conv2D, MaxPooling2D, UpSampling2D
from keras.models import Model
import matplotlib.pyplot as plt
from keras.models import load_model
# define input shape
input_img = Input(shape=(128, 128, 1))
# encoding dimension
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = MaxPooling2D((2, 2), padding='same')(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
encoded = MaxPooling2D((2, 2), padding='same')(x)
# decoding dimension
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)
x = UpSampling2D((2, 2))(x)
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)
x = UpSampling2D((4, 4))(x)
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)
# build model
autoencoder = Model(input_img, decoded)
autoencoder.compile(optimizer='adadelta', loss='binary_crossentropy')
As I said before, the design is somewhat arbitrary, however, those of you who are working with these kinds of networks probably know, the preliminary architecture quite often is somewhat arbitrary. Let us go through the code above and again, I know there are a million ways to set up my model. At first of course I am importing all the modules I need. Then I am defining the input shape. We reshaped the images to 128 x 128 and we gray-scaled all the images, thus the third dimension is of value 1. Second, I am defining the encoding layers, so the first part of the autoencoder network. I am using three convolutional layers to compress the input. The decoding dimension is built using three convolutional layers as well. I am using relu
for an activation function and sigmoid
for the last layer. Once I set up the layers, I am just stacking them all together with the Keras Model
function. I am using adadelta
as an optimizer and the binary cross-entropy as the loss function. So let’s have a look at our model’s architecture the keras way:
>>>autoencoder.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 128, 128, 1) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 128, 128, 16) 160
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 64, 64, 16) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 64, 64, 8) 1160
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 32, 32, 8) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 32, 32, 8) 584
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 16, 16, 8) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 16, 16, 8) 584
_________________________________________________________________
up_sampling2d_1 (UpSampling2 (None, 32, 32, 8) 0
_________________________________________________________________
conv2d_5 (Conv2D) (None, 32, 32, 8) 584
_________________________________________________________________
up_sampling2d_2 (UpSampling2 (None, 128, 128, 8) 0
_________________________________________________________________
conv2d_6 (Conv2D) (None, 128, 128, 1) 73
=================================================================
Total params: 3,145
Trainable params: 3,145
Non-trainable params: 0
_________________________________________________________________
Results
To run the model we make use of the fit()
method for keras.engine.training.Model
objects. To fit the model we just need to specify the batch size and the number of epochs. Since I am running this on my machine I am choosing way too large batch size and way too small an epoch number.
autoencoder.fit(x_train, x_train,
epochs=100,
batch_size=256,
shuffle=True,
validation_data=(x_test, x_test))
Our autoencoder is now trained and evaluated on the testing data. As a default, Keras provides extremely nice progress bars for each epoch. To evaluate the results I am not going to bother you with a lot of metrics, instead let’s check the input images and the reconstructed ones. To do so, we can quickly loop over some test images and some reconstructed images. First, we need to predict the reconstructed ones, once again Keras is incredibly handy.
decoded_imgs = autoencoder.predict(x_test)
The prediction is stored in a numpy
ndarray
and has the exact same structure as our prepped data. Now, let’s take a look at our reconstructed images:
n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
# display original
ax = plt.subplot(2, n, i + 1)
plt.imshow(x_test[i + 100].reshape(128, 128))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
# display reconstruction
ax = plt.subplot(2, n, i + 1 + n)
plt.imshow(decoded_imgs[i + 100].reshape(128, 128))
ax.get_xaxis().set_visible(False)
ax.get_yaxis().set_visible(False)
plt.show()
Well, well, well.. isn’t that impressive. Of course, this is not quite the result, we are looking for. Thus, we need to keep on improving the model. The first steps to take are quite obvious: smaller batch size, more epochs, more images, and of course a lot of iterations to adjust the architecture of the model.
This is quite a nice finding though. You see, the code above is incredibly simple. Thanks to implementations such as Keras it is becoming increasingly simple to build the most complex algorithms. However, the design is still very complicated and requires a lot of time and experience.