Image classification is one of the main application of convolutional neural networks, if you are interested in the topic like me, you may have consulted many theoretical resources and many tutorials, and this may be cumbersome as there’s a lot of information and concepts to digest. As a curious developer you may want to put your hands into code as soon as possible, if this is your case you’re in the right place.
In the following post, I’m going to train a convolutional neural network to classify brain tumors images, and going into a short brief of some key concepts.
TL;DR If you’re eager to check the code this is the notebook.
Prerequisites
Before starting, be sure you have trained machine learning models before, and that you are proficient in Python. We are going to use the following tools:
Dataset preprocessing
You may know right now that one of the most important steps to work in classification tasks, no matter the technique (common statistical approaches, machine learning, neural networks) is to gather enough quality data. Luckily, there’s a lot of open source datasets and Kaggle, hosts one with classified brain tumor images.
The dataset has four classes distributed as:
- Glioma tumor: 901 samples.
- Pituitary tumor: 844 samples.
- Meningioma tumor: 913 samples.
- Normal (no tumor): 438 samples.
You can download the dataset into your work environment using the following Kaggle api command, but first you need to obtain a Kaggle key.
!kaggle datasets download -d thomasdubail/brain-tumors-256x256
In a previous post, I stated the necessity to have three separated datasets: training, validation and testing, so the first step will be to build these three datasets. I divided the whole dataset in these percentages:
- Training: 66%
- Validation: 14%
- Testing: 20%
I separated the images into folders following the previous distribution. Then, you have to transform the images into a representation that can be understood by Tensorflow, which are tensors. We can easily do this with some Keras magic. The function image_dataset_from_directory reads images from a directory and returns a dataset, the name of the folder will be interpreted as the class for the read samples, so, after moving the images to the proper folders, my directory structure looks like this:
With the following code I’m passing the root folder for each dataset to the function, in which case, the train_dataset will be composed of the 66% of images and each sample will have the class name corresponding to the directory that contains it (meningioma_tumor, pituitary_tumor,
glioma_tumor, normal).
# Use the image_dataset_from_directory to create the 3 datasets
from tensorflow.keras.utils import image_dataset_from_directory
train_dataset = image_dataset_from_directory(
new_base_dir / "train",
image_size=(180, 180),
batch_size=32)
validation_dataset = image_dataset_from_directory(
new_base_dir / "validation",
image_size=(180, 180),
batch_size=32)
test_dataset = image_dataset_from_directory(
new_base_dir / "test",
image_size=(180, 180),
batch_size=32)
Building a baseline model
A basic step when training a model, is to first build a base line, something not too fancy but capable of complete the work, which is a model that learn how to classify the samples, no matter if it overfits. This first step let you know if the problem is solvable.
# Create an initial model
from tensorflow import keras
from tensorflow.keras import layers
inputs = keras.Input(shape=(180, 180, 3))
x = layers.Rescaling(1./255)(inputs)
x = layers.Conv2D(filters=32, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=64, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=128, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.MaxPooling2D(pool_size=2)(x)
x = layers.Conv2D(filters=256, kernel_size=3, activation="relu")(x)
x = layers.Flatten()(x)
outputs = layers.Dense(4, activation="softmax")(x)
base_model = keras.Model(inputs=inputs, outputs=outputs)
A Convolutional Neural Network (in a not formal description), is a stack of convolutional and max pooling layers. The convolutional layers learn patterns over the images on an incremental way, which is, the first layers learn simple patterns like borders or colors, and superior layers learn higher value patterns as tires, ears, eyes, etc. The max pooling layers intensify the patterns so the network focus its knowledge on these and discard the noise. As a rule of thumb, in a multi-class problem, the activation function on the last layers must be softmax. Now let’s train the baseline model.
# Configure the callbacks to:
# - Save the best model
# - Save the loss values
callbacks = [
keras.callbacks.EarlyStopping(
monitor="val_accuracy",
patience=10,
),
keras.callbacks.ModelCheckpoint(
filepath="base_model",
save_best_only=True,
monitor="val_loss")
]
history = base_model.fit(
train_dataset,
epochs=100,
validation_data=validation_dataset,
callbacks=callbacks)
I’m using the keras.callbacks.ModelCheckpoint to save the best model by monitoring the validation loss, which is how far a prediction is from the correct value for the sample in the validation dataset. Also I’m using the keras.callbacks.EarlyStopping callback to stop the training once the model has reached 10 epochs (check the patience parameter) without improving the accuracy, which is, the model has stopped learning.
It’s a good idea to plot the accuracy and the loss value on training and validation datasets, since it give us the information about the model overfitting behavior and the capacity of the CNN to solve the problem, let’s check the graphics:
We can see the accuracy on training went up almost to a 100%, which means our model is capable to solve the problem. But on the other hand the accuracy for the validation dataset didn’t surpass the 70%, also, the training and validation loss curves gradually separate, whilst the training loss decreases, the validation loss increases. This is the common symptom of overfitting. Our model will not generalize, and in a real scenario will not be accurate classifying new tumors images. Let’s check the accuracy of our model on the test dataset.
We achieved a 75%. But hey, cheers up, this is not bad news, we complete the first task, now that we know we the problem is solvable, we need to improve our model, we need to beat the baseline model.
Beating the baseline model
A great approach to improve a model is to obtain more data, take this approach whenever is feasible. Nevertheless, this is a limitation in some cases, and in this problem, we are not able to easily get more brain tumors images. Fortunately we have some other tools to improve our model: data augmentation and transfer learning.
Adding data augmentation
We can’t get new brain tumors images, but we can synthetically create new ones from the existing samples, that’s what data augmentation means, to take the existing samples and modify them to feed our model. Again we’ll use the tools provided by Keras, some preprocessing layers to make slight changes on the images by transformations as: zooming, rotation, contrast change or cropping i.a. This new “modified” images will be new samples for our model. You can check all the data augmentation possibilities here.
The following code shows a sample of images preprocessing using the Keras layers. First let’s download an image.
import tensorflow as tf
from tensorflow.keras import layers
import urllib
import PIL
img_url = 'https://upload.wikimedia.org/wikipedia/commons/1/11/Iron_Maiden_in_Bercy_4.jpg'
urllib.request.urlretrieve(img_url, "sample.png")
img = PIL.Image.open("sample.png")
Now to process the image, we need to convert it into a Tensor, let’s do this step first and print the Tensor representation:
import matplotlib.pyplot as plt
image = tf.keras.utils.load_img("sample.png")
input_arr = tf.keras.utils.img_to_array(img)
img_tensor = tf.convert_to_tensor(input_arr)
_ = plt.imshow(img_tensor.numpy().astype('uint8'))
Now we are going to create a stack of layers to preprocess the image, this can be easily done using the Sequential API, then we pass the image through the layers and print the result.
from tensorflow import keras
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.4),
layers.RandomZoom(0.2),
]
)
result = data_augmentation(img_tensor)
_ = plt.imshow(result.numpy().astype('uint8'))
Transfer Learning
Another great technique to improve your model is the use of transfer learning. A cool feature of deep learning is the ability to generalize and to learn high hierarchy patterns in larger datasets, that may be used in another problem (with a different dataset!), i.e., in an image classification problem, a model trained with a big enough dataset, may have learned visual patterns that are generic and therefore, portable. Fortunately, Keras has a set of pretrained models on thousands of images, you can check the catalog here.
I used the VGG16 pretrained model from Keras, this model has learned patterns about images, but it’s not trained to classify tumors images, thats why we reuse a subset of the layers, specifically the ones before the classifying layers, called the convolutional base (a.k.a conv base), thus, we need to train the final layers, that is, the classifier. Let’s see the code:
# Re-import the conv base for experimenting with different frozen layers
conv_base = keras.applications.vgg16.VGG16(
weights="imagenet",
include_top=False)
conv_base.trainable = False
The include_top = false indicates Keras to load the model without the final classification layer (only the conv base), the conv.base.trainable = False indicates Keras not to update the weights of the conv base. If we skip this parameter we’ll end losing the patterns learned by VGG16, because it will be retrained with the new set of tumor images.
Putting all together
Now we can configure a new model to beat the baseline using what we have explored until know: a pretrained conv base with learned visual patterns over a greater dataset, a data augmentation approach to extend the tumor images dataset and a final classifier layer:
from tensorflow.keras import layers
data_augmentation = keras.Sequential(
[
layers.RandomFlip("horizontal"),
layers.RandomRotation(0.1),
layers.RandomZoom(0.2),
layers.RandomContrast(0.5)
]
)
inputs = keras.Input(shape=(180, 180, 3))
x = data_augmentation(inputs)
x = keras.applications.vgg16.preprocess_input(x)
x = conv_base(x)
x = layers.Flatten()(x)
x = layers.Dense(256)(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(4, activation="softmax")(x)
model_with_conv_base = keras.Model(inputs, outputs)
model_with_conv_base.compile(loss="sparse_categorical_crossentropy",
optimizer="rmsprop",
metrics=["accuracy"])
The preprocess_input call is necessary to formatting the inputs in the shape expected by the VGG16, also there’s a Dropout layer, which is a regularization technique. The dropout layer randomly inhibits the output of some neurons to prevent some “conspiracy” inside the network. By “dropping out” the output value of neurons, the next layers must adapt their weights to handle by themself the representation (without the help of the dropped neurons). This regularization technique is used to reduce the overfitting. Let’s check the result of the full approach.
With this new model, the validation accuracy surpassed the 70% and the curves of loss values decreased together. This means we’re controlling the overfitting, finally lets see the accuracy on the test dataset:
We improved the base model passing from a 75% to an 83% of accuracy on testing.
Further steps for your experimentation
Finally, I encourage you to improve the model by exploring other approaches as:
- Create your own network topology combining Convolutional and MaxPooling layers, change the number of filters in the convolutional layers and track the results.
- Add and test other combinations of augmentation layers for the data augmentation approach.
- Experiment with the transfer learning using other pretrained models from Keras.
Another technique you may try is the fine tuning. In the transfer learning section I stated you should “freeze” the conv base layers to avoid updating the weights and losing the previous training effort, but is not necessary to freeze all the layers. Fine tuning propose to unfroze some of the final layers in the conv base, so it learns new patterns without losing the generalization gained beforehand. Here is the documentation for the approach. I recommend to read the 8th chapter of Deep Learning With Python by Francois Cholet, where I learned all these approaches. Happy coding!