worksheet08_solutions
COMP90051 Workshop 8¶
Convolutional Neural Networks¶
In this worksheet, we’ll implement a convolutional neural network (CNN) in Keras—a high-level API for deep learning.
Since this is our first time using Keras, we’ll start by implementing logistic regression—a familiar model from workshop 4.
We’ll then extend logistic regression to build a CNN by adding 2D convolutional and max-pooling layers.
By the end of this worksheet, you should be able to:
define and fit models in Keras
explain the architecture of a basic CNN
Let’s begin by importing the required packages.
In [3]:
from datetime import datetime
from packaging import version
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import numpy as np
import matplotlib.pyplot as plt
print(“TensorFlow version: “, tf.__version__)
assert version.parse(tf.__version__).release[0] >= 2, \
“This notebook requires TensorFlow 2.0 or above.”
TensorFlow version: 2.6.0
1. MNIST dataset¶
MNIST is a multi-class classification data set where:
the features are images of handwritten digits (28×28 pixels with a single 8-bit channel)
the target is a label in the set $\{0, 1, \ldots, 9\}$
The data is already split into training and test sets. The training set contains 60,000 instances and the test set contains 10,000 instances.
Below we load the data into NumPy arrays using a built-in function from Keras.
Question: How are the arrays structured? Which index is used to access individual instances? What is the type of the arrays?
Hint: use array.dtype to check.
Answer: The 0th axis indexes instances, the 1st axis indexes rows of pixels and the 2nd axis indexes columns of pixels.
In [4]:
(train_images, train_labels), (test_images, test_labels) = keras.datasets.mnist.load_data()
print(“train_images shape:”, train_images.shape)
print(“test_images shape:”, test_images.shape)
train_images shape: (60000, 28, 28)
test_images shape: (10000, 28, 28)
Before using the data for logistic regression, we need to do some basic pre-processing:
We rescale the images so that each pixel is represented as a float between 0 and 1 (note that the images are already normalised)
Ensure that the final array axis indexes the channel. For colour images, there are typically three channels corresponding to R, G, B. However, in this example we only have a single greyscale channel.
In [5]:
# Rescale
train_images = train_images.astype(float) / 255
test_images = test_images.astype(float) / 255
train_images = np.expand_dims(train_images, -1)
test_images = np.expand_dims(test_images, -1)
print(“train_images shape:”, train_images.shape)
print(“test_images shape:”, test_images.shape)
train_images shape: (60000, 28, 28, 1)
test_images shape: (10000, 28, 28, 1)
The code block below visualises random examples from the training set.
In [6]:
num_images = 10
fig, axes = plt.subplots(figsize=(1.5*num_images, 1.5), ncols=num_images)
sample_ids = np.random.choice(train_images.shape[0], size=num_images, replace=False)
for i, s in enumerate(sample_ids):
axes[i].imshow(train_images[s,:,:,0], cmap=’binary’)
axes[i].set_title(“$y = {}$”.format(train_labels[s]))
axes[i].axis(‘off’)
plt.show()
Finally, we note that the training set is relatively balanced—there are roughly 6000 examples for each digit.
In [7]:
plt.hist(train_labels, bins=range(11), align=’left’)
plt.xticks(ticks=range(11))
plt.title(‘Distribution of classes in training data’)
plt.ylabel(‘Frequency’)
plt.xlabel(‘Digit’)
plt.show()
2. Multi-class logistic regression¶
The handwritten digit recognition task is an example of a multi-class classification problem.
There are 10 classes—one for each digit $0, 1,\ldots, 9$.
We’ll first tackle the problem by generalising binary logistic regression (from workshop 4), to handle multiple classes.
Let the classes be denoted by integers $\{1, \ldots, C\}$ (in our example $C = 10$).
Treating class $C$ as a “reference”, we assume a linear relationship between the feature vector $\mathbf{x} \in \mathbb{R}^d$ and the log-odds of each class:
$$
\log \frac{p(y = c|\mathbf{x})}{p(y = C|\mathbf{x})} = \mathbf{x}^\top \mathbf{w}_c + b_c
$$where $\mathbf{w}_c \in \mathbb{R}^d$ and $b_c \in \mathbb{R}$ are the weights and bias for class $c$.
We can write this more compactly as
$$
\begin{gather}
p(y = c|\mathbf{x}) = \operatorname{softmax}(\mathbf{W} \mathbf{x} + \mathbf{b})_c = \frac{\exp \left\{-(\mathbf{W} \mathbf{x} + \mathbf{b})_c \right\}}{\sum_{c’ = 1}^{C} \exp \left\{-(\mathbf{W} \mathbf{x} + \mathbf{b})_{c’} \right\}}
\end{gather}
$$where $\mathbf{W} = [\mathbf{w}_1, \ldots, \mathbf{w}_C]^\top$ and $\mathbf{b} =[b_1, \ldots, b_C]^\top$.
This model can be expressed in Keras as follows.
In [8]:
lr = keras.Sequential(
[
layers.Input((28,28,1)), # Tell Keras the shape of the input array (a single-channel 28×28 image)
layers.Flatten(), # Unravel/flatten the input array
layers.Dense(10, activation=’softmax’) # Add a fully-connected layer with a softmax activation function
]
)
Next, we have to specify a loss function.
As for binary logistic regression, we use the cross-entropy loss or log-loss:
$$
\ell_\mathrm{log}(y, \boldsymbol{\pi}) = – \sum_{c = 1}^{C} \mathbb{1}[y = c] \log \pi_c
$$where $\mathbb{1}[\cdot]$ is the indicator function and $\pi_c = \operatorname{softmax}(\mathbf{W} \mathbf{x} + \mathbf{b})$ is the estimate for $p(y = c|\mathbf{x})$.
The following code block prepares the lr model for training under the cross-entropy loss.
It sets Adam (Adaptive Moment Estimation) as the optimisation algorithm and directs Keras to keep track of accuracy during training.
In [9]:
lr.compile(optimizer=’adam’,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=[‘accuracy’])
We’re now ready to fit the lr model using the training data.
By setting batch_size = 100, each gradient descent step is computed w.r.t. a random batch of 100 training instances.
By setting epochs = 20, we loop over the complete training data 20 times.
In [13]:
history_lr = lr.fit(train_images, train_labels, epochs=20, batch_size=100,
validation_data=(test_images, test_labels))
Epoch 1/20
600/600 [==============================] – 1s 2ms/step – loss: 0.6267 – accuracy: 0.8436 – val_loss: 0.3625 – val_accuracy: 0.9057
Epoch 2/20
600/600 [==============================] – 1s 2ms/step – loss: 0.3465 – accuracy: 0.9063 – val_loss: 0.3091 – val_accuracy: 0.9148
Epoch 3/20
600/600 [==============================] – 1s 1ms/step – loss: 0.3098 – accuracy: 0.9143 – val_loss: 0.2901 – val_accuracy: 0.9193
Epoch 4/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2921 – accuracy: 0.9190 – val_loss: 0.2802 – val_accuracy: 0.9224
Epoch 5/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2819 – accuracy: 0.9221 – val_loss: 0.2763 – val_accuracy: 0.9247
Epoch 6/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2751 – accuracy: 0.9234 – val_loss: 0.2706 – val_accuracy: 0.9241
Epoch 7/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2698 – accuracy: 0.9243 – val_loss: 0.2706 – val_accuracy: 0.9260
Epoch 8/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2654 – accuracy: 0.9258 – val_loss: 0.2659 – val_accuracy: 0.9261
Epoch 9/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2621 – accuracy: 0.9273 – val_loss: 0.2655 – val_accuracy: 0.9252
Epoch 10/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2592 – accuracy: 0.9279 – val_loss: 0.2647 – val_accuracy: 0.9259
Epoch 11/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2567 – accuracy: 0.9289 – val_loss: 0.2625 – val_accuracy: 0.9272
Epoch 12/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2545 – accuracy: 0.9301 – val_loss: 0.2675 – val_accuracy: 0.9249
Epoch 13/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2530 – accuracy: 0.9298 – val_loss: 0.2641 – val_accuracy: 0.9256
Epoch 14/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2513 – accuracy: 0.9304 – val_loss: 0.2625 – val_accuracy: 0.9274
Epoch 15/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2497 – accuracy: 0.9310 – val_loss: 0.2636 – val_accuracy: 0.9272
Epoch 16/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2484 – accuracy: 0.9311 – val_loss: 0.2636 – val_accuracy: 0.9286
Epoch 17/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2473 – accuracy: 0.9323 – val_loss: 0.2624 – val_accuracy: 0.9277
Epoch 18/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2462 – accuracy: 0.9323 – val_loss: 0.2623 – val_accuracy: 0.9272
Epoch 19/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2452 – accuracy: 0.9325 – val_loss: 0.2634 – val_accuracy: 0.9267
Epoch 20/20
600/600 [==============================] – 1s 1ms/step – loss: 0.2443 – accuracy: 0.9326 – val_loss: 0.2635 – val_accuracy: 0.9278
The plots below show that the model fit is unlikely to improve significantly with further training.
Both the test loss and accuracy have flattened out.
In [14]:
plt.plot(history_lr.history[‘accuracy’], label=’Train’)
plt.plot(history_lr.history[‘val_accuracy’], label=’Test’)
plt.ylabel(‘Accuracy’)
plt.xlabel(‘Epoch’)
plt.title(‘Training and testing accuracy’)
plt.legend()
plt.show()
In [15]:
plt.plot(history_lr.history[‘loss’], label=’Train’)
plt.plot(history_lr.history[‘val_loss’], label=’Test’)
plt.ylabel(‘Loss’)
plt.xlabel(‘Epoch’)
plt.title(‘Training and testing loss’)
plt.legend()
plt.show()
3. Convolutional neural network (CNN)¶
Let’s see if we can improve on logistic regression using a CNN.
For the convolutional layers, we follow the “convolutional pyramid” design principle—i.e. successive layers have decreasing spatial dimensions, but increasing depth. The reduction in the spatial dimensions is achieved via max pooling.
After the convolutional layers, we add a densely-connected layer (effectively a logistic regression layer) which combines the higher-level features to make a classification.
We also make use of dropout (a regularization method whereby random units are removed from the network) to prevent overfitting.
Architecture overview¶
We describe the architecture in further detail below (figure generated here).
Number Layer type Specification Keras function
1 Convolutional 8 5×5 filters with a stride of 1 and a ReLU activation function Conv2D
2 Pooling Max pooling with a 2×2 filter and a stride of 2 (implies pooled regions do not overlap) MaxPooling2D
3 Convolutional 16 5×5 filters with a stride of 1 and a ReLU activation function Conv2D
4 Pooling Same specs as pooling layer #1 MaxPooling2D
5 Flatten Nil Flatten
6 Dropout Randomly drops a fraction $r$ of the input units. We set $r=0.5$. Dropout
7 Dense 10 units (one for each target class) with a softmax activation function. Dense
Exercise: Complete the code block below to instantiate the model in Keras.
Hint: you can check the documentation for a layer type (e.g. Conv2D) by entering ?layers.Conv2D.
In [17]:
cnn = keras.Sequential(
[
layers.Input((28, 28, 1)),
layers.Conv2D(8, (5, 5), activation=’relu’), # fill in
layers.MaxPooling2D((2, 2)), # fill in
layers.Conv2D(16, (5, 5), activation=’relu’), #fill in
layers.MaxPooling2D((2, 2)), # fill in
layers.Flatten(),
layers.Dropout(0.5),
layers.Dense(10, activation=’softmax’)
],
)
We can get a useful summary of the model architecture using the summary method, as shown below.
In [18]:
cnn.summary()
Model: “sequential_1”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d (Conv2D) (None, 24, 24, 8) 208
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 12, 12, 8) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 8, 8, 16) 3216
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 4, 4, 16) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 256) 0
_________________________________________________________________
dropout (Dropout) (None, 256) 0
_________________________________________________________________
dense_1 (Dense) (None, 10) 2570
=================================================================
Total params: 5,994
Trainable params: 5,994
Non-trainable params: 0
_________________________________________________________________
Question: How many trainable parameters are there in the cnn model? How does this compare with the earlier lr model?
Answer: The cnn model has 5,994 parameters, while the lr model has 7,850 parameters. However, the lr model is more limited in the functions it can represent as it is a shallow (single-layer) network.
We prepare the cnn model for training using the same settings as for logistic regression.
In [19]:
cnn.compile(optimizer=’adam’,
loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=False),
metrics=[‘accuracy’])
Training the cnn model takes roughly 20 times longer than training the lr model on a CPU.
You may like to set the number of epochs to a smaller number (e.g. epochs=10) if you don’t have much time to spare.
In [20]:
history_cnn = cnn.fit(train_images, train_labels, epochs=20, batch_size=100,
validation_data=(test_images, test_labels))
Epoch 1/20
600/600 [==============================] – 9s 14ms/step – loss: 0.5926 – accuracy: 0.8099 – val_loss: 0.1304 – val_accuracy: 0.9638
Epoch 2/20
600/600 [==============================] – 8s 14ms/step – loss: 0.1988 – accuracy: 0.9403 – val_loss: 0.0814 – val_accuracy: 0.9750
Epoch 3/20
600/600 [==============================] – 8s 14ms/step – loss: 0.1474 – accuracy: 0.9546 – val_loss: 0.0617 – val_accuracy: 0.9800
Epoch 4/20
600/600 [==============================] – 8s 13ms/step – loss: 0.1260 – accuracy: 0.9615 – val_loss: 0.0526 – val_accuracy: 0.9834
Epoch 5/20
600/600 [==============================] – 8s 13ms/step – loss: 0.1086 – accuracy: 0.9660 – val_loss: 0.0452 – val_accuracy: 0.9862
Epoch 6/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0998 – accuracy: 0.9698 – val_loss: 0.0418 – val_accuracy: 0.9858
Epoch 7/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0959 – accuracy: 0.9703 – val_loss: 0.0380 – val_accuracy: 0.9879
Epoch 8/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0914 – accuracy: 0.9715 – val_loss: 0.0374 – val_accuracy: 0.9874
Epoch 9/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0863 – accuracy: 0.9736 – val_loss: 0.0339 – val_accuracy: 0.9892
Epoch 10/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0799 – accuracy: 0.9759 – val_loss: 0.0348 – val_accuracy: 0.9880
Epoch 11/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0819 – accuracy: 0.9748 – val_loss: 0.0319 – val_accuracy: 0.9894
Epoch 12/20
600/600 [==============================] – 9s 16ms/step – loss: 0.0772 – accuracy: 0.9760 – val_loss: 0.0308 – val_accuracy: 0.9901
Epoch 13/20
600/600 [==============================] – 10s 16ms/step – loss: 0.0752 – accuracy: 0.9769 – val_loss: 0.0319 – val_accuracy: 0.9896
Epoch 14/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0745 – accuracy: 0.9770 – val_loss: 0.0304 – val_accuracy: 0.9899
Epoch 15/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0713 – accuracy: 0.9775 – val_loss: 0.0284 – val_accuracy: 0.9916
Epoch 16/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0682 – accuracy: 0.9786 – val_loss: 0.0270 – val_accuracy: 0.9912
Epoch 17/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0680 – accuracy: 0.9783 – val_loss: 0.0292 – val_accuracy: 0.9914
Epoch 18/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0667 – accuracy: 0.9794 – val_loss: 0.0281 – val_accuracy: 0.9912
Epoch 19/20
600/600 [==============================] – 8s 14ms/step – loss: 0.0646 – accuracy: 0.9800 – val_loss: 0.0278 – val_accuracy: 0.9912
Epoch 20/20
600/600 [==============================] – 8s 13ms/step – loss: 0.0647 – accuracy: 0.9802 – val_loss: 0.0281 – val_accuracy: 0.9912
Let’s plot the accuracy and loss for each epoch, like we did for logistic regression.
In [21]:
plt.plot(history_cnn.history[‘accuracy’], label=’Train’)
plt.plot(history_cnn.history[‘val_accuracy’], label=’Test’)
plt.ylabel(‘Accuracy’)
plt.xlabel(‘Epoch’)
plt.title(‘Training and testing accuracy’)
plt.legend()
plt.show()
In [178]:
plt.plot(history_cnn.history[‘loss’], label=’Train’)
plt.plot(history_cnn.history[‘val_loss’], label=’Test’)
plt.ylabel(‘Loss’)
plt.xlabel(‘Epoch’)
plt.title(‘Training and testing loss’)
plt.legend()
plt.show()
Both the training accuracy and loss appear to have stabilised, so it’s unlikely the model will benefit from further epochs.
Question: Which model performs best for MNIST digit recognition?
Answer: The CNN achieves a test accuracy of around 0.99.
This is a significant improvement over logistic regression, which achieved a test accuracy of around 0.93.
This is to be expected as the CNN architecture naturally suited to vision applications—it takes advantage of local spatial coherence in images.
Bonus 1: Visualising filters (optional)¶
CNNs learn filters that extract salient features from images, as they pass through the network.
To understand how a CNN operates, it can be interesting to visualise the filters and their activations on input images.
First, let’s examine the layers in the cnn model.
In [23]:
cnn.layers
Out[23]:
[
We’ll take a look at the filters in the first convolutional layer, which is at index 0 in the list of layers.
The get_weights method extracts the weights as numpy arrays.
In [24]:
filters, biases = cnn.layers[0].get_weights()
print(“filters.shape:”, filters.shape)
filters.shape: (5, 5, 1, 8)
Below we visualise the 8 single-channel filters in the first convolutional layer.
It’s often possible to recognise filters that detect different types of strokes, e.g. diagonal, vertical etc.
The specific filters learned may vary due to stochasticity in the weight initialisation and optimisation algorithm.
In [25]:
fig, axes = plt.subplots(figsize=(8,4), nrows=2,ncols=4)
for i, ax in enumerate(axes.flatten()):
ax.imshow(filters[:,:,0,i], cmap=’binary’)
ax.set_title(“Filter {}”.format(i+1))
ax.set_xticks([])
ax.set_yticks([])
plt.suptitle(‘Filters in first CNN layer’)
plt.show()
We can also improve our understanding by examining the intermediate outputs of internal layers of the CNN.
In the code block below, we demonstrate how to access the output of the first convolutional layer (with index 0). [Aside: you may like to visualise the output of deeper layers by changing the index.]
Specifically, we define a new model that reuses the input of the cnn model, but truncates the output to the 0th layer.
In [26]:
cnn_internal = keras.Model(inputs=cnn.input, outputs=cnn.layers[0].output)
We then pass a single training example through the truncated model.
In [27]:
# Get the first example from the training set
training_example = np.expand_dims(train_images[0], 0)
plt.imshow(training_example[0,:,:,0], cmap=’binary’)
plt.axis(‘off’)
# Pass through the `cnn_internal` model
activations = cnn_internal.predict(training_example)
After passing an image through the first layer, we end up with an array of 8 24×24 single-channel images (one image for each filter).
We can see that each filter is active on different parts of the image.
For example in this particular run, the first filter is activating on diagonal strokes at the bottom-left edge.
In [28]:
fig, axes = plt.subplots(figsize=(8,4), nrows=2,ncols=4)
for i, ax in enumerate(axes.flatten()):
ax.imshow(activations[0,:,:,i], cmap=’binary’)
ax.set_title(“Filter {}”.format(i+1))
ax.set_xticks([])
ax.set_yticks([])
plt.suptitle(‘Activations in first CNN layer’)
plt.show()
Bonus 2: PyTorch (optional)¶
We have provided here the PyTorch code similar to Keras above. If you prefer pytorch, you can choose to complete this part.
In [107]:
import torch
from torch.utils import data
import numpy as np
import time, os
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torchvision.datasets as datasets
from torchvision.transforms import ToTensor
PyTorch accepts tensor as input, and the input shape of images for PyTorch is different from Keras. We can simply download the dataset from torchvision so we do not need to convert the dataset. You can try to print the shape of a single image and compare it to Keras one to see the difference.
In [ ]:
mnist_trainset = datasets.MNIST(root=’./data’, train=True, download=True, transform=ToTensor())
mnist_testset = datasets.MNIST(root=’./data’, train=False, download=True, transform=ToTensor())
train_loader = torch.utils.data.DataLoader(mnist_trainset, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(mnist_testset, batch_size=100, shuffle=False)
To implement same CNN model in PyTorch, we start with inheriting from torch.nn.Module, which allows us to access common NN-specific functionality, then:
Implement the constructor __init__(self, …). Here we define all network parameters.
Override the forward method forward(self, x). This accepts the input tensor x and returns our desired model output.
In [164]:
OUT_C1 = 8
OUT_C2 = 16
class BasicConvNet(nn.Module):
def __init__(self, out_c1, out_c2, dense_units, n_classes=10):
super(BasicConvNet, self).__init__()
self.conv1 = nn.Conv2d(in_channels=1, out_channels=out_c1, kernel_size=5)
self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
self.conv2 = nn.Conv2d(in_channels=out_c1, out_channels=out_c2, kernel_size=5)
self.dropout = nn.Dropout(p=0.5)
self.logits = nn.Linear(16 * 4 * 4, n_classes)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 4 * 4)
out = self.logits(self.dropout(x))
return out
cnn_torch = BasicConvNet(OUT_C1, OUT_C2, DENSE_UNITS)
We’ll write a convenient train and test function that allows us to seamlessly substitute different models – this is essential for fast iteration during development in The Real World. The basic structure is identical to what you encountered last week.
In [169]:
def test(model, criterion, test_loader):
test_loss = 0.
test_preds, test_labels = list(), list()
for i, data in enumerate(test_loader):
x, labels = data
with torch.no_grad():
logits = model(x) # Compute scores
predictions = torch.argmax(logits, dim=1)
test_loss += criterion(input=logits, target=labels).item()
test_preds.append(predictions)
test_labels.append(labels)
test_preds = torch.cat(test_preds)
test_labels = torch.cat(test_labels)
test_accuracy = torch.eq(test_preds, test_labels).float().mean().item()
print(‘[TEST] Mean loss {:.4f} | Accuracy {:.4f}’.format(test_loss/len(test_loader), test_accuracy))
return test_loss, test_accuracy
def train(model, train_loader, test_loader, optimizer, n_epochs=10):
“””
Generic training loop for supervised multiclass learning
“””
LOG_INTERVAL = 250
training_loss_history, training_accuracy_history = list(), list()
test_loss_history, test_accuracy_history = list(), list()
start_time = time.time()
criterion = torch.nn.CrossEntropyLoss()
for epoch in range(n_epochs): # Loop over training dataset `n_epochs` times
epoch_loss = 0.
for i, data in enumerate(train_loader): # Loop over elements in training set
x, labels = data
logits = model(x)
predictions = torch.argmax(logits, dim=1)
train_acc = torch.mean(torch.eq(predictions, labels).float()).item()
loss = criterion(input=logits, target=labels)
loss.backward() # Backward pass (compute parameter gradients)
optimizer.step() # Update weight parameter using SGD
optimizer.zero_grad() # Reset gradients to zero for next iteration
# ============================================================================
# You can safely ignore the boilerplate code below – just reports metrics over
# training and test sets
training_loss_history.append(loss.item())
training_accuracy_history.append(train_acc)
epoch_loss += loss.item()
if i % LOG_INTERVAL == 0: # Log training stats
deltaT = time.time() – start_time
mean_loss = epoch_loss / (i+1)
print(‘[TRAIN] Epoch {} [{}/{}]| Mean loss {:.4f} | Train accuracy {:.5f} | Time {:.2f} s’.format(epoch,
i, len(train_loader), mean_loss, train_acc, deltaT))
print(‘Epoch complete! Mean loss: {:.4f}’.format(epoch_loss/len(train_loader)))
test_loss, test_accuracy = test(model, criterion, test_loader)
test_loss_history.append(test_loss)
test_accuracy_history.append(test_accuracy)
return training_loss_history, training_accuracy_history, test_loss_history, test_accuracy_history
Load the model parameters into our selected optimizer and we’re good to go. We’ll include a momentum term in the standard SGD update rule to accelerate convergence. Intiutively, this helps the optimizer ignore parameter updates in suboptimal directions, possibly due to noise in the model.
In [170]:
optimizer = torch.optim.Adam(cnn_torch.parameters())
training_loss_history, training_accuracy_history,test_loss_history,test_accuracy_history = \
train(cnn_torch, train_loader, test_loader, optimizer)
[TRAIN] Epoch 0 [0/600]| Mean loss 0.2743 | Train accuracy 0.95000 | Time 0.03 s
[TRAIN] Epoch 0 [250/600]| Mean loss 0.1726 | Train accuracy 0.89000 | Time 5.10 s
[TRAIN] Epoch 0 [500/600]| Mean loss 0.1728 | Train accuracy 0.97000 | Time 10.18 s
Epoch complete! Mean loss: 0.1693
[TEST] Mean loss 0.1416 | Accuracy 0.9592
[TRAIN] Epoch 1 [0/600]| Mean loss 0.1278 | Train accuracy 0.96000 | Time 13.29 s
[TRAIN] Epoch 1 [250/600]| Mean loss 0.1533 | Train accuracy 0.98000 | Time 18.43 s
[TRAIN] Epoch 1 [500/600]| Mean loss 0.1521 | Train accuracy 0.95000 | Time 23.81 s
Epoch complete! Mean loss: 0.1500
[TEST] Mean loss 0.1335 | Accuracy 0.9597
[TRAIN] Epoch 2 [0/600]| Mean loss 0.1631 | Train accuracy 0.96000 | Time 27.16 s
[TRAIN] Epoch 2 [250/600]| Mean loss 0.1445 | Train accuracy 0.97000 | Time 32.68 s
[TRAIN] Epoch 2 [500/600]| Mean loss 0.1400 | Train accuracy 0.92000 | Time 38.30 s
Epoch complete! Mean loss: 0.1384
[TEST] Mean loss 0.1148 | Accuracy 0.9645
[TRAIN] Epoch 3 [0/600]| Mean loss 0.0832 | Train accuracy 0.97000 | Time 41.77 s
[TRAIN] Epoch 3 [250/600]| Mean loss 0.1287 | Train accuracy 0.96000 | Time 47.40 s
[TRAIN] Epoch 3 [500/600]| Mean loss 0.1303 | Train accuracy 0.91000 | Time 53.04 s
Epoch complete! Mean loss: 0.1294
[TEST] Mean loss 0.1100 | Accuracy 0.9662
[TRAIN] Epoch 4 [0/600]| Mean loss 0.0691 | Train accuracy 0.97000 | Time 56.57 s
[TRAIN] Epoch 4 [250/600]| Mean loss 0.1211 | Train accuracy 0.96000 | Time 62.27 s
[TRAIN] Epoch 4 [500/600]| Mean loss 0.1199 | Train accuracy 0.98000 | Time 67.88 s
Epoch complete! Mean loss: 0.1206
[TEST] Mean loss 0.1117 | Accuracy 0.9666
[TRAIN] Epoch 5 [0/600]| Mean loss 0.1101 | Train accuracy 0.96000 | Time 71.41 s
[TRAIN] Epoch 5 [250/600]| Mean loss 0.1109 | Train accuracy 0.98000 | Time 76.97 s
[TRAIN] Epoch 5 [500/600]| Mean loss 0.1137 | Train accuracy 0.98000 | Time 82.57 s
Epoch complete! Mean loss: 0.1148
[TEST] Mean loss 0.1051 | Accuracy 0.9674
[TRAIN] Epoch 6 [0/600]| Mean loss 0.0925 | Train accuracy 0.98000 | Time 85.99 s
[TRAIN] Epoch 6 [250/600]| Mean loss 0.1125 | Train accuracy 0.99000 | Time 91.41 s
[TRAIN] Epoch 6 [500/600]| Mean loss 0.1092 | Train accuracy 0.95000 | Time 96.93 s
Epoch complete! Mean loss: 0.1080
[TEST] Mean loss 0.1069 | Accuracy 0.9688
[TRAIN] Epoch 7 [0/600]| Mean loss 0.0798 | Train accuracy 0.95000 | Time 100.26 s
[TRAIN] Epoch 7 [250/600]| Mean loss 0.1016 | Train accuracy 0.99000 | Time 105.63 s
[TRAIN] Epoch 7 [500/600]| Mean loss 0.1040 | Train accuracy 0.97000 | Time 111.08 s
Epoch complete! Mean loss: 0.1041
[TEST] Mean loss 0.0956 | Accuracy 0.9707
[TRAIN] Epoch 8 [0/600]| Mean loss 0.0698 | Train accuracy 0.96000 | Time 114.41 s
[TRAIN] Epoch 8 [250/600]| Mean loss 0.1023 | Train accuracy 0.97000 | Time 119.74 s
[TRAIN] Epoch 8 [500/600]| Mean loss 0.0986 | Train accuracy 0.99000 | Time 125.12 s
Epoch complete! Mean loss: 0.1005
[TEST] Mean loss 0.0901 | Accuracy 0.9721
[TRAIN] Epoch 9 [0/600]| Mean loss 0.0729 | Train accuracy 0.99000 | Time 128.44 s
[TRAIN] Epoch 9 [250/600]| Mean loss 0.0972 | Train accuracy 0.95000 | Time 133.74 s
[TRAIN] Epoch 9 [500/600]| Mean loss 0.0994 | Train accuracy 0.96000 | Time 139.19 s
Epoch complete! Mean loss: 0.0987
[TEST] Mean loss 0.0889 | Accuracy 0.9727
image/svg+xml
Convolution 1
Max pooling 1
Convolution 2
Max pooling 1
Flatten
Dense