COSC2779LabExercises_W6_1
¶
COSC 2779 | Deep Learning
¶
Week 6 Lab Exercises: **Practical methodology**
¶
Introduction¶
This lab is aimed at understanding how to develop a CNN from scratch. Will be based on solveing CIFAR 10 task. During this lab you will:
Learn how to establish a base model.
Debug the base model and setup instrumentation.
Do manual hyperparameter tuning.
This notebook is designed to run on Google Colab. If you like to run this on your local machine, make sure that you have installed TensorFlow version 2.0.
In [ ]:
import os
os.environ[‘TF_CPP_MIN_LOG_LEVEL’] = ‘2’
import tensorflow as tf
AUTOTUNE = tf.data.experimental.AUTOTUNE
import numpy as np
import pandas as pd
import tensorflow_datasets as tfds
import pathlib
import shutil
import tempfile
from IPython import display
from matplotlib import pyplot as plt
Setting up Instrumentation¶
We can use the tensor board to view the learning curves, activation and weight hostograms. Lets first set it up.
In [ ]:
logdir = pathlib.Path(tempfile.mkdtemp())/”tensorboard_logs”
shutil.rmtree(logdir, ignore_errors=True)
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Open an embedded TensorBoard viewer
%tensorboard –logdir {logdir}/models
We can also write our own function to plot the models training history ones training has completed.
In [ ]:
from itertools import cycle
def plotter(history_hold, metric = ‘binary_crossentropy’, ylim=[0.0, 1.0], figsize=(6,6)):
plt.figure(figsize=figsize)
cycol = cycle(‘bgrcmk’)
for name, item in history_hold.items():
y_train = item.history[metric]
y_val = item.history[‘val_’ + metric]
x_train = np.arange(0,len(y_val))
c=next(cycol)
plt.plot(x_train, y_train, c+’-‘, label=name+’_train’)
plt.plot(x_train, y_val, c+’–‘, label=name+’_val’)
plt.legend()
plt.xlim([1, max(plt.xlim())])
plt.ylim(ylim)
plt.xlabel(‘Epoch’)
plt.ylabel(metric)
plt.grid(True)
Dataset¶
CIFAR is an acronym that stands for the Canadian Institute For Advanced Research and the CIFAR-10 dataset was developed along with the CIFAR-100 dataset by researchers at the CIFAR institute.
The dataset is comprised of 60,000 32×32 pixel color photographs of objects from 10 classes, such as frogs, birds, cats, ships, etc. The class labels and their standard associated integer values are listed below.
0: airplane
1: automobile
2: bird
3: cat
4: deer
5: dog
6: frog
7: horse
8: ship
9: truck
These are very small images, much smaller than a typical photograph, and the dataset was intended for computer vision research.
CIFAR-10 is a well-understood dataset and widely used for benchmarking computer vision algorithms in the field of machine learning. The problem is “solved.” It is relatively straightforward to achieve 80% classification accuracy. Top performance on the problem is achieved by deep learning convolutional neural networks with a classification accuracy above 90% on the test dataset.
Performance metric: Accuracy
Target performance: 90% Accuracy
Discuss why the above two selections are appropriate for the problem
Loading the dataset and Data Exploration¶
First load the CIFAR-10 dataset using the dataset API.
In [ ]:
import tensorflow_datasets as tfds
import tensorflow as tf
dataset, info = tfds.load(‘cifar10’, as_supervised = True, with_info = True)
test_dataset, train_dataset = dataset[‘test’], dataset[‘train’]
num_train_examples = info.splits[‘train’].num_examples
num_test_examples = info.splits[‘test’].num_examples
print(num_train_examples, num_test_examples)
Plot some images
In [ ]:
class_names = {0: ‘airplane’, 1: ‘automobile’, 2: ‘bird’, 3: ‘cat’, 4: ‘deer’, 5: ‘dog’, 6: ‘frog’, 7: ‘horse’, 8: ‘ship’, 9: ‘truck’}
plt.figure(figsize=(10,5))
i=1
for image,label in test_dataset.shuffle(100).take(10):
plt.subplot(2,5,i)
plt.imshow(image)
plt.title(class_names[label.numpy()])
i=i+1
Plot the histogram for labels. Checking class imbalance.
In [ ]:
lab_hold = list()
for img, lab in test_dataset.take(num_train_examples):
lab_hold.append(lab)
plt.hist(lab_hold, bins=np.arange(0,11), rwidth=.5,align=’left’)
plt.ylabel(‘Number of Images’)
plt.xlabel(‘Class Label’)
plt.show()
Discuss what other forms of analysis is required to form your model. May need to revisit this section ones you started the developemnt
Setup Data Loaders¶
We will setup a data loader using Dataset API. You can also use keras datagenerator for this stage. Use the knowladge from last week and pick the data loading mechanizm that suits best for your style of coding.
We are going to write our own augmentation functions. Unfortunately tensorflow does not provide a random rotation function under the image class. Therefore we are going to use the tensorflow addons to do this.
In [ ]:
!pip install tfa-nightly
Data Augmentation funcion
In [ ]:
import tensorflow_addons as tfa
import numpy as np
@tf.function
def rotate_tf(image,ang_deg=15):
random_angles = tf.random.uniform(shape = (), minval = -np.deg2rad(ang_deg), maxval = np.deg2rad(ang_deg))
return tfa.image.rotate(image,random_angles)
@tf.function
def convert(image, label):
image = tf.image.convert_image_dtype(image, tf.float32) # Cast and normalize the image to [0,1]
return image, label
@tf.function
def augment(image,label):
image,label = convert(image, label)
image = tf.image.resize_with_crop_or_pad(image, 40, 40) # Add 8 pixels of padding
image = rotate_tf(image, 10) # Rorate in the range [0,10]
image = tf.image.random_crop(image, size=[32, 32, 3]) # Random crop back to 32×23
image = tf.image.random_flip_left_right(image)
image = tf.image.random_brightness(image, max_delta=0.1) # Random brightness
return image,label
What are the other types of data augmentation you can use?
Generate training (with and without augmetnation) and validation batches.
In [ ]:
BATCH_SIZE=128
# Set aside some train data as validation set to do hyperparameter tuning
num_val_examples = 10000
num_train_examples = num_train_examples – num_val_examples
# Train set no augmetation
train_batches = (
train_dataset
.skip(num_val_examples)
.take(num_train_examples)
.cache()
.shuffle(num_train_examples)
.repeat()
.map(convert, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(AUTOTUNE)
)
# validation set no augmetation
val_batches = (
train_dataset
.take(num_val_examples)
.cache()
.map(convert, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
)
# Train set with augmetation
augmented_train_batches = (
train_dataset
.skip(num_val_examples)
.take(num_train_examples)
.cache()
.shuffle(num_train_examples)
.repeat()
.map(augment, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(AUTOTUNE)
)
# Train set with augmetation
full_augmented_train_batches = (
train_dataset
.take(num_train_examples+num_val_examples)
.cache()
.shuffle(num_train_examples)
.repeat()
.map(augment, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(AUTOTUNE)
)
# Test set no augmetation: final evaluation
test_batches = (
test_dataset
.take(num_test_examples)
.cache()
.map(convert, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
)
# Tiny train set for debugginh only with 128 images
train_batches_tiny = (
train_dataset
.skip(num_val_examples)
.take(128)
.cache()
.shuffle(num_train_examples)
.repeat()
.map(convert, num_parallel_calls=AUTOTUNE)
.batch(BATCH_SIZE)
.prefetch(AUTOTUNE)
)
Discuss the following:
What is the perpose of the function cache()?
What will happen if repeat was not used?
What id the function of preferch?
Lets take few images from augmented data stream and check if the output is as expected.
In [ ]:
plt.figure(figsize=(10,5))
i=1
for image,label in augmented_train_batches.shuffle(100).take(10):
plt.subplot(2,5,i)
plt.imshow(image[0,:])
plt.title(class_names[label[0].numpy()])
i=i+1
(Optional) Writing custom CNN layer¶
This stage is not strictly necessary. You can do the model by just using keras sequential API. The code below is advance code which will make the final code more readable.
**TODO:** If you are not using this section you need to write the function `get_resnet_model` yourself using the knowladge from labs week 2-5
When building networks like ResNet, GoogleLeNet and VGG, we tend to repeat NN block with the same structure over and over again to build a deep model.
While we can use the sequential/functional API with the existing blocks in tensorflow/keras to build such a large model, the code will become unreadble when the network size increases.
A solution to this is to create custom layer for a local structure that can then be reapeated.
Below we are going to create such a local block. This block is the resudial block from ResNet.
In [ ]:
class ResidualBlock(tf.keras.layers.Layer):
# Initialize components of the model
def __init__(self, filter_num, stride=1, reg_lambda=0.0):
super(ResidualBlock, self).__init__()
self.conv1 = tf.keras.layers.Conv2D(filters=filter_num,
kernel_size=(3, 3),
strides=stride,
kernel_initializer=”he_normal”,
kernel_regularizer=tf.keras.regularizers.l2(reg_lambda),
padding=”same”)
self.bn1 = tf.keras.layers.BatchNormalization(momentum=.4)
self.conv2 = tf.keras.layers.Conv2D(filters=filter_num,
kernel_size=(3, 3),
strides=1,
kernel_initializer=”he_normal”,
kernel_regularizer=tf.keras.regularizers.l2(reg_lambda),
padding=”same”)
self.bn2 = tf.keras.layers.BatchNormalization(momentum=.4)
if stride != 1:
self.downsample = tf.keras.Sequential()
self.downsample.add(tf.keras.layers.Conv2D(filters=filter_num,
kernel_size=(1, 1),
kernel_initializer=”he_normal”,
kernel_regularizer=tf.keras.regularizers.l2(reg_lambda),
strides=stride))
self.downsample.add(tf.keras.layers.BatchNormalization(momentum=.4))
else:
self.downsample = lambda x: x
# Define the forward function
def call(self, inputs, training=None, **kwargs):
residual = self.downsample(inputs)
x = self.conv1(inputs)
x = self.bn1(x, training=training)
x = tf.nn.relu(x)
x = self.conv2(x)
x = self.bn2(x, training=training)
output = tf.nn.relu(tf.keras.layers.add([residual, x]))
return output
def get_config(self):
config = super().get_config().copy()
config.update({
‘conv1’: self.conv1,
‘bn1’: self.bn1,
‘conv2’: self.conv2,
‘bn2’: self.bn2,
‘downsample’: self.downsample,
})
return config
**TODO:** Try to develop the bottle-neck residual block yourself later
Setting up the Models & attaching instrumentation¶
Write a function to generate a model with desired number of residual blocks.
In [ ]:
def get_resnet_model(filters, block_size, reg_lambda=0.0, fdropout=False):
model = tf.keras.Sequential()
#initial segment
model.add(tf.keras.layers.Conv2D(filters=64,
kernel_size=(3, 3),
strides=1,
kernel_initializer=”he_normal”,
kernel_regularizer=tf.keras.regularizers.l2(reg_lambda),
padding=”same”, input_shape=(32, 32, 3)))
model.add(tf.keras.layers.BatchNormalization(momentum=.4))
#Stack of residual blocks
for nFilters, nBlocks in zip(filters, block_size):
model.add(ResidualBlock(nFilters, stride=2, reg_lambda=reg_lambda))
for _ in range(1, nBlocks):
model.add(ResidualBlock(nFilters, stride=1, reg_lambda=reg_lambda))
# Final part
model.add(tf.keras.layers.GlobalAveragePooling2D())
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(10,
activation=tf.nn.softmax,
kernel_regularizer=tf.keras.regularizers.l2(reg_lambda),
kernel_initializer=”he_normal”))
return model
Define the callabcks and optimizer
In [ ]:
from tensorflow.keras.optimizers import Adam, SGD
epochs = 100
STEPS_PER_EPOCH = num_train_examples//BATCH_SIZE
lr = 0.001
lr_schedule = tf.keras.optimizers.schedules.InverseTimeDecay(
lr,
decay_steps=STEPS_PER_EPOCH*1000,
decay_rate=10,
staircase=False)
optimizer = Adam(learning_rate=lr_schedule)
# lr = 0.01
# optimizer = SGD(learning_rate=lr, momentum=0.9)
m_histories = {}
def get_callbacks(name, early_stop=True):
if early_stop:
return [
tf.keras.callbacks.EarlyStopping(monitor=’val_SparseCategoricalCrossentropy’, patience=25),
tf.keras.callbacks.TensorBoard(logdir/name, histogram_freq=60, embeddings_freq=60),
]
else:
return [tf.keras.callbacks.TensorBoard(logdir/name)]
Plot the lerning rate policy
In [ ]:
step = np.linspace(0,100000)
lr = lr_schedule(step)
plt.figure(figsize = (8,6))
plt.plot(step/STEPS_PER_EPOCH, lr)
plt.ylim([0,max(plt.ylim())])
plt.xlabel(‘Epoch’)
_ = plt.ylabel(‘Learning Rate’)
Developing the Models¶
Now lets start to develop the model.
Tiny model¶
Lets do a very small model with just one resudial block. It is highly probable that this model will underfit. However it is good to do a sanitiy check with a simple model to see if the code we developed so far is working.
Get a model, compile and train it.
In [ ]:
tiny_res_net = get_resnet_model([64,], [1,])
tiny_res_net.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
m_histories[‘resnet_tiny’] = tiny_res_net.fit(train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_tiny’) )
plotter(m_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’)
Tiny Dataset¶
Check if the model overfits to a tiny dataset to see if there is any coding errors.
In [ ]:
tiny_res_net2 = get_resnet_model([64,], [1,])
tiny_res_net2.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
m_histories[‘resnet_tiny_dataset’] = tiny_res_net2.fit(train_batches_tiny,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_tiny_dataset’) )
plotter(m_histories, ylim=[0.0, 5.0], metric = ‘SparseCategoricalCrossentropy’)
What do you observe? high bias or high varience?
What is the solutions that can be tried if you model is showing high bias (underfitting)?
Large Model (Base Model)¶
A solution for high bias is to increase the capacity of the model. This can be done by adding more depth (mode residual blocks).
Lets now do a large mode with three groups of residual blocks each with 3 residual blocks.
Get a model, compile and train it.
In [ ]:
b_histories = {}
In [ ]:
large_res_net = get_resnet_model([64, 128, 256], [3, 3, 3])
large_res_net.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
b_histories[‘resnet_large’] = large_res_net.fit(train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_large’) )
plotter(b_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’)
What do you observe now – high bias/high varience?
What are the options avaialbe when the model is showing high varience?
Data Augmentation¶
There are several options when the model is having high varince:
Get more data (not plausiable)
Data Augmentation.
Regularization: weight penelty
Reduce model capacity by changing the structure.
Lets first start with data augmentation. Here we will train the previous model with data augmentation.
In [ ]:
large_res_net_aug = get_resnet_model([64, 128, 256], [3, 3, 3])
large_res_net_aug.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
b_histories[‘resnet_large_aug’] = large_res_net_aug.fit(augmented_train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_large_aug’) )
plotter(b_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’, figsize=(10,10))
What do you observe: High bias/ High Varience?
Regularization¶
Lets apply some more techniques to reduce high varience.
In [ ]:
large_res_net_reg = get_resnet_model([64, 128, 256], [3, 3, 3], reg_lambda=0.001)
large_res_net_reg.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
b_histories[‘resnet_large_reg’] = large_res_net_reg.fit(augmented_train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_large_reg’) )
plotter(b_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’, figsize=(10,10))
**TODO:** Select the best value for the hyper parameter lambda
In [ ]:
h_histories = {}
lambda_vals = [0.05, 0.01, 0.005, 0.001]
# lambda_vals = [0.005, 0.002, 0.001, 0.0005, 0.0002, 0.0001]
for reg_lambda in lambda_vals:
large_res_net_reg_ = get_resnet_model([64, 128, 256], [3, 3, 3], reg_lambda=reg_lambda)
large_res_net_reg_.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
h_histories[‘resnet_large_reg’+ ‘_h’ + str(reg_lambda)] = large_res_net_reg_.fit(augmented_train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_large_reg’+ ‘_h1’ + str(reg_lambda), early_stop=True) )
In [ ]:
plt.figure(figsize=(10,5))
metric = ‘SparseCategoricalCrossentropy’
l_train = list()
l_val = list()
for reg_lambda in lambda_vals:
l_train.append(h_histories[‘resnet_large_reg’+ ‘_h’ + str(reg_lambda)].history[metric][-1])
l_val.append(h_histories[‘resnet_large_reg’+ ‘_h’ + str(reg_lambda)].history[‘val_’ + metric][-1])
plt.plot(lambda_vals,l_train, ‘ro’, label=’Train’ )
plt.plot(lambda_vals,l_val, ‘bs’, label=’Test’ )
plt.xlabel(‘Lambda’, fontsize=14)
plt.ylabel(‘Categorical Crossentropy’, fontsize=14)
plt.legend()
plt.show()
Have you achived the target metric value? What options would you like to try?
Huge Model¶
In [ ]:
huge_res_net_reg = get_resnet_model([64, 128, 256, 512], [6, 6, 6, 6], reg_lambda=0.005 )
# large_res_net_reg2.summary()
huge_res_net_reg.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
b_histories[‘resnet_huge_reg’] = huge_res_net_reg.fit(augmented_train_batches,
epochs=epochs,
steps_per_epoch=num_train_examples//BATCH_SIZE,
validation_data=val_batches,
validation_steps=num_val_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/resnet_huge_reg’) )
plotter(b_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’, figsize=(10,10))
**TODO:** Select the best value for the hyper parameter lambda
**TODO:** Train the best model you obtained using the entire training dataset (selected hyperparameters)
In [ ]:
final_model = get_resnet_model([64, 128, 256], [3, 3, 3], reg_lambda=0.005)
final_model.compile(optimizer=optimizer, loss=”sparse_categorical_crossentropy”,
metrics=[tf.losses.SparseCategoricalCrossentropy(from_logits=False, name=’SparseCategoricalCrossentropy’), ‘accuracy’])
b_histories[‘final_model’] = final_model.fit(full_augmented_train_batches,
epochs=epochs,
steps_per_epoch=(num_train_examples+num_val_examples)//BATCH_SIZE,
validation_data=test_batches,
validation_steps=num_test_examples//BATCH_SIZE,
verbose=0,
callbacks=get_callbacks(‘models/final_model’) )
plotter(b_histories, ylim=[0.0, 2.5], metric = ‘SparseCategoricalCrossentropy’, figsize=(10,10))
Testing models¶
In [ ]:
print(tiny_res_net.evaluate(test_batches))
print(large_res_net.evaluate(test_batches))
print(large_res_net_aug.evaluate(test_batches))
print(large_res_net_reg.evaluate(test_batches))
print(huge_res_net_reg.evaluate(test_batches))
print(final_model.evaluate(test_batches))
Have you achived the desired performance?
If not, What options would you want to consider?
Plotting results¶
We would like to plot some results on the test(validation) set to see if the model is performing reasonably.
This analysis can be used to identify components of the model that need to be improved.
In [ ]:
plt.figure(figsize=(20,7))
for image,label in test_batches.shuffle(100).take(1):
pred_y = final_model.predict(image)
pred_y_ = np.argmax(pred_y, axis=1)
for i in range(8):
plt.subplot(1,8,i+1)
plt.imshow(image[i,:])
plt.title(class_names[label[i].numpy()] + ‘ -> ‘ + class_names[pred_y_[i]] )
plt.figure(figsize=(20,3))
for i in range(8):
plt.subplot(1,8,i+1)
plt.bar(np.arange(0,10), np.squeeze(pred_y[i,:]))
plt.xticks(np.arange(0,10), [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’], rotation=’vertical’)
In [ ]:
plt.figure(figsize=(20,7))
for image,label in test_batches.shuffle(100).take(1):
pred_y = final_model.predict(image)
pred_y_ = np.argmax(pred_y, axis=1)
for i in range(8):
plt.subplot(1,8,i+1)
plt.imshow(image[i,:])
plt.title(class_names[label[i].numpy()] + ‘ -> ‘ + class_names[pred_y_[i]] )
plt.figure(figsize=(20,3))
for i in range(8):
plt.subplot(1,8,i+1)
plt.bar(np.arange(0,10), np.squeeze(pred_y[i,:]))
plt.xticks(np.arange(0,10), [‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’], rotation=’vertical’)