COSC2779LabExercises_W2_3_solutions
¶
COSC 2779 | Deep Learning
¶
Week 1-2 Lab Exercises: **Introduction to Tensorflow**
¶
introduction¶
This lab is aimed at introducing the fundamentals of TensorFlow 2.0. The Lab is organized into six sub modules:
Minimal example: 2D Linear regression.
Exercise: Classifying hand written text MNIST.
Advanced TensorFlow example: Understanding gradient tape and writing own training loop
Saving models and checkpointing
Working with TensorBoard
Exercise: Putting everythin together – FasionMNIST
The lab assumes that you are familiar with Google Colab. Please complete Week 01 self-study lab: Introduction to Google Colab before attempting this lab.
This notebook is designed to run on Google Colab. If you like to run this on your local machine, make sure that you have installed TensorFlow version 2.0.
(most online tutorials still contain TensorFlow 1.x code, therefore be careful when using online resources – if you see any reference to sessions then most probably that is TensorFlow 1.x code)
To start using TensorFlow lets load it and check the version.
In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# print the tensorflow version
print(“Tensorflow version is: “, tf.__version__)
# produce an error if version is not equal to 2
assert tf.__version__[0] == ‘2’
Minimal example: 2D Linear regression¶
We will start with a simple example. In this example the task is to fit a linear model to a set of synthetic data. We assume 2D input attributes $ \mathbf{x} \in \mathbb{R}^2$. The synthetic data is generated using the following equation.
$ y = 2 + 1.5 x_1 + 3.5 x_2 + noise$ \
where the noise is Guassian with 0 mean and standard deviation of 0.3.
In [ ]:
N = 1000 # Number of data points in each set
w = [2.0, 1.5, 3.5] # True weight vector (This defines our unknown target function)
# Lets generate some traing data
x1 = np.random.uniform(low=-10, high=10, size=(N, 1))
x2 = np.random.uniform(low=-10, high=10, size=(N, 1))
noise = np.random.normal(loc=0.0, scale=0.3, size=(N, 1)) # Generate some Gaussian noise
y = w[0] + w[1]*x1 + w[2]*x2 + noise # Unkown target function
train_data_attibutes = np.hstack((x1, x2))
train_data_target = y
# Now lets generate some i.i.d test data
x1 = np.random.uniform(low=-10, high=10, size=(N, 1))
x2 = np.random.uniform(low=-10, high=10, size=(N, 1))
noise = np.random.normal(loc=0.0, scale=1.0, size=(N, 1)) # Generate some Gaussian noise
y = w[0] + w[1]*x1 + w[2]*x2 + noise # Unkown target function
test_data_attibutes = np.hstack((x1, x2))
test_data_target = y
Lets visualise the data. Since we have a very simple datast, we can actually plot it using the scatter function.
In [ ]:
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection=’3d’)
ax.scatter(train_data_attibutes[:, 0], train_data_attibutes[:, 1], train_data_target, c=’r’, marker=’o’)
ax.set_xlabel(‘x1 Label’)
ax.set_ylabel(‘x2 Label’)
ax.set_zlabel(‘y Label’)
plt.show()
As we discussed in the first lecture, most machine learning algorithms can be described with a fairly simple recipe of four ingredients. Lets define the four ingredients of this problem.
Dataset: Generated in the previous section
Cost function: Mean squared error
Model: Linear regression
Optimisation procedure: Stochastic gradient decent
Defining these components in TensorFlow (using some help from Keras) is quite straight forward.
Lets see how this is done.
As you will descover, there are a many ways in TensorFlow to build models and train them. TensorFlow has many APIs that you can use, and trying to use all these APIs can be confusing.
Lets build a simple one unit linear perceptron using the Keras Sequential API. In keras Sequential API you have to provide a list of NN layers in the order they appear in the network from input to output. In this case we have only one layer.
In [ ]:
output_dim = 1
mse = tf.keras.losses.MeanSquaredError() # Cost function
model = tf.keras.Sequential([tf.keras.layers.Dense(output_dim, use_bias=True, input_shape=(2,))]) # model one linear perceptron
custom_optimizer = tf.keras.optimizers.SGD(learning_rate=0.001) # optimizer
We now have all the ingredients. Lets compile (Configures the model for training) the model and train it.
The compile command generate the computation graph. model.fit() conduct the optimisation to find the optimal weights. Note that 30\% data is used as the validation set.
The training is run for 100 epoch. One Epoch is when an ENTIRE dataset is passed forward and backward through the neural network ONCE.
In [ ]:
model.compile(optimizer=custom_optimizer, loss=mse) # attach the cost function and optimizer to the model
H = model.fit(train_data_attibutes, train_data_target, validation_split=0.3, epochs=100, verbose=0)
Try changing the verbose value to 1, 2 and observe the output
Now we can extract the optimal weights learned by our model and compare them to the original function (using which the synthetic data was generated)
In [ ]:
weights = model.layers[0].get_weights()[0]
bias = model.layers[0].get_weights()[1]
print(“Weights of the unknown target function: “, w)
print(“Weights estimated: “, bias.round(1).tolist() + weights.round(1).reshape((-1,)).tolist())
The history of the trained model object (H) can be used to plot the learning curve.
In [ ]:
plt.style.use(“ggplot”)
plt.figure(figsize=(10,5))
Nepoch=100
plt.subplot(1,2,1)
plt.plot(np.arange(0, Nepoch), H.history[“loss”], label=”train_loss”)
plt.plot(np.arange(0, Nepoch), H.history[“val_loss”], label=”validation_loss”)
plt.title(“Training Loss on Dataset”)
plt.xlabel(“Epoch #”)
plt.ylabel(“Loss”)
plt.legend(loc=”upper right”)
plt.show()
We can now predict the target variable for the test set and visualise the results.
In [ ]:
y_hat = model.predict(test_data_attibutes).round(1)
In [ ]:
plt.scatter(test_data_target, y_hat)
plt.xlabel(‘Outputs’)
plt.ylabel(‘Targets’)
plt.show()
What can you say about the learned model?
Exercise: Classifying hand written text classification MNIST.¶
In this exercise you will use the famous MNIST dataset to develop a simple MLP model to recognise hand writtend digits. The MNIST dataset is included in TensorFlow datasets. TensorFlow Datasets is a collection of datasets ready to use, with TensorFlow or other Python ML frameworks. More information is available via link text
(optional step) before we start this section, lets restart the kernel to remove any variables that we have created. This will give us a clean slate and does not have to worry about getting confused with the variables and code we have written in the previous section.
You can do this by either clicking Runtime -> Restart runtime in file menu of the notebook or by running the following code.
In [ ]:
# import os
# os._exit(00)
Loading the MNIST dataset and normalising the pixel values
In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
Visualise few instances of the dataset
In [ ]:
plt.figure(figsize=(12,4))
for i, image in enumerate(x_train[:3]):
plt.subplot(1,3,i+1)
plt.imshow(image)
plt.show()
In this exercise you have to build a simple one hidden layer MLP using the Keras Sequential API.
The model should consist of:
Layer to convert $[28,28]$ matrix input to a vector of $28\times28$
Hidden layer with 128 neurones and relu activation
Dropout regularisation
output layer with 10 units and softmax activation
Use adam optimiser and categorical cross-entropy loss.
**TODO:** Build your model in the cell below
In [ ]:
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation=’relu’),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=’softmax’)
])
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
**TODO:** Train the model for 5 epoch and evaluate the performance on the test set.
In [ ]:
model.fit(x_train, y_train, epochs=5)
test_loss, test_acc = model.evaluate(x_test, y_test)
print(‘Test accuracy:’, test_acc)
How can you add ridge regularisation penalty to the hidden layer?
What is the generalisation gap for the model you trained?
What are the hyper parameters of the above model?
How would you select the values of the hyper parameters?
Advanced TensorFlow example: Understanding gradient tape and writing own training loop¶
Now, we will explore a more advanced usage of TensorFlow. Following API will provide more flexibility and can be used when you need to customize yor models.
Note that you can specify more unique architectures with the forward pass. We will use the same MNIST dataset.
Before we start with the ML examples. Lets explore what Automatic differentiation and gradient tape is. More information at Tensorflow Tutorials
Automatic differentiation is useful for implementing machine learning algorithms such as backpropagation for training neural networks. To differentiate automatically, TensorFlow needs to remember what operations happen in what order during the forward pass. Then, during the backward pass, TensorFlow traverses this list of operations in reverse order to compute gradients.
TensorFlow provides the tf.GradientTape API for automatic differentiation; that is, computing the gradient of a computation with respect to some inputs, usually tf.Variables. TensorFlow “records” relevant operations executed inside the context of a tf.GradientTape onto a “tape”. TensorFlow then uses that tape to compute the gradients of a “recorded” computation using reverse mode differentiation.
Here is a simple example that calculate gradient at x=2:
$ \frac{\partial \left( 3x^2 + 2x\right )}{\partial x} $
In [ ]:
x = tf.Variable(2.0)
with tf.GradientTape() as tape:
y = 3 * x * x + 2*x
dy_dx = tape.gradient(y, x) # Will compute gradient
print(dy_dx)
Was the answer as expected?
Lets now load the MNIST dataset (This time using a different API)
In [ ]:
import tensorflow_datasets as tfds
# Function to Preprocess the dataset
def convert_types(image, label):
image = tf.cast(image, tf.float32)
image /= 255
return image, label
dataset, info = tfds.load(‘mnist’, data_dir=’gs://tfds-data/datasets’, with_info=True, as_supervised=True)
mnist_train, mnist_test = dataset[‘train’], dataset[‘test’]
# Do normalization and prepare batches
batch_size = 32
mnist_train = mnist_train.map(convert_types).shuffle(10000).batch(batch_size)
mnist_test = mnist_test.map(convert_types).batch(batch_size)
What does shuffle do in the above code block? More information at tf.data.Dataset
In [ ]:
# Print the shapes of the first 3 batches
for image, label in mnist_train.take(3): # example is (image, label)
print(image.shape, label)
Now lets instantiate a Model by subclassing the Keras Model class, in that case, you should define your layers in __init__ and you should implement the model’s forward pass in call.
The model is same as the last exercise:
Layer to convert $[28,28]$ matrix input to a vector of $28\times28$
Hidden layer with 128 neurones and relu activation
Dropout regularisation
output layer with 10 units and softmax activation
In [ ]:
from tensorflow.keras.layers import Dense, Flatten, Conv2D, Dropout
from tensorflow import keras
from tensorflow.keras import Model
class MNISTModel(Model):
def __init__(self):
super(MNISTModel, self).__init__()
self.flatten = Flatten(input_shape=(28, 28))
self.d1 = Dense(128, activation=’relu’)
self.d2 = Dense(10, activation=’softmax’)
self.drop = Dropout(0.2)
def call(self, x):
x = self.flatten(x)
x = self.d1(x)
x = self.drop(x)
return self.d2(x)
model = MNISTModel()
We can now define the optimiser, cost function and the evaluation measures.
In [ ]:
loss_object = tf.keras.losses.SparseCategoricalCrossentropy()
optimizer = tf.keras.optimizers.Adam()
train_loss = tf.keras.metrics.Mean(name=’train_loss’)
train_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name=’train_accuracy’)
test_loss = tf.keras.metrics.Mean(name=’test_loss’)
test_accuracy = tf.keras.metrics.SparseCategoricalAccuracy(name=’test_accuracy’)
Next, lets write a function to train the model. Note the tf.function annotation on the function below. This means that this function will be compiled into a graph in the backend, allowing it to run efficiently as TensorFlow can optimize the function for you. This automatic conversion of Python code to its graph representation is called AutoGraph, and this creates callable graphs from Python functions.
In [ ]:
@tf.function
def train_step(image, label):
with tf.GradientTape() as tape:
predictions = model(image)
loss = loss_object(label, predictions)
gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))
train_loss(loss)
train_accuracy(label, predictions)
Next define the function for testing. Note that we do not have to compute the gradients while testing.
In [ ]:
@tf.function
def test_step(image, label):
predictions = model(image)
t_loss = loss_object(label, predictions)
test_loss(t_loss)
test_accuracy(label, predictions)
We now have all the components. lets train the model for 5 epoch.
In [ ]:
EPOCHS = 5
for epoch in range(EPOCHS):
print(“starting Epoch: “, epoch)
# iterate over all batches in the dataset
for image, label in mnist_train:
train_step(image, label)
for test_image, test_label in mnist_test:
test_step(test_image, test_label)
print(‘Epoch {}, Loss: {}, Accuracy: {}, Test Loss: {}, Test Accuracy: {}’.format(epoch+1,
train_loss.result(),
train_accuracy.result()*100,
test_loss.result(),
test_accuracy.result()*100))
train_loss.reset_states()
test_loss.reset_states()
train_accuracy.reset_states()
test_accuracy.reset_states()
**TODO:** Change the model to have a 3×3 convolution layer with 64 channels as the first layer. Add l2 regularisation to the convolution layer.
Saving models and checkpointing¶
The description of checkpointing from TensorFlow Documentation
The phrase “Saving a TensorFlow model” typically means one of two things:
Checkpoints
SavedModel
Checkpoints capture the exact value of all parameters (tf.Variable objects) used by a model. Checkpoints do not contain any description of the computation defined by the model and thus are typically only useful when source code that will use the saved parameter values is available.
The SavedModel format on the other hand includes a serialized description of the computation defined by the model in addition to the parameter values (checkpoint). Models in this format are independent of the source code that created the model. They are thus suitable for deployment via TensorFlow Serving, TensorFlow Lite, TensorFlow.js, or programs in other programming languages (the C, C++, Java, Go, Rust, C# etc. TensorFlow APIs).
(optional step) before we start this section, lets restart the kernel to remove any variables that we have created. This will give us a clean slate and does not have to worry about getting confused with the variables and code we have written in the previous section.
You can do this by either clicking Runtime -> Restart runtime in file menu of the notebook or by running the following code.
In [ ]:
# import os
# os._exit(00)
First lets check how the savemodel works. We need to define a model first. Same code we used in previous sections.
In [ ]:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# load data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Create model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation=’relu’),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=’softmax’)
])
# Compile and train
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
model.fit(x_train, y_train, epochs=5)
# Print test accuracy
test_loss, test_acc = model.evaluate(x_test, y_test)
print(‘Test accuracy:’, test_acc)
Now we have a trained model. Lets save the model to file.
In [ ]:
model.save(‘my_model.h5’)
Check the local directory to see if the model is saved and you can see file my_model.h5
Next load the saved model and evaluate it. Note that this could happen in a different script on different machine.
NOTE: savemodel does not work with the models defined using functional API (Advanced model above) you need to use save_weights. For detaild example of how to save model for such models go to TensorFlow Documentation
The summary() function prints out the model structure and is a good tool to debug NN.
In [ ]:
new_model = tf.keras.models.load_model(‘my_model.h5’)
new_model.summary()
print(‘\n\n’)
test_loss, test_acc = new_model.evaluate(x_test, y_test)
print(‘\n\nTest accuracy:’, test_acc)
Checkpoints allow you to save the intermediate states of your model while trining.
It is highly recommended that you use checkpointing when doing the assignments. Checkpoints will allow you to start from the last state if something happens to your job mid training and can save time.
The checkpoint files will disapear if colab session is destroyed. Write code to move the checkpoints to the google drive – use last weeks lab meterial
In [ ]:
import os
checkpoint_path = “training_1/cp.ckpt”
checkpoint_dir = os.path.dirname(checkpoint_path)
# Create checkpoint callback
cp_callback = tf.keras.callbacks.ModelCheckpoint(checkpoint_path,
save_weights_only=True,
verbose=1)
model.fit(x_train, y_train, epochs=5, verbose=2, callbacks = [cp_callback]) # pass callback to training
More information on callbacks.ModelCheckpoint is at link text
Go through the documentation and find how you can change the code to
Only keep the model that has achieved the “best performance” so far
Save the model at the end of every epoch regardless of performance.
Note the epoch 1 accuracy is high for this model. why?
check the latest checkpoint
In [ ]:
latest = tf.train.latest_checkpoint(checkpoint_dir)
latest
Load the weights from the latest checkpoint and evalaute the model
In [ ]:
model.load_weights(latest)
# Print test accuracy
test_loss, test_acc = model.evaluate(x_test, y_test)
print(‘Test accuracy:’, test_acc)
Working with TensorBoard¶
TensorBoard is a tool for providing the measurements and visualizations needed during the machine learning workflow. It enables tracking experiment metrics like loss and accuracy, visualizing the model graph, and much more.
This section will show how to quickly get started with TensorBoard and use it in notebooks.
(optional step) before we start this section, lets restart the kernel to remove any variables that we have created. This will give us a clean slate and does not have to worry about getting confused with the variables and code we have written in the previous section.
You can do this by either clicking Runtime -> Restart runtime in file menu of the notebook or by running the following code.
In [ ]:
# import os
# os._exit(00)
In [ ]:
# Load the TensorBoard notebook extension
%load_ext tensorboard
Create a model and compile it.
In [ ]:
import tensorflow as tf
import datetime, os
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# load data
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
# Create model
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation=’relu’),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=’softmax’)
])
# Compile and train
model.compile(optimizer=’adam’,
loss=’sparse_categorical_crossentropy’,
metrics=[‘accuracy’])
When training with Keras Model.fit(), adding the tf.keras.callbacks.
TensorBoard callback ensures that logs are created and stored. Place the logs in a timestamped subdirectory to allow easy selection of different training runs.
In [ ]:
logdir = os.path.join(“logs”, datetime.datetime.now().strftime(“%Y%m%d-%H%M%S”))
tensorboard_callback = tf.keras.callbacks.TensorBoard(logdir, histogram_freq=1)
Start TensorBoard in notebook.
In [ ]:
%tensorboard –logdir logs
Now start trining the model. You will see the tensorboard in the above block get updated while the training is running.
In [ ]:
model.fit(x=x_train,
y=y_train,
epochs=50,
verbose=0,
validation_data=(x_test, y_test),
callbacks=[tensorboard_callback])
Exercise: Putting everythin together – FasionMNIST¶
Lets practice what we have learned so far.
For this Exercise you will use the FashionMNIST dataset.
Develop a classification model to classify images in the dataset using tensorflow. The code should include checkpointing and tensorboard.
Download the FashionMNIST dataset
In [ ]:
import tensorflow_datasets as tfds
dataset, info = tfds.load(‘fashion_mnist’, with_info=True, as_supervised=True)
fmnist_train, fmnist_test = dataset[‘train’], dataset[‘test’]
Visualise few images.
In [ ]:
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(16,4))
i=1
for image, label in fmnist_train.take(4): # example is (image, label)
plt.subplot(1,4,i)
plt.imshow(np.squeeze(image))
i = i+1
plt.show()
**TODO:** Complete the Code.