COSC2779LabExercises_W7_solution
Lab Exercises – Week: 7¶
COSC2779 – Deep Learning – 2020
This lab is aimed at understanding how to develop a simple RNN for sentiment classification. During this lab you will:
Encode/Decode text data
Develop a many-to-one RNN model for sentiment classification
This notebook is designed to run on Google Colab. If you like to run this on your local machine, make sure that you have installed TensorFlow version 2.0.
Setting up the Notebook¶
Let’s first load the packages we need.
In [ ]:
import tensorflow as tf
AUTOTUNE = tf.data.experimental.AUTOTUNE
import numpy as np
import pandas as pd
import tensorflow_datasets as tfds
import pathlib
import shutil
import tempfile
from IPython import display
from matplotlib import pyplot as plt
We can use the tensor board to view the learning curves. Let’s first set it up.
In [ ]:
logdir = pathlib.Path(tempfile.mkdtemp())/”tensorboard_logs”
shutil.rmtree(logdir, ignore_errors=True)
# Load the TensorBoard notebook extension
%load_ext tensorboard
# Open an embedded TensorBoard viewer
%tensorboard –logdir {logdir}/models
We can also write our own function to plot the models training history ones training has completed.
In [ ]:
from itertools import cycle
def plotter(history_hold, metric = ‘binary_crossentropy’, ylim=[0.0, 1.0]):
cycol = cycle(‘bgrcmk’)
for name, item in history_hold.items():
y_train = item.history[metric]
y_val = item.history[‘val_’ + metric]
x_train = np.arange(0,len(y_val))
c=next(cycol)
plt.plot(x_train, y_train, c+’-‘, label=name+’_train’)
plt.plot(x_train, y_val, c+’–‘, label=name+’_val’)
plt.legend()
plt.xlim([1, max(plt.xlim())])
plt.ylim(ylim)
plt.xlabel(‘Epoch’)
plt.ylabel(metric)
plt.grid(True)
Loading Dataset¶
In this lab we will be using the IMDB Movie review dataset available in tensoflow datasets. IMDB is a dataset for binary sentiment classification (all the reviews have either a positive or negative sentiment) containing substantially more data than previous benchmark datasets. It provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabelled data for use as well.
Let’s download the dataset. The version we are downloading below contains a vocabulary of 8K subword. There is another version that contain 32K vocabulary.
In [ ]:
dataset, info = tfds.load(‘imdb_reviews/subwords8k’, with_info=True,
as_supervised=True)
train_dataset, test_dataset = dataset[‘train’], dataset[‘test’]
Exploring the dataset¶
Let’s explore the dataset to get a better understanding. First, let’s print out a random data point in the train set.
In [ ]:
for x,y in train_dataset.shuffle(100).take(1):
print(‘The string: “{}”‘.format(x))
print(‘The label: {}’.format(y))
You can see the text in the dataset is represented by numbers. This is done using an encoder with vocabulary of 8k. The info of the dataset provides the encoder that was used for this dataset.
Let’s try to use the encoder to print a random datapoint in the train set.
In [ ]:
encoder = info.features[‘text’].encoder
print(‘Vocabulary size: {}’.format(encoder.vocab_size))
In [ ]:
for x,y in train_dataset.shuffle(100).take(1):
decoded_data = encoder.decode(x)
print(‘The string: “{}”‘.format(decoded_data))
print(‘The label: {}’.format(y))
We can also use this encoder to encode any novel sentence.
In [ ]:
novel_text = “This is the sixth lab in deep learning course”
encoded_string = encoder.encode(novel_text)
print(‘Encoded string is {}’.format(encoded_string))
decoded_string = encoder.decode(encoded_string)
print(‘Encoded string is {}’.format(decoded_string))
Note that the encoding happens at sub word level. This means that we may get a sequence of numbers that is not equal to the number of words we have.
In [ ]:
for index in encoded_string:
print(‘{} —-> {}’.format(index, encoder.decode([index])))
Setting up data loader¶
Next create batches of these encoded strings. Use the padded_batch method to zero-pad the sequences to the length of the longest string in the batch:
In [ ]:
BUFFER_SIZE = 10000
BATCH_SIZE = 64
train_dataset = train_dataset.shuffle(BUFFER_SIZE).padded_batch(BATCH_SIZE)
test_dataset = test_dataset.padded_batch(BATCH_SIZE)
Note that each batch now has the same length, but not two different batches.
In [ ]:
for x,y in train_dataset.take(2):
print(x.shape, y.shape)
Create a simple model¶
Build a tf.keras.Sequential model and start with an embedding layer. An embedding layer stores one vector per word. When called, it converts the sequences of word indices to sequences of vectors. These vectors are trainable. After training (on enough data), words with similar meanings often have similar vectors. we will cover more on embedding layer in lecture 7.
A recurrent neural network (RNN) processes sequence input by iterating through the elements. RNNs pass the outputs from one timestep to their input—and then to the next. As per our discussions in week 6 lecture lets start with a network with LSTM cells. LSTM allows the network to capture long range dependencies, easily.
return_sequences determines if we want to predict an output at each time step or not. We only need to predict the output at the final time step as our task is many-to-one.
In [ ]:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.LSTM(64, return_sequences=False),
tf.keras.layers.Dense(1, activation=’sigmoid’)
])
model.summary()
tf.keras.utils.plot_model(model, show_shapes=True, show_layer_names=True)
Compile the Keras model to configure the training process:
In [ ]:
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=[‘accuracy’, tf.losses.BinaryCrossentropy(from_logits=False, name=’BinaryCrossentropy’)])
In [ ]:
m_histories = {}
def get_callbacks(name):
return [
tf.keras.callbacks.TensorBoard(logdir/name, histogram_freq=1),
]
Training the model¶
In [ ]:
m_histories[‘simple_model’] = model.fit(train_dataset, epochs=10,
validation_data=test_dataset,
validation_steps=30,
verbose=0,
callbacks=get_callbacks(‘models/simple_model’))
In [ ]:
plotter(m_histories, ylim=[0.6, 0.8], metric = ‘BinaryCrossentropy’)
Create a base model¶
Lets do few modifications to the simple model base on our intuition, in order to get a base model:
The task we have, can benefit from having knowledge about the words in the future as well as the words in the history. The tf.keras.layers.Bidirectional wrapper can be used with an RNN layer. This propagates the input forward and backwards through the RNN layer and then concatenates the output. This helps the RNN to learn long range dependencies.
Can use a deep model (with two LSTM layers) to increase the capacity.
Can use a MLP at the output rather than a single layer NN.
In [ ]:
model = tf.keras.Sequential([
tf.keras.layers.Embedding(encoder.vocab_size, 64),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)),
tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32, return_sequences=False)),
tf.keras.layers.Dense(64, activation=’relu’),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(1, activation=’sigmoid’)
])
model.compile(loss=tf.keras.losses.BinaryCrossentropy(from_logits=False),
optimizer=tf.keras.optimizers.Adam(1e-4),
metrics=[‘accuracy’, tf.losses.BinaryCrossentropy(from_logits=False, name=’BinaryCrossentropy’)])
Training the model¶
In [ ]:
m_histories[‘base_model’] = model.fit(train_dataset, epochs=10,
validation_data=test_dataset,
validation_steps=30,
verbose=1,
callbacks=get_callbacks(‘models/base_model’))
In [ ]:
plotter(m_histories, ylim=[0.0, 1.1], metric = ‘BinaryCrossentropy’)
Exercises¶
Use the knowledge you obtained in weeks 1-7 to improve the model
Things you may try include
Regularisation to balance bias vs variance tradeoff
GRU vs LSTM
Network depth vs feature width