Practical Week 04
Workshop Week 4¶
Deep Learning for Name Gender Classification¶
We have already seen the following code for partitioning the data of name gender classification and feature extraction. The code is changed slightly so that the labels are numerical (0 for male, 1 for female). This is the format required for Keras:
In [1]:
import nltk
nltk.download(‘names’)
from nltk.corpus import names
m = names.words(‘male.txt’)
f = names.words(‘female.txt’)
[nltk_data] Downloading package names to /home/diego/nltk_data…
[nltk_data] Package names is already up-to-date!
In [2]:
import random
random.seed(1234) # Set the random seed to allow replicability
names = ([(name,0) for name in m] +
[(name,1) for name in f])
random.shuffle(names)
train_names = names[1000:]
devtest_names = names[500:1000]
test_names = names[:500]
In [3]:
def one_hot_character(c):
alphabet = ‘abcdefghijklmnopqrstuvwxyz’
result = [0]*(len(alphabet)+1)
i = alphabet.find(c.lower())
if i >= 0:
result[i] = 1
else:
result[len(alphabet)] = 1 # character is out of the alphabet
return result
def gender_features_n(word, n=2):
“Return the one-hot encodings of the last n characters”
features = []
for i in range(n):
if i < len(word):
features = one_hot_character(word[-i-1]) + features
else:
features = one_hot_character(' ') + features
return features
In [4]:
gender_features_n("Mary", n=2)
Out[4]:
[0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
0,
1,
0,
0]
In [5]:
# Let's determine the number of features so that we can use this information when we design the neural network
len(gender_features_n("Mary", n=2))
Out[5]:
54
Exercise: Simple Neural Network¶
Design a simple neural network that has 55 input cells (that's the number of gender features for $n=2$, as we have seen above), and one output cell (without a hidden layer). The output cell will be used to classify the name between male (output=0) and female (output=1). This is therefore an instance of binary classification. Pay attention to the right activation function! This simple model, without hidden layers, is equivalent to a logistic regression classifier. The model summary should look like this:
Layer (type) Output Shape Param #
=================================================================
dense_2 (Dense) (None, 1) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
Compile the model and provide the right loss function. Use 'rmsprop' as the optimiser, and include 'accuracy' as an evaluation metric.
Run the network for 100 epochs using batch size of 100, and observe the results.
Answer the following questions:
What is the best result on the validation set?
At the epoch with best result on the validation set, what is the result on the training set?
Is the system overfitting? Justify your answer.
Do we really need 100 epochs? Do we need more than 100 epochs? would the system run better with less epochs?
In [6]:
import tensorflow as tf
tf.config.experimental.list_physical_devices()
Out[6]:
[PhysicalDevice(name='/physical_device:CPU:0', device_type='CPU'),
PhysicalDevice(name='/physical_device:XLA_CPU:0', device_type='XLA_CPU'),
PhysicalDevice(name='/physical_device:XLA_GPU:0', device_type='XLA_GPU')]
In [7]:
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
In [8]:
import numpy as np
from tensorflow.keras import models
from tensorflow.keras import layers
# Write your model here
In [9]:
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
dense (Dense) (None, 1) 55
=================================================================
Total params: 55
Trainable params: 55
Non-trainable params: 0
_________________________________________________________________
(Write additional code for the partition of the data, your experiments, and your analysis. Write the answers to the questions.)
In [ ]:
Exercise: A Deeper Network¶
Experiment with a network that has one hidden dense layer with a 'relu' activation. The resulting system is no longer a logistic regression classifier, it's something more complex. Try with the following sizes in the hidden layer:
5, 10, 20
Answer the following questions:
Which system performed best on the dev-test set?
Would you add more or less cells in the hidden layer? Justify your answer.
Is this system better than the simpler system of the previous exercise? Justify your answer.
In [14]:
Optional: Deep Learning with the Movie Review Corpus¶
The notebook W04L1-2-MovieReviews.ipynb has several questions at the end, repeated below. Try to answer these, and indeed try other variants!
We were using 2 hidden layers. Try to use 1 or 3 hidden layers and see how it affects validation and test accuracy.
Try to use layers with more hidden units or less hidden units: 32 units, 64 units...
Try to use the mse loss function instead of binary_crossentropy.
Try to use the tanh activation (an activation that was popular in the early days of neural networks) instead of relu.
In [ ]: