CS计算机代考程序代写 cache python Keras W06L1-2-Transformers

W06L1-2-Transformers

Huggingface’s transformers library¶
Parts of this code based on https://huggingface.co/transformers/quickstart.html and https://github.com/strongio/keras-bert/blob/master/keras-bert.ipynb

Huggingface’s transformers library is a very popular library that contains some of the latest complex architectures based on the Transformer. This library is used by an increasing number of developers and researchers to produce state-of-the-art results in multiple tasks. In this notebook, we will what is arguably the most popular architecture, BERT, for the task of the classification of movie reviews (yes, this task again).

In [1]:

import tensorflow as tf
tf.config.experimental.list_physical_devices()

Out[1]:

[PhysicalDevice(name=’/physical_device:CPU:0′, device_type=’CPU’),
PhysicalDevice(name=’/physical_device:GPU:0′, device_type=’GPU’)]

In [2]:

from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

In [3]:

# Uncomment this cell if you need to install transformers for google colab
!pip install transformers

Collecting transformers
[?25l Downloading https://files.pythonhosted.org/packages/ed/d5/f4157a376b8a79489a76ce6cfe147f4f3be1e029b7144fa7b8432e8acb26/transformers-4.4.2-py3-none-any.whl (2.0MB)
[K |████████████████████████████████| 2.0MB 16.6MB/s
[?25hRequirement already satisfied: importlib-metadata; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from transformers) (3.7.2) Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (1.19.5)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.7/dist-packages (from transformers) (2019.12.20)
Collecting sacremoses
[?25l Downloading https://files.pythonhosted.org/packages/7d/34/09d19aff26edcc8eb2a01bed8e98f13a1537005d31e95233fd48216eed10/sacremoses-0.0.43.tar.gz (883kB)
[K |████████████████████████████████| 890kB 51.2MB/s
[?25hRequirement already satisfied: filelock in /usr/local/lib/python3.7/dist-packages (from transformers) (3.0.12)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.7/dist-packages (from transformers) (4.41.1)
Requirement already satisfied: packaging in /usr/local/lib/python3.7/dist-packages (from transformers) (20.9)
Collecting tokenizers<0.11,>=0.10.1
[?25l Downloading https://files.pythonhosted.org/packages/71/23/2ddc317b2121117bf34dd00f5b0de194158f2a44ee2bf5e47c7166878a97/tokenizers-0.10.1-cp37-cp37m-manylinux2010_x86_64.whl (3.2MB)
[K |████████████████████████████████| 3.2MB 47.0MB/s
[?25hRequirement already satisfied: requests in /usr/local/lib/python3.7/dist-packages (from transformers) (2.23.0)
Requirement already satisfied: zipp>=0.5 in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.4.1)
Requirement already satisfied: typing-extensions>=3.6.4; python_version < "3.8" in /usr/local/lib/python3.7/dist-packages (from importlib-metadata; python_version < "3.8"->transformers) (3.7.4.3)
Requirement already satisfied: six in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.15.0)
Requirement already satisfied: click in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (7.1.2)
Requirement already satisfied: joblib in /usr/local/lib/python3.7/dist-packages (from sacremoses->transformers) (1.0.1)
Requirement already satisfied: pyparsing>=2.0.2 in /usr/local/lib/python3.7/dist-packages (from packaging->transformers) (2.4.7)
Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (1.24.3)
Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2.10)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests->transformers) (2020.12.5)
Building wheels for collected packages: sacremoses
Building wheel for sacremoses (setup.py) … [?25l[?25hdone
Created wheel for sacremoses: filename=sacremoses-0.0.43-cp37-none-any.whl size=893262 sha256=106fb5e0d3bd18a72fe847f857b3c3813622ea854094358216d185a6cb370d1f
Stored in directory: /root/.cache/pip/wheels/29/3c/fd/7ce5c3f0666dab31a50123635e6fb5e19ceb42ce38d4e58f45
Successfully built sacremoses
Installing collected packages: sacremoses, tokenizers, transformers
Successfully installed sacremoses-0.0.43 tokenizers-0.10.1 transformers-4.4.2

If we are going to use the pre-trained weights provided by transformers, we need to make sure that we use the same tokenizer. The following code illustrates how to use the tokenizer bundled with the BERT model. The first thing we need to do is load the tokenizer from the pre-trained model.

In [4]:

from transformers import BertTokenizer, TFBertModel

In [5]:

# Load pre-trained model tokenizer (vocabulary)
tokenizer = BertTokenizer.from_pretrained(‘bert-base-uncased’)

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=231508.0, style=ProgressStyle(descripti…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=28.0, style=ProgressStyle(description_w…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=466062.0, style=ProgressStyle(descripti…

The encode will return the word indices. There are several special indices:

101 is used to encode the special token [CLS]. This token indicates the beginning of the string.
101 is used to encode the sentence separator [SEP].

You will observe that the number of tokens does not correspond with the number of words. This is because BERT’s tokeniser will split long words into shorter pieces of text. This is BERT’s approach to address the problem of unknown words. By splitting long words into multiple tokens, we are less likely to find unknown words.

In [6]:

text = “[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]”
tokens = tokenizer.encode(text, add_special_tokens=False)
tokens

Out[6]:

[101,
2040,
2001,
3958,
27227,
1029,
102,
3958,
27227,
2001,
1037,
13997,
11510,
102]

In [7]:

tokenizer.decode(tokens)

Out[7]:

‘[CLS] who was jim henson? [SEP] jim henson was a puppeteer [SEP]’

The decoded version has converted all uppercase characters to lowercase. This is because we have used the pre-trained model “bert-case-uncased”.

The process of tokenising can also be done through two steps: first find the words, then find the word indices. The code below shows that the word “puppeteer” has been split into two tokens: “puppet” and “##eer”.

In [8]:

# Tokenize input
text = “[CLS] Who was Jim Henson ? [SEP] Jim Henson was a puppeteer [SEP]”
tokenized_text = tokenizer.tokenize(text)
tokenized_text

Out[8]:

[‘[CLS]’,
‘who’,
‘was’,
‘jim’,
‘henson’,
‘?’,
‘[SEP]’,
‘jim’,
‘henson’,
‘was’,
‘a’,
‘puppet’,
‘##eer’,
‘[SEP]’]

In [9]:

# Convert token to vocabulary indices
indexed_tokens = tokenizer.convert_tokens_to_ids(tokenized_text)
indexed_tokens

Out[9]:

[101,
2040,
2001,
3958,
27227,
1029,
102,
3958,
27227,
2001,
1037,
13997,
11510,
102]

IMDB data preparation¶
From https://github.com/strongio/keras-bert/blob/master/keras-bert.ipynb

We need to tokenise the IMDB data (we cannot use the word indices given by Keras). The following code downloads the IMDB data and loads it as a pandas data frame.

In [10]:

import os
import re
import pandas as pd
# Load all files from a directory in a DataFrame.
def load_directory_data(directory):
data = {}
data[“sentence”] = []
data[“sentiment”] = []
for file_path in os.listdir(directory):
with tf.io.gfile.GFile(os.path.join(directory, file_path), “r”) as f:
data[“sentence”].append(f.read())
data[“sentiment”].append(re.match(“\d+_(\d+)\.txt”, file_path).group(1))
return pd.DataFrame.from_dict(data)

# Merge positive and negative examples, add a polarity column and shuffle.
def load_dataset(directory):
pos_df = load_directory_data(os.path.join(directory, “pos”))
neg_df = load_directory_data(os.path.join(directory, “neg”))
pos_df[“polarity”] = 1
neg_df[“polarity”] = 0
return pd.concat([pos_df, neg_df]).sample(frac=1).reset_index(drop=True)

# Download and process the dataset files.
def download_and_load_datasets(force_download=False):
dataset = tf.keras.utils.get_file(
fname=”aclImdb.tar.gz”,
origin=”http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz”,
extract=True)

train_df = load_dataset(os.path.join(os.path.dirname(dataset),
“aclImdb”, “train”))
test_df = load_dataset(os.path.join(os.path.dirname(dataset),
“aclImdb”, “test”))

return train_df, test_df

In [11]:

train_df, test_df = download_and_load_datasets()
train_df.head()

Downloading data from http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
84131840/84125825 [==============================] – 4s 0us/step

Out[11]:

sentence sentiment polarity
0 Unless you are between the ages of 10 and 14 (… 1 0
1 Wow You guys are way too nice!!!Corny,Corny,Co… 4 0
2 OK. I think the TV show is kind of cute and it… 2 0
3 If you’re in the mood for some dopey light ent… 4 0
4 Before this made for TV movie began, I had rel… 4 0

We now load the text and the labels. We use the “polarity” column for the labels.

In [12]:

import numpy as np
# Create datasets (Only take up to max_seq_length words for memory)
max_seq_length = 500
train_text = train_df[‘sentence’].tolist()
train_text = [‘ ‘.join(t.split()[0:max_seq_length]) for t in train_text]
#train_text = np.array(train_text, dtype=object)[:, np.newaxis]
train_label = train_df[‘polarity’].tolist()

test_text = test_df[‘sentence’].tolist()
test_text = [‘ ‘.join(t.split()[0:max_seq_length]) for t in test_text]
#test_text = np.array(test_text, dtype=object)[:, np.newaxis]
test_label = test_df[‘polarity’].tolist()

In [13]:

train_text[:3]

Out[13]:

[‘Unless you are between the ages of 10 and 14 (except for the R rating), there are very few things to like here. One or two lines from Kenan Thompson, David Koechner (we really should see him more) and Sam Jackson are humorous and Julianna Margulies is as good as she can be considering her surroundings, but sadly, that\’s it. Poor plot. Poor acting. Worse writing and delivery. The special effects are dismal. As much as the entire situation is an odd and awful joke, the significant individual embedded situations are all equally terrible. If we consider the action portions, well there are unbelievable action sequences in some films that make you giddy and there are some that make you groan. This movie only contains the latter kind. This leaves little left. I\’m so glad I did not pay for this.

Despite any hype, I can read and think, so as I sat down to watch, I did not expect anything good. I had no expectations, but was somewhat worried going in. Yet, like a train wreck, one cannot merely look away. And even with no expectations, I was let down. Bad. Not even \’so bad, it\’s good\’ material. I\’m _very_ tolerant of bad movies, but this makes “Six String Samurai” (which I liked) Oscar worthy.

No, this piece of over CGI\’d rubbish is in the same company as Battlefield Earth, Little Man and Gigli. How this is currently rated a 7.2 completely mystifies me. Brainwashing or somehow stacking the voting system is all that I can think of as answers.

I could go on and on but suffice to say that tonight, I witnessed a train wreck. I need to go wash my eyes. 1 of 10′,
‘Wow You guys are way too nice!!!Corny,Corny,Corny That is how I feel about that film.It started well with a good idea , A guy (Edward Asner) escape from Jail dressed as Santa,a bunch of kids find him and believes his the real Santa so the Fake Santa enlist the children to help him find a bag of stolen money.the film is like a Christmas version of “Whistle down the wind”. The movie start well but gradually it becomes Cheesier and Cheesier to the point that at the end it becomes ridiculous and you just cant take this film seriously. For example you get the Scrooge type character called Sumner (Rene Auberjonois) who\’s a total Douchebag who treat his young son like a pile a rubbish ,he treat his son so bad that he don\’t even buy him decent clothes,the poor kid wears Jeans with Holes in it! but a 45 second scene with Fake Santa visiting Sumner and by the end of the film you get the guy all happy singing Christmas Carol and giving his neglected son a hug…yep that is how Corny it is… I\’m all for feel good movie especially during Christmas and I am a big fan of seasonal TV movie but this one is way too over the top for me,it is a shame because it started well but the second half of the movie is trowing a supernatural element to the film that just don\’t match with the rest of it. It\’s not totally bad,there are some solid acting , especially from the children but there are plenty of better Christmas film around.’,
“OK. I think the TV show is kind of cute and it always has some kind of lesson involved. So, when my kids decided they wanted to see this movie, I decided to tag along. I wish I’d stayed home and watched the TV show instead.

The fact that the humor is silly and unoriginal is the least of the problems with this movie. The plot is next to non-existant, the characters seem to exist in a vacuum, and, worst of all, Gadget does not carry any lesson whatsoever. It appears that Disney took all of the things that make Inspector Gadget work on TV and tossed them all. To be fair, my younger child (8 years old) liked the movie but the older one (10 years old) came away thinking it silly (he was too old for the youth humor but too young for any of the adult humor).

Generally, I like Disney films but this one misses by a mile. It is OK for a very narrow age band (say 7 to 9) but a must miss for everybody else.”]

In [14]:

train_label[:3]

Out[14]:

[0, 0, 0]

In [15]:

train_indices = [tokenizer.encode(t, add_special_tokens=True, max_length=500, padding=’max_length’, truncation=True) for t in train_text]

In [16]:

test_indices = [tokenizer.encode(t, add_special_tokens=True, max_length=500, padding=’max_length’, truncation=True) for t in test_text]

If executing the above two lines (which have been commented out) is too slow and your computer has multiple CPUs, you can use the following parallel version that uses Python’s multiprocessing library instead. Uncomment and execute the following cells.

In [17]:

#from multiprocessing import Pool
#from functools import partial
#def text_to_indices(data, limit=500):
# “Convert the text to indices and pad-truncate to the maximum number of words”
# with Pool() as pool:
# return pool.map(partial(tokenizer.encode,
# add_special_tokens=True, max_length=limit, padding=’max_length’, truncation=True),
# data)

In [18]:

#text_to_indices([“this is sentence 1”, “this is sentence 2”], limit=10)

In [19]:

#print(“Obtaining indices of train set”)
#train_indices = text_to_indices(train_text)
#print(“Obtaining indices of test set”)
#test_indices = text_to_indices(test_text)

Results with BERT¶
Let’s now design a simple architecture that uses BERT for binary classification. Our approach will not modify the BERT weights, hence we use the option trainable=False.

In [20]:

# Load pre-trained model (weights)
bert = TFBertModel.from_pretrained(‘bert-base-uncased’, trainable=False)

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=433.0, style=ProgressStyle(description_…

HBox(children=(FloatProgress(value=0.0, description=’Downloading’, max=536063208.0, style=ProgressStyle(descri…

Some layers from the model checkpoint at bert-base-uncased were not used when initializing TFBertModel: [‘mlm___cls’, ‘nsp___cls’]
– This IS expected if you are initializing TFBertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
– This IS NOT expected if you are initializing TFBertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
All the layers of TFBertModel were initialized from the model checkpoint at bert-base-uncased.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions without further training.

In [21]:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Input, GlobalAveragePooling1D

Below is the code that implements the model. Given that the BERT layer returns multiple outputs, we must use Keras’ functional API where we use functions to define each layer. Each function takes as a parameter the output of the previous layer in the sequence of the model.

Look, in particular, at the line last_hidden_states = bert(inputs)[0]. This line is using our definition of BERT, to which we apply the inputs. The layer returns a tuple of values, and we use the first element only (element with index 0). This element contains the information that we want, which is the list of BERT embeddings of each token.

Then, the final model is defined by defining what are the inputs and the outputs. In the code below, this is done in the line bert_model = Model(inputs, outputs).

In [22]:

# From https://huggingface.co/transformers/_modules/transformers/modeling_tf_bert.html#TFBertModel
#inputs = tf.constant(indexed_tokens)[None, :] # Batch size 1
inputs = Input(shape=(500,), dtype=tf.int32)
last_hidden_states = bert(inputs)[0] # The last hidden-state is the first element of the output tuple
average_pooling = GlobalAveragePooling1D()(last_hidden_states) # Average pooling of last hidden states as recommended
outputs = Dense(1, activation=tf.nn.sigmoid)(average_pooling)
bert_model = Model(inputs, outputs)
bert_model.summary()

WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained(‘name’, output_attentions=True)`).
WARNING:tensorflow:AutoGraph could not transform > and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform > and will run it as-is.
Please report this to the TensorFlow team. When filing the bug, set the verbosity to 10 (on Linux, `export AUTOGRAPH_VERBOSITY=10`) and attach the full output.
Cause: module, class, method, function, traceback, frame, or code object was expected, got cython_function_or_method
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:AutoGraph could not transform and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING: AutoGraph could not transform and will run it as-is.
Cause: while/else statement not yet supported
To silence this warning, decorate the function with @tf.autograph.experimental.do_not_convert
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
Model: “model”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 500)] 0
_________________________________________________________________
tf_bert_model (TFBertModel) TFBaseModelOutputWithPool 109482240
_________________________________________________________________
global_average_pooling1d (Gl (None, 768) 0
_________________________________________________________________
dense (Dense) (None, 1) 769
=================================================================
Total params: 109,483,009
Trainable params: 769
Non-trainable params: 109,482,240
_________________________________________________________________

In [23]:

bert_model.compile(optimizer=’adam’,
loss=’binary_crossentropy’,
metrics=[‘acc’])

In [24]:

history = bert_model.fit(np.array(train_indices), np.array(train_label),
epochs=20,
batch_size=128,
validation_split=0.2)

Epoch 1/20
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained(‘name’, output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained(‘name’, output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
157/157 [==============================] – ETA: 0s – loss: 0.6764 – acc: 0.5882WARNING:tensorflow:The parameters `output_attentions`, `output_hidden_states` and `use_cache` cannot be updated when calling a model.They have to be set to True/False in the config object (i.e.: `config=XConfig.from_pretrained(‘name’, output_attentions=True)`).
WARNING:tensorflow:The parameter `return_dict` cannot be set in graph mode and will always be set to `True`.
157/157 [==============================] – 1170s 7s/step – loss: 0.6763 – acc: 0.5886 – val_loss: 0.6119 – val_acc: 0.7254
Epoch 2/20
157/157 [==============================] – 1156s 7s/step – loss: 0.6121 – acc: 0.7167 – val_loss: 0.5607 – val_acc: 0.7998
Epoch 3/20
157/157 [==============================] – 1154s 7s/step – loss: 0.5657 – acc: 0.7729 – val_loss: 0.5338 – val_acc: 0.7642
Epoch 4/20
157/157 [==============================] – 1155s 7s/step – loss: 0.5370 – acc: 0.7821 – val_loss: 0.5017 – val_acc: 0.8222
Epoch 5/20
157/157 [==============================] – 1155s 7s/step – loss: 0.5165 – acc: 0.7920 – val_loss: 0.4834 – val_acc: 0.8154
Epoch 6/20
157/157 [==============================] – 1157s 7s/step – loss: 0.4973 – acc: 0.8038 – val_loss: 0.4681 – val_acc: 0.8306
Epoch 7/20
157/157 [==============================] – 1156s 7s/step – loss: 0.4847 – acc: 0.8024 – val_loss: 0.4601 – val_acc: 0.8090
Epoch 8/20
157/157 [==============================] – 1151s 7s/step – loss: 0.4723 – acc: 0.8083 – val_loss: 0.4471 – val_acc: 0.8304
Epoch 9/20
157/157 [==============================] – 1153s 7s/step – loss: 0.4652 – acc: 0.8115 – val_loss: 0.4361 – val_acc: 0.8424
Epoch 10/20
157/157 [==============================] – 1154s 7s/step – loss: 0.4510 – acc: 0.8207 – val_loss: 0.4284 – val_acc: 0.8440
Epoch 11/20
157/157 [==============================] – 1154s 7s/step – loss: 0.4461 – acc: 0.8221 – val_loss: 0.4231 – val_acc: 0.8424
Epoch 12/20
157/157 [==============================] – 1155s 7s/step – loss: 0.4405 – acc: 0.8249 – val_loss: 0.4168 – val_acc: 0.8474
Epoch 13/20
157/157 [==============================] – 1155s 7s/step – loss: 0.4339 – acc: 0.8249 – val_loss: 0.4246 – val_acc: 0.8156
Epoch 14/20
157/157 [==============================] – 1158s 7s/step – loss: 0.4292 – acc: 0.8253 – val_loss: 0.4068 – val_acc: 0.8520
Epoch 15/20
157/157 [==============================] – 1157s 7s/step – loss: 0.4236 – acc: 0.8281 – val_loss: 0.4022 – val_acc: 0.8442
Epoch 16/20
157/157 [==============================] – 1155s 7s/step – loss: 0.4219 – acc: 0.8258 – val_loss: 0.3996 – val_acc: 0.8532
Epoch 17/20
157/157 [==============================] – 1155s 7s/step – loss: 0.4181 – acc: 0.8273 – val_loss: 0.3987 – val_acc: 0.8468
Epoch 18/20
157/157 [==============================] – 1156s 7s/step – loss: 0.4189 – acc: 0.8278 – val_loss: 0.3936 – val_acc: 0.8494
Epoch 19/20
157/157 [==============================] – 1154s 7s/step – loss: 0.4157 – acc: 0.8257 – val_loss: 0.3922 – val_acc: 0.8544
Epoch 20/20
157/157 [==============================] – 1154s 7s/step – loss: 0.4098 – acc: 0.8282 – val_loss: 0.3873 – val_acc: 0.8550

In [25]:

%matplotlib inline
import matplotlib.pyplot as plt

acc = history.history[‘acc’]
val_acc = history.history[‘val_acc’]
loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]

epochs = range(len(acc))

plt.subplot(121)
plt.plot(epochs, acc, ‘bo’, label=’Training acc’)
plt.plot(epochs, val_acc, ‘b’, label=’Validation acc’)
plt.title(‘Training and validation accuracy’)
plt.legend()

plt.subplot(122)

plt.plot(epochs, loss, ‘bo’, label=’Training loss’)
plt.plot(epochs, val_loss, ‘b’, label=’Validation loss’)
plt.title(‘Training and validation loss’)
plt.legend()

plt.show()

Results with Average baseline¶
For comparison, below is a simpler architecture that uses a simple embedding layer instead of BERT, as we have seen in previous notebooks. The system will process exactly the same data.

In [26]:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, GlobalAveragePooling1D

In [27]:

max_features = tokenizer.vocab_size
embeddings_size = 32
inputs = Input(shape=(500,), dtype=tf.int32)
embedding = Embedding(max_features, embeddings_size)(inputs)
avg = GlobalAveragePooling1D()(embedding)
outputs = Dense(1, activation=tf.nn.sigmoid)(avg)
avg_model = Model(inputs, outputs)
avg_model.summary()

Model: “model_1”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 500)] 0
_________________________________________________________________
embedding (Embedding) (None, 500, 32) 976704
_________________________________________________________________
global_average_pooling1d_1 ( (None, 32) 0
_________________________________________________________________
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 976,737
Trainable params: 976,737
Non-trainable params: 0
_________________________________________________________________

In [28]:

avg_model.compile(optimizer=’adam’,
loss=’binary_crossentropy’,
metrics=[‘acc’])

In [29]:

history = avg_model.fit(np.array(train_indices), np.array(train_label),
epochs=50,
batch_size=128,
validation_split=0.2)

Epoch 1/50
157/157 [==============================] – 3s 16ms/step – loss: 0.6905 – acc: 0.5830 – val_loss: 0.6752 – val_acc: 0.6336
Epoch 2/50
157/157 [==============================] – 2s 15ms/step – loss: 0.6658 – acc: 0.7195 – val_loss: 0.6274 – val_acc: 0.7752
Epoch 3/50
157/157 [==============================] – 2s 15ms/step – loss: 0.6145 – acc: 0.7765 – val_loss: 0.5692 – val_acc: 0.8028
Epoch 4/50
157/157 [==============================] – 2s 16ms/step – loss: 0.5575 – acc: 0.8109 – val_loss: 0.5160 – val_acc: 0.8288
Epoch 5/50
157/157 [==============================] – 2s 15ms/step – loss: 0.5047 – acc: 0.8355 – val_loss: 0.4708 – val_acc: 0.8466
Epoch 6/50
157/157 [==============================] – 2s 15ms/step – loss: 0.4575 – acc: 0.8515 – val_loss: 0.4338 – val_acc: 0.8586
Epoch 7/50
157/157 [==============================] – 2s 15ms/step – loss: 0.4162 – acc: 0.8687 – val_loss: 0.4042 – val_acc: 0.8690
Epoch 8/50
157/157 [==============================] – 2s 15ms/step – loss: 0.3830 – acc: 0.8779 – val_loss: 0.3802 – val_acc: 0.8748
Epoch 9/50
157/157 [==============================] – 2s 16ms/step – loss: 0.3552 – acc: 0.8878 – val_loss: 0.3613 – val_acc: 0.8784
Epoch 10/50
157/157 [==============================] – 2s 15ms/step – loss: 0.3308 – acc: 0.8983 – val_loss: 0.3496 – val_acc: 0.8770
Epoch 11/50
157/157 [==============================] – 2s 16ms/step – loss: 0.3125 – acc: 0.8960 – val_loss: 0.3333 – val_acc: 0.8854
Epoch 12/50
157/157 [==============================] – 3s 16ms/step – loss: 0.2935 – acc: 0.9060 – val_loss: 0.3224 – val_acc: 0.8864
Epoch 13/50
157/157 [==============================] – 2s 15ms/step – loss: 0.2788 – acc: 0.9139 – val_loss: 0.3138 – val_acc: 0.8886
Epoch 14/50
157/157 [==============================] – 2s 15ms/step – loss: 0.2626 – acc: 0.9156 – val_loss: 0.3058 – val_acc: 0.8904
Epoch 15/50
157/157 [==============================] – 3s 17ms/step – loss: 0.2539 – acc: 0.9171 – val_loss: 0.2998 – val_acc: 0.8914
Epoch 16/50
157/157 [==============================] – 3s 16ms/step – loss: 0.2374 – acc: 0.9266 – val_loss: 0.2947 – val_acc: 0.8942
Epoch 17/50
157/157 [==============================] – 3s 17ms/step – loss: 0.2308 – acc: 0.9283 – val_loss: 0.2905 – val_acc: 0.8950
Epoch 18/50
157/157 [==============================] – 3s 16ms/step – loss: 0.2231 – acc: 0.9298 – val_loss: 0.2873 – val_acc: 0.8940
Epoch 19/50
157/157 [==============================] – 2s 16ms/step – loss: 0.2128 – acc: 0.9334 – val_loss: 0.2822 – val_acc: 0.8968
Epoch 20/50
157/157 [==============================] – 2s 16ms/step – loss: 0.2015 – acc: 0.9365 – val_loss: 0.2796 – val_acc: 0.8966
Epoch 21/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1932 – acc: 0.9417 – val_loss: 0.2771 – val_acc: 0.8976
Epoch 22/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1872 – acc: 0.9432 – val_loss: 0.2749 – val_acc: 0.8996
Epoch 23/50
157/157 [==============================] – 3s 17ms/step – loss: 0.1780 – acc: 0.9441 – val_loss: 0.2737 – val_acc: 0.8994
Epoch 24/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1678 – acc: 0.9492 – val_loss: 0.2727 – val_acc: 0.8990
Epoch 25/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1687 – acc: 0.9461 – val_loss: 0.2709 – val_acc: 0.8994
Epoch 26/50
157/157 [==============================] – 3s 16ms/step – loss: 0.1646 – acc: 0.9491 – val_loss: 0.2701 – val_acc: 0.9004
Epoch 27/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1504 – acc: 0.9561 – val_loss: 0.2702 – val_acc: 0.9002
Epoch 28/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1489 – acc: 0.9557 – val_loss: 0.2697 – val_acc: 0.9012
Epoch 29/50
157/157 [==============================] – 3s 16ms/step – loss: 0.1407 – acc: 0.9589 – val_loss: 0.2708 – val_acc: 0.8996
Epoch 30/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1384 – acc: 0.9606 – val_loss: 0.2702 – val_acc: 0.9026
Epoch 31/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1326 – acc: 0.9614 – val_loss: 0.2707 – val_acc: 0.9010
Epoch 32/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1277 – acc: 0.9616 – val_loss: 0.2705 – val_acc: 0.9014
Epoch 33/50
157/157 [==============================] – 3s 16ms/step – loss: 0.1222 – acc: 0.9634 – val_loss: 0.2713 – val_acc: 0.9024
Epoch 34/50
157/157 [==============================] – 3s 16ms/step – loss: 0.1205 – acc: 0.9642 – val_loss: 0.2731 – val_acc: 0.9002
Epoch 35/50
157/157 [==============================] – 3s 16ms/step – loss: 0.1178 – acc: 0.9664 – val_loss: 0.2742 – val_acc: 0.9008
Epoch 36/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1102 – acc: 0.9682 – val_loss: 0.2748 – val_acc: 0.9016
Epoch 37/50
157/157 [==============================] – 2s 16ms/step – loss: 0.1056 – acc: 0.9714 – val_loss: 0.2773 – val_acc: 0.9010
Epoch 38/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1055 – acc: 0.9715 – val_loss: 0.2776 – val_acc: 0.9012
Epoch 39/50
157/157 [==============================] – 2s 15ms/step – loss: 0.1007 – acc: 0.9731 – val_loss: 0.2792 – val_acc: 0.9004
Epoch 40/50
157/157 [==============================] – 2s 16ms/step – loss: 0.0943 – acc: 0.9766 – val_loss: 0.2818 – val_acc: 0.8992
Epoch 41/50
157/157 [==============================] – 2s 16ms/step – loss: 0.0947 – acc: 0.9757 – val_loss: 0.2832 – val_acc: 0.9008
Epoch 42/50
157/157 [==============================] – 2s 15ms/step – loss: 0.0890 – acc: 0.9765 – val_loss: 0.2858 – val_acc: 0.8992
Epoch 43/50
157/157 [==============================] – 3s 17ms/step – loss: 0.0853 – acc: 0.9768 – val_loss: 0.2875 – val_acc: 0.8994
Epoch 44/50
157/157 [==============================] – 2s 15ms/step – loss: 0.0816 – acc: 0.9794 – val_loss: 0.2908 – val_acc: 0.8996
Epoch 45/50
157/157 [==============================] – 3s 16ms/step – loss: 0.0825 – acc: 0.9785 – val_loss: 0.2923 – val_acc: 0.8988
Epoch 46/50
157/157 [==============================] – 2s 15ms/step – loss: 0.0764 – acc: 0.9819 – val_loss: 0.2946 – val_acc: 0.8992
Epoch 47/50
157/157 [==============================] – 3s 17ms/step – loss: 0.0741 – acc: 0.9834 – val_loss: 0.2973 – val_acc: 0.8990
Epoch 48/50
157/157 [==============================] – 3s 16ms/step – loss: 0.0745 – acc: 0.9829 – val_loss: 0.3010 – val_acc: 0.8990
Epoch 49/50
157/157 [==============================] – 2s 16ms/step – loss: 0.0705 – acc: 0.9829 – val_loss: 0.3054 – val_acc: 0.8966
Epoch 50/50
157/157 [==============================] – 2s 15ms/step – loss: 0.0658 – acc: 0.9850 – val_loss: 0.3071 – val_acc: 0.8984

In [30]:

%matplotlib inline
import matplotlib.pyplot as plt

acc = history.history[‘acc’]
val_acc = history.history[‘val_acc’]
loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]

epochs = range(len(acc))

plt.subplot(121)
plt.plot(epochs, acc, ‘bo’, label=’Training acc’)
plt.plot(epochs, val_acc, ‘b’, label=’Validation acc’)
plt.title(‘Training and validation accuracy’)
plt.legend()

plt.subplot(122)

plt.plot(epochs, loss, ‘bo’, label=’Training loss’)
plt.plot(epochs, val_loss, ‘b’, label=’Validation loss’)
plt.title(‘Training and validation loss’)
plt.legend()

plt.show()

The system overfits much more than with BERT, but accuracy of the validation data is higher (better) than with BERT. This shows that it makes sense always to try a simple architecture before trying more complex and advanced architectures.

Results with LSTM baseline¶
An another baseline, this time using an LSTM layer.

In [31]:

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Embedding, LSTM

In [32]:

max_features = tokenizer.vocab_size
embeddings_size = 32
inputs = Input(shape=(500,), dtype=tf.int32)
embedding = Embedding(max_features, embeddings_size)(inputs)
lstm = LSTM(embeddings_size, dropout=0.2)(embedding)
outputs = Dense(1, activation=tf.nn.sigmoid)(lstm)
lstm_model = Model(inputs, outputs)
lstm_model.summary()

Model: “model_2”
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_3 (InputLayer) [(None, 500)] 0
_________________________________________________________________
embedding_1 (Embedding) (None, 500, 32) 976704
_________________________________________________________________
lstm (LSTM) (None, 32) 8320
_________________________________________________________________
dense_2 (Dense) (None, 1) 33
=================================================================
Total params: 985,057
Trainable params: 985,057
Non-trainable params: 0
_________________________________________________________________

In [33]:

lstm_model.compile(optimizer=’adam’,
loss=’binary_crossentropy’,
metrics=[‘acc’])

In [34]:

history = lstm_model.fit(np.array(train_indices), np.array(train_label),
epochs=30,
batch_size=128,
validation_split=0.2)

Epoch 1/30
157/157 [==============================] – 35s 36ms/step – loss: 0.6933 – acc: 0.5042 – val_loss: 0.6929 – val_acc: 0.5112
Epoch 2/30
157/157 [==============================] – 5s 32ms/step – loss: 0.6925 – acc: 0.5138 – val_loss: 0.6926 – val_acc: 0.4970
Epoch 3/30
157/157 [==============================] – 5s 32ms/step – loss: 0.6800 – acc: 0.5296 – val_loss: 0.6883 – val_acc: 0.5340
Epoch 4/30
157/157 [==============================] – 5s 32ms/step – loss: 0.6288 – acc: 0.5640 – val_loss: 0.7061 – val_acc: 0.5202
Epoch 5/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5972 – acc: 0.5841 – val_loss: 0.7361 – val_acc: 0.5348
Epoch 6/30
157/157 [==============================] – 5s 33ms/step – loss: 0.6266 – acc: 0.5842 – val_loss: 0.7155 – val_acc: 0.5438
Epoch 7/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5868 – acc: 0.5903 – val_loss: 0.7122 – val_acc: 0.5674
Epoch 8/30
157/157 [==============================] – 5s 33ms/step – loss: 0.6008 – acc: 0.5913 – val_loss: 0.7363 – val_acc: 0.5364
Epoch 9/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5901 – acc: 0.5850 – val_loss: 0.7483 – val_acc: 0.5184
Epoch 10/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5940 – acc: 0.5823 – val_loss: 0.7699 – val_acc: 0.5046
Epoch 11/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5852 – acc: 0.5811 – val_loss: 0.7942 – val_acc: 0.5206
Epoch 12/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5820 – acc: 0.5902 – val_loss: 0.8184 – val_acc: 0.5184
Epoch 13/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5813 – acc: 0.5840 – val_loss: 0.8320 – val_acc: 0.5184
Epoch 14/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5795 – acc: 0.5922 – val_loss: 0.8496 – val_acc: 0.5164
Epoch 15/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5814 – acc: 0.5855 – val_loss: 0.8713 – val_acc: 0.5168
Epoch 16/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5789 – acc: 0.5885 – val_loss: 0.8970 – val_acc: 0.5006
Epoch 17/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5745 – acc: 0.5928 – val_loss: 0.9119 – val_acc: 0.5012
Epoch 18/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5698 – acc: 0.5985 – val_loss: 0.9133 – val_acc: 0.5096
Epoch 19/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5774 – acc: 0.5934 – val_loss: 0.9430 – val_acc: 0.5126
Epoch 20/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5658 – acc: 0.6041 – val_loss: 0.9351 – val_acc: 0.5008
Epoch 21/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5569 – acc: 0.6104 – val_loss: 0.8964 – val_acc: 0.5136
Epoch 22/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5614 – acc: 0.6152 – val_loss: 1.0206 – val_acc: 0.5080
Epoch 23/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5589 – acc: 0.6199 – val_loss: 1.0098 – val_acc: 0.5102
Epoch 24/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5470 – acc: 0.6284 – val_loss: 0.9741 – val_acc: 0.4988
Epoch 25/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5425 – acc: 0.6372 – val_loss: 0.9974 – val_acc: 0.5036
Epoch 26/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5288 – acc: 0.6591 – val_loss: 1.0346 – val_acc: 0.5078
Epoch 27/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5314 – acc: 0.6573 – val_loss: 1.0573 – val_acc: 0.5066
Epoch 28/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5312 – acc: 0.6633 – val_loss: 0.8332 – val_acc: 0.5208
Epoch 29/30
157/157 [==============================] – 5s 32ms/step – loss: 0.5807 – acc: 0.5944 – val_loss: 0.8904 – val_acc: 0.5226
Epoch 30/30
157/157 [==============================] – 5s 33ms/step – loss: 0.5758 – acc: 0.5971 – val_loss: 0.9658 – val_acc: 0.5028

In [35]:

%matplotlib inline
import matplotlib.pyplot as plt

acc = history.history[‘acc’]
val_acc = history.history[‘val_acc’]
loss = history.history[‘loss’]
val_loss = history.history[‘val_loss’]

epochs = range(len(acc))

plt.subplot(121)
plt.plot(epochs, acc, ‘bo’, label=’Training acc’)
plt.plot(epochs, val_acc, ‘b’, label=’Validation acc’)
plt.title(‘Training and validation accuracy’)
plt.legend()

plt.subplot(122)

plt.plot(epochs, loss, ‘bo’, label=’Training loss’)
plt.plot(epochs, val_loss, ‘b’, label=’Validation loss’)
plt.title(‘Training and validation loss’)
plt.legend()

plt.show()

In [35]: