Neural Machine Translation¶
In this workshop, we are going to build a seq2seq machine translation model and train it on a parallel corpus of English and French. We will frame the translation problem in a slightly different way. Instead of translating the sentence word by word, we are going to work on character-level. This means, tokens in the source and target sentences are characters instead of words.
We’ll be using the Keras framework. This notebook adopted code in this blog post.
It might take hours to train this model on CPUs, hence we encorage you to run this experiment with GPU support on Colab. After you have uploaded this notebook to Colab, don’t forget to enable GPU acceleration by going to “Runtime > Change runtime type” and selecting “GPU” as the hardware accelerator. Click save.
In [0]:
!pip install keras
Requirement already satisfied: keras in /usr/local/lib/python3.6/dist-packages (2.3.1)
Requirement already satisfied: keras-preprocessing>=1.0.5 in /usr/local/lib/python3.6/dist-packages (from keras) (1.1.0)
Requirement already satisfied: h5py in /usr/local/lib/python3.6/dist-packages (from keras) (2.10.0)
Requirement already satisfied: numpy>=1.9.1 in /usr/local/lib/python3.6/dist-packages (from keras) (1.18.4)
Requirement already satisfied: keras-applications>=1.0.6 in /usr/local/lib/python3.6/dist-packages (from keras) (1.0.8)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python3.6/dist-packages (from keras) (1.12.0)
Requirement already satisfied: scipy>=0.14 in /usr/local/lib/python3.6/dist-packages (from keras) (1.4.1)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.6/dist-packages (from keras) (3.13)
Before we start, let’s download the data set and unzip it.
In [0]:
import os
import urllib.request
from zipfile import ZipFile
if not os.path.exists(‘fra-eng.zip’):
url = ‘http://www.manythings.org/anki/fra-eng.zip’
opener = urllib.request.URLopener()
opener.addheader(‘User-Agent’, ‘ ‘)
filename, headers = opener.retrieve(url, ‘fra-eng.zip’)
compressed = ZipFile(filename, “r”)
compressed.extractall()
compressed.close()
Each row in the file is a pair of sentences in English and French, repectively. The two sentences are separated by \t. The first step if our pre-processing is to read pairs of sentences in the data file. Meanwhile, we will also build the character vocabulary of the input and output languages.
Note that the goal of machine translation is to generate sequences in the target language. Therefore, for target sequences, we will need start of sequence and end of sequence symbol to denote the start and end of generation, respectively. You will see why this is essential in the inference process.
In [0]:
data_path = ‘fra.txt’
num_samples = 10000
input_texts = []
target_texts = []
input_characters = set()
target_characters = set()
with open(data_path, ‘r’, encoding=’utf-8′) as f:
lines = f.read().split(‘\n’)
for line in lines[: min(num_samples, len(lines) – 1)]:
input_text, target_text, _ = line.split(‘\t’)
# We use “tab” as the “start sequence” character
# for the targets, and “\n” as “end sequence” character.
target_text = ‘\t’ + target_text + ‘\n’
input_texts.append(input_text)
target_texts.append(target_text)
# build input character vocab
for char in input_text:
if char not in input_characters:
input_characters.add(char)
# build target character vocab
for char in target_text:
if char not in target_characters:
target_characters.add(char)
input_characters = sorted(list(input_characters))
target_characters = sorted(list(target_characters))
num_encoder_tokens = len(input_characters)
num_decoder_tokens = len(target_characters)
max_encoder_seq_length = max([len(txt) for txt in input_texts])
max_decoder_seq_length = max([len(txt) for txt in target_texts])
print(‘Number of samples:’, len(input_texts))
print(‘Number of unique input tokens:’, num_encoder_tokens)
print(‘Number of unique output tokens:’, num_decoder_tokens)
print(‘Max sequence length for inputs:’, max_encoder_seq_length)
print(‘Max sequence length for outputs:’, max_decoder_seq_length)
Number of samples: 10000
Number of unique input tokens: 71
Number of unique output tokens: 93
Max sequence length for inputs: 16
Max sequence length for outputs: 59
In [0]:
input_texts[1], target_texts[1]
Out[0]:
(‘Hi.’, ‘\tSalut !\n’)
After we have the data in textual format, we need to convert them into vectors that can be fed into our model. Turn the sentences into 3 Numpy arrays, encoder_input_data, decoder_input_data, decoder_target_data:
1. encoder_input_data is a 3D array of shape (num_pairs, max_english_sentence_length, num_english_characters) containing a one-hot vectorization of the English sentences.
2. decoder_input_data is a 3D array of shape (num_pairs, max_french_sentence_length, num_french_characters) containg a one-hot vectorization of the French sentences.
3. decoder_target_data is the same as decoder_input_data but offset by one timestep. decoder_target_data[:, t, :] will be the same as decoder_input_data[:, t + 1, :].
In [0]:
import numpy as np
# build character to index lookup for source language
input_token_index = dict(
[(char, i) for i, char in enumerate(input_characters)])
# build character to index lookup for target language
target_token_index = dict(
[(char, i) for i, char in enumerate(target_characters)])
# initilize the 3D arrays with zeros
encoder_input_data = np.zeros(
(len(input_texts), max_encoder_seq_length, num_encoder_tokens),
dtype=’float32′)
decoder_input_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype=’float32′)
decoder_target_data = np.zeros(
(len(input_texts), max_decoder_seq_length, num_decoder_tokens),
dtype=’float32′)
for i, (input_text, target_text) in enumerate(zip(input_texts, target_texts)):
for t, char in enumerate(input_text):
encoder_input_data[i, t, input_token_index[char]] = 1.
encoder_input_data[i, t + 1:, input_token_index[‘ ‘]] = 1. # add paddings from t+1 to max_encoder_seq_length
for t, char in enumerate(target_text):
# decoder_target_data is ahead of decoder_input_data by one timestep
decoder_input_data[i, t, target_token_index[char]] = 1.
if t > 0:
# decoder_target_data will be ahead by one timestep
# and will not include the start character because we are not interested in generating it.
decoder_target_data[i, t – 1, target_token_index[char]] = 1.
decoder_input_data[i, t + 1:, target_token_index[‘ ‘]] = 1. # add paddings from t+1 to max_decoder_seq_length
decoder_target_data[i, t:, target_token_index[‘ ‘]] = 1. # add paddings from t to max_encoder_seq_length
What is sequence-to-sequence learning?¶
Sequence-to-sequence learning (Seq2Seq) is about training models to convert sequences from one domain (e.g. sentences in English) to sequences in another domain (e.g. the same sentences translated to French).
“the cat sat on the mat” -> [Seq2Seq model] -> “le chat etait assis sur le tapis”
This can be used for machine translation or for free-from question answering (generating a natural language answer given a natural language question) — in general, it is applicable any time you need to generate text.
There are multiple ways to handle this task, either using RNNs or using 1D convnets. Here we will focus on RNNs.
The trivial case: when input and output sequences have the same length¶
When both input sequences and output sequences have the same length, you can implement such models simply with a Keras LSTM or GRU layer (or stack thereof). This is the case in this example script that shows how to teach a RNN to learn to add numbers, encoded as character strings:

One caveat of this approach is that it assumes that it is possible to generate target[…t] given input[…t]. That works in some cases (e.g. adding strings of digits) but does not work for most use cases. In the general case, information about the entire input sequence is necessary in order to start generating the target sequence.
The general case: canonical sequence-to-sequence¶
In the general case, input sequences and output sequences have different lengths (e.g. machine translation) and the entire input sequence is required in order to start predicting the target. This requires a more advanced setup, which is what people commonly refer to when mentioning “sequence to sequence models” with no further context. Here’s how it works:
• A RNN layer (or stack thereof) acts as “encoder”: it processes the input sequence and returns its own internal state. Note that we discard the outputs of the encoder RNN, only recovering the state. This state will serve as the “context”, or “conditioning”, of the decoder in the next step.
• Another RNN layer (or stack thereof) acts as “decoder”: it is trained to predict the next characters of the target sequence, given previous characters of the target sequence. Specifically, it is trained to turn the target sequences into the same sequences but offset by one timestep in the future, a training process called “teacher forcing” in this context. Importantly, the encoder uses as initial state the state vectors from the encoder, which is how the decoder obtains information about what it is supposed to generate. Effectively, the decoder learns to generate targets[t+1…] given targets[…t], conditioned on the input sequence.

This is our training model. It leverages three key features of Keras RNNs:
• The return_state contructor argument, configuring a RNN layer to return a list where the first entry is the outputs and the next entries are the internal RNN states. This is used to recover the states of the encoder.
• The inital_state call argument, specifying the initial state(s) of a RNN. This is used to pass the encoder states to the decoder as initial states. The return_sequences constructor argument, configuring a RNN to return its full sequence of outputs (instead of just the last output, which the defaults behavior). This is used in the decoder.
In [0]:
from keras.models import Model
from keras.layers import Input, LSTM, Dense
latent_dim = 256 # size of the states
# Define an input sequence and process it.
encoder_inputs = Input(shape=(None, num_encoder_tokens))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
# We discard `encoder_outputs` and only keep the states.
encoder_states = [state_h, state_c]
# Set up the decoder, using `encoder_states` as initial state.
decoder_inputs = Input(shape=(None, num_decoder_tokens))
# We set up our decoder to return full output sequences,
# and to return internal states as well. We don’t use the
# return states in the training model, but we will use them in inference.
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation=’softmax’)
decoder_outputs = decoder_dense(decoder_outputs)
# Define the model that will turn
# `encoder_input_data` & `decoder_input_data` into `decoder_target_data`
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
print(model.summary())
Using TensorFlow backend.
Model: “model_1″
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, None, 71) 0
__________________________________________________________________________________________________
input_2 (InputLayer) (None, None, 93) 0
__________________________________________________________________________________________________
lstm_1 (LSTM) [(None, 256), (None, 335872 input_1[0][0]
__________________________________________________________________________________________________
lstm_2 (LSTM) [(None, None, 256), 358400 input_2[0][0]
lstm_1[0][1]
lstm_1[0][2]
__________________________________________________________________________________________________
dense_1 (Dense) (None, None, 93) 23901 lstm_2[0][0]
==================================================================================================
Total params: 718,173
Trainable params: 718,173
Non-trainable params: 0
__________________________________________________________________________________________________
None
You might have noticed that the decoder LSTM has slightly more number of trainable parameters than the encoder. This is because the size of target vocab is slightly bigger than the size of input vocab.
We train our model in two lines, while monitoring the loss on a held-out set of 20% of the samples.
In [0]:
batch_size = 256 # Batch size for training.
epochs = 30 # Number of epochs to train for
# Run training
model.compile(optimizer=’rmsprop’, loss=’categorical_crossentropy’,
metrics=[‘accuracy’])
model.fit([encoder_input_data, decoder_input_data], decoder_target_data,
batch_size=batch_size,
epochs=epochs,
validation_split=0.2)
# Save model
model.save(‘s2s.h5’)
Train on 8000 samples, validate on 2000 samples
Epoch 1/100
8000/8000 [==============================] – 18s 2ms/step – loss: 1.1716 – accuracy: 0.7257 – val_loss: 1.0481 – val_accuracy: 0.7091
Epoch 2/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.8537 – accuracy: 0.7698 – val_loss: 0.8523 – val_accuracy: 0.7635
Epoch 3/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.6759 – accuracy: 0.8087 – val_loss: 0.7188 – val_accuracy: 0.7911
Epoch 4/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.5909 – accuracy: 0.8286 – val_loss: 0.6473 – val_accuracy: 0.8109
Epoch 5/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.5428 – accuracy: 0.8415 – val_loss: 0.6010 – val_accuracy: 0.8239
Epoch 6/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.5064 – accuracy: 0.8512 – val_loss: 0.5741 – val_accuracy: 0.8307
Epoch 7/100
8000/8000 [==============================] – 16s 2ms/step – loss: 0.4777 – accuracy: 0.8589 – val_loss: 0.5497 – val_accuracy: 0.8380
Epoch 8/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.4542 – accuracy: 0.8653 – val_loss: 0.5281 – val_accuracy: 0.8430
Epoch 9/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.4325 – accuracy: 0.8715 – val_loss: 0.5126 – val_accuracy: 0.8486
Epoch 10/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.4139 – accuracy: 0.8764 – val_loss: 0.4977 – val_accuracy: 0.8521
Epoch 11/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3970 – accuracy: 0.8813 – val_loss: 0.4878 – val_accuracy: 0.8550
Epoch 12/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3817 – accuracy: 0.8855 – val_loss: 0.4777 – val_accuracy: 0.8583
Epoch 13/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3668 – accuracy: 0.8901 – val_loss: 0.4754 – val_accuracy: 0.8587
Epoch 14/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3530 – accuracy: 0.8942 – val_loss: 0.4658 – val_accuracy: 0.8627
Epoch 15/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3403 – accuracy: 0.8980 – val_loss: 0.4549 – val_accuracy: 0.8652
Epoch 16/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3281 – accuracy: 0.9014 – val_loss: 0.4507 – val_accuracy: 0.8677
Epoch 17/100
8000/8000 [==============================] – 16s 2ms/step – loss: 0.3166 – accuracy: 0.9047 – val_loss: 0.4481 – val_accuracy: 0.8675
Epoch 18/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.3052 – accuracy: 0.9083 – val_loss: 0.4460 – val_accuracy: 0.8692
Epoch 19/100
8000/8000 [==============================] – 16s 2ms/step – loss: 0.2951 – accuracy: 0.9111 – val_loss: 0.4413 – val_accuracy: 0.8706
Epoch 20/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2850 – accuracy: 0.9141 – val_loss: 0.4395 – val_accuracy: 0.8712
Epoch 21/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2756 – accuracy: 0.9166 – val_loss: 0.4356 – val_accuracy: 0.8734
Epoch 22/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2661 – accuracy: 0.9194 – val_loss: 0.4386 – val_accuracy: 0.8727
Epoch 23/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2577 – accuracy: 0.9220 – val_loss: 0.4336 – val_accuracy: 0.8751
Epoch 24/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2491 – accuracy: 0.9243 – val_loss: 0.4387 – val_accuracy: 0.8745
Epoch 25/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2412 – accuracy: 0.9267 – val_loss: 0.4393 – val_accuracy: 0.8752
Epoch 26/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2337 – accuracy: 0.9289 – val_loss: 0.4384 – val_accuracy: 0.8750
Epoch 27/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2261 – accuracy: 0.9311 – val_loss: 0.4419 – val_accuracy: 0.8757
Epoch 28/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2192 – accuracy: 0.9330 – val_loss: 0.4452 – val_accuracy: 0.8756
Epoch 29/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2124 – accuracy: 0.9351 – val_loss: 0.4529 – val_accuracy: 0.8741
Epoch 30/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.2057 – accuracy: 0.9372 – val_loss: 0.4496 – val_accuracy: 0.8763
Epoch 31/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1998 – accuracy: 0.9388 – val_loss: 0.4499 – val_accuracy: 0.8762
Epoch 32/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1938 – accuracy: 0.9408 – val_loss: 0.4566 – val_accuracy: 0.8752
Epoch 33/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1880 – accuracy: 0.9426 – val_loss: 0.4640 – val_accuracy: 0.8752
Epoch 34/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1830 – accuracy: 0.9439 – val_loss: 0.4600 – val_accuracy: 0.8766
Epoch 35/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1775 – accuracy: 0.9452 – val_loss: 0.4666 – val_accuracy: 0.8758
Epoch 36/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1726 – accuracy: 0.9470 – val_loss: 0.4727 – val_accuracy: 0.8754
Epoch 37/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1677 – accuracy: 0.9485 – val_loss: 0.4753 – val_accuracy: 0.8750
Epoch 38/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1632 – accuracy: 0.9499 – val_loss: 0.4811 – val_accuracy: 0.8745
Epoch 39/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1583 – accuracy: 0.9513 – val_loss: 0.4845 – val_accuracy: 0.8747
Epoch 40/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1545 – accuracy: 0.9525 – val_loss: 0.4852 – val_accuracy: 0.8762
Epoch 41/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1501 – accuracy: 0.9538 – val_loss: 0.5007 – val_accuracy: 0.8742
Epoch 42/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1465 – accuracy: 0.9547 – val_loss: 0.4926 – val_accuracy: 0.8757
Epoch 43/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1423 – accuracy: 0.9562 – val_loss: 0.5048 – val_accuracy: 0.8741
Epoch 44/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1388 – accuracy: 0.9572 – val_loss: 0.5061 – val_accuracy: 0.8746
Epoch 45/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1357 – accuracy: 0.9579 – val_loss: 0.5090 – val_accuracy: 0.8744
Epoch 46/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1320 – accuracy: 0.9592 – val_loss: 0.5167 – val_accuracy: 0.8748
Epoch 47/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1290 – accuracy: 0.9602 – val_loss: 0.5238 – val_accuracy: 0.8742
Epoch 48/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1261 – accuracy: 0.9607 – val_loss: 0.5181 – val_accuracy: 0.8754
Epoch 49/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1227 – accuracy: 0.9621 – val_loss: 0.5287 – val_accuracy: 0.8743
Epoch 50/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1200 – accuracy: 0.9627 – val_loss: 0.5276 – val_accuracy: 0.8750
Epoch 51/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1172 – accuracy: 0.9637 – val_loss: 0.5398 – val_accuracy: 0.8745
Epoch 52/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1148 – accuracy: 0.9642 – val_loss: 0.5392 – val_accuracy: 0.8743
Epoch 53/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1123 – accuracy: 0.9652 – val_loss: 0.5452 – val_accuracy: 0.8744
Epoch 54/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1094 – accuracy: 0.9656 – val_loss: 0.5500 – val_accuracy: 0.8743
Epoch 55/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1074 – accuracy: 0.9663 – val_loss: 0.5506 – val_accuracy: 0.8749
Epoch 56/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1049 – accuracy: 0.9672 – val_loss: 0.5584 – val_accuracy: 0.8747
Epoch 57/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1024 – accuracy: 0.9678 – val_loss: 0.5594 – val_accuracy: 0.8758
Epoch 58/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.1009 – accuracy: 0.9683 – val_loss: 0.5653 – val_accuracy: 0.8744
Epoch 59/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0985 – accuracy: 0.9688 – val_loss: 0.5698 – val_accuracy: 0.8746
Epoch 60/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0965 – accuracy: 0.9692 – val_loss: 0.5738 – val_accuracy: 0.8738
Epoch 61/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0947 – accuracy: 0.9697 – val_loss: 0.5764 – val_accuracy: 0.8741
Epoch 62/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0924 – accuracy: 0.9706 – val_loss: 0.5836 – val_accuracy: 0.8737
Epoch 63/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0906 – accuracy: 0.9710 – val_loss: 0.5902 – val_accuracy: 0.8737
Epoch 64/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0886 – accuracy: 0.9719 – val_loss: 0.5980 – val_accuracy: 0.8737
Epoch 65/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0873 – accuracy: 0.9719 – val_loss: 0.5914 – val_accuracy: 0.8744
Epoch 66/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0854 – accuracy: 0.9727 – val_loss: 0.5971 – val_accuracy: 0.8741
Epoch 67/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0839 – accuracy: 0.9731 – val_loss: 0.6065 – val_accuracy: 0.8735
Epoch 68/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0819 – accuracy: 0.9736 – val_loss: 0.6055 – val_accuracy: 0.8745
Epoch 69/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0809 – accuracy: 0.9738 – val_loss: 0.6128 – val_accuracy: 0.8735
Epoch 70/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0791 – accuracy: 0.9744 – val_loss: 0.6207 – val_accuracy: 0.8731
Epoch 71/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0778 – accuracy: 0.9748 – val_loss: 0.6178 – val_accuracy: 0.8727
Epoch 72/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0764 – accuracy: 0.9751 – val_loss: 0.6239 – val_accuracy: 0.8739
Epoch 73/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0751 – accuracy: 0.9756 – val_loss: 0.6333 – val_accuracy: 0.8732
Epoch 74/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0738 – accuracy: 0.9758 – val_loss: 0.6299 – val_accuracy: 0.8735
Epoch 75/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0716 – accuracy: 0.9766 – val_loss: 0.6372 – val_accuracy: 0.8729
Epoch 76/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0713 – accuracy: 0.9767 – val_loss: 0.6352 – val_accuracy: 0.8735
Epoch 77/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0700 – accuracy: 0.9770 – val_loss: 0.6380 – val_accuracy: 0.8734
Epoch 78/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0685 – accuracy: 0.9774 – val_loss: 0.6423 – val_accuracy: 0.8742
Epoch 79/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0674 – accuracy: 0.9778 – val_loss: 0.6478 – val_accuracy: 0.8729
Epoch 80/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0664 – accuracy: 0.9782 – val_loss: 0.6558 – val_accuracy: 0.8732
Epoch 81/100
8000/8000 [==============================] – 16s 2ms/step – loss: 0.0648 – accuracy: 0.9783 – val_loss: 0.6589 – val_accuracy: 0.8725
Epoch 82/100
8000/8000 [==============================] – 16s 2ms/step – loss: 0.0640 – accuracy: 0.9788 – val_loss: 0.6578 – val_accuracy: 0.8738
Epoch 83/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0628 – accuracy: 0.9791 – val_loss: 0.6686 – val_accuracy: 0.8722
Epoch 84/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0625 – accuracy: 0.9792 – val_loss: 0.6667 – val_accuracy: 0.8728
Epoch 85/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0609 – accuracy: 0.9796 – val_loss: 0.6714 – val_accuracy: 0.8729
Epoch 86/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0601 – accuracy: 0.9797 – val_loss: 0.6713 – val_accuracy: 0.8733
Epoch 87/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0594 – accuracy: 0.9799 – val_loss: 0.6762 – val_accuracy: 0.8732
Epoch 88/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0584 – accuracy: 0.9803 – val_loss: 0.6857 – val_accuracy: 0.8714
Epoch 89/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0577 – accuracy: 0.9804 – val_loss: 0.6799 – val_accuracy: 0.8721
Epoch 90/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0566 – accuracy: 0.9807 – val_loss: 0.6841 – val_accuracy: 0.8722
Epoch 91/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0558 – accuracy: 0.9810 – val_loss: 0.6920 – val_accuracy: 0.8722
Epoch 92/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0548 – accuracy: 0.9812 – val_loss: 0.6924 – val_accuracy: 0.8725
Epoch 93/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0534 – accuracy: 0.9816 – val_loss: 0.6951 – val_accuracy: 0.8727
Epoch 94/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0533 – accuracy: 0.9817 – val_loss: 0.7003 – val_accuracy: 0.8717
Epoch 95/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0524 – accuracy: 0.9819 – val_loss: 0.7076 – val_accuracy: 0.8718
Epoch 96/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0521 – accuracy: 0.9820 – val_loss: 0.7021 – val_accuracy: 0.8716
Epoch 97/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0509 – accuracy: 0.9825 – val_loss: 0.7034 – val_accuracy: 0.8726
Epoch 98/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0504 – accuracy: 0.9824 – val_loss: 0.7126 – val_accuracy: 0.8720
Epoch 99/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0498 – accuracy: 0.9828 – val_loss: 0.7098 – val_accuracy: 0.8728
Epoch 100/100
8000/8000 [==============================] – 15s 2ms/step – loss: 0.0490 – accuracy: 0.9830 – val_loss: 0.7176 – val_accuracy: 0.8722
Inference in encoder-decoder architecture¶

In inference mode, i.e. when we want to decode unknown input sequences, we go through a slightly different process:
1. Encode the input sequence into state vectors.
2. Start with a target sequence of size 1 (just the start-of-sequence character).
3. Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character.
4. Sample the next character using these predictions (we simply use argmax).
5. Append the sampled character to the target sequence
6. Repeat until we generate the end-of-sequence character or we hit the character limit.
We define our inference model in this cell. Here’s the drill:
1. Encode input and retrieve initial decoder state
2. Run one step of decoder with this initial state and a start of sequence token as target. Output will be the next target token
3. Repeat with the current target token and current states
The decoder needs to run iteratively while the encoder only need to run once. This means, the encoder model can stay the same while we slightly tweak the decoder model. So we define encoder and decoder separately.
In [0]:
# Define sampling models
encoder_model = Model(encoder_inputs, encoder_states)
decoder_state_input_h = Input(shape=(latent_dim,)) # decoder hidden state
decoder_state_input_c = Input(shape=(latent_dim,)) # decoder cell state
decoder_states_inputs = [decoder_state_input_h, decoder_state_input_c]
# Decoder takes characters from ‘decoder_inputs’ and states in ‘decoder_state_inputs’ as inputs.
decoder_outputs, state_h, state_c = decoder_lstm(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_states = [state_h, state_c] # we wrap hidden and cell states by a list
decoder_outputs = decoder_dense(decoder_outputs) # project decoder output to logits and then apply softmax
# Define the model that will turn
# `decoder_inputs` & `decoder__states_inputs` into `decoder_outputs` & `decoder_states`
decoder_model = Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
# Reverse-lookup token index to decode sequences back to
# something readable.
reverse_input_char_index = dict(
(i, char) for char, i in input_token_index.items())
reverse_target_char_index = dict(
(i, char) for char, i in target_token_index.items())
To decode a test sentence, we will repeatedly:
1. Encode the input sentence and retrieve the initial decoder state
2. Run one step of the decoder with this initial state and a start of sequence token as target. The output will be the next target character.
3. Append the target character predicted and repeat.
In [0]:
def decode_sequence(input_seq):
# Encode the input as state vectors.
states_value = encoder_model.predict(input_seq)
# Generate empty target sequence of length 1.
target_seq = np.zeros((1, 1, num_decoder_tokens))
# Populate the first character of target sequence with the start character.
target_seq[0, 0, target_token_index[‘\t’]] = 1.
# Sampling loop for a batch of sequences
# (to simplify, here we assume a batch of size 1).
stop_condition = False
decoded_sentence = ”
while not stop_condition:
output_tokens, h, c = decoder_model.predict(
[target_seq] + states_value)
# Sample a token
sampled_token_index = np.argmax(output_tokens[0, -1, :])
sampled_char = reverse_target_char_index[sampled_token_index] # map token index back to character
decoded_sentence += sampled_char # concatenate the results with the newly generated character
# Exit condition: either hit max length
# or find stop character.
if (sampled_char == ‘\n’ or
len(decoded_sentence) > max_decoder_seq_length):
stop_condition = True
# Update the target sequence (of length 1).
target_seq = np.zeros((1, 1, num_decoder_tokens))
target_seq[0, 0, sampled_token_index] = 1.
# Update states
states_value = [h, c]
return decoded_sentence
Now we use some examples from the training set to test our inference model.
In [0]:
for seq_index in range(100):
# Take one sequence (part of the training set)
# for trying out decoding.
input_seq = encoder_input_data[seq_index: seq_index + 1]
decoded_sentence = decode_sequence(input_seq)
print(‘-‘)
print(‘Input sentence:’, input_texts[seq_index])
print(‘Decoded sentence:’, decoded_sentence)
–
Input sentence: Go.
Decoded sentence: Va !
–
Input sentence: Hi.
Decoded sentence: Salut.
–
Input sentence: Hi.
Decoded sentence: Salut.
–
Input sentence: Run!
Decoded sentence: Courez !
–
Input sentence: Run!
Decoded sentence: Courez !
–
Input sentence: Who?
Decoded sentence: Qui ?
–
Input sentence: Wow!
Decoded sentence: Ça alors !
–
Input sentence: Fire!
Decoded sentence: Au feu !
–
Input sentence: Help!
Decoded sentence: À l’aide !
–
Input sentence: Jump.
Decoded sentence: Saute.
–
Input sentence: Stop!
Decoded sentence: Stop !
–
Input sentence: Stop!
Decoded sentence: Stop !
–
Input sentence: Stop!
Decoded sentence: Stop !
–
Input sentence: Wait!
Decoded sentence: Attendez !
–
Input sentence: Wait!
Decoded sentence: Attendez !
–
Input sentence: Go on.
Decoded sentence: Poursuis.
–
Input sentence: Go on.
Decoded sentence: Poursuis.
–
Input sentence: Go on.
Decoded sentence: Poursuis.
–
Input sentence: Hello!
Decoded sentence: Salut !
–
Input sentence: Hello!
Decoded sentence: Salut !
–
Input sentence: I see.
Decoded sentence: Je vois une lispotie.
–
Input sentence: I try.
Decoded sentence: J’essaye.
–
Input sentence: I won!
Decoded sentence: Je l’ai emporté !
–
Input sentence: I won!
Decoded sentence: Je l’ai emporté !
–
Input sentence: I won.
Decoded sentence: J’ai gagné.
–
Input sentence: Oh no!
Decoded sentence: Oh non !
–
Input sentence: Attack!
Decoded sentence: Attaque !
–
Input sentence: Attack!
Decoded sentence: Attaque !
–
Input sentence: Cheers!
Decoded sentence: Santé !
–
Input sentence: Cheers!
Decoded sentence: Santé !
–
Input sentence: Cheers!
Decoded sentence: Santé !
–
Input sentence: Cheers!
Decoded sentence: Santé !
–
Input sentence: Get up.
Decoded sentence: Lève-toi.
–
Input sentence: Go now.
Decoded sentence: Va, maintenant.
–
Input sentence: Go now.
Decoded sentence: Va, maintenant.
–
Input sentence: Go now.
Decoded sentence: Va, maintenant.
–
Input sentence: Got it!
Decoded sentence: Compris !
–
Input sentence: Got it!
Decoded sentence: Compris !
–
Input sentence: Got it?
Decoded sentence: Compris ?
–
Input sentence: Got it?
Decoded sentence: Compris ?
–
Input sentence: Got it?
Decoded sentence: Compris ?
–
Input sentence: Hop in.
Decoded sentence: Montez.
–
Input sentence: Hop in.
Decoded sentence: Montez.
–
Input sentence: Hug me.
Decoded sentence: Serrez-moi dans vos bras !
–
Input sentence: Hug me.
Decoded sentence: Serrez-moi dans vos bras !
–
Input sentence: I fell.
Decoded sentence: Je suis tombé.
–
Input sentence: I fell.
Decoded sentence: Je suis tombé.
–
Input sentence: I know.
Decoded sentence: Je sais.
–
Input sentence: I left.
Decoded sentence: Je suis parti.
–
Input sentence: I left.
Decoded sentence: Je suis parti.
–
Input sentence: I lied.
Decoded sentence: J’ai menti.
–
Input sentence: I lost.
Decoded sentence: J’ai perdu.
–
Input sentence: I paid.
Decoded sentence: J’ai payé.
–
Input sentence: I’m 19.
Decoded sentence: J’ai trinté.
–
Input sentence: I’m OK.
Decoded sentence: Je suis en train de discuter.
–
Input sentence: I’m OK.
Decoded sentence: Je suis en train de discuter.
–
Input sentence: Listen.
Decoded sentence: Écoutez !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: No way!
Decoded sentence: C’est exclu !
–
Input sentence: Really?
Decoded sentence: Vraiment ?
–
Input sentence: Really?
Decoded sentence: Vraiment ?
–
Input sentence: Really?
Decoded sentence: Vraiment ?
–
Input sentence: Thanks.
Decoded sentence: Merci !
–
Input sentence: We try.
Decoded sentence: On essaye.
–
Input sentence: We won.
Decoded sentence: Nous avons gagné.
–
Input sentence: We won.
Decoded sentence: Nous avons gagné.
–
Input sentence: We won.
Decoded sentence: Nous avons gagné.
–
Input sentence: We won.
Decoded sentence: Nous avons gagné.
–
Input sentence: Ask Tom.
Decoded sentence: Demande à Tom.
–
Input sentence: Awesome!
Decoded sentence: Fantastique !
–
Input sentence: Be calm.
Decoded sentence: Soyez calme !
–
Input sentence: Be calm.
Decoded sentence: Soyez calme !
–
Input sentence: Be calm.
Decoded sentence: Soyez calme !
–
Input sentence: Be cool.
Decoded sentence: Sois détendu !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be fair.
Decoded sentence: Soyez équitable !
–
Input sentence: Be kind.
Decoded sentence: Sois gentil.
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Be nice.
Decoded sentence: Soyez gentille !
–
Input sentence: Beat it.
Decoded sentence: Dégage !
–
Input sentence: Call me.
Decoded sentence: Appelle-moi !
–
Input sentence: Call me.
Decoded sentence: Appelle-moi !
–
Input sentence: Call us.
Decoded sentence: Appelez-nous !
–
Input sentence: Call us.
Decoded sentence: Appelez-nous !
–
Input sentence: Come in.
Decoded sentence: Entrez !