CS计算机代考程序代写 chain python deep learning Keras Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
COMP3220 — Document Processing and the Semantic Web
Week 06 L1: Advanced Topics in Deep Learning
Diego Moll ́a
Department of Computer Science Macquarie University
COMP3220 2021H1
Diego Moll ́a
W06L1: Advanced Deep Learning 1/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Programme
1 Text Generation
2 Encoder-Decoder Architecture
3 Pre-training and Fine-tuning
Reading
Deep Learning book, section 8.1.
Additional Reading
Jurafsky & Martin, Chapter 9.
The Illustrated BERT, ELMo, and co.:
http://jalammar.github.io/illustrated-bert/
Diego Moll ́a
W06L1: Advanced Deep Learning 2/25

Programme
Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
1 Text Generation
2 Encoder-Decoder Architecture
3 Pre-training and Fine-tuning
W06L1: Advanced Deep Learning 3/25
Diego Moll ́a

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Generating Text Sequences
One of the advances of deep learning versus shallower approaches to machine learning is its ability to process complex contexts.
This has allowed significant advances in image and text processing.
We have seen how to process text sequences for text classification.
Text generation as a particular case of text classification
Given a piece of text … Predict the next character.
Diego Moll ́a
W06L1: Advanced Deep Learning 4/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Text Generation as Character Prediction
Our training data is a set of samples of the form: Input Text fragment.
Label Next character to predict.
We do not need to manually annotate the training data: the
data are self-annotated.
This means that we can easily gather training data for text generation.
This is the idea for training language models (next slide).
Diego Moll ́a
W06L1: Advanced Deep Learning 5/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Language Models
Given a collection of text, we can train a language model that can be used to generate text in the same style.
Figure 8.1 of Chollet (2018).
Diego Moll ́a
W06L1: Advanced Deep Learning 6/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Implementing Character-level LSTM Text Generation
The architecture of the model is of the kinds we have seen for text classification.
The input is a sequence of characters.
The “class” to predict is the next character to generate.
If we add an embedding layer after the input, This layer will learn character embeddings.
model = tf . keras . models . Sequential () model.add(layers.Embedding(len(chars), 20, input len=maxlen)) m o d e l . a d d ( l a y e r s . LSTM ( 1 2 8 ) ) model.add(layers.Dense(len(chars), activation=’softmax))
Diego Moll ́a
W06L1: Advanced Deep Learning 7/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Generating Text 􏰁
Remember that the output of a prediction is a probability distribution.
To generate the next character, we can sample from the probability distribution.
We can determine how deterministic the sampling is:
We can always return the character with highest probability . . . Or we can select a character randomly . . .
Or we can do something in between, according to a “temperature” parameter.
import numpy as np
def reweight distribution(original distribution , temperature=0.5):
distribution = np.log(original distribution) / temperature distribution = np.exp(distribution)
return distribution / np.sum(distribution)
Diego Moll ́a
W06L1: Advanced Deep Learning 8/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Figure: Different Reweightings 􏰁
Figure 8.2 of Chollet (2018)
Diego Moll ́a
W06L1: Advanced Deep Learning 9/25

Text Generation
Encoder-Decoder Architecture Pre-training and Fine-tuning
Example 􏰁
See notebook . . .
Diego Moll ́a
W06L1: Advanced Deep Learning 10/25

Programme
Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
1 Text Generation
2 Encoder-Decoder Architecture
3 Pre-training and Fine-tuning
W06L1: Advanced Deep Learning
11/25
Diego Moll ́a

Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
The Encoder-Decoder Architecture
Composed of an encoder and a decoder.
The encoder can be an RNN chain that takes the input.
The decoder can be an RNN that takes the output of the previous RNN as input.
Revolutionised machine translation and many other text processing applications.
The encoder stage can be something non-textual, e.g. images for caption generation.
Encoder Decoder
y0 y1 y2
RNN
RNN
RNN
RNN
RNN
RNN
x0 x1 x2
Diego Moll ́a
W06L1: Advanced Deep Learning 12/25

Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
Training the Encoder-Decoder Architecture
A common approach to train the encoder-decoder architecture is to apply teacher forcing:
Use the target sequence to guide the training of the decoder.
For example, in an English to French machine translation system, we feed the target French translation to the decoder.
“The weather is fine” −→ “Il fait bon”
Encoder Decoder
y0 y1 y2
… …
“t” “h” “i” “l”
RNN
RNN
RNN
RNN
RNN
RNN
Diego Moll ́a
W06L1: Advanced Deep Learning 13/25

Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
Attention: An Improvement to the Encoder-Decoder Architecture
Attention is an enhancement in the seq2seq architecture that allows to focus on parts of the input during the generation stage by the decoder.
https://github.com/tensorflow/tensorflow/blob/r1.13/tensorflow/contrib/eager/python/examples/ nmt_with_attention/nmt_with_attention.ipynb
Diego Moll ́a
W06L1: Advanced Deep Learning 14/25

Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
Attention for MT
Very useful to start understanding the decision processes of the model.
Diego Moll ́a
W06L1: Advanced Deep Learning 15/25

Text Generation
Encoder-Decoder Architecture
Pre-training and Fine-tuning
Attention in Caption Generation
Xu et al. (2015) arXiv:1502.03044
Diego Moll ́a
W06L1: Advanced Deep Learning 16/25

Programme
Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
1 Text Generation
2 Encoder-Decoder Architecture
3 Pre-training and Fine-tuning
W06L1: Advanced Deep Learning
17/25
Diego Moll ́a

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Problems with Supervised Learning
Annotated data
Supervised learning requires (a lot of) annotated data. Annotated data can be costly.
Human annotated data can contain annotation errors.
Training size
Supervised learning requires a lot of (annotated) data.
Large companies can afford the resources for processing large volumes of data, others can’t.
Some domains do not have much text anyway.
Diego Moll ́a
W06L1: Advanced Deep Learning 18/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Problems with Supervised Learning
Annotated data
Supervised learning requires (a lot of) annotated data. Annotated data can be costly.
Human annotated data can contain annotation errors.
Training size
Supervised learning requires a lot of (annotated) data.
Large companies can afford the resources for processing large volumes of data, others can’t.
Some domains do not have much text anyway.
Diego Moll ́a
W06L1: Advanced Deep Learning 18/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
https://www.technologyreview.com/2019/06/06/239031/
training- a- single- ai- model- can- emit- as- much- carbon- as- five- cars- in- their- lifetimes/
Diego Moll ́a
W06L1: Advanced Deep Learning 19/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Idea: Pre-Train and Fine-Tune
Pre-training
Develop a system that can be trained with large volumes of data.
Make the system as general as possible, so that it can be used for multiple tasks.
Fine-tuning
Design a Deep Learning model that contains: A layer pre-trained for a general task.
Additional layers that adapt the general task to our specific task.
Fine-tune the system using the (smaller) training data of our specific task.
Diego Moll ́a
W06L1: Advanced Deep Learning 20/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Example: Word Embeddings
As we have seen in a previous lecture, word embeddings can be learnt using large, unlabelled data.
These pre-trained word embeddings can be used to initialise an embeddings layer in our Deep Learning model.
We have the choice to update these word embeddings, or not.
Diego Moll ́a
W06L1: Advanced Deep Learning 21/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Huggingface’s transformers library
https://github.com/huggingface/transformers
Huggingface’s transformers library contains a large repository of pre-trained models.
These models are contributions from many researchers and developers.
These models are being used to obtain state-of-the-art results.
Diego Moll ́a
W06L1: Advanced Deep Learning 22/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Example: Using BERT in Keras
BERT is one of the most popular architectures for pre-training and fine-tuning.
Look at the lecture notebook for an example of use in keras. BERT is easy to use, but fine-tuning can take a long time.
http://jalammar.github.io/illustrated-bert/
Diego Moll ́a
W06L1: Advanced Deep Learning 23/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
Take-home Messages
1 Text generation as a task of character (or word) prediction.
2 Describe the encoder-decoder architecture. What is this
architecture good for?
3 What is teacher forced training and what is it good for?
4 Transfer learning and fine-tuning.
Diego Moll ́a
W06L1: Advanced Deep Learning 24/25

Text Generation Encoder-Decoder Architecture Pre-training and Fine-tuning
What’s Next
Weeks 7-12
Semantic Web (Rolf Schwitter).
Assignment 2 submission deadline on Friday 23 April 2021.
Diego Moll ́a
W06L1: Advanced Deep Learning 25/25