PowerPoint Presentation
Comp90042
Workshop
Week 7
4 May
1
1
Contextual Representation
Discourse
2
Table of Content
Motivation?
Language is complex. Context can completely change the meaning of the individual words in a sentence. For example:
He kicked the bucket.
I have yet to cross-off all the items on my bucket list.
The bucket was filled with water.
3
Contextual representation
What is contextual representation?
Representations based on a particular usage. It captures the different senses or nuances of the word depending on the context.
Different to word embeddings (e.g.Word2Vec).
Contextual representations that are pre-trained on large data can be seen as a model that has obtained fairly comprehensive knowledge about the language.
4
Contextual representation
What is transformer?
Encoder-Decoder structure: The Encoder is on the left and the Decoder is on the right.
Both Encoder and Decoder are composed of modules that can be stacked on top of each other multiple times.
Positional Encoding: The model need to somehow give every word/part in the sequence a relative position since a sequence depends on the order of its elements.
5
Transformer
Figure from ‘Attention is all your need’ by Vaswani et al.
Transformer uses attention to capture dependencies between words.
For each target word in a sentence, transformer attends to every other words in the sentence to create its contextual embeddings.
6
Transformer
How does a transformer captures dependencies between words?
7
Attention
Figure from ‘Attention is all your need’ by Vaswani et al.
7
What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.
8
Transformer v.s. RNN
8
What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.
9
Transformer v.s. RNN
Figure from ‘A Comparative Study on Transformer vs RNN in Speech Applications’
9
What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.
10
Transformer v.s. RNN
Figure from ‘A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation’
10
What is discourse segmentation? What do the segments consist of, and what are some methods we can use to find them?
In Discourse Segmentation, we try to divide up a text into discrete, cohesive units based on sentences.
By interpreting the task as a boundary–finding problem, we can use rule-based or unsupervised methods to find sentences with little lexical overlap (suggesting a discourse boundary). We can also use supervised methods, by training a classifier around paragraph boundaries.
11
Discourse
11
What is an anaphor?
From the lectures: an anaphor is a linguistic expression that refers back to one or more elements in the text (generally preceding the anaphor)
These tend to be pronouns (he, she) but can also be determiners (which, the, etc.).
12
Anaphor
12
What is anaphora resolution and why is it difficult?
This is the problem of working out which element (generally a noun or noun phrase, but sometimes a whole clause) a given anaphor is actually referring to.
For example:
Mary gave John a cat for his birthday. (i) She is generous. (ii) He was surprised. (iii) He is fluffy.
his [birthday] obviously refers to John; (i) (presumably) refers to Mary; (ii) (presumably) refers to John; and (iii) (presumably) refers to [the] cat.
13
Anaphor
13
What are some useful heuristics (or features) to help resolve anaphora?
Recency heuristic: given multiple possible referents the mostly intended one is the one most recently used in the text.
Most likely referent (consistent in meaning with the anaphor) is the focus of the discourse (the “center”).
We can also build a supervised machine learning model, usually based around the semantic properties of the anaphor/nearby words and the sentence/discourse structure.
14
Anaphor
14
Steps:
go to: https://colab.research.google.com/
Sign up or login to a Google account.
File > Upload Notebook
15
Programming — BERT
15
/docProps/thumbnail.jpeg