程序代写代做代考 PowerPoint Presentation

PowerPoint Presentation

Comp90042
Workshop
Week 7

4 May

1

1

Contextual Representation

Discourse
2
Table of Content

Motivation?
Language is complex. Context can completely change the meaning of the individual words in a sentence. For example:
He kicked the bucket.
I have yet to cross-off all the items on my bucket list.
The bucket was filled with water.

3
Contextual representation

What is contextual representation?
Representations based on a particular usage. It captures the different senses or nuances of the word depending on the context.
Different to word embeddings (e.g.Word2Vec).
Contextual representations that are pre-trained on large data can be seen as a model that has obtained fairly comprehensive knowledge about the language.

4
Contextual representation

What is transformer?
Encoder-Decoder structure: The Encoder is on the left and the Decoder is on the right.
Both Encoder and Decoder are composed of modules that can be stacked on top of each other multiple times.
Positional Encoding: The model need to somehow give every word/part in the sequence a relative position since a sequence depends on the order of its elements.
5
Transformer
Figure from ‘Attention is all your need’ by Vaswani et al.

Transformer uses attention to capture dependencies between words.
For each target word in a sentence, transformer attends to every other words in the sentence to create its contextual embeddings.

6
Transformer

How does a transformer captures dependencies between words?

7
Attention

Figure from ‘Attention is all your need’ by Vaswani et al.

7

What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.
8
Transformer v.s. RNN

8

What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.

9
Transformer v.s. RNN

Figure from ‘A Comparative Study on Transformer vs RNN in Speech Applications’

9

What advantages does transformer have compared to RNN?
Transformer allows for significantly more parallelization. While RNN relies on sequential processing. This allows transformer-based models to scale to very large data that is difficult for RNN-based models.

10
Transformer v.s. RNN

Figure from ‘A Comparison of Transformer and Recurrent Neural Networks on Multilingual Neural Machine Translation’

10

What is discourse segmentation? What do the segments consist of, and what are some methods we can use to find them?
In Discourse Segmentation, we try to divide up a text into discrete, cohesive units based on sentences.
By interpreting the task as a boundary–finding problem, we can use rule-based or unsupervised methods to find sentences with little lexical overlap (suggesting a discourse boundary). We can also use supervised methods, by training a classifier around paragraph boundaries.

11
Discourse

11

What is an anaphor?
From the lectures: an anaphor is a linguistic expression that refers back to one or more elements in the text (generally preceding the anaphor)
These tend to be pronouns (he, she) but can also be determiners (which, the, etc.).

12
Anaphor

12

What is anaphora resolution and why is it difficult?
This is the problem of working out which element (generally a noun or noun phrase, but sometimes a whole clause) a given anaphor is actually referring to.
For example:
Mary gave John a cat for his birthday. (i) She is generous. (ii) He was surprised. (iii) He is fluffy.
his [birthday] obviously refers to John; (i) (presumably) refers to Mary; (ii) (presumably) refers to John; and (iii) (presumably) refers to [the] cat.

13
Anaphor

13

What are some useful heuristics (or features) to help resolve anaphora?
Recency heuristic: given multiple possible referents the mostly intended one is the one most recently used in the text.
Most likely referent (consistent in meaning with the anaphor) is the focus of the discourse (the “center”).
We can also build a supervised machine learning model, usually based around the semantic properties of the anaphor/nearby words and the sentence/discourse structure.

14
Anaphor

14

Steps:
go to: https://colab.research.google.com/

Sign up or login to a Google account.

File > Upload Notebook

15
Programming — BERT

15

/docProps/thumbnail.jpeg