Discourse
COMP90042
Natural Language Processing Lecture 12
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L12
•
Most tasks/models we learned operate at word or sentence level:
‣ POS tagging
‣ Language models
‣ Lexical/distributional semantics
• •
But NLP often deals with documents
Discourse: understanding how sentences relate to each other in a document
Discourse
2
COMP90042
L12
Three Key Discourse Tasks
•
•
Discourse segmentation
•
Anaphora resolution
Discourse parsing
3
COMP90042 L12
Discourse Segmentation
4
COMP90042
L12
•
• •
–
–
Discourse Segmentation
A document can be viewed as a sequence of segments
A segment: a span of cohesive text Cohesion:
‣ organised around a particular topic or function
Wikipedia biographies: early years, major events, impact on others
Scientific articles: introduction, related work, experiments
5
COMP90042
L12
•
•
TextTiling algorithm: looking for points of low lexical cohesion between sentences
Unsupervised Approaches
For each sentence gap:
‣ Create two BOW vectors consisting of words from k sentences on either side of gap
‣ Use cosine to get a similarity score (sim) for two vectors
‣ For gap i, calculate a depth score, insert boundaries when depth is greater than some threshold t
𝑑𝑒𝑝𝑡h(gap𝑖)=(𝑠𝑖𝑚𝑖−1 −𝑠𝑖𝑚𝑖)+(𝑠𝑖𝑚𝑖+1 −𝑠𝑖𝑚𝑖)
6
COMP90042
L12
Text Tiling Example (k=1, t=0.9)
d=0.7-0.9=-0.2 d=(0.9-0.7)+(0.1-0.7)=-0.4
d=(0.7-0.1)+(0.5-0.1)=1.0
d=(0.1-0.5)+(0.8-0.5)=-0.1
d=(0.1-0.5)+(0.8-0.5)=-0.6 d=0.8-0.5=0.3
sim: 0.9 sim: 0.7
sim: 0.1 sim: 0.5
sim: 0.8 sim: 0.5
He walked 15 minutes to the tram stop. Then he waited for another 20 minutes, but
the tram didn’t come.
The tram drivers were on strike that
morning.
So he walked home and got his bike out of the garage.
He started riding but quickly discovered he had a flat tire
He walked his bike back home.
He looked around but his wife had cleaned the garage and he couldn’t find the bike pump.
𝑑𝑒𝑝𝑡h(gap𝑖) = (𝑠𝑖𝑚𝑖−1 − 𝑠𝑖𝑚𝑖) + (𝑠𝑖𝑚𝑖+1 − 𝑠𝑖𝑚𝑖) 7
COMP90042
L12
Supervised Approaches
•
Get labelled data from easy sources
‣ Scientific publications
‣ Wikipedia articles
8
COMP90042
L12
•
•
•
•
Apply a binary classifier to identify boundaries Or use sequential classifiers
Supervised Discourse Segmenter
Potentially include classification of section types (introduction, conclusion, etc.)
Integrate a wider range of features, including ‣ distributional semantics
‣ discourse markers (therefore, and, etc)
9
COMP90042
L12
Discourse Parsing
10
COMP90042
L12
•
•
Identify discourse units, and the relations that hold between them
Discourse Parsing
Rhetorical Structure Theory (RST), is a framework to do hierarchical analysis of discourse structure in documents
11
COMP90042
L12
RST
• Basicelement:elementarydiscourseunits(EDUs)
‣ Typically clauses of a sentence
‣ EDUs do not cross sentence boundary
‣ [It does have beautiful scenery,] [some of the best since Lord of the Rings.]
• RSTrelationsbetweendiscourseunits:
‣ conjuction, justify, concession, elaboration, etc
‣ [It does have beautiful scenery,]
↑(elaboration)
[some of the best since Lord of the Rings.]
12
COMP90042
L12
Nucleus vs. Satellite
• Withinadiscourserelation,oneargumentisthenucleus (the primary argument)
• Thesupportingargumentisthesatellite
‣ [It does have beautiful scenery,]nucleus
↑(elaboration)
[some of the best since Lord of the Rings.]satellite
• Somerelationsareequal(e.g.conjunction),andsoboth arguments are nuclei
‣ [He was a likable chap,]nucleus
↑(conjunction)
[and I hated to see him die.]nucleus
13
COMP90042
L12
RST Tree
•
An RST relation combines two or more DUs into composite DUs
•
Process of combining DUs is repeated to create an RST tree
14
COMP90042
L12
•
Some discourse markers (cue phrases) explicitly indicate relations
‣ Some examples: although, but, for example, in other words, so, because, in conclusion,…
• •
Can be used to build a simple rule-based parser
However
‣ Many relations are not marked by discourse marker at all
‣ Many important discourse markers (e.g. and) ambiguous – Sometimes not a discourse marker
– Can signal multiple relations
Parsing Using Discourse Markers
15
COMP90042
L12
•
•
RST Discourse Treebank
‣ 300+ documents annotated with RST trees
Parsing Using Machine Learning
Basic idea:
‣ Segment document
into EDUs
‣ Combine adjacent DUs
into composite DUs
iteratively to create
the full RST tree
16
COMP90042
L12
•
Transition-based parsing (lecture 16): ‣ Bottom-up
‣ Greedy, uses shift-reduce algorithm
•
CYK/chart parsing algorithm (lecture 14)
‣ Bottom-up
‣ Global, but some constraints prevent CYK from finding globally optimal tree for discourse parsing
Parsing Using Machine Learning
17
COMP90042
L12
•
Top-down parsing
‣ Sequence labelling problem ‣ BERT
Parsing Using Machine Learning
18
COMP90042
L12
• • • • • •
Bag of words
Discourse markers
Starting/ending n-grams
Location in the text
Syntax features
Lexical and distributional similarities
Discourse Parsing Features
19
COMP90042
L12
• • • • •
Summarisation Sentiment analysis Argumentation Authorship attribution Essay scoring
Why Discourse Parsing?
20
COMP90042 L12
Anaphora Resolution
21
COMP90042
L12
Anaphors
• Anaphor: linguistic expressions that refer back to earlier elements in the text
• Anaphors have a antecedent in the discourse, often but not always a noun phrase
‣ Yesterday, Ted was late for work. It all started when his car wouldn’t start.
• Pronouns are the most common anaphor
• But there are various others ‣ Demonstratives (that problem)
22
COMP90042
L12
Antecedent Restrictions
• Pronouns must agree in number with their antecedents
‣ His coworkers were leaving for lunch when Ted arrived. They invited him, but he said no.
• Pronouns must agree in gender with their antecedents ‣ Sue was leaving for lunch when Ted arrived. She
invited him, but he said no.
• Pronouns whose antecedents are the subject of the same syntactic clause must be reflexive (…self)
‣ Ted was angry at him. [him ≠ Ted]
‣ Ted was angry at himself. [himself = Ted]
23
COMP90042
L12
Antecedent Preferences
The antecedents of pronouns should be recent
‣ He waited for another 20 minutes, but the tram didn’t come. So he walked home and got his bike out of the garage. He started riding it to work.
•
•
The antecedent should be salient, as determined by grammatical position
‣ Subject > object > argument of preposition
‣ Ted usually rode to work with Bill. He was never late.
24
COMP90042
L12
• •
Discourse 16.1 (left) more coherent
Pronouns all refer to John consistently, the protagonist
Entities and Reference
25
COMP90042
L12
•
•
•
A unified account of relationship between discourse structure and entity reference
Centering Theory
Every utterance in the discourse is characterised by a set of entities, known as centers
Explains preference of certain entities for ambiguous pronouns
26
COMP90042
L12
For an Utterance Un • Forward-lookingcenters:
‣
‣
‣ Ordered by syntactic prominence: subjects > objects • Backward-lookingcenter:
‣ ‣
All entities in Un:
Cf(Un) = [e1, e2, …]
Cf(16.1a) = [John, music store, piano]
Highest ranked entity in previous utterance’s (Cf(Un-1)) forward-looking centers that is also in current utterance (Un)
Cb(16.b) = [John]
27
COMP90042
L12
•
When resolving entity for anaphora resolution, choose the entity such that the top foward- looking center matches with the backward- looking center
Centering Algorithm
28
COMP90042
L12
The Centering Algorithm 1. JohnsawaFordinthedealership
Cf(U1) = [John, Ford, dealership] Cb(U1) = None
Cf(U2) = [John, Ford, Bob] Cb(U2) = John
If he = Bob:
Cf(U3) = [Bob, Ford]
Cb(U3) = John
top forward-looking center = backward-looking center
2. HeshowedittoBob 3. Heboughtit
If he = John:
Cf(U3) = [John, Ford]
Cb(U3) = Ford
29
COMP90042
L12
• •
Build a binary classifier for anaphor/antecedent pairs
Convert restrictions and preferences into features ‣ Binary features for number/gender compatibility
‣ Position of antecedent in text
‣ Include features about type of antecedent
•
•
With enough data, can approximate the centering algorithm
Supervised Anaphor Resolution
But also easy to include features which indicate tendencies, rather than rules
‣ Like repetition, parallelism
30
COMP90042
L12
Anaphora Resolution Tools
•
Stanford CoreNLP includes
pronoun coreference models
‣ rule-based system does
very well
‣ considerably faster than learned models
SYSTEM
LANGUAGE
PREPROCES SING TIME
COREF TIME
TOTAL TIME
F1 SCORE
Deterministic
English
3.87s
0.11s
3.98s
49.5
Statistical
English
0.48s
1.23s
1.71s
56.2
Neural
English
3.22s
4.96s
8.18s
60.0
Deterministic
Chinese
0.39s
0.16s
0.55s
47.5
Neural
Chinese
0.42s
7.02s
7.44s
53.9
Source: https://stanfordnlp.github.io/CoreNLP/coref.html
Evaluated on CoNLL 2012 task.
31
COMP90042
L12
Motivation for Anaphor Resolution Essential for deep semantic analysis
‣ Very useful for QA, e.g., reading comprehension Ted’s car broke down. So he went over to Bill’s
house to borrow his car. Bill said that was fine. Whose car is borrowed?
•
32
COMP90042
L12
•
•
For many tasks, it is important to consider context larger than sentences
A Final Word
Traditionally many popular NLP applications has been sentence-focused (e.g. machine translation), but that is beginning to change…
33
COMP90042
L12
•
E18, Ch 16
Further Reading
34