Discourse
COMP90042
Natural Language Processing
Lecture 12
Semester 1 2021 Week 6 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L12
•
Most tasks/models we learned operate at word or sentence level:
‣ POS tagging
‣ Language models
‣ Lexical/distributional semantics
• •
But NLP often deals with documents
Discourse: understanding how sentences relate to each other in a document
Discourse
2
COMP90042
L12
Outline Discourse segmentation
•
•
•
Anaphora resolution
Discourse parsing
3
COMP90042
L12
Discourse Segmentation
4
COMP90042
L12
•
• •
A document can be viewed as a sequence of segments
Discourse Segmentation
A segment: a span of cohesive text
Cohesion: organised around a topic or function
5
COMP90042
L12
•
Wikipedia biographies: early years, major events, impact on others
6
COMP90042
L12
•
Scientific articles: introduction, related work, experiments
7
COMP90042
L12
•
•
TextTiling algorithm: looking for points of low lexical cohesion between sentences
Unsupervised Approaches
For each sentence gap:
‣ Create two BOW vectors consisting of words from k sentences on either side of gap
‣ Use cosine to get a similarity score (sim) for two vectors
‣ For gap i, calculate a depth score, insert boundaries when depth is greater than some threshold t
𝑑𝑒𝑝𝑡h(gap𝑖)=(𝑠𝑖𝑚𝑖−1 −𝑠𝑖𝑚𝑖)+(𝑠𝑖𝑚𝑖+1 −𝑠𝑖𝑚𝑖)
8
COMP90042
L12
Text Tiling Example (k=1, t=0.9)
d=0.7-0.9=-0.2 d=(0.9-0.7)+(0.1-0.7)=-0.4
d=(0.7-0.1)+(0.5-0.1)=1.0
d=(0.1-0.5)+(0.8-0.5)=-0.1
d=(0.1-0.5)+(0.8-0.5)=-0.6 d=0.8-0.5=0.3
sim: 0.9 sim: 0.7
sim: 0.1 sim: 0.5
sim: 0.8 sim: 0.5
He walked 15 minutes to the tram stop. Then he waited for another 20 minutes, but
the tram didn’t come.
The tram drivers were on strike that
morning.
So he walked home and got his bike out of the garage.
He started riding but quickly discovered he had a flat tire
He walked his bike back home.
He looked around but his wife had cleaned the garage and he couldn’t find the bike pump.
𝑑𝑒𝑝𝑡h(gap𝑖) = (𝑠𝑖𝑚𝑖−1 − 𝑠𝑖𝑚𝑖) + (𝑠𝑖𝑚𝑖+1 − 𝑠𝑖𝑚𝑖) 9
COMP90042
L12
Supervised Approaches
•
Get labelled data from easy sources
‣ Scientific publications
‣ Wikipedia articles
10
COMP90042
L12
•
•
•
•
Apply a binary classifier to identify boundaries Or use sequential classifiers
Supervised Discourse Segmenter
Potentially include classification of section types (introduction, conclusion, etc.)
Integrate a wider range of features, including ‣ distributional semantics
‣ discourse markers (therefore, and, etc)
11
COMP90042
L12
Discourse Parsing
12
COMP90042
L12
•
•
Identify discourse units, and the relations that hold between them
Discourse Analysis
Rhetorical Structure Theory (RST) is a framework to do hierarchical analysis of discourse structure in documents
13
COMP90042
L12
•
•
•
•
Typically clauses of a sentence
DUs do not cross sentence boundary
2 merged DUs = another
composite DU
Discourse Units
[It does have beautiful scenery,] [some of the best since Lord of the Rings.]
DU
14
COMP90042
L12
•
Relations between discourse units:
‣ conjuction, justify, concession, elaboration, etc
‣ [It does have beautiful scenery,]
↑(elaboration)
[some of the best since Lord of the Rings.]
Discouse Relations
15
COMP90042
L12
Nucleus vs. Satellite
• Withinadiscourserelation,oneargumentisthenucleus (the primary argument)
• Thesupportingargumentisthesatellite
‣ [It does have beautiful scenery,]nucleus
↑(elaboration)
[some of the best since Lord of the Rings.]satellite
• Somerelationsareequal(e.g.conjunction),andsoboth arguments are nuclei
‣ [He was a likable chap,]nucleus
↑(conjunction)
[and I hated to see him die.]nucleus
16
COMP90042
L12
RST Tree
•
An RST relation combines two or more DUs into composite DUs
•
Process of combining DUs is repeated, creating an RST tree
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.] 17
COMP90042
L12
•
Task: given a document, recover the RST tree ‣ Rule-basd parsing
‣ Bottom-up approach
‣ Top-down aproach
RST Parsing
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.]
18
COMP90042
L12
•
Some discourse markers (cue phrases) explicitly indicate relations
‣ although, but, for example, in other words, so, because, in conclusion,…
• •
Can be used to build a simple rule-based parser However
‣ Many relations are not marked by discourse marker
‣ Many discourse markers ambiguous (e.g. and)
19
Parsing Using Discourse Markers
COMP90042
L12
•
•
RST Discourse Treebank
‣ 300+ documents annotated with RST trees
Parsing Using Machine Learning
Basic idea:
‣ Segment document into DUs
‣ Combine adjacent DUs into composite DUs iteratively to create the full RST tree (bottom-up parsing)
20
COMP90042
L12
•
•
Transition-based parsing (lecture 16): ‣ Greedy, uses shift-reduce algorithm
Bottom-Up Parsing
CYK/chart parsing algorithm (lecture 14)
‣ Global, but some constraints prevent CYK from finding globally optimal tree for discourse parsing
21
COMP90042
L12
Top-Down Parsing
1. Segment document into DUs
2. Decide a boundary to split into 2 segments
3. For each segment, repeat step 2
2 4
3 5
6 7 1
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.]
22
COMP90042
L12
• • • • • •
Bag of words
Discourse markers
Starting/ending n-grams
Location in the text
Syntax features
Lexical and distributional similarities
Discourse Parsing Features
23
COMP90042
L12
• • • • •
Summarisation Sentiment analysis Argumentation Authorship attribution Essay scoring
Applications of Discourse Parsing?
PollEv.com/jeyhanlau569
COMP90042
L12
25
COMP90042
L12
• • • • •
Summarisation Sentiment analysis Argumentation Authorship attribution Essay scoring
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
Applications of Discourse Parsing?
hidden rip-offs.]
26
COMP90042
L12
Anaphora Resolution
27
COMP90042
L12
Anaphors
• Anaphor: linguistic expressions that refer back to earlier elements in the text
• Anaphors have a antecedent in the discourse, often but not always a noun phrase
‣ Yesterday, Ted was late for work. It all started when his car wouldn’t start.
• Pronouns are the most common anaphor
• But there are various others
‣ Demonstratives (that problem)
28
COMP90042
L12
•
Motivation Essential for deep semantic analysis
‣ Very useful for QA, e.g., reading comprehension
Ted’s car broke down. So he went over to Bill’s house to borrow his car. Bill said that was fine.
Whose car is borrowed?
29
COMP90042
L12
Antecedent Restrictions
• Pronouns must agree in number with their antecedents
‣ His coworkers were leaving for lunch when Ted arrived. They invited him, but he said no.
• Pronouns must agree in gender with their antecedents ‣ Sue was leaving for lunch when Ted arrived. She
invited him, but he said no.
• Pronouns whose antecedents are the subject of the same syntactic clause must be reflexive (…self)
‣ Ted was angry at him. [him ≠ Ted]
‣ Ted was angry at himself. [himself = Ted]
30
COMP90042
L12
Antecedent Preferences
The antecedents of pronouns should be recent
‣ He waited for another 20 minutes, but the tram didn’t come. So he walked home and got his bike out of the garage. He started riding it to work.
•
•
The antecedent should be salient, as determined by grammatical position
‣ Subject > object > argument of preposition
‣ Ted usually rode to work with Bill. He was never late.
31
COMP90042
L12
Entities and Reference
• •
Discourse 16.1 (left) more coherent
Pronouns all refer to John consistently, the protagonist
32
COMP90042
L12
•
•
•
A unified account of relationship between discourse structure and entity reference
Centering Theory
Every utterance in the discourse is characterised by a set of entities, known as centers
Explains preference of certain entities for ambiguous pronouns
33
COMP90042
L12
For an Utterance Un • Forward-lookingcenters:
‣
‣
‣ Ordered by syntactic prominence: subjects > objects • Backward-lookingcenter:
All entities in Un:
Cf(Un) = [e1, e2, …]
Cf(16.1a) = [John, music store, piano]
‣
‣ Candidate entities in 16.1b = [John, music store]
‣
‣ Not music store because John has a higher rank in previous utterance’s forward-looking centers Cf(Un-1)
Highest ranked forward-looking center in previous utterance (Cf(Un-1)) that is also in current utterance (Un)
Cb(16.1b) = [John]
34
COMP90042
L12
•
When resolving entity for anaphora resolution, choose the entity such that the top foward- looking center matches with the backward- looking center
•
Why? Because the text reads more fluent when this condition is satisfied
Centering Algorithm
35
COMP90042
L12
Text is coherent because the top forward-looking center matches the backward-looking center for each utterance:
top forward-looking center = John backward-looking center = John
Not quite the case here. Cf(16.2b) = [music store, John]
Cb(16.2b) = [John]
Cf(16.2d) = [music store, John] Cb(16.2d) = [John]
36
COMP90042
L12
• •
Build a binary classifier for anaphor/antecedent pairs
Convert restrictions and preferences into features ‣ Binary features for number/gender compatibility
‣ Position of antecedent in text
‣ Include features about type of antecedent
•
•
With enough data, can approximate the centering algorithm
Supervised Anaphor Resolution
But also easy to include features that are potentially helpful
‣ words around anaphor/antecedent
37
COMP90042
L12
•
Stanford CoreNLP includes
pronoun coreference models
‣ rule-based system isn’t too
bad
‣ considerably faster than neural models
Anaphora Resolution Tools
SYSTEM
LANGUAGE
PREPROCES SING TIME
COREF TIME
TOTAL TIME
F1 SCORE
Deterministic
English
3.87s
0.11s
3.98s
49.5
Statistical
English
0.48s
1.23s
1.71s
56.2
Neural
English
3.22s
4.96s
8.18s
60.0
Deterministic
Chinese
0.39s
0.16s
0.55s
47.5
Neural
Chinese
0.42s
7.02s
7.44s
53.9
Source: https://stanfordnlp.github.io/CoreNLP/coref.html
Evaluated on CoNLL 2012 task.
38
COMP90042
L12
•
•
For many tasks, it is important to consider context larger than sentences
A Final Word
Traditionally many popular NLP applications has been sentence-focused (e.g. machine translation), but that is beginning to change…
39
COMP90042
L12
•
E18, Ch 16
Further Reading
40