l12-discourse-v3
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
Natural Language Processing
Lecture 12
Semester 1 2021 Week 6
Jey Han Lau
Discourse
COMP90042 L12
2
Discourse
• Most tasks/models we learned operate at word or
sentence level:
‣ POS tagging
‣ Language models
‣ Lexical/distributional semantics
• But NLP often deals with documents
• Discourse: understanding how sentences relate
to each other in a document
COMP90042 L12
3
Outline
• Discourse segmentation
• Discourse parsing
• Anaphora resolution
COMP90042 L12
4
Discourse Segmentation
COMP90042 L12
5
Discourse Segmentation
• A document can be viewed as a sequence of
segments
• A segment: a span of cohesive text
• Cohesion: organised around a topic or function
COMP90042 L12
6
• Wikipedia biographies: early years, major events,
impact on others
COMP90042 L12
7
• Scientific articles: introduction, related work,
experiments
COMP90042 L12
8
Unsupervised Approaches
• TextTiling algorithm: looking for points of low
lexical cohesion between sentences
• For each sentence gap:
‣ Create two BOW vectors consisting of words from k
sentences on either side of gap
‣ Use cosine to get a similarity score (sim) for two
vectors
‣ For gap i, calculate a depth score, insert boundaries
when depth is greater than some threshold t
) + ) 𝑑𝑒𝑝𝑡h(gap𝑖) = (𝑠𝑖𝑚𝑖−1 − 𝑠𝑖𝑚𝑖 (𝑠𝑖𝑚𝑖+1 − 𝑠𝑖𝑚𝑖
COMP90042 L12
9
Text Tiling Example (k=1, t=0.9)
He walked 15 minutes to the tram stop.
Then he waited for another 20 minutes, but
the tram didn’t come.
The tram drivers were on strike that
morning.
So he walked home and got his bike out of
the garage.
He started riding but quickly discovered he
had a flat tire
He walked his bike back home.
He looked around but his wife had cleaned
the garage and he couldn’t find the bike
pump.
sim: 0.9
sim: 0.7
sim: 0.1
sim: 0.5
sim: 0.8
sim: 0.5
d=0.7-0.9=-0.2
d=(0.9-0.7)+(0.1-0.7)=-0.4
d=(0.7-0.1)+(0.5-0.1)=1.0
d=(0.1-0.5)+(0.8-0.5)=-0.1
d=(0.1-0.5)+(0.8-0.5)=-0.6
d=0.8-0.5=0.3
) + ) 𝑑𝑒𝑝𝑡h(gap𝑖) = (𝑠𝑖𝑚𝑖−1 − 𝑠𝑖𝑚𝑖 (𝑠𝑖𝑚𝑖+1 − 𝑠𝑖𝑚𝑖
COMP90042 L12
10
Supervised Approaches
• Get labelled data
from easy sources
‣ Scientific
publications
‣ Wikipedia
articles
COMP90042 L12
11
Supervised Discourse Segmenter
• Apply a binary classifier to identify boundaries
• Or use sequential classifiers
• Potentially include classification of section types
(introduction, conclusion, etc.)
• Integrate a wider range of features, including
‣ distributional semantics
‣ discourse markers (therefore, and, etc)
COMP90042 L12
12
Discourse Parsing
COMP90042 L12
13
Discourse Analysis
• Identify discourse units, and the relations that
hold between them
• Rhetorical Structure Theory (RST) is a
framework to do hierarchical analysis of discourse
structure in documents
COMP90042 L12
14
Discourse Units
• Typically clauses of a sentence
• DUs do not cross sentence boundary
• [It does have beautiful scenery,] [some of the best
since Lord of the Rings.]
• 2 merged DUs = another
composite DU
DU
COMP90042 L12
15
Discouse Relations
• Relations between discourse units:
‣ conjuction, justify, concession, elaboration, etc
‣ [It does have beautiful scenery,]
↑(elaboration)
[some of the best since Lord of the Rings.]
COMP90042 L12
16
Nucleus vs. Satellite
• Within a discourse relation, one argument is the nucleus
(the primary argument)
• The supporting argument is the satellite
‣ [It does have beautiful scenery,]nucleus
↑(elaboration)
[some of the best since Lord of the Rings.]satellite
• Some relations are equal (e.g. conjunction), and so both
arguments are nuclei
‣ [He was a likable chap,]nucleus
↑(conjunction)
[and I hated to see him die.]nucleus
COMP90042 L12
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.] 17
RST Tree
• An RST relation
combines two or
more DUs into
composite DUs
• Process of
combining DUs is
repeated, creating
an RST tree
COMP90042 L12
18
RST Parsing
• Task: given a document, recover the RST tree
‣ Rule-basd parsing
‣ Bottom-up approach
‣ Top-down aproach
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.]
COMP90042 L12
19
Parsing Using Discourse Markers
• Some discourse markers (cue phrases) explicitly
indicate relations
‣ although, but, for example, in other words, so,
because, in conclusion,…
• Can be used to build a simple rule-based parser
• However
‣ Many relations are not marked by discourse
marker
‣ Many discourse markers ambiguous (e.g. and)
COMP90042 L12
20
Parsing Using Machine Learning
• RST Discourse Treebank
‣ 300+ documents annotated with RST trees
• Basic idea:
‣ Segment document into DUs
‣ Combine adjacent DUs into composite DUs
iteratively to create the full RST tree (bottom-up
parsing)
COMP90042 L12
21
Bottom-Up Parsing
• Transition-based parsing (lecture 16):
‣ Greedy, uses shift-reduce algorithm
• CYK/chart parsing algorithm (lecture 14)
‣ Global, but some constraints prevent CYK from
finding globally optimal tree for discourse
parsing
COMP90042 L12
22
Top-Down Parsing
1. Segment document into DUs
2. Decide a boundary to split
into 2 segments
3. For each segment, repeat
step 2
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.]
1
2
3
4
5
6
7
COMP90042 L12
23
Discourse Parsing Features
• Bag of words
• Discourse markers
• Starting/ending n-grams
• Location in the text
• Syntax features
• Lexical and distributional similarities
COMP90042 L12
Applications of Discourse Parsing?
PollEv.com/jeyhanlau569
• Summarisation
• Sentiment analysis
• Argumentation
• Authorship attribution
• Essay scoring
http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569
COMP90042 L12
25
COMP90042 L12
1A: [It could have been a great movie.]
1B: [It does have beautiful scenery,]
1C: [some of the best since Lord of the Rings.]
1D: [The acting is well done,]
1E: [and I really liked the son of the leader of the Samurai.]
1F: [He was a likable chap,]
1G: [and I hated to see him die.]
1H: [But, other than all that, this movie is nothing more than
hidden rip-offs.] 26
Applications of Discourse Parsing?
• Summarisation
• Sentiment analysis
• Argumentation
• Authorship attribution
• Essay scoring
COMP90042 L12
27
Anaphora Resolution
COMP90042 L12
28
Anaphors
• Anaphor: linguistic expressions that refer back to
earlier elements in the text
• Anaphors have a antecedent in the discourse, often
but not always a noun phrase
‣ Yesterday, Ted was late for work. It all started when
his car wouldn’t start.
• Pronouns are the most common anaphor
• But there are various others
‣ Demonstratives (that problem)
COMP90042 L12
29
Motivation
• Essential for deep semantic analysis
‣ Very useful for QA, e.g., reading
comprehension
Ted’s car broke down. So he went over to Bill’s
house to borrow his car. Bill said that was fine.
Whose car is borrowed?
COMP90042 L12
30
Antecedent Restrictions
• Pronouns must agree in number with their antecedents
‣ His coworkers were leaving for lunch when Ted
arrived. They invited him, but he said no.
• Pronouns must agree in gender with their antecedents
‣ Sue was leaving for lunch when Ted arrived. She
invited him, but he said no.
• Pronouns whose antecedents are the subject of the
same syntactic clause must be reflexive (…self)
‣ Ted was angry at him. [him ≠ Ted]
‣ Ted was angry at himself. [himself = Ted]
COMP90042 L12
31
Antecedent Preferences
• The antecedents of pronouns should be recent
‣ He waited for another 20 minutes, but the tram
didn’t come. So he walked home and got his
bike out of the garage. He started riding it to
work.
• The antecedent should be salient, as determined
by grammatical position
‣ Subject > object > argument of preposition
‣ Ted usually rode to work with Bill. He was never
late.
COMP90042 L12
32
Entities and Reference
• Discourse 16.1 (left) more coherent
• Pronouns all refer to John consistently, the
protagonist
COMP90042 L12
33
Centering Theory
• A unified account of relationship between
discourse structure and entity reference
• Every utterance in the discourse is characterised
by a set of entities, known as centers
• Explains preference of certain entities for
ambiguous pronouns
COMP90042 L12
34
For an Utterance Un
• Forward-looking centers:
‣ All entities in Un:
Cf(Un) = [e1, e2, …]
‣ Cf(16.1a) = [John, music store, piano]
‣ Ordered by syntactic prominence: subjects > objects
• Backward-looking center:
‣ Highest ranked forward-looking center in previous utterance (Cf(Un-1))
that is also in current utterance (Un)
‣ Candidate entities in 16.1b = [John, music store]
‣ Cb(16.1b) = [John]
‣ Not music store because John has a higher rank in previous
utterance’s forward-looking centers Cf(Un-1)
COMP90042 L12
35
Centering Algorithm
• When resolving entity for anaphora resolution,
choose the entity such that the top foward-
looking center matches with the backward-
looking center
• Why? Because the text reads more fluent when
this condition is satisfied
COMP90042 L12
36
Text is coherent because the top
forward-looking center matches the
backward-looking center for each
utterance:
top forward-looking center = John
backward-looking center = John
Not quite the case here.
Cf(16.2b) = [music store, John]
Cb(16.2b) = [John]
Cf(16.2d) = [music store, John]
Cb(16.2d) = [John]
COMP90042 L12
37
Supervised Anaphor Resolution
• Build a binary classifier for anaphor/antecedent pairs
• Convert restrictions and preferences into features
‣ Binary features for number/gender compatibility
‣ Position of antecedent in text
‣ Include features about type of antecedent
• With enough data, can approximate the centering
algorithm
• But also easy to include features that are potentially
helpful
‣ words around anaphor/antecedent
COMP90042 L12
38
Anaphora Resolution Tools
• Stanford CoreNLP includes
pronoun coreference models
‣ rule-based system isn’t too
bad
‣ considerably faster than neural models
SYSTEM LANGUAGE PREPROCESSING TIME COREF TIME TOTAL TIME F1 SCORE
Deterministic English 3.87s 0.11s 3.98s 49.5
Statistical English 0.48s 1.23s 1.71s 56.2
Neural English 3.22s 4.96s 8.18s 60.0
Deterministic Chinese 0.39s 0.16s 0.55s 47.5
Neural Chinese 0.42s 7.02s 7.44s 53.9
Source: https://stanfordnlp.github.io/CoreNLP/coref.html
Evaluated on CoNLL 2012 task.
https://stanfordnlp.github.io/CoreNLP/coref.html
https://stanfordnlp.github.io/CoreNLP/coref.html
COMP90042 L12
39
A Final Word
• For many tasks, it is important to consider context
larger than sentences
• Traditionally many popular NLP applications has
been sentence-focused (e.g. machine translation),
but that is beginning to change…
COMP90042 L12
40
Further Reading
• E18, Ch 16