l16-dependency-v3
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
Natural Language Processing
Lecture 16
Semester 1 2021 Week 8
Jey Han Lau
Dependency Grammar
COMP90042 L16
2
Context-Free Grammars (Recap)
• CFGs assume a constituency tree which identifies the
phrases in a sentence
‣ based on idea that
these phrases are
interchangeable
(e.g., swap an NP
for another NP) and
maintain
grammaticality
COMP90042 L16
3
Dependency Grammars
• Dependency grammar offers a simpler approach
‣ describe relations between pairs of words
‣ namely, between heads and dependents
‣ e.g. (prefer, dobj, flight)
COMP90042 L16
4
Why?
• Deal better with languages that are
morphologically rich and have a relatively free
word order
‣ CFG need a separate rule for each possible
place a phrase can occur in
• Head-dependent relations similar to semantic
relations between words
‣ More useful for applications: coreference
resolution, information extraction, etc
COMP90042 L16
5
Outline
• Basics of dependency grammar
• Transition-based parsing
• Graph-based parsing
COMP90042 L16
6
Basics of Dependency Grammar
COMP90042 L16
7
Dependency Relations
• Captures the grammatical relation between:
‣ Head = central word
‣ Dependent = supporting word
• Grammatical relation = subject, direct object, etc
• Many dependency theories and taxonomies
proposed for different languages
• Universal Dependency: a framework to create a set
of dependency relations that are computationally
useful and cross-lingual
COMP90042 L16
8
Universal Dependency
COMP90042 L16
9
More Examples
COMP90042 L16
10
Question Answering
• Dependency tree more directly represents the
core of the sentence: who did what to whom?
‣ captured by the links incident on verb nodes
COMP90042 L16
11
Information Extraction
• “Brasilia, the Brazilian capital, was founded in
1960.”
→ capital(Brazil, Brasilia)
→ founded(Brasilia, 1960)
• Dependency tree captures relations succinctly
COMP90042 L16
12
What about CFGs
• Constituency trees can also provide similar
information
• But it requires some distilling using head-finding
rules
COMP90042 L16
13
Dependency vs Constituency
• Dependency tree
‣ each node is a word token
‣ one node is chosen as the root
‣ directed edges link heads and their dependents
• Constituency tree
‣ forms a hierarchical tree
‣ word tokens are the leaves
‣ internal nodes are ‘constituent phrases’ e.g. NP
• Both use part-of-speech
COMP90042 L16
14
Properties of a Dependency Tree
• Each word has a single head (parent)
• There is a single root node
• There is a unique path to each word from the root
• All arcs should be projective
COMP90042 L16
15
Projectivity
• An arc is projective if there is a path from head to
every word that lies between the head and the
dependent
A B C D
• Dependency tree is projective if all arcs are projective
• In other words, a dependency tree is projective if it can
be drawn with no crossing edges
• Most sentences are projective, but exceptions exist
• Common in languages with flexible word order
Path from A to B: ✓
Path from A to C: ✓
Arc is Projective: ✓
COMP90042 L16
16
Projectivity
PollEv.com/jeyhanlau569
Is this tree projective?
http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569
COMP90042 L16
17
COMP90042 L16
18
Treebank Conversion
• A few dependency treebanks (Czech, Arabic, Danish…)
• Many constituency treebanks
• Some can be converted into dependencies
• Dependency trees generated from constituency trees
are always projective
• Main idea: identify head-dependent relations in
constituency structure and the corresponding
dependency relations
‣ Use various heuristics, e.g., head-finding rules
‣ Often with manual correction
COMP90042 L16
19
COMP90042 L16
20
Examples From Treebanks
• Danish DDT includes additional ‘subject’ link for verbs
• METU-Sabancı Turkish treebank
‣ edges between morphological units, not just words
http://www.buch-kromann.dk/matthias/ddt1.0/
Dansk
AN
4
kultur
NC
5
skal
VA
6
skabes
VA
7
,
XP
8
forvaltes
VA
9
,
XP
10
plejes
VA
11
og
CC
12
forkæles
VA
13
af
SP
14
os
PP
15
danskere
NC
16
.
XP
17
Den
PP
20
er
VA
21
vores
PO
22
mod subj
[subj][subj][subj][subj]
vobj pnctpnct conj pnct conj coord modconj nobj nobj subj pred pnctpossd
egenart
NC
23
og
CC
24
livsstil
NC
25
.
XP
26
Det
PD
29
bedste
AN
30
vi
PP
31
har
VA
32
.
XP
33
I
SP
38
debatten
NC
39
tordnes
VA
40
der
U=
41
løs
AN
42
mod
SP
43
Det
PD
44
kgl.
AN
45
Teaters
NC
46
pnctpossd coord conj mod
[dobj]
rel pnctsubj nobjmod expl mod pobj pnctnobj mod nobj possd
repertoire
NC
47
.
XP
48
Filmfolket
NC
51
jamrer
VA
52
sig
PP
53
over
SP
54
manglende
VA
55
bevillinger
NC
56
,
XP
57
mens
CS
58
forfatterne
NC
59
flår
VA
60
hovedet
NC
61
af
SP
62
pnctpossd subj dobj pobj pnct mod pnctnobjmod vobjsubj dobj mod nobj
hinanden
PC
63
.
XP
64
Samtidig
RG
67
kvier
VA
68
vi
PP
69
os
PP
70
ved
SP
71
at
U=
72
købe
VA
73
for
RG
74
dyre
AN
75
bøger
NC
76
,
XP
77
svigter
VA
78
biograferne
NC
79
og
CC
80
ser
VA
81
TV
NC
82
pnctnobj mod subj dobj pobj pnct conj coord pnct
[subj][subj]
nobj vobj dobjmod mod dobj conj dobj mod
i
SP
83
stedet
NC
84
for
SP
85
at
U=
86
gå
VA
87
i
SP
88
teatret
NC
89
.
XP
90
Det
PP
95
er
VA
96
der
U=
97
for
SP
98
så
RG
99
vidt
RG
100
ikke
RG
101
noget
PI
102
nyt
AN
103
i
SP
104
,
XP
105
bortset_fra
SP
106
pnctmod nobj pobj nobj vobj mod nobj nobj expl mod mod dobj mod pnct mod pnctavobjmod mod nobj
Oflazer, Kemal, et al. “Building a Turkish treebank.” Treebanks. Springer, 2003. 261-277.
8 K. OFLAZER, B. SAY, D-Z. HAKKANI-TÜR, G. TÜR
The syntactic relations that we have currently opted to encode in our syn-
tactic representation are the following:
1. Subject 2. Object
3. Modifier (adv./adj.) 4. Possessor
5. Classifier 6. Determiner
7. Dative Adjunct 8. Locative Adjunct
9. Ablative Adjunct 10. Instrumental Adjunct
Some of the relations above perhaps require some more clarification. Object
is used to mark objects of verbs and the nominal complements of postpositions.
A classifier is a nominal modifier in nominative case (as in book cover) while
a possessor is a genitive case-marked nominal modifier. For verbal adjuncts,
we indicate the syntactic relation with a marker paralleling the case marking
though the semantic relation they encode is not only determined by the case
marking but also the lexical semantics of the head noun and the verb they
are attached to. For instance a dative adjunct can be a goal, a destination, a
beneficiary or a value carrier in a transaction, or a theme, while an ablative
adjunct may be a reason, a source or a theme. Although we do not envision the
use of such detailed relation labels at the outset, such distinctions can certainly
be useful in training case-frame based transfer modules in machine translation
systems to select the appropriate prepositions in English for instance.
Bu eski bahçe-de+ki gül-ün böyle büyü +me-si herkes-i çok etkile-di
D ADJ N ADJ N ADV V N PN ADV V
Mod Mod Mod Obj
Mod
Det Subj Subj
Last line shows the final POS for each word.
Figure 1.3. Dependency structure for a sample Turkish Sentence
2.3 Example of a Treebank Sentence
In this section we present the detailed representation of a Turkish sentence
in the treebank. Each sentence is represented by a sequence of the attribute lists
of the words involved, bracketed with tags and .5 Figure 1.4 shows
the treebank encoding for the sentence given earlier. Each word is bracketed
by
denotes the lemma of the word, as one would find in a dictionary. For verbs,
this is typically an infinitive form, while for other word classes it is usually
the root word itself. MORPH indicates the morphological structure of the word
as a sequence of morphemes, essentially corresponding to the lexical form.
COMP90042 L16
21
Transition-based Parsing
COMP90042 L16
22
Dependency Parsing
• Find the best structure for a given input sentence
• Two main approaches:
‣ Transition-based: bottom-up greedy method
‣ Graph-based: encodes problem using nodes/
edges and use graph theory methods to find
optimal solutions
COMP90042 L16
23
Caveat
• Transition-based parsers can only handle
projective dependency trees!
• Less applicable for languages where cross-
dependencies are common
COMP90042 L16
24
Transition-Based Parsing: Intuition
• Processes word from left to right
• Maintain two data structures
‣ Buffer: input words yet to
be processed
‣ Stack: store words that are
being processed
COMP90042 L16
25
Transition-Based Parsing: Intuition
• At each step, perform one of the 3 actions:
‣ Shift: move a word from buffer to stack
‣ Left-Arc: assign current word as head of the
previous word in stack
‣ Right-Arc: assign previous word as head of
current word in stack
A B C D
A B C D
COMP90042 L16
26
??
PollEv.com/jeyhanlau569
Buffer
http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569
COMP90042 L16
27
COMP90042 L16
28
Buffer
COMP90042 L16
29
Dependency Labels
• For simplicity, we omit labels on the dependency
relations
• In practice, we parameterise the left-arc and right-
arc actions with dependency labels:
‣ E.g. left-arc-nsubj or right-arc-dobj
• Expands the list of actions to > 3 types
COMP90042 L16
30
The Right Action?
• We assume an oracle that tells us the right action
at every step
• Given a dependency tree, the role of oracle is to
generate a sequence of ground truth actions
COMP90042 L16
31
Parsing Model
• We then train a supervised model to mimic the
actions of the oracle
‣ To learn at every step the correct action to take
(as given by the oracle)
‣ At test time, the trained model can be used to
parse a sentence to create the dependency
tree
COMP90042 L16
32
Parsing As Classification
• Input:
‣ Stack (top-2 elements: s1 and s2)
‣ Buffer (first element: b1)
• Output
‣ 3 classes: shift, left-arc, or, right-arc
• Features
‣ word (w), part-of-speech (t)
COMP90042 L16
33
• Input features:
‣ s1.w = flights
‣ s2.w = cancelled
‣ s1.t = NNS
‣ s2.t = VBD
‣ b1.w = to
‣ b1.t = TO
‣ s1.t ◦s2.t = NNS_VBD
• Output label: shift
COMP90042 L16
34
Classifiers
• Traditionally SVM works best
• Nowadays, deep learning models are state-of-the-art
• Weakness: local classifier based on greedy search
• Solutions:
‣ Beam search: keep track of top-N best actions
‣ Dynamic oracle: during training, use predicted
actions occasionally
‣ Graph-based parser
COMP90042 L16
35
Graph-based Parsing
COMP90042 L16
36
Graph-Based Parsing
• Given an input sentence, construct a fully-
connected, weighted, directed graph
• Vertices: all words
• Edges: head-dependent arcs
• Weight: score based on training data (relation that
is frequently observed receive a higher score)
• Objective: find the maximum spanning tree
(Kruskal’s algorithm)
COMP90042 L16
37
Advantage
• Can produce non-projective trees
‣ Not a big deal for English
‣ But important for many other languages
• Score entire trees
‣ Avoid making greedy local decisions like
transition-based parsers
‣ Captures long dependencies better
COMP90042 L16
38
Example
• Caveat: tree may contain cycles
• Solution: need to do cleanup to remove cycles
(Chu-Liu-Edmonds algorithm)
COMP90042 L16
39
A Final Word
• Dependency parsing a compelling, alterative,
formulation to constituency parsing
‣ Edges encode word-word syntactic and
semantic relations
• Transition-based parsing
• Graph-based parsing
COMP90042 L16
40
Required Reading
• JM3 Ch. 14