Dependency Grammar
COMP90042
Natural Language Processing Lecture 16
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1
COMP90042 L16
Correction on Lecture 13, Page 8
2
COMP90042 L16
Context-Free Grammars (Recap) • CFGs assume a constituency tree which identifies the
phrases in a sentence
‣ based on idea that
these phrases are
interchangeable
(e.g., swap an NP
for another NP) and
maintain
grammaticality
3
COMP90042
L16
Dependency Grammars
• Dependency grammar offers a simpler approach ‣ describe relations between pairs of words
‣ namely, between heads and dependents
‣ e.g. (prefer, dobj, flight)
4
COMP90042
L16
•
Deal better with languages that are morphologically rich and have a relatively free word order
‣ CFG need a separate rule for each possible place a phrase can occur in
•
Head-dependent relations similar to semantic relations between words
‣ More useful for applications: coreference resolution, information extraction, etc
Why?
5
COMP90042
L16
Dependency Relations
• Capturesthegrammaticalrelationbetweenaheadand a dependent
• Head=centralword
• Dependent=supportingword
• Grammaticalrelation=subject,directobject,etc
• Manydependencytheoriesandtaxonomiesproposed for different languages
• UniversalDependency:aframeworktocreateasetof dependency relations that are computationally useful and cross-lingual
6
COMP90042
L16
Universal Dependency
7
COMP90042
L16
More Examples
8
COMP90042
L16
•
Question Answering Dependency tree more directly represents the
core of the sentence: who did what to whom? ‣ captured by the links incident on verb nodes
‣ more minor details are buried deeper in the tree (e.g., adjectives, determiners etc)
9
COMP90042
L16
•
“Brasilia, the Brazilian capital, was founded in 1960.”
→ capital(Brazil, Brasilia)
→ founded(Brasilia, 1960)
•
Dependency tree captures relations succinctly
Information Extraction
10
COMP90042
L16
•
•
Constituency trees can also provide similar information
What about CFGs
But it requires some distilling using head-finding rules
11
COMP90042
L16
Dependency vs Constituency
•
•
Constituency tree
‣ forms a hierarchical tree
‣ word tokens are the leaves
‣ internal nodes are ‘constituent phrases’ e.g., NP, VP
etc
Both use part-of-speech
•
Dependency tree
‣ each node is a word token
‣ one node is chosen as the root
‣ directed edges link heads and their dependents
12
COMP90042 L16
• • • •
Each word has a single head (parent)
There is a single root node
There is a unique path to each word from the root All arcs are projective
‣ Transition-based parsers can only produce projective trees
Properties of a Dependency Tree
13
COMP90042
L16
Projectivity
• An arc is projective if there is a path from head to every word that lies between the head and the dependent
• A dependency tree is projective if all arcs are projective
• That is, a dependency tree is projective if it can be
drawn with no crossing edges
• Most sentences are projective, however exceptions
exist
• Common in languages with flexible word order
14
COMP90042
L16
Projectivity
15
COMP90042
L16
Dependency Treebanks A few dependency treebanks
‣ Czech, Arabic, Danish, Dutch, Greek, Turkish …
•
•
•
http://universaldependencies.org/
More recently, Universal Dependency Treebank
‣ collates >100 treebanks, >60 languages
‣ unified part-of-speech, morphology labels, relation types
‣ consistent handling of conjunctions and other tricky cases
16
COMP90042
L16
•
•
•
Many constituency treebanks; some can be automatically converted into dependencies
Treebank Conversion
Dependency trees generated from constituency trees are always projective
Main idea: identify head-dependent relations in constituency structure and the corresponding dependency relations
‣ Use various heuristics, e.g., head-finding rules ‣ often with manual correction
17
COMP90042 L16
18
7. Dative Adjunct
COMP90042
8. Locative Adjunct
10. Instrumental Adjunct
L16
9. Ablative Adjunct
Some of the relations above perhaps require some more clarification. Object
•
a possessor is a genitive case-marked nominal modifier. For verbal adjuncts, Danish DDT includes additional ‘subject’ link for verbs
Examples From Treebanks
A classifier is a nominal modifier in nominative case (as in book cover) while
is used to mark objects of verbs and the nominal complements of postpositions.
we indicate the syntactic relation with a marker paralleling the case marking though the semantic relation they encode is not only determined by the case marking but also the lexical semantics of the head noun and the verb they
are attached to. For instance a dative adjunct can be a goal, a destination, a Dansk kultur skal skabes , forvaltes , plejes og forkæles af os danskere . Den er vores
adjunct may be a reason, a source or a theme. Although we do not envision the http://www.buch-kromann.dk/matthias/ddt1.0/
mod subj vobj pnct conj pnct conj coord conj mod nobj nobj pnct subj pred possd pnct
beneficiary or a value carrier in a transaction, or a theme, while an ablative 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 22
AN NC VA VA XP VA XP VA CC VA SP PP NC XP PP VA PO
[subj]
•
use of such detailed relation labels at the outset, such distinctions can certainly be useful in training case-frame based transfer modules in machine translation
METU-Sabancı Turkish treebank
systems to select the appropriate prepositions in English for instance.
‣ edgesbetweenmorphologicalunits,notjustwords
possd coord conj pnct mod subj rel pnct mod nobj expl mod pobj nobj mod nobj possd pnct
egenart og livsstil . Det bedste vi har . I debatten tordnes der løs mod Det kgl. Teaters
Det Subj
Subj
NC CC NC XP PD AN PP VA XP SP NC 23 24 25 26 29 30 31 32
VA U= AN SP PD AN NC 40 41 42 43 44 45 46
[dobj]
Mod
Obj Mod
Mod
3338 39
Mod
Bu
eski
bahçe-de
ki
gül-ün
böyle
büyü
me-si
herkes-i
çok
etkile-di
D ADJ N
++
ADJ N ADV V N PN ADV V
possd pnct subj dobj pobj mod nobj pnct mod subj vobj dobj mod nobj pnct repertoire . Filmfolket jamrer sig over manglende bevillinger , mens forfatterne flår hovedet af
Last line shows the final POS for each word.
Oflazer, Kemal, et al. “Building a Turkish treebank.” Treebanks. Springer, 2003. 261-277. NC XP NC VA PP SP VA NC XP CS NC VA NC SP
19
47 48 51 52 53 54 55 56 57 58 59 60 61 62
COMP90042
L16
Parsing
20
COMP90042
L16
•
‣
•
Dependency Parsing
Parsing: task of finding the best structure for a given input sentence
argmaxt score(t|w) Two main approaches:
‣ transition-based: treats problem as incremental sequence of decisions over next action in a state machine
‣ graph-based: encodes problem using nodes/edges and use graph theory methods to find optimal solutions
21
COMP90042 L16
• •
Examine the words in a single pass from left to right At each step, perform one of the following actions:
‣ Assign current word as head of some previously seen words [left arc]
‣ Assign some previously seen words as head of current word [right arc]
‣ Don’t do anything, store it and move to next step [shift]
Transition-Based Parsing: Intuition
22
COMP90042
L16
Transition-Based Parsing
• Transition-basedparserscanonlyproduce projective trees
• Framesparsingassequenceoftransitions ‣ maintain two data structures
– buffer: input words yet to be processed
– stack: head words currently being processed ‣ two types of transitions
– shift: move word from buffer to top of stack
– arc: add arc (left/right) between top two items
on stack, and remove dependent from stack
23
COMP90042
L16
Transition-Based Parsing
J&M Fig 15.5
24
COMP90042 L16
25
COMP90042 L16
26
COMP90042 L16
27
COMP90042 L16
28
COMP90042 L16
29
COMP90042 L16
30
COMP90042 L16
31
COMP90042 L16
32
COMP90042 L16
33
COMP90042
L16
•
•
For simplicity, we omit labels on the dependency relations
•
This expands the list of actions to more than just 3 types
Dependency Labels
In practice, we parameterise the left-arc and right- arc actions with dependency labels:
‣ E.g. left-arc-nsubj or right-arc-dobj
34
COMP90042
L16
•
•
We assume an oracle that tells us the right action at every step
The Right Action?
Given a dependency tree, the role of oracle is to generate a sequence of ground truth actions
35
COMP90042
L16
•
We then train a supervised model to mimic the actions of the oracle
‣ To learn at every step the correct action to take (given by the oracle)
‣ At test time, the trained model can be used to parse a sentence to create the dependency tree
Parsing Model
36
COMP90042
L16
•
Input:
‣ Stack (top-2 elements: s1 and s2) ‣ Buffer (first element: b1)
•
•
Output
‣ 3 classes: shift, left-arc, or, right-arc
Parsing As Classification
Features
‣ word (w), part-of-speech (t)
37
COMP90042 L16
• Inputfeatures:
‣ s1.w = flights
‣ s2.w = cancelled
‣ s1.t = NNS
‣ s2.t = VBD
‣ b1.w = to
‣ b1.t = TO
‣ s1.t ◦s2.t = NNS_VBD
• Outputlabel:shift
38
COMP90042
L16
Classifiers
Traditionally SVM works best
Nowadays, deep learning models are the state-of- the-art
• •
• •
Weakness: local classifier based on greedy search Solutions:
‣ Beam search: keep track of top-N best actions (next lecture!)
‣ Dynamic oracle: during training, use predicted actions occasionally
39
COMP90042
L16
•
•
•
•
•
Given an input sentence, construct a fully- connected, weighted, directed graph
Graph-Based Parsing
Vertices: all words
Edges: head-dependent arcs
Weight: score based on training data (relation that is frequently observed receive a higher score)
Objective: find the maximum spanning tree
40
COMP90042
L16
•
Can produce non-projective trees
‣ Not a big deal for English
‣ But a problem for many other languages
•
Score entire trees
‣ Avoid making greedy local decisions like transition-based parsers
‣ Captures long dependencies better
Advantage
41
COMP90042
L16
Example
• Caveat:treemaycontaincycles
• Solution:needtodocleanuptoremovecycles (Chu-Liu-Edmonds algorithm)
42
COMP90042
L16
•
Dependency parsing a compelling, alterative, formulation to constituency parsing
‣ structures based on words as internal nodes
‣ edges encode word-word syntactic and semantic
relations
‣ often this is the information we need for other tasks!
•
•
Transition-based parsing
‣ a sequence of shift and left/right-arc actions
A Final Word
Graph-based parsing
‣ frames it as a maximum spanning tree problem
43
COMP90042
L16
•
Required Reading J&M3 Ch. 15
44