CS计算机代考程序代写 data structure deep learning flex ER algorithm l16-dependency-v3

l16-dependency-v3

COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1

COMP90042
Natural Language Processing

Lecture 16
Semester 1 2021 Week 8

Jey Han Lau

Dependency Grammar

COMP90042 L16

2

Context-Free Grammars (Recap)

• CFGs assume a constituency tree which identifies the
phrases in a sentence

‣ based on idea that 

these phrases are 

interchangeable 

(e.g., swap an NP 

for another NP) and 

maintain 

grammaticality

COMP90042 L16

3

Dependency Grammars

• Dependency grammar offers a simpler approach
‣ describe relations between pairs of words
‣ namely, between heads and dependents
‣ e.g. (prefer, dobj, flight)

COMP90042 L16

4

Why?

• Deal better with languages that are
morphologically rich and have a relatively free
word order

‣ CFG need a separate rule for each possible
place a phrase can occur in

• Head-dependent relations similar to semantic
relations between words

‣ More useful for applications: coreference
resolution, information extraction, etc

COMP90042 L16

5

Outline

• Basics of dependency grammar

• Transition-based parsing

• Graph-based parsing

COMP90042 L16

6

Basics of Dependency Grammar

COMP90042 L16

7

Dependency Relations

• Captures the grammatical relation between:

‣ Head = central word

‣ Dependent = supporting word

• Grammatical relation = subject, direct object, etc

• Many dependency theories and taxonomies
proposed for different languages

• Universal Dependency: a framework to create a set
of dependency relations that are computationally
useful and cross-lingual

COMP90042 L16

8

Universal Dependency

COMP90042 L16

9

More Examples

COMP90042 L16

10

Question Answering

• Dependency tree more directly represents the
core of the sentence: who did what to whom?
‣ captured by the links incident on verb nodes

COMP90042 L16

11

Information Extraction
• “Brasilia, the Brazilian capital, was founded in

1960.”

→ capital(Brazil, Brasilia)

→ founded(Brasilia, 1960)
• Dependency tree captures relations succinctly

COMP90042 L16

12

What about CFGs

• Constituency trees can also provide similar
information

• But it requires some distilling using head-finding
rules

COMP90042 L16

13

Dependency vs Constituency

• Dependency tree
‣ each node is a word token
‣ one node is chosen as the root
‣ directed edges link heads and their dependents

• Constituency tree
‣ forms a hierarchical tree
‣ word tokens are the leaves
‣ internal nodes are ‘constituent phrases’ e.g. NP

• Both use part-of-speech

COMP90042 L16

14

Properties of a Dependency Tree

• Each word has a single head (parent)

• There is a single root node

• There is a unique path to each word from the root

• All arcs should be projective

COMP90042 L16

15

Projectivity

• An arc is projective if there is a path from head to
every word that lies between the head and the
dependent

A B C D

• Dependency tree is projective if all arcs are projective

• In other words, a dependency tree is projective if it can
be drawn with no crossing edges

• Most sentences are projective, but exceptions exist

• Common in languages with flexible word order

Path from A to B: ✓
Path from A to C: ✓
Arc is Projective: ✓

COMP90042 L16

16

Projectivity

PollEv.com/jeyhanlau569

Is this tree projective?

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L16

17

COMP90042 L16

18

Treebank Conversion

• A few dependency treebanks (Czech, Arabic, Danish…)

• Many constituency treebanks

• Some can be converted into dependencies

• Dependency trees generated from constituency trees
are always projective

• Main idea: identify head-dependent relations in
constituency structure and the corresponding
dependency relations
‣ Use various heuristics, e.g., head-finding rules
‣ Often with manual correction

COMP90042 L16

19

COMP90042 L16

20

Examples From Treebanks
• Danish DDT includes additional ‘subject’ link for verbs

• METU-Sabancı Turkish treebank
‣ edges between morphological units, not just words

http://www.buch-kromann.dk/matthias/ddt1.0/

Dansk
AN
4

kultur
NC
5

skal
VA
6

skabes
VA
7

,
XP
8

forvaltes
VA
9

,
XP
10

plejes
VA
11

og
CC
12

forkæles
VA
13

af
SP
14

os
PP
15

danskere
NC
16

.
XP
17

Den
PP
20

er
VA
21

vores
PO
22

mod subj

[subj][subj][subj][subj]

vobj pnctpnct conj pnct conj coord modconj nobj nobj subj pred pnctpossd

egenart
NC
23

og
CC
24

livsstil
NC
25

.
XP
26

Det
PD
29

bedste
AN
30

vi
PP
31

har
VA
32

.
XP
33

I
SP
38

debatten
NC
39

tordnes
VA
40

der
U=
41

løs
AN
42

mod
SP
43

Det
PD
44

kgl.
AN
45

Teaters
NC
46

pnctpossd coord conj mod

[dobj]

rel pnctsubj nobjmod expl mod pobj pnctnobj mod nobj possd

repertoire
NC
47

.
XP
48

Filmfolket
NC
51

jamrer
VA
52

sig
PP
53

over
SP
54

manglende
VA
55

bevillinger
NC
56

,
XP
57

mens
CS
58

forfatterne
NC
59

flår
VA
60

hovedet
NC
61

af
SP
62

pnctpossd subj dobj pobj pnct mod pnctnobjmod vobjsubj dobj mod nobj

hinanden
PC
63

.
XP
64

Samtidig
RG
67

kvier
VA
68

vi
PP
69

os
PP
70

ved
SP
71

at
U=
72

købe
VA
73

for
RG
74

dyre
AN
75

bøger
NC
76

,
XP
77

svigter
VA
78

biograferne
NC
79

og
CC
80

ser
VA
81

TV
NC
82

pnctnobj mod subj dobj pobj pnct conj coord pnct

[subj][subj]

nobj vobj dobjmod mod dobj conj dobj mod

i
SP
83

stedet
NC
84

for
SP
85

at
U=
86


VA
87

i
SP
88

teatret
NC
89

.
XP
90

Det
PP
95

er
VA
96

der
U=
97

for
SP
98


RG
99

vidt
RG
100

ikke
RG
101

noget
PI
102

nyt
AN
103

i
SP
104

,
XP
105

bortset_fra
SP
106

pnctmod nobj pobj nobj vobj mod nobj nobj expl mod mod dobj mod pnct mod pnctavobjmod mod nobj

Oflazer, Kemal, et al. “Building a Turkish treebank.” Treebanks. Springer, 2003. 261-277.

8 K. OFLAZER, B. SAY, D-Z. HAKKANI-TÜR, G. TÜR

The syntactic relations that we have currently opted to encode in our syn-
tactic representation are the following:

1. Subject 2. Object
3. Modifier (adv./adj.) 4. Possessor
5. Classifier 6. Determiner
7. Dative Adjunct 8. Locative Adjunct
9. Ablative Adjunct 10. Instrumental Adjunct

Some of the relations above perhaps require some more clarification. Object
is used to mark objects of verbs and the nominal complements of postpositions.
A classifier is a nominal modifier in nominative case (as in book cover) while
a possessor is a genitive case-marked nominal modifier. For verbal adjuncts,
we indicate the syntactic relation with a marker paralleling the case marking
though the semantic relation they encode is not only determined by the case
marking but also the lexical semantics of the head noun and the verb they
are attached to. For instance a dative adjunct can be a goal, a destination, a
beneficiary or a value carrier in a transaction, or a theme, while an ablative
adjunct may be a reason, a source or a theme. Although we do not envision the
use of such detailed relation labels at the outset, such distinctions can certainly
be useful in training case-frame based transfer modules in machine translation
systems to select the appropriate prepositions in English for instance.

Bu eski bahçe-de+ki gül-ün böyle büyü +me-si herkes-i çok etkile-di

D ADJ N ADJ N ADV V N PN ADV V

Mod Mod Mod Obj
Mod

Det Subj Subj

Last line shows the final POS for each word.

Figure 1.3. Dependency structure for a sample Turkish Sentence

2.3 Example of a Treebank Sentence
In this section we present the detailed representation of a Turkish sentence

in the treebank. Each sentence is represented by a sequence of the attribute lists
of the words involved, bracketed with tags and .5 Figure 1.4 shows
the treebank encoding for the sentence given earlier. Each word is bracketed
by and tags. The IX denotes the number or index of the word. LEM
denotes the lemma of the word, as one would find in a dictionary. For verbs,
this is typically an infinitive form, while for other word classes it is usually
the root word itself. MORPH indicates the morphological structure of the word
as a sequence of morphemes, essentially corresponding to the lexical form.

COMP90042 L16

21

Transition-based Parsing

COMP90042 L16

22

Dependency Parsing

• Find the best structure for a given input sentence

• Two main approaches:

‣ Transition-based: bottom-up greedy method

‣ Graph-based: encodes problem using nodes/
edges and use graph theory methods to find
optimal solutions

COMP90042 L16

23

Caveat

• Transition-based parsers can only handle
projective dependency trees!

• Less applicable for languages where cross-
dependencies are common

COMP90042 L16

24

Transition-Based Parsing: Intuition
• Processes word from left to right

• Maintain two data structures

‣ Buffer: input words yet to 

be processed

‣ Stack: store words that are 

being processed

COMP90042 L16

25

Transition-Based Parsing: Intuition
• At each step, perform one of the 3 actions:

‣ Shift: move a word from buffer to stack

‣ Left-Arc: assign current word as head of the
previous word in stack


‣ Right-Arc: assign previous word as head of
current word in stack

A B C D

A B C D

COMP90042 L16

26

??

PollEv.com/jeyhanlau569

Buffer

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L16

27

COMP90042 L16

28

Buffer

COMP90042 L16

29

Dependency Labels

• For simplicity, we omit labels on the dependency
relations

• In practice, we parameterise the left-arc and right-
arc actions with dependency labels:

‣ E.g. left-arc-nsubj or right-arc-dobj

• Expands the list of actions to > 3 types

COMP90042 L16

30

The Right Action?

• We assume an oracle that tells us the right action
at every step

• Given a dependency tree, the role of oracle is to
generate a sequence of ground truth actions

COMP90042 L16

31

Parsing Model

• We then train a supervised model to mimic the
actions of the oracle

‣ To learn at every step the correct action to take
(as given by the oracle)

‣ At test time, the trained model can be used to
parse a sentence to create the dependency
tree

COMP90042 L16

32

Parsing As Classification

• Input:

‣ Stack (top-2 elements: s1 and s2)

‣ Buffer (first element: b1)

• Output

‣ 3 classes: shift, left-arc, or, right-arc

• Features

‣ word (w), part-of-speech (t)

COMP90042 L16

33

• Input features:

‣ s1.w = flights

‣ s2.w = cancelled

‣ s1.t = NNS

‣ s2.t = VBD

‣ b1.w = to

‣ b1.t = TO

‣ s1.t ◦s2.t = NNS_VBD

• Output label: shift

COMP90042 L16

34

Classifiers

• Traditionally SVM works best

• Nowadays, deep learning models are state-of-the-art

• Weakness: local classifier based on greedy search

• Solutions:

‣ Beam search: keep track of top-N best actions

‣ Dynamic oracle: during training, use predicted
actions occasionally

‣ Graph-based parser

COMP90042 L16

35

Graph-based Parsing

COMP90042 L16

36

Graph-Based Parsing

• Given an input sentence, construct a fully-
connected, weighted, directed graph

• Vertices: all words

• Edges: head-dependent arcs

• Weight: score based on training data (relation that
is frequently observed receive a higher score)

• Objective: find the maximum spanning tree
(Kruskal’s algorithm)

COMP90042 L16

37

Advantage

• Can produce non-projective trees

‣ Not a big deal for English

‣ But important for many other languages

• Score entire trees

‣ Avoid making greedy local decisions like
transition-based parsers

‣ Captures long dependencies better

COMP90042 L16

38

Example

• Caveat: tree may contain cycles

• Solution: need to do cleanup to remove cycles
(Chu-Liu-Edmonds algorithm)

COMP90042 L16

39

A Final Word

• Dependency parsing a compelling, alterative,
formulation to constituency parsing

‣ Edges encode word-word syntactic and
semantic relations

• Transition-based parsing

• Graph-based parsing

COMP90042 L16

40

Required Reading

• JM3 Ch. 14