CS计算机代考程序代写 data structure deep learning flex ER algorithm Dependency Grammar

Dependency Grammar
COMP90042
Natural Language Processing
Lecture 16
Semester 1 2021 Week 8 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L16
Context-Free Grammars (Recap) • CFGs assume a constituency tree which identifies the
phrases in a sentence
‣ based on idea that 
 these phrases are 
 interchangeable 
 (e.g., swap an NP 
 for another NP) and 
 maintain 
 grammaticality
2

COMP90042
L16
Dependency Grammars
• Dependency grammar offers a simpler approach ‣ describe relations between pairs of words
‣ namely, between heads and dependents
‣ e.g. (prefer, dobj, flight)
3

COMP90042
L16

Deal better with languages that are morphologically rich and have a relatively free word order
‣ CFG need a separate rule for each possible place a phrase can occur in

Head-dependent relations similar to semantic relations between words
‣ More useful for applications: coreference resolution, information extraction, etc
Why?
4

COMP90042
L16
• • •
Basics of dependency grammar Transition-based parsing Graph-based parsing
Outline
5

COMP90042
L16
Basics of Dependency Grammar
6

COMP90042
L16

Captures the grammatical relation between: ‣ Head = central word
‣ Dependent = supporting word
• •

Grammatical relation = subject, direct object, etc
Many dependency theories and taxonomies proposed for different languages
Dependency Relations
Universal Dependency: a framework to create a set of dependency relations that are computationally useful and cross-lingual
7

COMP90042
L16
Universal Dependency
8

COMP90042
L16
More Examples
9

COMP90042
L16

Question Answering Dependency tree more directly represents the
core of the sentence: who did what to whom? ‣ captured by the links incident on verb nodes
10

COMP90042
L16

“Brasilia, the Brazilian capital, was founded in 1960.”
→ capital(Brazil, Brasilia)
→ founded(Brasilia, 1960)
Dependency tree captures relations succinctly

Information Extraction
11

COMP90042
L16


Constituency trees can also provide similar information
What about CFGs
But it requires some distilling using head-finding rules
12

COMP90042
L16

Dependency tree
‣ each node is a word token
‣ one node is chosen as the root
‣ directed edges link heads and their dependents

Constituency tree
‣ forms a hierarchical tree
‣ word tokens are the leaves
‣ internal nodes are ‘constituent phrases’ e.g. NP

Both use part-of-speech
Dependency vs Constituency
13

COMP90042
L16
• • • •
Each word has a single head (parent)
There is a single root node
There is a unique path to each word from the root All arcs should be projective
Properties of a Dependency Tree
14

COMP90042
L16
Projectivity
• An arc is projective if there is a path from head to every word that lies between the head and the
dependent
Path from A to B: ✓ Path from A to C: ✓ Arc is Projective: ✓
ABCD
• Dependency tree is projective if all arcs are projective
• In other words, a dependency tree is projective if it can
be drawn with no crossing edges
• Most sentences are projective, but exceptions exist
• Common in languages with flexible word order
15

COMP90042
L16
Projectivity
Is this tree projective?
PollEv.com/jeyhanlau569
16

COMP90042
L16
17

COMP90042
L16
Treebank Conversion
• Afewdependencytreebanks(Czech,Arabic,Danish…)
• Manyconstituencytreebanks
• Somecanbeconvertedintodependencies
• Dependencytreesgeneratedfromconstituencytrees are always projective
• Mainidea:identifyhead-dependentrelationsin constituency structure and the corresponding dependency relations
‣ Use various heuristics, e.g., head-finding rules ‣ Often with manual correction
18

COMP90042
L16
19

7. Dative Adjunct 8. Locative Adjunct
9. Ablative Adjunct 10. Instrumental Adjunct
COMP90042
L16

a possessor is a genitive case-marked nominal modifier. For verbal adjuncts, Danish DDT includes additional ‘subject’ link for verbs
Some of the relations above perhaps require some more clarification. Object Examples From Treebanks
A classifier is a nominal modifier in nominative case (as in book cover) while
is used to mark objects of verbs and the nominal complements of postpositions.
we indicate the syntactic relation with a marker paralleling the case marking though the semantic relation they encode is not only determined by the case marking but also the lexical semantics of the head noun and the verb they
are attached to. For instance a dative adjunct can be a goal, a destination, a Dansk kultur skal skabes , forvaltes , plejes og forkæles af os danskere . Den er vores
adjunct may be a reason, a source or a theme. Although we do not envision the http://www.buch-kromann.dk/matthias/ddt1.0/
mod subj vobj pnct conj pnct conj coord conj mod nobj nobj pnct subj pred possd pnct
beneficiary or a value carrier in a transaction, or a theme, while an ablative 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 21 22
AN NC VA VA XP VA XP VA CC VA SP PP NC XP PP VA PO
[subj]

use of such detailed relation labels at the outset, such distinctions can certainly be useful in training case-frame based transfer modules in machine translation
METU-Sabancı Turkish treebank
systems to select the appropriate prepositions in English for instance.
‣ edgesbetweenmorphologicalunits,notjustwords
possd coord conj pnct mod subj rel pnct mod nobj expl mod pobj nobj mod nobj possd pnct
egenart og livsstil . Det bedste vi har . I debatten tordnes der løs mod Det kgl. Teaters
Det Subj
Subj
NC CC NC XP PD AN PP VA XP SP NC 23 24 25 26 29 30 31 32
VA U= AN SP PD AN NC 40 41 42 43 44 45 46
[dobj]
Mod
Obj Mod
Mod
3338 39
Mod
Bu
eski
bahçe-de
ki
gül-ün
böyle
büyü
me-si
herkes-i
çok
etkile-di
D ADJ N
++
ADJ N ADV V N PN ADV V
possd pnct subj dobj pobj mod nobj pnct mod subj vobj dobj mod nobj pnct repertoire . Filmfolket jamrer sig over manglende bevillinger , mens forfatterne flår hovedet af
Last line shows the final POS for each word.
Oflazer, Kemal, et al. “Building a Turkish treebank.” Treebanks. Springer, 2003. 261-277. NC XP NC VA PP SP VA NC XP CS NC VA NC SP
20
47 48 51 52 53 54 55 56 57 58 59 60 61 62

COMP90042
L16
Transition-based Parsing
21

COMP90042
L16
• •
Find the best structure for a given input sentence Two main approaches:
‣ Transition-based: bottom-up greedy method
‣ Graph-based: encodes problem using nodes/ edges and use graph theory methods to find optimal solutions
Dependency Parsing
22

COMP90042
L16
Caveat Transition-based parsers can only handle
projective dependency trees!


Less applicable for languages where cross- dependencies are common
23

COMP90042
L16
• •
Processes word from left to right Maintain two data structures
‣ Buffer: input words yet to
 be processed
‣ Stack: store words that are
 being processed
Transition-Based Parsing: Intuition
24

COMP90042
L16

At each step, perform one of the 3 actions:
‣ Shift: move a word from buffer to stack
‣ Left-Arc: assign current word as head of the previous word in stack


‣ Right-Arc: assign previous word as head of current word in stack
Transition-Based Parsing: Intuition
A
B
C
D
A
B
C
D
25

COMP90042
L16
Buffer
??
PollEv.com/jeyhanlau569
26

COMP90042
L16
27

COMP90042
L16
Buffer
28

COMP90042
L16


For simplicity, we omit labels on the dependency relations

In practice, we parameterise the left-arc and right- arc actions with dependency labels:
‣ E.g. left-arc-nsubj or right-arc-dobj Expands the list of actions to > 3 types
Dependency Labels
29

COMP90042
L16


We assume an oracle that tells us the right action at every step
The Right Action?
Given a dependency tree, the role of oracle is to generate a sequence of ground truth actions
30

COMP90042
L16

We then train a supervised model to mimic the actions of the oracle
‣ To learn at every step the correct action to take (as given by the oracle)
‣ At test time, the trained model can be used to parse a sentence to create the dependency tree
Parsing Model
31

COMP90042
L16

Input:
‣ Stack (top-2 elements: s1 and s2) ‣ Buffer (first element: b1)


Output
‣ 3 classes: shift, left-arc, or, right-arc
Parsing As Classification
Features
‣ word (w), part-of-speech (t)
32

COMP90042
L16
• Inputfeatures:
‣ s1.w = flights
‣ s2.w = cancelled
‣ s1.t = NNS
‣ s2.t = VBD
‣ b1.w = to
‣ b1.t = TO
‣ s1.t ◦s2.t = NNS_VBD
• Outputlabel:shift
33

COMP90042
L16
• • • •
Traditionally SVM works best
Nowadays, deep learning models are state-of-the-art Weakness: local classifier based on greedy search Solutions:
‣ Beam search: keep track of top-N best actions ‣ Dynamic oracle: during training, use predicted
actions occasionally ‣ Graph-based parser
Classifiers
34

COMP90042
L16
Graph-based Parsing
35

COMP90042
L16





Given an input sentence, construct a fully- connected, weighted, directed graph
Graph-Based Parsing
Vertices: all words
Edges: head-dependent arcs
Weight: score based on training data (relation that is frequently observed receive a higher score)
Objective: find the maximum spanning tree (Kruskal’s algorithm)
36

COMP90042
L16

Can produce non-projective trees
‣ Not a big deal for English
‣ But important for many other languages

Score entire trees
‣ Avoid making greedy local decisions like transition-based parsers
‣ Captures long dependencies better
Advantage
37

COMP90042
L16
Example
• Caveat:treemaycontaincycles
• Solution:needtodocleanuptoremovecycles (Chu-Liu-Edmonds algorithm)
38

COMP90042
L16

Dependency parsing a compelling, alterative, formulation to constituency parsing
‣ Edges encode word-word syntactic and semantic relations
• •
Transition-based parsing Graph-based parsing
A Final Word
39

COMP90042
L16

Required Reading JM3 Ch. 14
40