Transition-based dependency parsing
Transition systems for dependency parsing
Transitions are between configurations that are represented as triples C ! ( , , A), where is the stack, is the input bu↵er, and A is the list of arcs that have been created.
Transition systems:
I Arc-standard I Arc-eager
The arc-standard system
The arc-standard system is closely related to the shift-reduce algorithm for phrase structure parsing, with the di↵erence being that the Reduce action is split into two actions, Arc-left and Arc-right, depending on whether the head is on the left or right.
I Shift ( , i| , A) ) ( |i, , A)
I Arc-left ( |i, j| , A) ) ( , j| , A j ! r i) I Arc-right ( |i, j| , A) ) ( , i| , A i ! r j)
Arc-standard derivation
11.3. TRANSITION-BASEDDEPENDENCYPARSING
271
arc added to A (they like)
(with lox) (bagels ! lox) (like ! bagels) (ROOT ! like)
they like bagels with lox like bagels with lox
like bagels with lox bagels with lox
with lox lox
lox bagels like
action
SHIFT ARC-LEFT SHIFT SHIFT SHIFT ARC-LEFT ARC-RIGHT ARC-RIGHT ARC-RIGHT DONE
1. [ROOT]
2. [ROOT, they]
3. [ROOT]
4. [ROOT,
5. [ROOT,
6. [ROOT,
7. [ROOT,
8. [ROOT,
9. [ROOT]
10. [ROOT]
like]
like, bagels]
like, bagels, with] like, bagels]
like]
Table 11.2: Arc-standard derivation of the unlabeled dependency parse for the input they like bagels with lox.
Arc-eager dependency parsing changes the ARC-RIGHT action so that right depen- dents can be attached before all of their dependents have been found. Rather than re- moving the modifier from both the buffer and stack, the ARC-RIGHT action pushes the modifier on to the stack, on top of the head. Because the stack can now contain elements that already have parents in the partial dependency graph, two additional changes are
necessary:
• A precondition is required to ensure that the ARC-LEFT action cannot be applied
when the top element on the stack already has a parent in A.
• A new REDUCE action is introduced, which can remove elements from the stack if
they already have a parent in A:
( |i, , A) ) ( , , A). [11.21]
As a result of these changes, it is now possible to create the arc like ! bagels before parsing
the prepositional phrase with lox. Furthermore, this action does not imply a decision about
Arc-eager transition system
Arc-eager dependency parsing changes the Arc-right action so that right dependents can be attached before all of their dependents have been found. Rather than removing the modifier from both the bu↵er and stack, the Arc-right action pushes the modifier on to the stack of the head. Two additional changes are necessary:
I A precondition is required to ensure that the Arc-left action cannot be applied when the top element on the stack already has a parent A.
I A new Reduce action is introduced, which can remove elements from the stack if they have a parent A: ( |i, ,A) ) ( , ,A)
Arc-eager derivation
272
1. [ROOT]
2. [ROOT, they]
3. [ROOT]
4. [ROOT, like]
5. [ROOT, like,
6. [ROOT, like,
7. [ROOT, like,
8. [ROOT, like,
9. [ROOT, like,
10. [ROOT, like]
11. [ROOT]
CHAPTER11. DEPENDENCYPARSING
bagels] bagels, with] bagels] bagels, lox] bagels]
they like bagels with lox like bagels with lox
like bagels with lox bagels with lox
with lox lox
lox
action
SHIFT ARC-LEFT ARC-RIGHT ARC-RIGHT SHIFT ARC-LEFT ARC-RIGHT REDUCE REDUCE REDUCE DONE
arc added to A (they like)
(ROOT ! like) (like ! bagels)
(with lox) (bagels ! lox)
Table 11.3: Arc-eager derivation of the unlabeled dependency parse for the input they like bagels with lox.
left-most edge of the buffer (Nivre, 2008). Non-projective transition systems can be con- structed by adding actions that create arcs with words that are second or third in the stack (Attardi, 2006), or by adopting an alternative configuration structure, which main- tains a list of all words that do not yet have heads (Covington, 2001). In pseudo-projective
dependency parsing, a projective dependency parse is generated first, and then a set of graph transformation techniques are applied, producing non-projective edges (Nivre and Nilsson, 2005).
Beam search
In “greedy” transition-based parsing, the parser tries to make the best decision at each configuration. This can lead to search errors, when an early decision locks the parser into a poor derivation. For example, in Table 11.2, if ARC-RIGHT were chosen at step 4, then the parser would later be forced to attach the prepositional phrase with lox to the verb
likes. Note that the likes ! bagels arc is indeed part of the correct dependency parse, but
Oracle-based training
An oracle is a function that maps a parse tree into an action sequence. Given such an oracle, a dependency treebank can be converted into a set of action sequences {A(i)}Ni=1. The parser can be trained by stepping through the oracle action sequences, and optimizing on an classification-based objective that rewards the oracle action. A commonly used objective is to maximize the conditional likelihood (or minimize the negative log conditional likelihood).
P(a|c,w) = P exp (a,c,w;✓) a02A(c) exp (a0,c,w;✓)
✓ i=1
N |A(i)|
✓ˆ = argmax X X log P(a(i)|c(i), w)
tt