程序代写CS代考 algorithm Computational

Computational
Linguistics
CSC 485/2501 Fall 2021
2B
2B. Graphical Dependency Parsing

Department of Computer Science, University of Toronto
Based on slides by , , , and
Copyright © 2020 . All rights reserved.

Predicting structured outputs
 Log-linear models great for n-way classification  Also good for predicting sequences
van
find preferred tags
CVEs, or, to allow fast dynamic programming, only use n-gram features
 Also good for dependency parsing
…find preferred links…
but to allow fast dynamic programming or MST parsing, only use single-edge features
5

Edge-Factored Parsers (McDonald et al. 2005)  Is this a good edge?
yes, lots of green …
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It was a bright cold day in April and the clocks were striking thirteen”
6

Edge-Factored Parsers (McDonald et al. 2005)  Is this a good edge?
jasný  den (“bright day”)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
“It was a bright cold day in April and the clocks were striking thirteen”
7

Edge-Factored Parsers (McDonald et al. 2005)  Is this a good edge?
jasný  den (“bright day”)
jasný  N (“bright NOUN”)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
“It was a bright cold day in April and the clocks were striking thirteen”
8

Edge-Factored Parsers (McDonald et al. 2005)  Is this a good edge?
jasný  den (“bright day”)
jasný  N (“bright NOUN”)
AN
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
“It was a bright cold day in April and the clocks were striking thirteen”
9

Edge-Factored Parsers (McDonald et al. 2005)  Is this a good edge?
jasný  N (“bright NOUN”)
AN
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
“It was a bright cold day in April and the clocks were striking thirteen”
jasný  den (“bright day”)
AN preceding conjunction
10

Edge-Factored Parsers (McDonald et al. 2005)  How about this competing edge?
not as good, lots of red …
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
“It was a bright cold day in April and the clocks were striking thirteen”
11

Edge-Factored Parsers (McDonald et al. 2005)  How about this competing edge?
jasný  hodiny (“bright clocks”)
… undertrained …
Byl jasný studený dubnový den a hodiny odbíjely třináctou VAAANJNVC
“It was a bright cold day in April and the clocks were striking thirteen”
12

Edge-Factored Parsers (McDonald et al. 2005)  How about this competing edge?
jasný  hodiny jasn  hodi (“bright clocks”) (“bright clock,”
… undertrained …
stems only)
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
byl jasn stud dubn den a hodi odbí třin
“It was a bright cold day in April and the clocks were striking thirteen”
13

Edge-Factored Parsers (McDonald et al. 2005)  How about this competing edge?
jasný  hodiny jasn  hodi (“bright clocks”) (“bright clock,”
… undertrained …
stems only)
Aplural  jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC byl jasn stud dubn den a hodi odbí třin
“It was a bright cold day in April and the clocks were striking thirteen”
14

Edge-Factored Parsers (McDonald et al. 2005)  How about this competing edge?
jasný  hodiny (“bright clocks”)
jasn  hodi (“bright clock,” stems only)
AN where N follows
A N
plural singular
… undertrained …
a conjunction
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC byl jasn stud dubn den a hodi odbí třin
“It was a bright cold day in April and the clocks were striking thirteen”
15

Edge-Factored Parsers (McDonald et al. 2005)  Which edge is better?
 “bright day” or “bright clocks”?
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
byl jasn stud dubn den a hodi odbí třin
“It was a bright cold day in April and the clocks were striking thirteen”
16

Edge-Factored Parsers (McDonald et al. 2005)
 Which edge is better?
 Score of an edge e =   features(e)
 Standard algosvalid parse with max total score
our current weight vector
Byl jasný studený dubnový den a hodiny odbíjely třináctou
VAAANJNVC
byl jasn stud dubn den a hodi odbí třin
“It was a bright cold day in April and the clocks were striking thirteen”
17

Edge-Factored Parsers (McDonald et al. 2005)
 Which edge is better?
 Score of an edge e =   features(e)
 Standard algosvalid parse with max total score
our current weight vector
can’t have both can‘t have both (one parent per word) (no crossing links)
Can’t have all three (no cycles)
Thus, an edge may lose (or win) because of a consensus of other edges.
18

Non-Projective Parses
ROOT I ‘ll give a talk tomorrow on bootstrapping subtree rooted at “talk”
is a discontiguous noun phrase
can‘t have both (no crossing links)
The “projectivity” restriction. Do we really want it?
25

Non-Projective Parses
ROOT I ‘ll give a talk tomorrow on bootstrapping
ROOT
occasional non-projectivity in English
ista meam norit thatNOM myACC may-know
may-know
(i.e., it shall last till I go gray)
frequent non-projectivity in Latin, etc.
gloria gloryNOM
canitiem going-grayACC
That glory
my going-gray
26

Parsing Methods
Non-Projective Parsing Algorithms
◮ Complexity considerations:
◮ Projective (Proj)
◮ Non-projective (NonP) Problem/Algorithm
Complete grammar parsing
[Gaifman 1965, Neuhaus and Bro ̈ker 1997]
Deterministic parsing
[Nivre 2003, Covington 2001]
First order spanning tree
[McDonald et al. 2005b]
Nth order spanning tree (N > 1) [McDonald and Pereira 2006]
Proj NonP P NP hard
O(n) O(n2) O(n3) O(n2)
P NP hard
Dependency Parsing 65(103)

McDonald’s Approach (non-projective)
 Consider the sentence “John saw Mary” (left).
 The Chu-Liu-Edmonds algorithm finds the maximum-
weight spanning tree (right) – may be non-projective.  Can be found in time O(n2).
root
9 20saw
John
11 3
root
9
10
10
saw

Every node selects best parent
30
300 30
30
cycles, contract them and repeat
slide thanks to
27

Chu-Liu-Edmonds – Contracting Stage
􏲓 For each non-ROOT node v, set bestInEdge[v] to be its highest scoring incoming edge.
􏲓 If a cycle C is formed:
􏲓 contract the nodes in C into a new node vC
􏲓 edges outgoing from any node in C now get source vC
􏲓 edges incoming to any node in C now get destination vC
􏲓 For each node u in C, and for each edge e incoming to u from
outside of C:
􏲓 add to e.kicksOut the edge bestInEdge[u], and
􏲓 set e.score to be e.score − e.kicksOut.score.
􏲓 Repeat until every non-ROOT node has an incoming edge and no cycles are formed

An Example – Contracting Stage
bestInEdge
V1 V2 V3
ROOT
a:5 b:1 c:1
V1 d : 11 V2 f : 5 V3 g : 10 i : 8
e:4
h:9
kicksOut
a b c d e f g h i

An Example – Contracting Stage
bestInEdge
V1 V2 V3
g
ROOT
a:5 b:1 c:1
V1 d : 11 V2 f : 5 V3 g : 10 i : 8
e:4
h:9
kicksOut
a b c d e f g h i

An Example – Contracting Stage
bestInEdge
V1 V2 V3
g
d
ROOT
a:5 b:1 c:1
V1 d : 11 V2 f : 5 V3 g : 10 i : 8
e:4
h:9
kicksOut
a b c d e f g h i

An Example – Contracting Stage
bestInEdge
V1 V2 V3
g d
kicksOut
a b c d e f g h i
g d
g d
V1
a : 5 − 10 V4
d : 11 g : 10
ROOT
b : 1 − 11
V2
e:4
h : 9 − 10
c : 1
f : 5 V3 i : 8 − 11

An Example – Contracting Stage
bestInEdge
ROOT
V1 V2 V3 V4
g d
a : −5
b : −10
c : 1
kicksOut
a b c d e f g h i
g d
V4 f:5 V3
i : −3
e:4 h : −1
g d

An Example – Contracting Stage
bestInEdge
ROOT
V1 V2 V3 V4
g d f
a : −5
b : −10
c : 1
kicksOut
a b c d e f g h i
g d
V4 f:5 V3
i : −3
e:4 h : −1
g d

An Example – Contracting Stage
bestInEdge
ROOT
V1 V2 V3 V4
g d f h
a : −5
b : −10
c : 1
kicksOut
a b c d e f g h i
g d
V4 f:5 V3
i : −3
e:4 h : −1
g d

An Example – Contracting Stage
bestInEdge
ROOT
V1 V2 V3 V4 V5
g d f h
a : −5−−1
b : −10 − −1
c : 1 − 5
kicksOut
V5
V4 f:5 V3
i : −3
e:4 h : −1
a b c d e f g h i
g, h d, h f
g d

An Example – Contracting Stage
bestInEdge
V1 V2 V3 V4 V5
g d f h
kicksOut
a : −4
ROOT
b : −9
V5
c : −4
a b c d e f g h i
g, h d, h f
f
g d

An Example – Contracting Stage
bestInEdge
V1 V2 V3 V4 V5
g d f h a
kicksOut
a : −4
ROOT
b : −9
V5
c : −4
a b c d e f g h i
g, h d, h f
f
g d

Chu-Liu-Edmonds – Expanding Stage
After the contracting stage, every contracted node will have exactly one bestInEdge. This edge will kick out one edge inside the contracted node, breaking the cycle.
􏲓 Go through each bestInEdge e in the reverse order that we added them
􏲓 lock down e, and remove every edge in kicksOut(e) from bestInEdge.

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
g d f h a
kicksOut
a : −4
ROOT
b : −9
V5
c : −4
a b c d e f g h i
g, h d, h f
f
g d

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
ag 􏰬 d
f
ah 􏰬
a
kicksOut
a : −4
ROOT
b : −9
V5
c : −4
a
b c d e f g h i
g, h
d, h f
f
g d

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
ag 􏰬 d
f
ah 􏰬
a
ROOT
a : −5
b : −10
c : 1
kicksOut
a
b c d e f g h i
g, h
d, h f
f
g d
V4 f:5 V3
i : −3
e:4 h : −1

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
ag 􏰬 d
f ah
􏰬
a
ROOT
a : −5
b : −10
c : 1
kicksOut
a
b c d e f g h i
g, h
d, h f
f
g d
V4 f:5 V3
i : −3
e:4 h : −1

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
ag 􏰬 d
f ah
􏰬
a
ROOT
a:5 b:1 c:1
V1 d : 11 V2 f : 5 V3 g : 10 i : 8
e:4
h:9
kicksOut
a
b c d e f g h i
g, h
d, h f
f
g d

An Example – Expanding Stage
bestInEdge
V1 V2 V3 V4 V5
ag 􏰬 d
f ah
􏰬
a
ROOT
a:5 b:1 c:1
V1 d : 11 V2 f : 5 V3 g : 10 i : 8
e:4
h:9
kicksOut
a
b c d e f g h i
g, h
d, h f
f
g d

Summing over all non-projective trees Finding highest-scoring non-projective tree
 Consider the sentence “John saw Mary (left)”.
 The Chu-Liu-Edmonds algorithm finds the maximum-
weight spanning tree – may be non-projective.
 Can be found in time O(n2).
 How about total weight Z of all trees?
 Can be found in time O(n3) by matrix determinants
and inverses (Smith & Smith, 2007).
slide thanks to
28

Graph Theory to the Rescue!
O(n3) time!
Tutte’s Matrix-Tree Theorem (1948)
The determinant of the Kirchoff (aka Laplacian) adjacency matrix of directed graph G without row and column r is equal to the sum of scores of all directed spanning trees of G rooted at node r.
Exactly the Z we need!
29




s(2,0) s(n,0) s(1,j) s(2,1)  s(n,1)
0



Negate edge scores
Building the Kirchoff (Laplacian) Matrix
s(1,0)
0 s(1,0) s(2,0) s(n,0)




s(1, j) s(2,1) 0 s(2,1)
s(n,1)

0
s(n,1)
• Sum columns (children)
• Strike root row/col.
• Take determinant
j1 

0
  
 j1   s(1,2) s(2, j)   s(n,2)
0 s(1,2)j2 s(2,j) s(n,2)   
  0 s(1,2) 0 s(n,2)
 
 j2   

 

 0  s ( 1 , n )  s ( 2 , n )

s(1,n)
s(2,n) s(2,n)
0   s(n, j)

s(1,n)

  s(n, j)
0
 
 jn 
jn
N.B.: This allows multiple children of root, but see Koo et al. 2007.
30

s(1, j) j1
s(1,2) s(1,n)
s(2,1)
s(n,1) s(n,2)
s(n, j) jn
Why Should This Work?
Clear for 1×1 matrix; use induction
Chu-Liu-Edmonds analogy:
Every node selects best parent If cycles, contract and recurse
 j2
s(2, j) s(2,n)
K K with contracted edge 1,2 K K with deleted edge 1,2
K  s(1,2)K K
Undirected case; special root cases for directed
31

Graph Deletion & Contraction
Important fact: κ(G) = κ(G-{e}) + κ(G\{e})
1