Computational
Linguistics
CSC 485 Summer 2020
14
14.Free Word-Order Languages
Gerald Penn
Department of Computer Science, University of Toronto
Copyright © 2008 Gerald Penn. All rights reserved.
Chart Parsing
2
• Simple tabulation method for CFG string recognition
• Can build bottom-up or top-down
• Table entries: edge(Cat,Left,Right)
Freer Word-Order Languages
Conjecture: Freer Word-Order (FWO) languages are the languages for which there is no known phrase structure grammar in which the rules:
• are context-independent,
• are semantically “transparent,” and
• tightly couple grammatical function assignment with linear constituent order.
We’d like to use standard chart parsers with FWO grammars, but then either:
• we must exponentially expand their rule systems into a classical context-independent rewriting system [Barton et al., 1987], or
• we must generalize the assumption that edges are characterizable by a left node, right node and category triple (contiguous subspans).
3
Freer Word-Order Languages
Conjecture: Freer Word-Order (FWO) languages are the languages for which there is no known phrase structure grammar in which the rules:
• are context-independent,
• are semantically “transparent,” and • tightly couple grammatical . . .
We’d like to use standard chart parsers with FWO grammars, but then either:
• we must exponentially expand . . .
• we must generalize the assumption that edges are characterizable by a left node, right node and category triple (contiguous subspans).
4
Generalizing Parsing State
Use input-length bit vectors [Johnson, 1985; Reape, 1991; et al.] plus categories in edges . . .
. . . but this isn’t even half of the story:
1. danger of na ̈ıvely searching through 2n states in a purely bottom-up approach:
• no goal-directedness,
• linear precedence constraints (<) can help, but almost without exception their scopes are:
– not purely existential, e.g.: ∀X.∀Y.Y < X
– not clearly bounded, e.g.: all Y in the
current NP, all Y in the current clause, etc.
– not linearly projective, e.g.: NP
3. N = N0.
• (Subsequent Prediction) • (Completion)
9
Edge Prediction
Using this distinction, the prediction steps are then (assuming no LP constraints, for simplicity):
• (Initial Prediction)
• (Subsequent Prediction) ⟨N, C, R⟩ =⇒ ⟨Nj+1, Cj, φ⟩
where:
1. N0 → N1 . . . Nk,
2. N = N0,
3. ⟨N1, C, φ⟩ succeeded with U1, .
⟨Nj, Cj−1, φ⟩ succeeded with Uj, 4. k > 1 and 1 ≤ j < k − 1, and 5.Cj =C∩U1∩...∩Uj.
• (Completion)
10
,
Edge Prediction
Using this distinction, the prediction steps are then (assuming no LP constraints, for simplicity):
• (Initial Prediction)
• (Subsequent Prediction)
• (Completion) ⟨N, C, R⟩ =⇒ ⟨Nk, Ck−1, Rk−1⟩, where:
1. N0 → N1 . . . Nk,
2. N = N0,
3. ⟨N1, C, φ⟩ succeeded with U1, .
⟨Nk−1, Ck−2, φ⟩ succeeded with Uk−1, 4. Ck−1 = C ∩ U 1 ∩ . . . ∩ U k−1, and
5. Rk−1 = R ∩ U 1 ∩ . . . ∩ U k−1.
11
Edge Prediction
Using this distinction, the prediction steps are then (assuming no LP constraints, for simplicity):
• (Initial Prediction)
• (Subsequent Prediction) • (Completion)
If successful, then ⟨N, C, R⟩ succeeds with U = U1 ∪ . . . ∪ Uk.
12
Edge Subsumption
CanBV and ReqBV span sublattices of the powerset lattice of all possible bit vectors.
Example: let C1 = {1,2,4,5,6}, N1 = N2, C2 = {1,2,3,4,5,6}, and R1 = R2 = {1,2}:
{1,2,3,4,5}
{1,2,3,4} {1,2,3,5}
{1,2,3}
{1,2,3,4,5,6}
{1,2,3,5,6}
{1,2,3,6}
{1,2,4}
{1,2,3,4,6}
{1,2,4,5}
{1,2,5}
{1,2}
{1,2,4,5,6}
{1,2,4,6}
{1,2,6}
{1,2,5,6}
13
Edge Subsumption
CanBV and ReqBV span sublattices of the powerset lattice of all possible bit vectors.
Example: let C1 = {1,2,4,5,6}, N1 = N2, C2 = {1,2,3,4,5,6}, and R1 = R2 = {1,2}:
{1,2,3,4,5}
{1,2,3,4} {1,2,3,5}
{1,2,3}
{1,2,3,4,5,6}
{1,2,3,5,6}
{1,2,3,6}
{1,2,4}
{1,2,3,4,6}
{1,2,4,5}
{1,2,5}
{1,2}
{1,2,4,5,6}
{1,2,4,6}
{1,2,6}
{1,2,5,6}
14
Edge Subsumption
Given existing active chart edge: ⟨N, C1, R1⟩ candidate active edge: ⟨N, C2, R2⟩
4 main cases: let Oi := Ci ∩ Ri (optional )
1. R1 ∩ C2 ̸= φ: try another active edge
2. R2 ∩ C1 ̸= φ: try another active edge
3. (Z :=)O2 ∩ C1 ̸= φ: then split active edge into:
(a) ⟨N, C2, R2 ∪ Z⟩ and try another
(b) ⟨N, C2∩Z, R2⟩ and try this active edge again.
4. (Z :=)R1 ∩ O2 ̸= φ: try another active edge on ⟨N, C2 ∩ Z, R2⟩,
If none of these, then fail: ⟨N, C2, R2⟩ is subsumed by an active edge.
15
Edge Subsumption C1
C2
R2 R1
16
Not 1. R1 ∩ C2 = φ Not 2. R2 ∩ C1 = φ Not 3. O2 ∩ C1 = φ Not 4. R1 ∩ O2 = φ
Edge Prediction Revisited We start with a goal:
{1,2,3,4}
17
Edge Prediction Revisited
Initial prediction expands it down to its principal ideal (and changes category):
18
{1,2,3}
{1,2} {1,3}
{1}
{1,2,3,4}
{1,3,4}
{1,4}
{2}
{1,2,4} {2,3,4}
{2,3} {2,4}
{3} {4}
{}
{3,4}
Edge Prediction Revisited
Success identifies a point within that ideal (the UsedBV of the corresponding passive edge):
19
{1,2,3}
{1,2} {1,3}
{1}
{1,2,3,4}
{1,3,4}
{1,4}
{2}
{1,2,4}
{2,3}
{3}
{}
{2,3,4}
{2,4}
{4}
{3,4}
Edge Prediction Revisited
Subsequent prediction computes a pseudo-complement and proceeds with its principal ideal:
20
{1,2,3}
{1,2} {1,3}
{1}
{1,2,3,4}
{1,3,4}
{1,4}
{2}
{1,2,4}
{2,3}
{3}
{}
{2,3,4}
{2,4}
{4}
{3,4}
Edge Prediction Revisited
LP constraints delete portions of these sublattices as a function of the annotating category:
21
{1,2,3}
{1,2} {1,3}
{1}
{1,2,3,4}
{1,3,4}
{1,4}
{2}
{1,2,4} {2,3,4}
{2,3} {2,4}
{3} {4}
{}
{3,4}
Category Graphs
Definition: the category graph of grammar G, with ruleset R over non-terminals N, is the smallest directed bipartite graph, C(G) = ⟨V, E⟩, such that:
•V =N∪R∪{Lex,Empty},
• (X, r) ∈ E if non-terminal X appears on the
RHS of rule r,
• (r,X) ∈ E if the LHS non-terminal of r is X,
• (Lex, r) ∈ E if there is a terminal on the RHS of rule r, and
• (Empty, r) ∈ E if r is an empty production rule.
We will call the vertices of C(G) either category nodes or rule nodes. Lex and Empty are considered category nodes.
22
Category Graphs
S S
23
S → VP NP NP1 → N’ S NP2 → N’ N’1 → N Det N’2 → N
VP1 →VNP VP2 → V
N → {boy, girl}
Det → {a, the, this} V → {sees, calls}
NP
N’
VP
NP1
NP2
VP1
VP2
N’2
N Det V
N Det V
Lex Empty
N’1
24
Yield Bounds
How do we calculate the minimum and maximum
yields for the non-terminals of G?
• Empty has min/max of 0.
• Lex has min/max of 1.
• Topologically sort C(G).
• Compute other min’s and max’s in topological order.
...unless C(G) has a cycle, in which case every node on a cycle and every node path-accessible from a cycle node has a maximum yield of +∞.
Category Bounds vs. Rule Bounds
This procedure actually assigns max and min yields to rule nodes as well as category nodes.
Bounds on rule nodes are approximations of true category bounds . . .
...but they are more useful to us, because the bounds they provide are tighter
. . . and they are only sound when the subtree labelled by their LHS category uses that rule.
How else can we contextualize yield bounds to obtain tighter, more useful numbers?
25
Height-Dependent Yield Bounds
As a function of tree height, yield bounds are never infinite!
Given non-terminal X, let:
Xmax(≤ h) = max Xmax(j) 0≤j≤h
Xmin(≤ h) = min Xmin(j) 0≤j≤h
Then, for h > 1:
Xmax(h) = max max Xmax(h − 1)
X→X1…Xk∈R 1≤i≤k i k max
+ Xj (≤h−1) j=1,j̸=i
Xmin(h) = min min Xmin(h − 1) X→X1…Xk∈R 1≤i≤k i
k min + Xj (≤h−1) .
j=1,j̸=i
26
27
Height-Dependent Yield Bounds Height-dependent yield bounds are useful because:
• although input in general is unbounded in length, every single input string is of a known and finite length;
• these equations can be iteratively inverted to translate known (bounds on) yield to bounds on height;
• in particular, minimum yield bounds induce maximum height bounds;
• and maximum height bounds restrict depth in a top-down search.
S
Height-Dependent Yield Bounds
h≥0
28
S NP
Height-Dependent Yield Bounds
h≥1 h≥0
29
S NP
N’
Height-Dependent Yield Bounds
h≥2 h≥1
h≥0
30
S NP
N’ N
Height-Dependent Yield Bounds
h≥3 h≥2
h≥1 h≥0
31
S NP
h≥4 h≥3
Height-Dependent Yield Bounds
32
h≥2 N h=1 man h=0
N’
RC h≥0
33
Cycle-Dependent Yield Bounds
It is also possible to bound yields relative to the
number of iterations through certain cycles in C(G). These are more flexible than heights because:
• they can attain exact values without finding a leaf node (lexical item), and
• it is easy to distinguish different cycles or count only some cycles . . .
. . . but there can also be problems with having too many cycle variables, or not knowing which cycles to count.
. . . and I’m running out of time.
Can CFGs benefit from Yield Bounds? We checked:
• Grammar induced from WSJ Section 0105 of the Penn Treebank II.
• 2 purely top-down parsers: baseline and with yield bounds.
• Both written in SICStus Prolog.
• Two kinds of active edges in experimental parser:
– active(Left,Cat,Right)
– active(Left,Cat,RightLowBound,RightUpBound).
• Category bounds and rule bounds used;
– no height variables, – no cycle variables.
34
Can CFGs benefit from Yield Bounds?
• Prediction and completion both constrained by bounds.
• Subsumption can result in edge splitting into
|
UR2
two intervals:
LR1 UR1 LEdge1 ||
L Edge2 | LR2
⇓
LR2
L Edge2a | |
35
LR1 − 1
LEdge2b || UR1 + 1 UR2
1e+06
100000
10000
Can CFGs benefit from Yield Bounds?
• Yes! (this one, at least)
• Median all-paths parse time reduction: 52.24% • Maximum: 65.93%
• Minimum: 38.71%
Sentence
36
Top-Down Baseline
Yield Bounds
Time (msec)
Conclusions
• FWO chart parsing works like CFG chart parsing except we consider all possible sublat- tices of a powerset lattice instead of all possible line subsegments within a line segment.
• LP constraints delete portions of those sublattices from consideration.
• Category Graphs can be used to compute yield bounds.
• Yield bounds can be used to bound search — even with CFGs.
37
Conclusions
• Yield bounds come in many flavours:
– category yields (min and max),
– rule yields,
– height-dependent yields,
– cycle-dependent yields (be careful!).
• Dependent yield equations can be (iteratively) inverted to provide sound and complete depth- bounds on (top-down) search.
38
Acknowledgements
• Nick Pendar (UT Linguistics, now H5 Inc.) • Stefan Banjevic (UT Math)
• Michael Demko (UT Computer Science)
Thanks!
39
Yield Bounds
For typed feature structures, these bounds are approximate:
• we do not know that actual paths exist, just because each edge does;
• unification can bind variables/nodes that propagate along paths, creating potential violations.
40