程序代写代做代考 graph algorithm C A Zoo of Parsing Algorithms

A Zoo of Parsing Algorithms
ANLP: Week 5, Unit 4
Shay Cohen
Based on slides from ANLP 2019
1/70
Recap: Syntax
Two reasons to care about syntactic structure (parse tree):
􏰀 As a guide to the semantic interpretation of the sentence
􏰀 As a way to prove whether a sentence is grammatical or not
But having a grammar isn’t enough.
We also need a parsing algorithm to compute the parse tree for a given input string and grammar.
Global and local ambiguity
􏰀 We’ve already seen examples of global ambiguity: multiple analyses for a full sentence, like I saw the man with the telescope
􏰀 But local ambiguity is also a big problem: multiple analyses for parts of sentence.
􏰀 the dog bit the child: first three words could be NP (but aren’t).
􏰀 Building useless partial structures wastes time.
􏰀 Avoiding useless computation is a major issue in parsing.
􏰀 Syntactic ambiguity is rampant; humans usually don’t even notice because we are good at using context/semantics to disambiguate.
2/70
4/70
Parsing algorithms
Goal: compute the structure(s) for an input string given a grammar.
􏰀 As usual, ambiguity is a huge problem.
􏰀 For correctness: need to find the right structure to get the
right meaning.
􏰀 For efficiency: searching all possible structures can be very slow; want to use parsing for large-scale language tasks (e.g., used to create Google’s “infoboxes”).
3/70

Parser properties
All parsers have two fundamental properties:
􏰀 Directionality: the sequence in which the structures are constructed.
􏰀 top-down: start with root category (S), choose expansions, 􏰀 build down to words.
􏰀 bottom-up: build subtrees over words, build up to S.
Mixed strategies also possible (e.g., left corner parsers)
􏰀 Search strategy: the order in which the search space of possible analyses is explored.
5/70
Example: search space for top-down parser
􏰀 Start with S node.
􏰀 Choose one of many possible expansions.
􏰀 Each of which has children with many possible expansions…
􏰀 etc
SSS
S
aux NPVP
S . .. . . S S . .. . . S
Recursive Descent Parsing
NP VP
NP
S . .. .. S
NP
􏰀 A recursive descent parser treats a grammar as a specification of how to break down a top-level goal (find S) into subgoals (find NP VP).
􏰀 It is a top-down, depth-first parser:
􏰀 Blindly expand nonterminals until reaching a terminal (word).
􏰀 If multiple options available, choose one but store current state as a backtrack point (in a stack to ensure depth-first.)
􏰀 If terminal matches next input word, continue; else, backtrack.
6/70
8/70
Search strategies
􏰀 depth-first search: explore one branch of the search space at a time, as far as possible. If this branch is a dead-end, parser needs to backtrack.
􏰀 breadth-first search: expand all possible branches in parallel (or simulated parallel). Requires storing many incomplete parses in memory at once.
􏰀 best-first search: score each partial parse and pursue the highest-scoring options first. (Will get back to this when discussing statistical parsing.)
7/70

RD Parsing algorithm
Start with subgoal = S, then repeat until input/subgoals are empty:
􏰀 If first subgoal in list is a non-terminal A, then pick an expansion A → B C from grammar and replace A in subgoal list with B C
􏰀 If first subgoal in list is a terminal w:
􏰀 If input is empty, backtrack.
􏰀 If next input word is different from w, backtrack.
􏰀 If next input word is w, match! i.e., consume input word w and
subgoal w and move to next subgoal.
If we run out of backtrack points but not input, no parse is possible.
9/70
Recursive descent parsing pseudocode
In the background: a CFG G, a sentence x1 ···xn Function RecursiveDescent(t, v, i) where
􏰀 t is a partially constructed tree 􏰀 v isanodeint
􏰀 i is a sentence position
􏰀 Let N be the nonterminal in v 􏰀 For each rule with LHS N:
􏰀 IftheruleisalexicalruleN→w,checkwhetherxi =w,ifso increase i by 1 and call RecursiveDescent(t, u, i + 1) where u
􏰀 is the lowest point above v that has a nonterminal
If the rule is a grammatical rule, Let t′ be t with v expanded using the rule N → A1 ···An. For each j ∈ {1···n}, call RecursiveDescent(t′, uj, i) where uj is the node for nonterminal Aj in t′.
Start with: RecursiveDescent(S, topnode, 1)
Recursive descent example
Consider a very simple example:
􏰀 Grammar contains only these rules:
S → NP VP VP → V NN → bit NP → DT NN DT → the NN → dog
􏰀 The input sequence is the dog bit
10/70
V → bit V → dog
11/70
Recursive descent parsing pseudocode
In the background: a CFG G, a sentence x1 ···xn Function RecursiveDescent(t, v, i) where
􏰀 t is a partially constructed tree 􏰀 v isanodeint
􏰀 i is a sentence position
􏰀 Let N be the nonterminal in v 􏰀 For each rule with LHS N:
􏰀 IftheruleisalexicalruleN→w,checkwhetherxi =w,ifso increase i by 1 and call RecursiveDescent(t, u, i + 1) where u
􏰀 is the lowest point above v that has a nonterminal
If the rule is a grammatical rule, Let t′ be t with v expanded using the rule N → A1 ···An. For each j ∈ {1···n}, call RecursiveDescent(t′, uj, i) where uj is the node for nonterminal Aj in t′.
Start with: RecursiveDescent(S, topnode, 1)
Quick quiz: this algorithm has a bug. Where? What do we need to add?
10/70

12/70
Recursive descent example
􏰀 Grammar contains only these rules:
S → NP VP NP → Det N P → in
Det → a
VP → V NP PP PP → P NP
V → saw
N → dog
NP → Det N PP VP → V NP
Det → the
N → man
N → park
􏰀 The input sequence is the dog saw a man in the park
Recursive Descent Parsing
S
NP VP
13/70
the dog saw a man in the park
15/70
Recursive Descent Parsing
S
the dog saw a man in the park
14/70

Recursive Descent Parsing
S
NP
Det N PP
VP
the dog saw a man in the park
16/70
Recursive Descent Parsing
S
NP
Det
the
VP
N PP
the dog saw a man in the park
Recursive Descent Parsing
17/70
S
NP
Det N PP man
VP
the
the dog saw a man in the park
19/70
Recursive Descent Parsing
S
NP
Det N PP
VP
the
the dog saw a man in the park
18/70

Recursive Descent Parsing
S
NP
Det
VP
N
PP
park
the
the dog saw a man in the park
20/70
Recursive Descent Parsing
S
NP
N
VP
Det
PP
the dog
the dog saw a man in the park
Recursive Descent Parsing
21/70
S
NP
Det N PP
P
in
VP
NP
the dog
the dog saw a man in the park
23/70
Recursive Descent Parsing
S
NP
Det N PP
VP
the dog
P NP
the dog saw a man in the park
22/70

Recursive Descent Parsing
S
NP
Det
VP
N
the
the dog saw a man in the park
24/70
Recursive Descent Parsing
S
NP
N
VP
Det
the dog
the dog saw a man in the park
Recursive Descent Parsing
25/70
S
NP
Det N V
VP
NP
Det N PP
PP
the dog saw a
the dog saw a man in the park
27/70
Recursive Descent Parsing
S
NP
Det N V
the dog saw
VP
NP PP
the dog saw a man in the park
26/70

Recursive Descent Parsing
S
NP
Det N V
VP
NP Det
PP
N PP
the dog saw a man
the dog saw a man in the park
28/70
Recursive Descent Parsing
S
NP
Det N V
VP
NP
Det
PP
N PP
P NP
the dog saw a man in
the dog saw a man in the park
Recursive Descent Parsing
S
29/70
NP
Det N V
VP
NP
Det
PP
N PP
P NP
Det N PP
the dog saw
a man in the park
P NP
the dog saw a man in the park
31/70
Recursive Descent Parsing
S
NP
Det N V
VP
NP
Det
PP
N PP
P NP
the dog saw a man in
Det N PP
the dog saw a man in the park
30/70

Recursive Descent Parsing
S
NP
Det N V NP
VP
PP
the dog saw
Det N
the dog saw a man in the park
32/70
Recursive Descent Parsing
S
NP
Det N
VP
V NP
Det
PP
the dog
saw a man
N
the dog saw a man in the park
Left Recursion
Can recursive descent parsing handle left recursion?
33/70
35/70
Recursive Descent Parsing
S
NP
Det N V NP PP
Det N P NP
Det
VP
N
the dog saw a man in the park
the dog saw a man in the park
34/70

Left Recursion
Can recursive descent parsing handle left recursion? Grammars for natural human languages should be revealing, left-recursive rules are needed in English.
NP → DET N NP → NPR DET → NP ’s
These rules generate NPs with possessive modifiers such as:
John’s sister
John’s mother’s sister
John’s mother’s uncle’s sister
John’s mother’s uncle’s sister’s niece
35/70
Shift-Reduce Parsing
A Shift-Reduce parser tries to find sequences of words and phrases that correspond to the righthand side of a grammar production and replace them with the lefthand side:
􏰀 Directionality = bottom-up: starts with the words of the input and tries to build trees from the words up.
􏰀 Search strategy = breadth-first: starts with the words, then applies rules with matching right hand sides, and so on until the whole sentence is reduced to an S.
Shift-Reduce Parsing
36/70
Text
my dog saw a man in the park with a stat
Stack
Remaining
38/70
Algorithm Sketch: Shift-Reduce Parsing
Until the words in the sentences are substituted with S:
􏰀 Scan through the input until we recognise something that
corresponds to the RHS of one of the production rules (shift)
􏰀 Apply a production rule in reverse; i.e., replace the RHS of the rule which appears in the sentential form with the LHS of the rule (reduce)
A shift-reduce parser implemented using a stack:
1. start with an empty stack
2. a shift action pushes the current input symbol onto the stack 3. a reduce action replaces n items with a single item
37/70
u

Shift-Reduce Parsing
Stack
Shift-Reduce Parsing
Text Text
Stack
Det my
statue Det N my dog
stat
Shift-Reduce Parsing
NP
Det N my dog
Stack
dog saw a man in the park with a
saw a man in the park with a
Remaining
Remaining
39/70
Shift-Reduce Parsing
Text Text
Stack
statue NP V NP
Det N saw Det N my dog a man
stat
saw a man in the park
in the park
Remaining
with a
Remaining
40/70
with a
41/70
42/70
u
u

Shift-Reduce Parsing
Stack
NP V NP PP
Det N saw Det N P NP
my dog a man in Det N the park
Shift-Reduce Parsing
NP VP
Det N V
my dog saw NP
NP
Stack
Remaining
with a
Shift-Reduce Parsing
Stack
Text Re
statue NP V NP
Det N saw NP PP
my dog Det N P NP
a maninDet N the park
43/70
Shift-Reduce Parsing
Rema
with a statue S
NP
Det N V
Re
44/70
inSitnagckText
VP
PP
NP
Det N P
NP
my dog saw NP
PP
a maninDet N the park
Det N P
NP
45/70
a maninDet N the park
46/70
m
w
m
w

How many parses are there?
If our grammar is ambiguous (inherently, or by design) then how many possible parses are there?
47/70
How many parses are there?
If our grammar is ambiguous (inherently, or by design) then how many possible parses are there?
In general: an infinite number, if we allow unary recursion.
How many parses are there?
A
aa
47/70
48/70
How many parses are there?
If our grammar is ambiguous (inherently, or by design) then how many possible parses are there?
In general: an infinite number, if we allow unary recursion.
More specific: suppose that we have a grammar in Chomsky normal form. How many possible parses are there for a sentence of n words? Imagine that every nonterminal can rewrite as every pair of nonterminals (A→BC) and every nonterminal (A→a)
1. n
2. n2
3. nlogn
4. (2n)! (n+1)!n!
47/70

How many parses are there?
A
aa AA
AA AA
aaa aaa
48/70
How many parses are there?
A
aa AA
AA AA
aaa aaa AA
AA AA AA
aaaa aaaa
How many parses are there?
A
aa AA
AA AA
48/70
aaa aaa AAAAA
AAAAAA AAAAAAAA
aaaa aaaa aaaa aaaa aaaa
48/70
How many parses are there?
A
aa AA
AA AA
aaa aaa AAAA
AAAA
AAAAAAAA
aaaa aaaa aaaa aaaa
48/70

How many parses are there?
Intution. Let C(n) be the number of binary trees over a sentence of length n. The root of this tree has two subtrees: one over k words (1 ≤ k < n), and one over n − k words. Hence, for all values of k, we can combine any subtree over k words with any subtree over n − k words: 􏰄n−1 k=1 C(n) = C(k) × C(n − k) 49/70 How many parses are there? Intution. Let C(n) be the number of binary trees over a sentence of length n. The root of this tree has two subtrees: one over k words (1 ≤ k < n), and one over n − k words. Hence, for all values of k, we can combine any subtree over k words with any subtree over n − k words: 􏰄n−1 C(k) × C(n − k) (n + 1)!n! C(n) = C(n) = (2n)! k=1 These numbers are called the Catalan numbers. They’re big numbers! n 1 2 3 4 5 6 8 9 10 11 12 C(n) 1 1 2 5 14 42 132 429 1430 4862 16796 Dynamic Programming 49/70 With a CFG, a parser should be able to avoid re-analyzing sub-strings because the analysis of any sub-string is independent of the rest of the parse. NP The dog saw a man in the park NP NP NP The parser’s exploration of its search space can exploit this independence if the parser uses dynamic programming. Dynamic programming is the basis for all chart parsing algorithms. 51/70 Problems with Parsing as Search 1. A recursive descent parser (top-down) will do badly if there are many different rules for the same LHS. Hopeless for rewriting parts of speech (preterminals) with words (terminals). 2. A shift-reduce parser (bottom-up) does a lot of useless work: many phrase structures will be locally possible, but globally impossible. Also inefficient when there is much lexical ambiguity. 3. Both strategies do repeated work by re-analyzing the same substring many times. We will see how chart parsing solves the re-parsing problem, and also copes well with ambiguity. 50/70 Parsing as Dynamic Programming 􏰀 Given a problem, systematically fill a table of solutions to sub-problems: this is called memoization. 􏰀 Once solutions to all sub-problems have been accumulated, solve the overall problem by composing them. 􏰀 For parsing, the sub-problems are analyses of sub-strings and correspond to constituents that have been found. 􏰀 Sub-trees are stored in a chart (aka well-formed substring table), which is a record of all the substructures that have ever been built during the parse. Solves re-parsing problem: sub-trees are looked up, not re-parsed! Solves ambiguity problem: chart implicitly stores all parses! 52/70 Depicting a Chart A chart can be depicted as a matrix: 􏰀 Rows and columns of the matrix correspond to the start and end positions of a span (ie, starting right before the first word, ending right after the final one); 􏰀 A cell in the matrix corresponds to the sub-string that starts at the row index and ends at the column index. 􏰀 It can contain information about the type of constituent (or constituents) that span(s) the substring, pointers to its sub-constituents, and/or predictions about what constituents might follow the substring. CYK: an example Let’s look at a simple example before we explain the general case. Grammar Rules in CNF 53/70 NP → Nom → AP → A → Det → Adv → Det Nom book |orange |APNom heavy |orange |AdvA heavy | orange a very (N.B. Converting to CNF sometimes breeds duplication!) Now let’s parse: a very heavy orange book 55/70 CYK Algorithm CYK (Cocke, Younger, Kasami) is an algorithm for recognizing and recording constituents in the chart. 􏰀 Assumes that the grammar is in Chomsky Normal Form: rules allhaveformA→BC orA→w. 􏰀 Conversion to CNF can be done automatically. NP → Nom → OptAP → A → Det → OptAdv → N → Det Nom NP → Det Nom N | OptAPNom Nom → book | orange | APNom heavy | orange | AdvA heavy | orange a very ε | OptAdvA heavy | orange a ε | very book | orange → AP A → Det → Adv → 54/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book 0a 1 very 2 heavy 3 orange 4 book Det Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 56/70 12345 a very heavy orange book Det Adv A,AP 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det Adv 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det Adv AP A,AP 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book 0a 1 very 2 heavy 3 orange 4 book Det Adv AP A,AP Nom,A,AP Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 56/70 12345 a very heavy orange book Det Adv AP Nom A,AP Nom Nom,A,AP 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det Adv AP A,AP Nom Nom,A,AP 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det NP Adv AP Nom A,AP Nom Nom,A,AP 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book 0a 1 very 2 heavy 3 orange 4 book Det NP Adv AP Nom A,AP Nom Nom,A,AP Nom Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 56/70 12345 a very heavy orange book Det NP Adv AP Nom A,AP Nom Nom Nom,A,AP Nom Nom 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det NP Adv AP Nom A,AP Nom Nom,A,AP Nom Nom 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 a very heavy orange book Det NP Adv AP Nom Nom A,AP Nom Nom Nom,A,AP Nom Nom 0a 1 very 2 heavy 3 orange 4 book 56/70 Filling out the CYK chart 0 a1 very2 heavy3 orange4 book5 12345 0a 1 very 2 heavy 3 orange 4 book a very heavy orange book Det NP NP Adv AP Nom Nom A,AP Nom Nom Nom,A,AP Nom Nom CYK: The general algorithm function CKY-Parse(words, grammar) returns table for j ←from 1 to Length(words) do 56/70 table[j −1,j] ← {A | A → words[j] ∈ grammar} for i ←from j − 2 downto 0 do fill row i in column j for k ← i + 1 to j − 1 do table[i, j] ← table[i, j]∪ loop over the columns fill bottom cell loop over split locations between i and j {A | A → BC ∈ grammar, Check the grammar B ∈ table[i, k] C ∈table[k,j]} for rules that link the constituents in [i, k] with those in [k, j]. For each rule found store LHS in cell [i, j]. 58/70 CYK: The general algorithm function CKY-Parse(words, grammar) returns table for j ←from 1 to Length(words) do table [j − 1, j ] ← {A | A → words [j ] ∈ grammar } for i ←from j − 2 downto 0 do for k ← i + 1 to j − 1 do table[i, j] ← table[i, j]∪ {A | A → BC ∈ grammar, B ∈ table[i,k] C ∈table[k,j]} 57/70 A succinct representation of CKY We have a Boolean table called Chart, such that Chart[A,i,j] is true if there is a sub-phrase according the grammar that dominates words i through words j Build this chart recursively, similarly to the Viterbi algorithm: For j > i + 1:
j−1
Chart[A,i,j]= 􏰆 􏰆 Chart[B,i,k]∧Chart[C,k,j]
k=i+1 A→B C
Seed the chart, for i +1 = j:
Chart[A, i, i + 1] = True if there exists a rule A → wi+1 where wi +1 is the (i + 1)th word in the string
59/70
From CYK Recognizer to CYK Parser
􏰀 So far, we just have a chart recognizer, a way of determining whether a string belongs to the given language.
􏰀 Changing this to a parser requires recording which existing constituents were combined to make each new constituent.
􏰀 This requires another field to record the one or more ways in which a constituent spanning (i,j) can be made from constituents spanning (i,k) and (k,j). (More clearly displayed in graph representation, see next lecture.)
􏰀 In any case, for a fixed grammar, the CYK algorithm runs in time O(n3) on an input string of n tokens.
􏰀 The algorithm identifies all possible parses.
Second example
S → NP VP
S → X 1 VP
X 1 → Aux VP
S → book|include|prefer
S → Verb NP
S → X2
S →Verb PP
S →VP PP
NP → TWA|Houston
NP → Det Nominal
Verb → book|include|prefer
Nominal → book|flight|money Nominal → Nominal noun Nominal → Nominal PP
VP → book|include|prefer VPVerb → NP
VP → X2 PP
X2→Verb NP
VP →Verb NP
VP → VP PP
PP → Preposition NP Noun → book|flight|money
Grammar Rules in CNF
Let’s parse Book the flight through Houston!
60/70
62/70
CYK-style parse charts
Even without converting a grammar to CNF, we can draw CYK-style parse charts:
12345
a very
heavy
orange book
Det
NP
NP
OptAdv
OptAP
Nom
Nom
A,OptAP
Nom
Nom
N,Nom,A,AP
Nom
N,Nom
0a 1 very 2 heavy 3 orange 4 book
(We haven’t attempted to show ε-phrases here. Could in principle use cells below the main diagonal for this . . . )
However, CYK-style parsing will have run-time worse than O(n3) if e.g. the grammar has rules A → BCD.
61/70

Second example
S → NP VP
S → X 1 VP
X 1 → Aux VP
S → book|include|prefer
S → Verb NP
S → X2
S →Verb PP
S →VP PP
NP → TWA|Houston
NP → Det Nominal
Verb → book|include|prefer
Nominal → book|flight|money Nominal → Nominal noun Nominal → Nominal PP
VP → book|include|prefer VPVerb → NP
VP → X2 PP
X2→Verb NP
VP →Verb NP
VP → VP PP
PP → Preposition NP Noun → book|flight|money
Grammar Rules in CNF
Let’s parse Book the flight through Houston!
62/70
Second example
Book the
flight
through
Houston
S, VP, Verb, Nominal, Noun
[0, 1]
Second example
Book the
flight
through
Houston
63/70
S, VP, Verb, Nominal, Noun
[0, 1]
Det [1, 2]
Nominal, Noun
[2, 3]
63/70
Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
Det [1, 2]
63/70

Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
Det [1, 2]
Nominal, Noun
[2, 3]
Prep [3, 4]
63/70
Second example
Book the
flight
through
Houston
S, VP, Verb, Nominal, Noun
[0, 1]
Det [1, 2]
Nominal, Noun
[2, 3]
Prep [3, 4]
Second example
Book the
flight
through
Houston
NP, Proper- Noun
[4, 5]
63/70
S, VP, Verb, Nominal, Noun
[0, 1]
[0, 2]
Det [1, 2]
NP [1, 3]
Nominal, Noun
[2, 3]
Prep [3, 4]
NP, Proper- Noun
[4, 5]
63/70
Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
[0, 2]
Det [1, 2]
Nominal, Noun
[2, 3]
Prep [3, 4]
NP, Proper- Noun
[4, 5]
63/70

Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
Det [1, 2]
NP [1, 3]
Nominal, Noun
[2, 3]
Prep [3, 4]
NP, Proper- Noun
[4, 5]
63/70
Second example
Book the
flight
through
Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
Det [1, 2]
NP [1, 3]
Nominal, Noun
[2, 3]
[2, 4]
Prep [3, 4]
Second example
Book the
flight
through
Houston
NP, Proper- Noun
[4, 5]
63/70
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
[0, 4]
Det [1, 2]
NP [1, 3]
[1, 4]
Nominal, Noun
[2, 3]
[2, 4]
Prep [3, 4]
NP, Proper- Noun
[4, 5]
63/70
Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
Det [1, 2]
NP [1, 3]
[1, 4]
Nominal, Noun
[2, 3]
[2, 4]
Prep [3, 4]
NP, Proper- Noun
[4, 5]
63/70

Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
[0, 4]
Det [1, 2]
NP [1, 3]
[1, 4]
Nominal, Noun
[2, 3]
[2, 4]
Prep [3, 4]
PP [3, 5]
NP, Proper- Noun
[4, 5]
63/70
Second example
Book the
flight
through
Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
[0, 4]
Det [1, 2]
NP [1, 3]
[1, 4]
Nominal, Noun
[2, 3]
[2, 4]
Nominal [2, 5]
Prep [3, 4]
PP [3, 5]
Second example
Book the
flight
through
Houston
NP, Proper- Noun
[4, 5]
63/70
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
S1,VP,X2, S2,VP,
S3 [0, 5]
[0, 2]
[0, 4]
Det [1, 2]
NP [1, 3]
NP [1, 5]
[1, 4]
Nominal, Noun
[2, 3]
Nominal [2, 5]
[2, 4]
Prep [3, 4]
PP [3, 5]
NP, Proper- Noun
[4, 5]
63/70
Second example
Book the flight through Houston
S, VP, Verb, Nominal, Noun
[0, 1]
S, VP, X2 [0, 3]
[0, 2]
[0, 4]
Det [1, 2]
NP [1, 3]
NP [1, 5]
[1, 4]
Nominal, Noun
[2, 3]
Nominal [2, 5]
[2, 4]
Prep [3, 4]
PP [3, 5]
NP, Proper- Noun
[4, 5]
63/70

Visualizing the Chart
64/70
Visualizing the Chart
A Tribute to CKY (part 1/3)
You, my CKY algorithm,
dictate every parser’s rhythm,
if Cocke, Younger and Kasami hadn’t bothered,
all of our parsing dreams would have been shattered.
You are so simple, yet so powerful,
and with the proper semiring and time,
you will be truthful,
to return the best parse – anything less would be a crime.
With dynamic programming or memoization,
you are one of a kind,
I really don’t need to mention,
if it weren’t for you, all syntax trees would be behind.
65/70
67/70
Dynamic Programming as a problem-solving technique
􏰀 Given a problem, systematically fill a table of solutions to sub-problems: this is called memoization.
􏰀 Once solutions to all sub-problems have been accumulated, solve the overall problem by composing them.
􏰀 For parsing, the sub-problems are analyses of sub-strings and correspond to constituents that have been found.
􏰀 Sub-trees are stored in a chart (aka well-formed substring table), which is a record of all the substructures that have ever been built during the parse.
Solves re-parsing problem: sub-trees are looked up, not re-parsed! Solves ambiguity problem: chart implicitly stores all parses!
66/70

A Tribute to CKY (part 2/3)
Failed attempts have been made to show there are better, for example, by using matrix multiplication,
all of these impractical algorithms didn’t matter –
you came out stronger, insisting on just using summation.
All parsing algorithms to you hail,
at least those with backbones which are context-free, you will never become stale,
as long as we need to have a syntax tree.
It doesn’t matter that the C is always in front, or that the K and Y can swap,
you are still on the same hunt,
maximizing and summing, nonstop.
68/70
A Tribute to CKY (part 3/3)
Every Informatics student knows you intimately, they have seen your variants dozens of times, you have earned that respect legitimately,
and you will follow them through their primes.
CKY, going backward and forward, inside and out,
it is so straightforward –
You are the best, there is no doubt.
Summary
􏰀 Parsing as search is inefficient (typically exponential time).
􏰀 Alternative: use dynamic programming and memoize
sub-analysis in a chart to avoid duplicate work.
􏰀 The chart can be visualized as as a matrix.
􏰀 The CYK algorithm builds a chart in O(n3) time. The basic version gives just a recognizer, but it can be made into a parser if more info is recorded in the chart.
69/70
71/70
Questions to Ask Yourself
􏰀 How many spans are there for a given sequence (as a function of the length of the sentence)?
􏰀 How long does it take to process each one of them (each “cell”) as a function of the length of the span and the size of the grammar?
􏰀 Does CYK perform any unnecessary calculation?
70/70