Probabilistic Context-Free Grammar
COMP90042
Natural Language Processing Lecture 15
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L15
•
Context-free grammars assign hierarchical structure to language
‣ Linguistic notion of a ‘syntactic constituent’
‣ Formulated as generating all strings in the language; or
‣ Predicting the structure(s)
for a given string
Ambiguity In Parsing
•
Raises problem of
ambiguity, e.g., which is
better?
2
COMP90042
L15
•
•
•
Probabilistic context-free grammars (PCFGs) Parsing using dynamic programming
Outline
Limitations of ‘context-free’ assumption and some solutions:
‣ parent annotation ‣ head lexicalisation
3
COMP90042
L15
Basics of Probabilistic CFGs
• AsforCFGs,samesymbolset:
‣ Terminals: words such as book
‣ Non-terminal: syntactic labels such as NP or NN
• Sameproductions(rules)
‣ LHS non-terminal → ordered list of RHS symbols
• Inaddition,storeaprobabilitywitheach production
‣ NP→DTNN
‣ NN→cat
‣ NN → leprechaun ‣…
[p=0.45] [p=0.02]
[p = 0.00001]
4
COMP90042
L15
•
•
•
Probability values denote conditional ‣ Pr(LHS → RHS)
‣ Pr(RHS | LHS)
Probabilistic CFGs
Consequently they:
‣ must be positive values, between 0 and 1 ‣ must sum to one for given LHS
E.g.,
‣ NN → aadvark
‣ NN→cat
‣ NN → leprechaun
[p = 0.0003] [p=0.02]
[p = 0.0001]
‣
∑x Pr(NN→x)=1
5
COMP90042 L15
JM3 Ch14 6
COMP90042 L15
Stochastic Generation with PCFGs
Almost the same as for CFG, with one twist:
1. Start with S, the sentence symbol
2. Choose a rule with S as the LHS
‣ Randomly select a RHS according to Pr(RHS | LHS)
e.g., S → VP
‣ Apply this rule, e.g., substitute VP for S
3. Repeat step 2 for each non-terminal in the string
(here, VP)
4. Stop when no non-terminals remain
Gives us a tree, as before, with a sentence as the yield
7
COMP90042
L15
•
•
How Likely is a Tree? Given a tree, we can compute its probability
‣ Decomposes into probability of each production E.g., for (left) tree,
‣ P(tree)=
P(S → VP) ×
P(VP → Verb NP) ×
P(Verb → Book) ×
P(NP → Det Nominal) ×
P(Det → the) ×
P(Nominal → Nominal Noun) ×
P(Nominal → Noun) ×
P(Noun → dinner) ×
P(Noun → flight)
8
COMP90042
L15
P(tree)
How Likely is a Tree?
= P(S → VP) × P(VP → Verb NP) × P(Verb → Book) ×
P(NP → Det Nominal) × P(Det → the) × P(Nominal → Nominal Noun) ×
P(Nominal → Noun) × P(Noun → dinner) × P(Noun → flight)
= 0.05 × 0.20 × 0.30 ×
0.20 × 0.60 × 0.20 ×
0.75 × 0.10 × 0.40
= 2.2 × 10-6
9
COMP90042
L15
Resolving Parse Ambiguity
• Can select between different trees based on P(T)
• P(Tleft) = 2.2 × 10-6 P(Tright) = 6.1 × 10-7
10
COMP90042
L15
Parsing PCFGs
• Insteadofselectingbetweentwotrees,canweselect a tree from the set of all possible trees?
• Beforewelookedat ‣ CYK
‣ for unweighted grammars (CFGs) ‣ finds all possible trees
• Butthereareoften1000s,manycompletely nonsensical
• Canwesolveforthemostprobabletree?
11
COMP90042
L15
•
• •
CYK finds all trees for a sentence; we want best tree
CYK for PCFGs
Prob. CYK follows similar process to standard CYK Convert grammar to Chomsky Normal Form (CNF)
‣ E.g., VP → Verb NP NP [0.10]
‣ becomes VP → Verb NP+NP [0.10]
NP+NP → NP NP [1.0]
‣ where NP+NP is a new symbol.
12
COMP90042 L15
PCFG Parsing Example
13
COMP90042
L15
[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
14
COMP90042
L15
we
NP 1/4
[0,1]
[0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8
PP →INNP 1
IN → with
VP →VNP 1⁄2
→VPPP 1⁄4
1
→ MDV V → eat
1⁄4
1
15
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
16
COMP90042
L15
we
V1 [1,2]
eat
sushi
VP 1/16
(1/2 * 1 * 1/8) [1,3]
NP 1/8 [2,3]
[0,4]
[2,4]
with
IN 1 [3,4]
chopsticks
NP 1/4
[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,4]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
17
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
[0,4]
with
chopsticks
Ø [0,2]
S 1/64
(1 * 1/4 * 1/64) [0,3]
[0,5]
VP 1/16 [1,3]
[1,4]
[1,5]
NP 1/8 [2,3]
[2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
18
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
Ø [0,4]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
[0,5]
VP 1/16 [1,3]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
19
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
VP 1/16 [1,3]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
PP 1/8
(1 * 1 * 1/8) [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
20
COMP90042
L15
we
eat
sushi
with
chopsticks
NP 1/4
[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128
(1/2 * 1/8 * 1/8) [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
21
COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
NP 1/4
[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256
(1/2 * 1 * 1/128) [1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
22
COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
NP 1/4
[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
Ø [2,4]
VP 1/512
(1/4*1/16*1/8) [1,5]
NP 1/8 [2,3]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
IN → with
VP →VNP 1⁄2
→VPPP 1⁄4
1⁄4
1⁄8
1⁄8 PP →INNP 1
1
→ MDV V → eat
1⁄4 1
23
COMP90042
L15
we
eat
sushi
with
chopsticks
NP 1/4
[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256
[1,5]
1/256 > 1/512!
S →NPVP 1 NP →NPPP 1⁄2
NP 1/8 [2,3]
Ø [2,4]
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
NP 1/8 [4,5]
→ MDV V → eat
1⁄4 1
24
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/1024
(1 * 1/4 * 1/256) [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
25
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
success!
S 1/1024 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
26
COMP90042 L15
•
• •
•
S in the top-right corner of parse table indicates success
Prob CYK: Retrieving the Parses
Retain back-pointer to best analysis
To get parse(s), follow pointers back for each match
Convert back from CNF by removing new non- terminals
27
COMP90042 L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/1024 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
P(T) = 1/1024
28
COMP90042
L15
Prob. CYK
Source: JM3 Ch 14
29
CYK can be thought
of as storing all events with probability = 1
validity test now looks to
see that the child chart cells
have non-zero probability
chart now stores
probabilities for each span and symbol
Instead of storing set
of symbols, store the
probability of best scoring
tree fragment covering span
[i,j] with root symbol A
Overwrite lower scoring
analysis if this one is better,
and record the best production
30
COMP90042
L15
•
What’s the space and time complexity of this algorithm?
‣ in terms of n the length of the input sentence
Complexity of CYK
31
COMP90042
L15
Issues with PCFG
32
COMP90042
L15
PCFG Problem 1:
Poor Independence Assumptions
• Rewrite decisions made independently, whereas inter- dependence is often needed to capture global structure.
• NP→DetN
• Probability of this rule independent of rest of tree
NP statistics in the Switchboard corpus
• No way to represent this contextual differences in PCFG probabilities
33
COMP90042 L15
Poor Independence Assumptions
NP statistics in the Switchboard corpus PCFG probabilities based on Switchboard corpus
•
•
•
No way to capture the fact that in subject position, NP → PRP should go up to 0.91
While in object position NP → DT NN should go up to 0.66
Solution: add a condition to denote whether NP is a subject or object
34
COMP90042
L15
Solution: Parent Conditioning
• Make non-terminals more explicit by incorporating parent symbol into each symbol
• NP^S represents subject position (left)
• NP^VP denotes object position (right)
35
COMP90042
L15
PCFG Problem 2:
Lack of Lexical Conditioning
• Lack of sensitivity to words in tree
• Prepositional phrase (PP) attachment ambiguity
‣ Worker dumped sacks into a bin
36
COMP90042
L15
PP Attachment Ambiguity
“into a bin” describes the resulting location of the sacks
sacks to be dumped are the ones which are already “into a bin” 37
COMP90042
L15
Coordination Ambiguity
• dogsinhousesandcats
• dogsissemanticallyabetterconjunctforcatsthan houses (dogs can’t fit into houses!)
38
COMP90042
L15
Solution: Head Lexicalisation • Record head word with parent symbols
‣ the most salient child of a constituent, usually the noun in a NP, verb in a VP etc
‣ VP→VBDNPPP ⇒
VP(dumped) → VBD(dumped) NP(sacks) PP(into)
39
COMP90042
L15
•
Incorporate head words into productions, such that the most important links between words is captured
‣ rule captures correlations between head tokens of phrases
‣ VP(dumped) / NP(sacks) for PP(into)
Grammar symbol inventory expands massively!
‣ Many of the productions much too specific, seen very rarely
‣ Learning more involved to avoid sparsity problems
(e.g., zero probabilities)
•
Head Lexicalisation
40
COMP90042
L15
•
PCFGs widely used, and there are efficient parsers available.
‣ Collins parser, Berkeley parser, Stanford parser ‣ all use some form of lexicalisation
•
But there are other grammar formalisms ‣ Lexical function grammar
‣ Head-driven phrase structure grammar ‣ Next lecture: dependency grammar
A Final Word
41
COMP90042
L15
•
Required Reading J&M3 Ch. 14 – 14.6 (skip 14.6.1)
42