Probabilistic Context-Free Grammar
COMP90042
Natural Language Processing
Lecture 15
Semester 1 2021 Week 8 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L15
•
Context-free grammars assign hierarchical structure to language
‣ Formulated as generating all strings in the language ‣ Predicting the structure(s)
for a given string
Ambiguity In Parsing
•
•
Raises problem of
ambiguity — which is
better?
Probabilistic CFG!
2
COMP90042
L15
• • •
Basics of Probabilistic CFGs (PCFGs) PCFG parsing
Limitations of CFG
Outline
3
COMP90042
L15
Basics of PCFGs
4
COMP90042
L15
Basics of PCFGs
• Samesymbolset:
‣ Terminals: words such as book
‣ Non-terminal: syntactic labels such as NP or NN
• Sameproductions(rules)
‣ LHS non-terminal → ordered list of RHS symbols
• Inaddition,storeaprobabilitywitheach production
‣ NP→DTNN
‣ NN→cat
‣ NN → leprechaun ‣…
[p=0.45] [p=0.02]
[p = 0.00001]
5
COMP90042
L15
Basics of PCFGs
• Probabilityvaluesdenoteconditional ‣ P(LHS → RHS)
‣ P(RHS | LHS)
• Consequentlythey:
‣ must be positive values, between 0 and 1 ‣ must sum to one for given LHS
• E.g.,
‣ NN → aadvark
‣ NN→cat
‣ NN → leprechaun
‣ ∑P(NN→x)=1 x
[p = 0.0003] [p=0.02]
[p = 0.0001]
6
COMP90042
L15
7
COMP90042
L15
Stochastic Generation with PCFGs
Almost the same as for CFG, with one twist:
1. Start with S, the sentence symbol
2. Choose a rule with S as the LHS
‣ Randomly select a RHS according to P(RHS | LHS)
e.g., S → VP
‣ Apply this rule, e.g., substitute VP for S
3. Repeat step 2 for each non-terminal in the string
(here, VP)
4. Stop when no non-terminals remain
Gives us a tree, as before, with a sentence as the yield
8
COMP90042
L15
•
•
How Likely Is a Tree? Given a tree, we can compute its probability
‣ Decomposes into probability of each production P(tree) =
P(S → VP) ×
P(VP → Verb NP) ×
P(Verb → Book) ×
P(NP → Det Nominal) ×
P(Det → the) ×
P(Nominal → Nominal Noun) × P(Nominal → Noun) ×
P(Noun → dinner) ×
P(Noun → flight)
9
COMP90042
L15
P(tree)
How Likely Is a Tree?
= P(S → VP) × P(VP → Verb NP) × P(Verb → Book) ×
P(NP → Det Nominal) × P(Det → the) × P(Nominal → Nominal Noun) ×
P(Nominal → Noun) × P(Noun → dinner) × P(Noun → flight)
= 0.05 × 0.20 × 0.30 ×
0.20 × 0.60 × 0.20 ×
0.75 × 0.10 × 0.40
= 2.2 × 10-6
10
COMP90042
L15
Resolving Parse Ambiguity
• Can select between different trees based on P(T)
• P(Tleft) = 2.2 × 10-6 P(Tright) = 6.1 × 10-7
11
COMP90042
L15
PCFG Parsing
12
COMP90042
L15
•
Before we looked at ‣ CYK
‣ for unweighted grammars (CFGs) ‣ finds all possible trees
•
•
But there are often 1000s, many completely nonsensical
Parsing PCFGs
Can we solve for the most probable tree?
13
COMP90042
L15
• • •
CYK finds all trees for a sentence; we want best tree Prob. CYK follows similar process to standard CYK Convert grammar to Chomsky Normal Form (CNF)
CYK for PCFGs
‣ VP → Verb NP NP ‣ VP → Verb NP+NP
NP+NP → NP NP
‣ where NP+NP is a new symbol.
[0.10]
[0.10]
[1.0]
14
COMP90042
L15
[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
15
COMP90042
L15
we
NP 1/4
[0,1]
[0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
16
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
17
COMP90042
L15
we
V1 [1,2]
eat
sushi
VP 1/16
(1/2 * 1 * 1/8) [1,3]
NP 1/8 [2,3]
[0,4]
[2,4]
with
IN 1 [3,4]
chopsticks
NP 1/4
[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,4]
[1,5]
[2,5]
[3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
18
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
[0,4]
with
chopsticks
Ø [0,2]
S 1/64
(1 * 1/4 * 1/16) [0,3]
[0,5]
VP 1/16 [1,3]
[1,4]
[1,5]
NP 1/8 [2,3]
[2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
19
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
Ø [0,4]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
[0,5]
VP 1/16 [1,3]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
20
COMP90042
L15
we
NP 1/4
[0,1]
V1 [1,2]
eat
sushi
VP 1/16 [1,3]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
PP 1/2
(1 * 1 * 1/2) [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
21
COMP90042
L15
we
eat
sushi
with
chopsticks
NP 1/4
[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128
(1/8 * 1/8 * 1/2) [2,5]
IN 1 [3,4]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
22
COMP90042
L15
we
eat
sushi
with
chopsticks
NP 1/4
[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
?
[1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
PollEv.com/jeyhanlau569
23
COMP90042
L15
24
COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
NP 1/4
[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256
(1/2 * 1 * 1/128) [1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
25
COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
NP 1/4
[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
Ø [2,4]
VP 1/128
(1/4*1/16*1/2) [1,5]
NP 1/8 [2,3]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
26
COMP90042
L15
we
Ø [0,2]
V1 [1,2]
eat
sushi
S 1/64 [0,3]
VP 1/16 [1,3]
Ø [0,4]
Ø [1,4]
with
chopsticks
NP 1/4
[0,1]
[0,5]
VP 1/128
[1,5]
1/128 > 1/256!
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
27
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/2 [3,5]
NP 1/2 [4,5]
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/512
(1 * 1/4 * 1/128) [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/128 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
28
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
success!
S 1/512 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/128 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/2 [3,5]
NP 1/2 [4,5]
S →NPVP 1 NP →NPPP 1⁄8 → we 1⁄4 → sushi 1⁄8 → chopsticks 1⁄2 PP →INNP 1 IN → with 1 VP →VNP 1⁄2 →VPPP 1⁄4 → MDV 1⁄4 V → eat 1
29
COMP90042
L15
•
• •
•
S in the top-right corner of parse table indicates success
Prob CYK: Retrieving the Parses
Retain back-pointer to best analysis
To get parse(s), follow pointers back for each match
Convert back from CNF by removing new non- terminals
30
COMP90042
L15
we
NP 1/4
[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/512 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/128 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/2 [3,5]
NP 1/2 [4,5]
P(T) = 1/512
31
COMP90042
L15
Prob. CYK
Source: JM3 Ch 14
32
Both CYK and
prob. CYK store all
possible NTs
CYK
validity test now looks to
see that the child chart cells
have non-zero probability
Instead of storing set
of symbols, store the
probability of best scoring
tree fragment covering span
[i,j] with root symbol A
Overwrite lower scoring
analysis if this one is better,
and record the best production
Probabilistic CYK
Complexity in terms of sentence length n?
PollEv.com/jeyhanlau569
33
COMP90042
L15
34
COMP90042
L15
Limitations of CFG
35
COMP90042
L15
CFG Problem 1:
Poor Independence Assumptions
• Rewrite decisions made independently, whereas inter- dependence is often needed to capture global structure.
• NP → DT NN [0.28]
• NP → PRP [0.25]
• Probability of a rule independent of rest of tree
• No way to represent this contextual differences in PCFG probabilities
NP statistics in the Switchboard corpus
36
COMP90042
L15
Poor Independence Assumptions
NP statistics in the Switchboard corpus PCFG probabilities based on Switchboard corpus
NP → PRP should go up to 0.91 as a subject NP → DT NN should be 0.66 as an object
•
•
•
Solution: add a condition to denote whether NP is a subject or object
37
COMP90042
L15
Solution: Parent Conditioning
• Make non-terminals more explicit by incorporating parent symbol into each symbol
• NP^S represents subject position (left)
• NP^VP denotes object position (right)
38
COMP90042
L15
CFG Problem 2:
Lack of Lexical Conditioning
• Lack of sensitivity to words in tree
• Prepositional phrase (PP) attachment ambiguity
‣ Worker dumped sacks into a bin
39
COMP90042
L15
PP Attachment Ambiguity
“into a bin” describes the resulting location of the sacks
sacks to be dumped are the ones which are “into a bin”
40
COMP90042
L15
Coordination Ambiguity
• dogsinhousesandcats
• dogsissemanticallyabetterconjunctforcatsthan houses (dogs can’t fit into cats!)
41
COMP90042
L15
Solution: Head Lexicalisation • Record head word with parent symbols
‣ the most salient child of a constituent, usually the noun in a NP, verb in a VP etc
‣ VP→VBDNPPP ⇒
VP(dumped) → VBD(dumped) NP(sacks) PP(into)
42
COMP90042
L15
•
Incorporate head words into productions, to capture the most important links between words
‣ Captures correlations between head words of phrases
‣ PP(into): VP(dumped) vs. NP(sacks) Grammar symbol inventory expands massively!
‣ Many of the productions too specific, rarely seen
‣ Learning more involved to avoid sparsity problems
(e.g., zero probabilities)
•
Head Lexicalisation
43
COMP90042
L15
•
PCFGs widely used, and there are efficient parsers available.
‣ Collins parser, Berkeley parser, Stanford parser
‣ All use some form of lexicalisation But there are other grammar formalisms
‣ Lexical function grammar
‣ Head-driven phrase structure grammar ‣ Next lecture: dependency grammar!
•
A Final Word
44
COMP90042
L15
•
Required Reading JM3 Ch. 13-13.2
45