程序代写代做代考 algorithm go Probabilistic Context-Free Grammar

Probabilistic Context-Free Grammar
COMP90042
Natural Language Processing Lecture 15
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L15

Context-free grammars assign hierarchical structure to language
‣ Linguistic notion of a ‘syntactic constituent’
‣ Formulated as generating all strings in the language; or
‣ Predicting the structure(s) 
 for a given string
Ambiguity In Parsing

Raises problem of 
 ambiguity, e.g., which is 
 better?
2

COMP90042
L15



Probabilistic context-free grammars (PCFGs) Parsing using dynamic programming
Outline
Limitations of ‘context-free’ assumption and some solutions:
‣ parent annotation ‣ head lexicalisation
3

COMP90042
L15
Basics of Probabilistic CFGs
• AsforCFGs,samesymbolset:
‣ Terminals: words such as book
‣ Non-terminal: syntactic labels such as NP or NN
• Sameproductions(rules)
‣ LHS non-terminal → ordered list of RHS symbols
• Inaddition,storeaprobabilitywitheach production
‣ NP→DTNN
‣ NN→cat
‣ NN → leprechaun ‣…
[p=0.45] [p=0.02]
[p = 0.00001]
4

COMP90042
L15



Probability values denote conditional ‣ Pr(LHS → RHS)
‣ Pr(RHS | LHS)
Probabilistic CFGs
Consequently they:
‣ must be positive values, between 0 and 1 ‣ must sum to one for given LHS
E.g.,
‣ NN → aadvark
‣ NN→cat
‣ NN → leprechaun
[p = 0.0003] [p=0.02]
[p = 0.0001]

∑x Pr(NN→x)=1
5

COMP90042 L15
JM3 Ch14 6

COMP90042 L15
Stochastic Generation with PCFGs
Almost the same as for CFG, with one twist:
1. Start with S, the sentence symbol
2. Choose a rule with S as the LHS
‣ Randomly select a RHS according to Pr(RHS | LHS)

e.g., S → VP
‣ Apply this rule, e.g., substitute VP for S
3. Repeat step 2 for each non-terminal in the string 
 (here, VP)
4. Stop when no non-terminals remain
Gives us a tree, as before, with a sentence as the yield
7

COMP90042
L15


How Likely is a Tree? Given a tree, we can compute its probability
‣ Decomposes into probability of each production E.g., for (left) tree,
‣ P(tree)=

P(S → VP) ×

P(VP → Verb NP) ×

P(Verb → Book) × 

P(NP → Det Nominal) × 

P(Det → the) × 

P(Nominal → Nominal Noun) × 
 P(Nominal → Noun) × 

P(Noun → dinner) × 

P(Noun → flight)
8

COMP90042
L15
P(tree)
How Likely is a Tree?
= P(S → VP) × P(VP → Verb NP) × P(Verb → Book) × 

P(NP → Det Nominal) × P(Det → the) × P(Nominal → Nominal Noun) × 
 P(Nominal → Noun) × P(Noun → dinner) × P(Noun → flight)
= 0.05 × 0.20 × 0.30 ×
 0.20 × 0.60 × 0.20 ×
 0.75 × 0.10 × 0.40
= 2.2 × 10-6
9

COMP90042
L15
Resolving Parse Ambiguity
• Can select between different trees based on P(T)
• P(Tleft) = 2.2 × 10-6 P(Tright) = 6.1 × 10-7
10

COMP90042
L15
Parsing PCFGs
• Insteadofselectingbetweentwotrees,canweselect a tree from the set of all possible trees?
• Beforewelookedat ‣ CYK
‣ for unweighted grammars (CFGs) ‣ finds all possible trees
• Butthereareoften1000s,manycompletely nonsensical
• Canwesolveforthemostprobabletree?
11

COMP90042
L15

• •
CYK finds all trees for a sentence; we want best tree
CYK for PCFGs
Prob. CYK follows similar process to standard CYK Convert grammar to Chomsky Normal Form (CNF)
‣ E.g., VP → Verb NP NP [0.10]
 ‣ becomes VP → Verb NP+NP [0.10]

NP+NP → NP NP [1.0]
 ‣ where NP+NP is a new symbol.
12

COMP90042 L15
PCFG Parsing Example
13

COMP90042
L15

[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
14

COMP90042
L15
we

NP 1/4

[0,1]
[0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8
PP →INNP 1
IN → with
VP →VNP 1⁄2
→VPPP 1⁄4
1
→ MDV V → eat
1⁄4
1
15

COMP90042
L15
we

NP 1/4

[0,1]
Ø [0,2]
eat
sushi
with
chopsticks
V1 [1,2]
[0,3]
[1,3]
NP 1/8 [2,3]
[0,4]
[1,4]
[2,4]
IN 1 [3,4]
[0,5]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
16

COMP90042
L15
we
V1 [1,2]
eat
sushi
VP 1/16
(1/2 * 1 * 1/8) [1,3]
NP 1/8 [2,3]
[0,4]
[2,4]
with
IN 1 [3,4]
chopsticks

NP 1/4

[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,4]
[1,5]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
17

COMP90042
L15
we

NP 1/4

[0,1]
V1 [1,2]
eat
sushi
[0,4]
with
chopsticks
Ø [0,2]
S 1/64
(1 * 1/4 * 1/64) [0,3]
[0,5]
VP 1/16 [1,3]
[1,4]
[1,5]
NP 1/8 [2,3]
[2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
18

COMP90042
L15
we

NP 1/4

[0,1]
V1 [1,2]
eat
sushi
Ø [0,4]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
[0,5]
VP 1/16 [1,3]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
[3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
19

COMP90042
L15
we

NP 1/4

[0,1]
V1 [1,2]
eat
sushi
VP 1/16 [1,3]
Ø [1,4]
with
chopsticks
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
[2,5]
PP 1/8
(1 * 1 * 1/8) [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
20

COMP90042
L15
we
eat
sushi
with
chopsticks

NP 1/4

[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
[1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128
(1/2 * 1/8 * 1/8) [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
21

COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks

NP 1/4

[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256
(1/2 * 1 * 1/128) [1,5]
NP 1/8 [2,3]
Ø [2,4]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
22

COMP90042
L15
we
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks

NP 1/4

[0,1]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
Ø [2,4]
VP 1/512
(1/4*1/16*1/8) [1,5]
NP 1/8 [2,3]
NP 1/128 [2,5]
IN 1 [3,4]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
IN → with
VP →VNP 1⁄2
→VPPP 1⁄4
1⁄4
1⁄8
1⁄8 PP →INNP 1
1
→ MDV V → eat
1⁄4 1
23

COMP90042
L15
we
eat
sushi
with
chopsticks

NP 1/4

[0,1]
Ø [0,2]
S 1/64 [0,3]
Ø [0,4]
[0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256
[1,5]
1/256 > 1/512!
S →NPVP 1 NP →NPPP 1⁄2
NP 1/8 [2,3]
Ø [2,4]
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
NP 1/8 [4,5]
→ MDV V → eat
1⁄4 1
24

COMP90042
L15
we

NP 1/4

[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/1024
(1 * 1/4 * 1/256) [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
25

COMP90042
L15
we

NP 1/4

[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
success!
S 1/1024 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
S →NPVP 1 NP →NPPP 1⁄2
→ we
→ sushi
→ chopsticks
1⁄4
1⁄8
1⁄8 PP →INNP 1
IN → with
VP →VNP 1⁄2
1 →VPPP 1⁄4
→ MDV V → eat
1⁄4 1
26

COMP90042 L15

• •

S in the top-right corner of parse table indicates success
Prob CYK: Retrieving the Parses
Retain back-pointer to best analysis
To get parse(s), follow pointers back for each match
Convert back from CNF by removing new non- terminals
27

COMP90042 L15
we

NP 1/4

[0,1]
Ø [0,2]
eat
sushi
S 1/64 [0,3]
Ø [0,4]
with
chopsticks
S 1/1024 [0,5]
V1 [1,2]
VP 1/16 [1,3]
Ø [1,4]
VP 1/256 [1,5]
NP 1/8 [2,3]
Ø [2,4]
IN 1 [3,4]
NP 1/128 [2,5]
PP 1/8 [3,5]
NP 1/8 [4,5]
P(T) = 1/1024
28

COMP90042
L15
Prob. CYK
Source: JM3 Ch 14
29

CYK can be thought 
 of as storing all events with probability = 1
validity test now looks to
 see that the child chart cells
 have non-zero probability
chart now stores
 probabilities for each span and symbol
Instead of storing set

of symbols, store the 
 probability of best scoring

tree fragment covering span 
 [i,j] with root symbol A


Overwrite lower scoring
 analysis if this one is better,
 and record the best production
30

COMP90042
L15

What’s the space and time complexity of this algorithm?
‣ in terms of n the length of the input sentence
Complexity of CYK
31

COMP90042
L15
Issues with PCFG
32

COMP90042
L15
PCFG Problem 1:
Poor Independence Assumptions
• Rewrite decisions made independently, whereas inter- dependence is often needed to capture global structure.
• NP→DetN
• Probability of this rule independent of rest of tree
NP statistics in the Switchboard corpus
• No way to represent this contextual differences in PCFG probabilities
33

COMP90042 L15
Poor Independence Assumptions
NP statistics in the Switchboard corpus PCFG probabilities based on Switchboard corpus



No way to capture the fact that in subject position, NP → PRP should go up to 0.91
While in object position NP → DT NN should go up to 0.66
Solution: add a condition to denote whether NP is a subject or object
34

COMP90042
L15
Solution: Parent Conditioning
• Make non-terminals more explicit by incorporating parent symbol into each symbol

• NP^S represents subject position (left)
• NP^VP denotes object position (right)
35

COMP90042
L15
PCFG Problem 2:
Lack of Lexical Conditioning
• Lack of sensitivity to words in tree
• Prepositional phrase (PP) attachment ambiguity
‣ Worker dumped sacks into a bin

36

COMP90042
L15
PP Attachment Ambiguity
“into a bin” describes the resulting location of the sacks
sacks to be dumped are the ones which are already “into a bin” 37

COMP90042
L15
Coordination Ambiguity
• dogsinhousesandcats
• dogsissemanticallyabetterconjunctforcatsthan houses (dogs can’t fit into houses!)
38

COMP90042
L15
Solution: Head Lexicalisation • Record head word with parent symbols
‣ the most salient child of a constituent, usually the noun in a NP, verb in a VP etc
‣ VP→VBDNPPP ⇒

VP(dumped) → VBD(dumped) NP(sacks) PP(into)
39

COMP90042
L15

Incorporate head words into productions, such that the most important links between words is captured
‣ rule captures correlations between head tokens of phrases
‣ VP(dumped) / NP(sacks) for PP(into)
Grammar symbol inventory expands massively!
‣ Many of the productions much too specific, seen very rarely
‣ Learning more involved to avoid sparsity problems 
 (e.g., zero probabilities)

Head Lexicalisation
40

COMP90042
L15

PCFGs widely used, and there are efficient parsers available.
‣ Collins parser, Berkeley parser, Stanford parser ‣ all use some form of lexicalisation

But there are other grammar formalisms ‣ Lexical function grammar
‣ Head-driven phrase structure grammar ‣ Next lecture: dependency grammar
A Final Word
41

COMP90042
L15

Required Reading J&M3 Ch. 14 – 14.6 (skip 14.6.1)
42