Context-Free Grammar
COMP90042
Natural Language Processing
Lecture 14
Semester 1 2021 Week 7 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1
COMP90042
L14
Recap
‣ The cat loves Mozart
‣ The cat the dog chased loves Mozart
‣ The cat the dog the rat bit chased loves Mozart
‣ The cat the dog the rat the elephant admired bit chased loves Mozart
• •
•
Center embedding
Cannot be captured by regular expressions (SnVn)
Context-free grammar!
2
COMP90042
L14
•
Symbols
‣ Terminal: word such as book
‣ Non-terminal: syntactic label such as NP or VP
•
Productions (rules) ‣W→XYZ
‣ Exactly one non-terminal on left-hand side (LHS)
‣ An ordered list of symbols on right-hand side (RHS);
•
can be terminals or non-terminals Start symbol: S
3
Basics of Context-Free Grammars
convention:
lowercase for terminals
uppercase for non-terminals
COMP90042
L14
•
W→XYZ
Production rule depends only on the LHS (and not
on ancestors, neighbours)
‣ Analogous to Markov chain
‣ Behaviour at each step depends only on current state
Why “Context Free”
4
COMP90042
L14
•
Context-free languages more general than regular languages
‣ Allows recursive nesting
Context-Free vs. Regular
5
COMP90042
L14
CFG Parsing
•
Given production rules ‣S→aSb ‣S→ab
•
•
And a string ‣ aaabbb
Produce a valid parse tree
6
COMP90042
L14
•
If English can be represented with CFG:
‣ first develop the production rules
‣ can then build a “parser” to automatically judge whether a sentence is grammatical!
• •
But is natural language context-free?
Not quite: cross-serial dependencies (ambncmdn)
What This Means?
7
COMP90042
L14
•
CFG strike a good balance:
‣ CFG covers most syntactic patterns
‣ CFG parsing is computational efficient
•
We use CFG to describe a core fragment of English syntax
But…
8
COMP90042
L14
• • •
Constituents
CYK Algorithm
Representing English with CFGs
Outline
9
COMP90042
L14
Constituents
10
COMP90042
L14
•
Syntactic Constituents Sentences are broken into constituents
‣ word sequence that function as a coherent unit for linguistic analysis
‣ helps build CFG production rules Constituents have certain key properties:
‣ movement
‣ substitution ‣ coordination
•
11
COMP90042
L14
•
Constituents can be moved around sentences ‣ Abigail gave [her brother] [a fish]
‣ Abigail gave [a fish] to [her brother]
•
Contrast: [gave her], [brother a]
Movement
12
COMP90042
L14
•
Constituents can be substituted by other phrases of the same type
‣ Max thanked [his older sister]
‣ Max thanked [her]
Contrast: [Max thanked], [thanked his]
•
Substitution
13
COMP90042
L14
•
Coordination
Constituents can be conjoined with coordinators
like and and or
‣ [Abigail] and [her young brother] brought a fish ‣ Abigail [bought a fish] and [gave it to Max]
‣ Abigail [bought] and [greedily ate] a fish
14
COMP90042
L14
•
•
Once we identify constituents, we use phrases to describe them
•
We can use CFG to formalise these intuitions
Constituents and Phrases
Phrases are determined by their head word: ‣ noun phrase: her younger brother
‣ verb phrase: greedily ate it
15
COMP90042
L14
He gave a lecture and away a pie.
Which of the following is a constituent
• a lecture
• gave a lecture • apie
• away a pie
PollEv.com/jeyhanlau569
16
COMP90042
L14
17
COMP90042
L14
A Simple CFG for English
Terminal symbols: rat, the, ate, cheese
Non-terminal symbols: S, NP, VP, DT, VBD, NN
Productions: S → NP VP
NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
18
COMP90042
L14
Generating Sentences with CFGs
Always start with S (the sentence/start symbol)
S
Apply a rule with S on LHS (S → NP VP), i.e substitute RHS
NP VP
Apply a rule with NP on LHS (NP → DT NN) DT NN VP
Apply rule with DT on LHS (DT → the) the NN VP
Apply rule with NN on LHS (NN → rat) the rat VP
S → NP VP NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
19
COMP90042
L14
Generating Sentences with CFGs
Apply rule with VP on LHS (VP → VBD NP) the rat VBD NP
Apply rule with VBD on LHS (VBD → ate) the rat ate NP
Apply rule with NP on LHS (NP → DT NN) the rat ate DT NN
Apply rule with DT on LHS (DT → the) the rat ate the NN
Apply rule with NN on LHS (NN → cheese) the rat ate the cheese
S → NP VP NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
No non-terminals left, we’re done!
20
COMP90042
L14
CFG Trees
• Generation corresponds to a syntactic tree
• Non-terminals are internal nodes
• Terminals are leaves
(S (NP (DT the) (NN rat))
(VP (VBG ate) (NP (DT the)
(NN cheese))))
• CFG parsing is the reverse
process (sentence → tree)
21
COMP90042
L14
•
• •
S = starting symbol
| = operator OR
Recursive, NUM and S can produce themselves
A CFG for Arithmetic Expressions
22
COMP90042
L14
•
Parsing Is ‘4’ a valid string?
23
COMP90042
L14
•
Parsing Is ‘1+2-3’ a valid string?
24
COMP90042
L14
CYK Algorithm
25
COMP90042
L14
CYK Algorithm
Bottom-up parsing
Tests whether a string is valid given a CFG, without enumerating all possible parses
•
•
• •
Core idea: form small constituents first, and merge them into larger constituents
Requirement: CFGs must be in Chomsky Normal Forms
26
COMP90042
L14
•
Change grammar so all rules of form: ‣ A→BC
‣ A→a
Convert rules of form A → B c into: ‣ A→BX
‣X→c
•
Convert to Chomsky Normal Form
27
COMP90042
L14
Convert to Chomsky Normal Form
Convert rules A → B C D into:
‣A→BY
‣Y→CD
‣ E.g. VP → VP NP NP
for ditransitive cases, “sold [her] [the book]”
•
•
X, Y are new symbols we have introduced
28
COMP90042
L14
• •
Convert to Chomsky Normal Form CNF disallows unary rules, A → B.
Imagine NP → S; and S → NP … leads to infinitely many trees with same yield.
•
•
Replace RHS non-terminal with its productions A → B, B → cat, B → dog
A → cat, A → dog
•
29
COMP90042
L14
• • • • •
Convert grammar to Chomsky Normal Form (CNF) Fill in a parse table (left to right, bottom to top) Use table to derive parse
S in top right corner of table = success!
Convert result back to original grammar
The CYK Parsing Algorithm
30
COMP90042
L14
[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
31
COMP90042
L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks
NP
[0,1]
[0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
32
COMP90042
L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks
NP
[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
33
COMP90042
L14
NP
[0,1]
we
Ø [0,2]
eat
[0,3]
sushi
[0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
[1,4]
[2,4]
IN
[3,4]
[0,5]
[1,5]
NP
[2,3]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
34
COMP90042
L14
NP
[0,1]
we
Ø [0,2]
V
[1,2]
eat
sushi
S
Split=1
[0,3]
VP
Split=2
[1,3]
with
chopsticks
[0,4]
[0,5]
[1,4]
[1,5]
NP
[2,3]
[2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
35
COMP90042
L14
NP
[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[0,5]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
36
COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
NP
[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
37
COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
NP
[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
NP
[2,3]
Ø [1,4]
Ø [2,4]
[1,5]
IN
[3,4]
NP
Split=3
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
38
COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
NP
[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
NP
[2,3]
Ø [1,4]
Ø [2,4]
?
[1,5]
IN
[3,4]
NP
Split=3
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
PollEv.com/jeyhanlau569
39
COMP90042
L14
40
COMP90042
L14
NP
[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
VP
Split=2
[1,5]
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
41
COMP90042
L14
we
eat
sushi
with
chopsticks
NP
[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
[0,5]
VP
Split=2
[1,3]
Split=3
V
[1,2]
Ø [1,4]
Ø [2,4]
Split=2
[1,5]
VP
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
42
COMP90042
L14
we
eat
sushi
with
chopsticks
NP
[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
43
COMP90042
L14
we
eat
sushi
with
chopsticks
NP
[0,1]
Ø [0,2]
S
Split=1
[0,3]
[0,4]
sentence is valid!
ØS Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
NP
[2,3]
Ø [1,4]
Split=2
[1,5]
VP
Split=3
Ø [2,4]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
44
COMP90042
L14
•
•
S in the top-right corner of parse table indicates success
CYK: Retrieving the Parses
To get parse(s), follow pointers back for each match
45
COMP90042
L14
we
eat
sushi
with
chopsticks
NP
[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
46
COMP90042
L14
we
eat
sushi
with
chopsticks
NP
[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
47
COMP90042
L14
CYK Algorithm
fill the diagonals (NP → we) bottom to top
going through the ‘splits’
create the links if they are in the production rules
JM3, Ch 12
48
COMP90042
L14
Representing English with CFGs
49
COMP90042
L14 From Toy Grammars to Real Grammars
•
• •
Toy grammars with handful of productions good for demonstration or extremely limited domains
For real texts, we need real grammars Many thousands of production rules
50
COMP90042
L14
• • • • • • •
Sentence (S)
Noun phrase (NP)
Verb phrase (VP) Prepositional phrase (PP) Adjective phrase (AdjP) Adverbial phrase (AdvP) Subordinate clause (SBAR)
Key Constituents in Penn Treebank
51
COMP90042
L14
Example PTB/0001
( (S (NP-SBJ
(NP (NNP Pierre) (NNP Vinken) )
(, ,)
(ADJP
(NP (CD 61) (NNS years) )
(JJ old) )
(, ,) )
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board) )
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director) ))
(NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))
52
COMP90042
L14
Basic English Sentence Structures • Declarativesentences(S→NPVP)
‣ The rat ate the cheese Imperative sentences (S → VP)
‣ Eat the cheese!
Yes/no questions (S → VB NP VP)
‣ Did the rat eat the cheese? Wh-subject-questions (S → WH VP)
‣ Who ate the cheese?
Wh-object-questions (S → WH VB NP VP) ‣ What did the rat eat?
•
•
•
•
53
COMP90042
L14
•
•
Pre-modifiers
‣ DT, CD, ADJP, NNP, NN
‣ E.g. the two very best Philly cheese steaks
English Noun Phrases
Post-modifiers
‣ PP, VP, SBAR
‣ A delivery from Bob coming today that I don’t want to miss
NP → DT? CD? ADJP? (NN|NNP)+ PP* VP? SBAR?
54
COMP90042
L14
Verb Phrases ‣ MD, AdvP, VB, TO
‣ E.g should really have tried to wait VP → (MD|VB|TO) AdvP? VP
•
•
Auxiliaries
Arguments and adjuncts ‣ NP, PP, SBAR, VP, AdvP
‣ E.g told him yesterday that I was ready
‣ E.g. gave John a gift for his birthday to make amends
VP → VB NP? NP? PP* AdvP* VP? SBAR?
55
COMP90042
L14
Other Constituents • Prepositional phrase
‣ PP→INNP
• Adjective phrase
‣ AdjP→(AdvP)JJ
• Adverb phrase
‣ AdvP→(AdvP)RB
• Subordinate clause
‣ SBAR→(IN)S
• Coordination
‣ NP→NPCCNP;VP→VPCCVP;etc.
• Complex sentences
‣ S→SSBAR;S→SBARS;etc.
in the house really nice
not too well
since I came here JackandJill
if he goes, I’ll go
56
COMP90042
L14
•
•
Context-free grammars can represent most linguistic structures of natural languages
A Final Word
There are relatively fast dynamic programming algorithms (CYK) to retrieve this structure
57
COMP90042
L14
•
But what about ambiguity? Often more than one tree can describe a string
Parse Ambiguity
58
COMP90042
L14
•
Reading E18 Ch. 9.2, 10.1
59