程序代写代做代考 C chain go algorithm html Context-Free Grammar

Context-Free Grammar
COMP90042
Natural Language Processing Lecture 14
COPYRIGHT 2020, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L14
Recap
‣ The cat loves Mozart
‣ The cat the dog chased loves Mozart
‣ The cat the dog the rat bit chased loves Mozart
‣ The cat the dog the rat the elephant admired bit chased loves Mozart
• •

Center embedding
Cannot be captured by regular expressions (SnVn)
Context-free grammar!
2

COMP90042 L14

Symbols
‣ Terminal: word such as book
‣ Non-terminal: syntactic label such as NP or VP
‣ Convention to use upper and lower-case to distinguish,
or else “quotes” for terminals

Productions (rules)
 W→XYZ
‣ Exactly one non-terminal on left-hand side (LHS)
‣ An ordered list of symbols on right-hand side (RHS)


— can be Terminals or Non-terminals Start symbol: S
Basics of Context-Free Grammars
3

COMP90042
L14

W→XYZ
Production rule depends only on the LHS (and not
on ancestors, neighbours)
‣ Analogous to Markov chain
‣ Behaviour at each step depends only on current state
Why “Context Free”
4

COMP90042
L14

Context-free languages more general 
 than Regular languages
‣ Allows recursive nesting CFG for anbn:
Context-Free vs. Regular

‣S→aSb ‣S→ab
Parse tree for aaabbb
5

COMP90042
L14
• •
Given a string and production rules Produce a valid parse tree
Parsing
‣S→aSb
‣S→ab
aaabbb
6

COMP90042
L14
What This Means?
If English can be represented with CFG:
‣ we can build a “parser” to automatically judge whether a sentence is grammatical!
• •

But is natural language context-free?
Not quite: cross-serial dependencies (ambncmdn)
7

COMP90042
L14

Context-free representations strike a good balance:
‣ CFG cover most syntactic patterns
‣ CFG parsing is computational efficient

We use CFG to describe a core fragment of English syntax
But…
8

COMP90042
L14
Syntactic Constituents Sentences are broken into constituents
‣ word sequence that function as a coherent unit for linguistic analysis


Constituents have certain key properties: ‣ movement
‣ substitution
‣ coordination
9

COMP90042
L14

Constituents can be moved around sentences ‣ Abigail gave [her brother] [a fish]
‣ Abigail gave [a fish] to [her brother]

Contrast: [gave her], [brother a]
Movement
10

COMP90042
L14

Constituents can be substituted by other phrases of the same type
‣ Max thanked [his older sister]
‣ Max thanked [her]
Contrast: [Max thanked], [thanked his]

Substitution
11

COMP90042
L14
Coordination
Constituents can be conjoined with coordinators
like and and or
‣ [Abigail] and [her young brother] brought a fish ‣ Abigail [bought a fish] and [gave it to Max]
‣ Abigail [bought] and [greedily ate] a fish


Contrast: [brother brought], [bought a]
12

COMP90042
L14
Constituents and Phrases



We can use CFG to formalise these intuitions
Once we identify constituents, we use phrases to describe them
Phrases are determined by their head word: ‣ noun phrase: her younger brother
‣ verb phrase: greedily ate it
13

COMP90042
L14
A Simple CFG for English
Terminal symbols: rat, the, ate, cheese
Non-terminal symbols: S, NP, VP, DT, VBD, NN
Productions: S → NP VP
NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
14

COMP90042 L14
Generating Sentences with CFGs
Always start with S (the sentence/start symbol)
S
Apply a rule with S on LHS (S → NP VP), i.e substitute RHS NP VP
Apply a rule with NP on LHS (NP → DT NN) DT NN VP
Apply rule with DT on LHS (DT → the) the NN VP
Apply rule with NN on LHS (NN → rat) the rat VP
In each step we rewrite the left-most non-terminal
15

COMP90042 L14
Generating Sentences with CFGs
Apply rule with VP on LHS (VP → VBD NP) the rat VBD NP
Apply rule with VBD on LHS (VBD → ate) the rat ate NP
Apply rule with NP on LHS (NP → DT NN) the rat ate DT NN
Apply rule with DT on LHS (DT → the) the rat ate the NN
Apply rule with NN on LHS (NN → cheese) the rat ate the cheese
No non-terminals left, we’re done!
16

COMP90042
L14
CFG Trees
• Generation corresponds to a syntactic tree
• Non-terminals are internal nodes
• Terminals are leaves
(S (NP (DT the) (NN rat))
(VP (VBG ate) (NP (DT the)
(NN cheese))))
• Parsing is the 
 reverse process
17

COMP90042 L14
• • •
S = starting symbol
‘|’ = operator OR
Recursive, NUM and S can produce themselves
A CFG for Arithmetic Expressions
18

COMP90042
L14

Parsing Is ‘4’ a valid string?
19

COMP90042
L14

Parsing Is ‘1+2-3’ a valid string?
20

COMP90042
L14
Parse Ambiguity
• Oftenmorethanonetreecandescribeastring
• “WhilehuntinginAfrica,Ishotanelephantinmy
pajamas. How he got into my pajamas, I don’t know.” Animal Crackers (1930)
http://www.nltk.org/book/ch08.html 21

COMP90042
L14
Parsing CFG
22

COMP90042
L14
CYK Algorithm
Bottom-up approach to parsing in CFG
Tests whether a string is valid given a CFG, without enumerating all possible parses


• •
Core idea: form small constituents first, and merge them into larger constituents
Requirement: CFGs must be in Chomsky Normal Forms
23

COMP90042 L14

Change grammar so all rules of form: ‣ A→BC
‣ A→a
Convert rules of form A → B c into: ‣ A→BX
‣X→c

Convert to Chomsky Normal Form
24

COMP90042 L14
Convert to Chomsky Normal Form
Convert rules A → B C D into:
‣A→BY
‣Y→CD
‣E.g.,VP→VPNPNP 

for ditransitive cases, “sold [her] [the book]”


X, Y are new symbols we have introduced
25

COMP90042
L14
• •
CNF (cont) CNF disallows unary rules, A → B.
Imagine NP → S; and S → NP … leads to infinitely many trees with same yield.


Replace RHS non-terminal with its productions E.g convert A → B, B → 1, B → 2 into:

A → 1, A →2
26

COMP90042
L14
The CYK Parsing Algorithm
• • • • •
Convert grammar to Chomsky Normal Form (CNF) Fill in a parse table
Use table to derive parse
S in top right corner of table = success!
Convert result back to original grammar
27

COMP90042 L14

[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
28

COMP90042 L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks

NP

[0,1]
[0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
29

COMP90042 L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks

NP

[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
30

COMP90042 L14

NP

[0,1]
we
Ø [0,2]
eat
[0,3]
sushi
[0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
[1,4]
[2,4]
IN
[3,4]
[0,5]
[1,5]
NP
[2,3]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
31

COMP90042 L14

NP

[0,1]
we
Ø [0,2]
V
[1,2]
eat
sushi
S
Split=1
[0,3]
VP
Split=2
[1,3]
with
chopsticks
[0,4]
[0,5]
[1,4]
[1,5]
NP
[2,3]
[2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
32

COMP90042 L14

NP

[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[0,5]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
33

COMP90042 L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks

NP

[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
34

COMP90042 L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks

NP

[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
NP
[2,3]
Ø [1,4]
Ø [2,4]
[1,5]
IN
[3,4]
NP
Split=3
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
35

COMP90042 L14

NP

[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
VP
Split=2
[1,5]
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
36

COMP90042 L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
[0,5]
VP
Split=2
[1,3]
Split=3
V
[1,2]
Ø [1,4]
Ø [2,4]
Split=2
[1,5]
VP
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
37

COMP90042 L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
38

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
[0,4]
sentence is valid!
ØS Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
NP
[2,3]
Ø [1,4]
Split=2
[1,5]
VP
Split=3
Ø [2,4]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
39

COMP90042
L14
CYK: Retrieving the Parses


S in the top-left corner of parse table indicates success
To get parse(s), follow pointers back for each match
40

COMP90042 L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
41

COMP90042 L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
42

COMP90042
L14
CYK Algorithm
JM3, Ch 12
43

COMP90042 L14
Representing English with CFGs
44

COMP90042 L14

• •
Toy grammars with handful of productions good for demonstration or extremely limited domains
From Toy Grammars to Real Grammars
For real texts, we need real grammars Many thousands of production rules
45

COMP90042 L14
• • • • • • •
Sentence (S)
Noun phrase (NP)
Verb phrase (VP) Prepositional phrase (PP) Adjective phrase (AdjP) Adverbial phrase (AdvP) Subordinate clause (SBAR)
Key Constituents in Penn Treebank
46

COMP90042
L14
Example PTB/0001
( (S (NP-SBJ
(NP (NNP Pierre) (NNP Vinken) )
(, ,)
(ADJP
(NP (CD 61) (NNS years) )
(JJ old) )
(, ,) )
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board) )
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director) ))
(NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))
47

COMP90042 L14
Basic English Sentence Structures • Declarativesentences(S→NPVP)
‣ The rat ate the cheese Imperative sentences (S → VP)
‣ Eat the cheese!
Yes/no questions (S → VB NP VP)
‣ Did the rat eat the cheese? Wh-subject-questions (S → WH VP)
‣ Who ate the cheese?
Wh-object-questions (S → WH VB NP VP) ‣ What did the rat eat?




48

COMP90042
L14


Pre-modifiers
‣ DT, CD, ADJP, NNP, NN
‣ E.g. the two very best Philly cheese steaks
English Noun Phrases
Post-modifiers
‣ PP, VP, SBAR
‣ A delivery from Bob coming today that I don’t want to miss
NP → DT? CD? ADJP? (NN|NNP)+ PP* VP? SBAR? NP → PRP
49

COMP90042
L14
Verb Phrases ‣ MD, AdvP, VB, TO
‣ E.g should really have tried to wait VP → (MD|VB|TO) AdvP? VP


Auxiliaries
Arguments and adjuncts ‣ NP, PP, SBAR, VP, AdvP
‣ E.g told him yesterday that I was ready
‣ E.g. gave John a gift for his birthday to make amends
VP → VB NP? NP? PP* AdvP* VP? SBAR?
50

COMP90042
L14
Other Constituents • Prepositional phrase
‣ PP→INNP
• Adjective phrase
‣ AdjP→(AdvP)JJ
• Adverb phrase
‣ AdvP→(AdvP)RB
• Subordinate clause
‣ SBAR→(IN)S
• Coordination
‣ NP→NPCCNP;VP→VPCCVP;etc.
• Complex sentences
‣ S→SSBAR;S→SBARS;etc.
in the house really nice
not too well
since I came here JackandJill
if he goes, I’ll go
51

COMP90042
L14



Context-free grammars can represent linguistic structure
A Final Word
There are relatively fast dynamic programming algorithms to retrieve this structure
But what about ambiguity?
‣ Extreme ambiguity will slow down parsing ‣ If multiple possible parses, which is best?
52

COMP90042
L14

Readings E18 Ch. 9.2, 10.1
53