CS计算机代考程序代写 algorithm chain Context-Free Grammar

Context-Free Grammar
COMP90042
Natural Language Processing
Lecture 14
Semester 1 2021 Week 7 Jey Han Lau
COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1

COMP90042
L14
Recap
‣ The cat loves Mozart
‣ The cat the dog chased loves Mozart
‣ The cat the dog the rat bit chased loves Mozart
‣ The cat the dog the rat the elephant admired bit chased loves Mozart
• •

Center embedding
Cannot be captured by regular expressions (SnVn)
Context-free grammar!
2

COMP90042
L14

Symbols
‣ Terminal: word such as book
‣ Non-terminal: syntactic label such as NP or VP

Productions (rules) ‣W→XYZ
‣ Exactly one non-terminal on left-hand side (LHS)
‣ An ordered list of symbols on right-hand side (RHS);

can be terminals or non-terminals Start symbol: S
3
Basics of Context-Free Grammars
convention:

lowercase for terminals
 uppercase for non-terminals

COMP90042
L14

W→XYZ
Production rule depends only on the LHS (and not
on ancestors, neighbours)
‣ Analogous to Markov chain
‣ Behaviour at each step depends only on current state
Why “Context Free”
4

COMP90042
L14

Context-free languages more general than regular languages
‣ Allows recursive nesting
Context-Free vs. Regular
5

COMP90042
L14
CFG Parsing

Given production rules ‣S→aSb ‣S→ab


And a string ‣ aaabbb
Produce a valid parse tree
6

COMP90042
L14

If English can be represented with CFG:
‣ first develop the production rules
‣ can then build a “parser” to automatically judge whether a sentence is grammatical!
• •
But is natural language context-free?
Not quite: cross-serial dependencies (ambncmdn)
What This Means?
7

COMP90042
L14

CFG strike a good balance:
‣ CFG covers most syntactic patterns
‣ CFG parsing is computational efficient

We use CFG to describe a core fragment of English syntax
But…
8

COMP90042
L14
• • •
Constituents
CYK Algorithm
Representing English with CFGs
Outline
9

COMP90042
L14
Constituents
10

COMP90042
L14

Syntactic Constituents Sentences are broken into constituents
‣ word sequence that function as a coherent unit for linguistic analysis
‣ helps build CFG production rules Constituents have certain key properties:
‣ movement
‣ substitution ‣ coordination

11

COMP90042
L14

Constituents can be moved around sentences ‣ Abigail gave [her brother] [a fish]
‣ Abigail gave [a fish] to [her brother]

Contrast: [gave her], [brother a]
Movement
12

COMP90042
L14

Constituents can be substituted by other phrases of the same type
‣ Max thanked [his older sister]
‣ Max thanked [her]
Contrast: [Max thanked], [thanked his]

Substitution
13

COMP90042
L14

Coordination
Constituents can be conjoined with coordinators
like and and or
‣ [Abigail] and [her young brother] brought a fish ‣ Abigail [bought a fish] and [gave it to Max]
‣ Abigail [bought] and [greedily ate] a fish
14

COMP90042
L14


Once we identify constituents, we use phrases to describe them

We can use CFG to formalise these intuitions
Constituents and Phrases
Phrases are determined by their head word: ‣ noun phrase: her younger brother
‣ verb phrase: greedily ate it
15

COMP90042
L14
He gave a lecture and away a pie.
Which of the following is a constituent
• a lecture
• gave a lecture • apie
• away a pie
PollEv.com/jeyhanlau569
16

COMP90042
L14
17

COMP90042
L14
A Simple CFG for English
Terminal symbols: rat, the, ate, cheese
Non-terminal symbols: S, NP, VP, DT, VBD, NN
Productions: S → NP VP
NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
18

COMP90042
L14
Generating Sentences with CFGs
Always start with S (the sentence/start symbol)
S
Apply a rule with S on LHS (S → NP VP), i.e substitute RHS
NP VP
Apply a rule with NP on LHS (NP → DT NN) DT NN VP
Apply rule with DT on LHS (DT → the) the NN VP
Apply rule with NN on LHS (NN → rat) the rat VP
S → NP VP NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
19

COMP90042
L14
Generating Sentences with CFGs
Apply rule with VP on LHS (VP → VBD NP) the rat VBD NP
Apply rule with VBD on LHS (VBD → ate) the rat ate NP
Apply rule with NP on LHS (NP → DT NN) the rat ate DT NN
Apply rule with DT on LHS (DT → the) the rat ate the NN
Apply rule with NN on LHS (NN → cheese) the rat ate the cheese
S → NP VP NP → DT NN VP → VBD NP DT → the
NN → rat
NN → cheese VBD → ate
No non-terminals left, we’re done!
20

COMP90042
L14
CFG Trees
• Generation corresponds to a syntactic tree
• Non-terminals are internal nodes
• Terminals are leaves
(S (NP (DT the) (NN rat))
(VP (VBG ate) (NP (DT the)
(NN cheese))))
• CFG parsing is the reverse
 process (sentence → tree)
21

COMP90042
L14

• •
S = starting symbol
| = operator OR
Recursive, NUM and S can produce themselves
A CFG for Arithmetic Expressions
22

COMP90042
L14

Parsing Is ‘4’ a valid string?
23

COMP90042
L14

Parsing Is ‘1+2-3’ a valid string?
24

COMP90042
L14
CYK Algorithm
25

COMP90042
L14
CYK Algorithm
Bottom-up parsing
Tests whether a string is valid given a CFG, without enumerating all possible parses


• •
Core idea: form small constituents first, and merge them into larger constituents
Requirement: CFGs must be in Chomsky Normal Forms
26

COMP90042
L14

Change grammar so all rules of form: ‣ A→BC
‣ A→a
Convert rules of form A → B c into: ‣ A→BX
‣X→c

Convert to Chomsky Normal Form
27

COMP90042
L14
Convert to Chomsky Normal Form
Convert rules A → B C D into:
‣A→BY
‣Y→CD
‣ E.g. VP → VP NP NP 

for ditransitive cases, “sold [her] [the book]”


X, Y are new symbols we have introduced
28

COMP90042
L14
• •
Convert to Chomsky Normal Form CNF disallows unary rules, A → B.
Imagine NP → S; and S → NP … leads to infinitely many trees with same yield.


Replace RHS non-terminal with its productions A → B, B → cat, B → dog
A → cat, A → dog

29

COMP90042
L14
• • • • •
Convert grammar to Chomsky Normal Form (CNF) Fill in a parse table (left to right, bottom to top) Use table to derive parse
S in top right corner of table = success!
Convert result back to original grammar
The CYK Parsing Algorithm
30

COMP90042
L14

[0,1]
we
[0,2]
eat
sushi
with
chopsticks
[1,2]
[0,3]
[1,3]
[2,3]
[0,4]
[1,4]
[2,4]
[3,4]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
31

COMP90042
L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks

NP

[0,1]
[0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
32

COMP90042
L14
we
V
[1,2]
eat
[1,3]
sushi
NP
[2,3]
[0,4]
[1,4]
[2,4]
IN
[3,4]
with
chopsticks

NP

[0,1]
Ø [0,2]
[0,3]
[0,5]
[1,5]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
33

COMP90042
L14

NP

[0,1]
we
Ø [0,2]
eat
[0,3]
sushi
[0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
[1,4]
[2,4]
IN
[3,4]
[0,5]
[1,5]
NP
[2,3]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
34

COMP90042
L14

NP

[0,1]
we
Ø [0,2]
V
[1,2]
eat
sushi
S
Split=1
[0,3]
VP
Split=2
[1,3]
with
chopsticks
[0,4]
[0,5]
[1,4]
[1,5]
NP
[2,3]
[2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
35

COMP90042
L14

NP

[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[0,5]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
36

COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks

NP

[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
[1,5]
NP
[2,3]
Ø [2,4]
IN
[3,4]
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
37

COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks

NP

[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
NP
[2,3]
Ø [1,4]
Ø [2,4]
[1,5]
IN
[3,4]
NP
Split=3
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
38

COMP90042
L14
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks

NP

[0,1]
[0,5]
V
[1,2]
VP
Split=2
[1,3]
NP
[2,3]
Ø [1,4]
Ø [2,4]
?
[1,5]
IN
[3,4]
NP
Split=3
[2,5]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
PollEv.com/jeyhanlau569
39

COMP90042
L14
40

COMP90042
L14

NP

[0,1]
we
Ø [0,2]
eat
sushi
S
Split=1
[0,3]
Ø [0,4]
with
chopsticks
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
VP
Split=2
[1,5]
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
41

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
[0,5]
VP
Split=2
[1,3]
Split=3
V
[1,2]
Ø [1,4]
Ø [2,4]
Split=2
[1,5]
VP
NP
[2,3]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
42

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
43

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
[0,4]
sentence is valid!
ØS Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
S → NP VP NP → NP PP PP → IN NP VP → V NP VP → VP PP NP → we
NP → sushi
NP → chopsticks IN → with
V → eat
NP
[2,3]
Ø [1,4]
Split=2
[1,5]
VP
Split=3
Ø [2,4]
NP
Split=3
[2,5]
IN
[3,4]
PP
Split=4
[3,5]
[4,5]
NP
44

COMP90042
L14


S in the top-right corner of parse table indicates success
CYK: Retrieving the Parses
To get parse(s), follow pointers back for each match
45

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
46

COMP90042
L14
we
eat
sushi
with
chopsticks

NP

[0,1]
Ø [0,2]
S
Split=1
[0,3]
Ø [0,4]
S
Split=1
[0,5]
V
[1,2]
VP
Split=2
[1,3]
Ø [1,4]
Ø [2,4]
IN
[3,4]
Split=2
[1,5]
NP
Split=3
[2,5]
VP
Split=3
NP
[2,3]
PP
Split=4
[3,5]
[4,5]
NP
47

COMP90042
L14
CYK Algorithm
fill the diagonals (NP → we) bottom to top
going through the ‘splits’
create the links if they are in the production rules
JM3, Ch 12
48

COMP90042
L14
Representing English with CFGs
49

COMP90042
L14 From Toy Grammars to Real Grammars

• •
Toy grammars with handful of productions good for demonstration or extremely limited domains
For real texts, we need real grammars Many thousands of production rules
50

COMP90042
L14
• • • • • • •
Sentence (S)
Noun phrase (NP)
Verb phrase (VP) Prepositional phrase (PP) Adjective phrase (AdjP) Adverbial phrase (AdvP) Subordinate clause (SBAR)
Key Constituents in Penn Treebank
51

COMP90042
L14
Example PTB/0001
( (S (NP-SBJ
(NP (NNP Pierre) (NNP Vinken) )
(, ,)
(ADJP
(NP (CD 61) (NNS years) )
(JJ old) )
(, ,) )
(VP (MD will)
(VP (VB join)
(NP (DT the) (NN board) )
(PP-CLR (IN as)
(NP (DT a) (JJ nonexecutive) (NN director) ))
(NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))
52

COMP90042
L14
Basic English Sentence Structures • Declarativesentences(S→NPVP)
‣ The rat ate the cheese Imperative sentences (S → VP)
‣ Eat the cheese!
Yes/no questions (S → VB NP VP)
‣ Did the rat eat the cheese? Wh-subject-questions (S → WH VP)
‣ Who ate the cheese?
Wh-object-questions (S → WH VB NP VP) ‣ What did the rat eat?




53

COMP90042
L14


Pre-modifiers
‣ DT, CD, ADJP, NNP, NN
‣ E.g. the two very best Philly cheese steaks
English Noun Phrases
Post-modifiers
‣ PP, VP, SBAR
‣ A delivery from Bob coming today that I don’t want to miss
NP → DT? CD? ADJP? (NN|NNP)+ PP* VP? SBAR?
54

COMP90042
L14
Verb Phrases ‣ MD, AdvP, VB, TO
‣ E.g should really have tried to wait VP → (MD|VB|TO) AdvP? VP


Auxiliaries
Arguments and adjuncts ‣ NP, PP, SBAR, VP, AdvP
‣ E.g told him yesterday that I was ready
‣ E.g. gave John a gift for his birthday to make amends
VP → VB NP? NP? PP* AdvP* VP? SBAR?
55

COMP90042
L14
Other Constituents • Prepositional phrase
‣ PP→INNP
• Adjective phrase
‣ AdjP→(AdvP)JJ
• Adverb phrase
‣ AdvP→(AdvP)RB
• Subordinate clause
‣ SBAR→(IN)S
• Coordination
‣ NP→NPCCNP;VP→VPCCVP;etc.
• Complex sentences
‣ S→SSBAR;S→SBARS;etc.
in the house really nice
not too well
since I came here JackandJill
if he goes, I’ll go
56

COMP90042
L14


Context-free grammars can represent most linguistic structures of natural languages
A Final Word
There are relatively fast dynamic programming algorithms (CYK) to retrieve this structure
57

COMP90042
L14

But what about ambiguity? Often more than one tree can describe a string
Parse Ambiguity
58

COMP90042
L14

Reading E18 Ch. 9.2, 10.1
59