CS计算机代考程序代写 chain algorithm l14-context-free-grammar-v3

l14-context-free-grammar-v3

COPYRIGHT 2021, THE UNIVERSITY OF MELBOURNE
1

COMP90042
Natural Language Processing

Lecture 14
Semester 1 2021 Week 7

Jey Han Lau

Context-Free Grammar

COMP90042 L14

2

Recap

• Center embedding

‣ The cat loves Mozart

‣ The cat the dog chased loves Mozart

‣ The cat the dog the rat bit chased loves Mozart

‣ The cat the dog the rat the elephant admired bit
chased loves Mozart

• Cannot be captured by regular expressions (SnVn)

• Context-free grammar!

COMP90042 L14

3

Basics of Context-Free Grammars

• Symbols
‣ Terminal: word such as book

‣ Non-terminal: syntactic label such as NP or VP

• Productions (rules)
‣ W → X Y Z

‣ Exactly one non-terminal on left-hand side (LHS)

‣ An ordered list of symbols on right-hand side (RHS);
can be terminals or non-terminals

• Start symbol: S

convention:

lowercase for terminals

uppercase for non-terminals

COMP90042 L14

4

Why “Context Free”

• Production rule depends only on the LHS (and not
on ancestors, neighbours)

‣ Analogous to Markov chain

‣ Behaviour at each step depends only on
current state

W → X Y Z

COMP90042 L14

5

Context-Free vs. Regular

• Context-free languages more general than regular
languages

‣ Allows recursive nesting

COMP90042 L14

6

CFG Parsing

• Given production rules

‣ S → a S b

‣ S → a b

• And a string

‣ aaabbb

• Produce a valid parse tree

COMP90042 L14

7

What This Means?

• If English can be represented with CFG:

‣ first develop the production rules

‣ can then build a “parser” to automatically judge
whether a sentence is grammatical!

• But is natural language context-free?

• Not quite: cross-serial dependencies (ambncmdn)

COMP90042 L14

8

But…

• CFG strike a good balance:

‣ CFG covers most syntactic patterns

‣ CFG parsing is computational efficient

• We use CFG to describe a core fragment of
English syntax

COMP90042 L14

9

Outline

• Constituents

• CYK Algorithm

• Representing English with CFGs

COMP90042 L14

10

Constituents

COMP90042 L14

11

Syntactic Constituents

• Sentences are broken into constituents

‣ word sequence that function as a coherent
unit for linguistic analysis

‣ helps build CFG production rules

• Constituents have certain key properties:

‣ movement

‣ substitution

‣ coordination

COMP90042 L14

12

Movement

• Constituents can be moved around sentences

‣ Abigail gave [her brother] [a fish]

‣ Abigail gave [a fish] to [her brother]

• Contrast: [gave her], [brother a]

COMP90042 L14

13

Substitution

• Constituents can be substituted by other phrases
of the same type

‣ Max thanked [his older sister]

‣ Max thanked [her]

• Contrast: [Max thanked], [thanked his]

COMP90042 L14

14

Coordination

• Constituents can be conjoined with coordinators
like and and or

‣ [Abigail] and [her young brother] brought a fish

‣ Abigail [bought a fish] and [gave it to Max]

‣ Abigail [bought] and [greedily ate] a fish

COMP90042 L14

15

Constituents and Phrases

• Once we identify constituents, we use phrases to
describe them

• Phrases are determined by their head word:

‣ noun phrase: her younger brother

‣ verb phrase: greedily ate it

• We can use CFG to formalise these intuitions

COMP90042 L14

16

• a lecture
• gave a lecture
• a pie
• away a pie

PollEv.com/jeyhanlau569

He gave a lecture and away a pie.
Which of the following is a constituent

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L14

17

COMP90042 L14

18

A Simple CFG for English

Terminal symbols: rat, the, ate, cheese
Non-terminal symbols: S, NP, VP, DT, VBD, NN
Productions:

S → NP VP
NP → DT NN
VP → VBD NP
DT → the
NN → rat
NN → cheese
VBD → ate

COMP90042 L14

19

Generating Sentences with CFGs
Always start with S (the sentence/start symbol)

S
Apply a rule with S on LHS (S → NP VP), i.e substitute RHS

NP VP
Apply a rule with NP on LHS (NP → DT NN)

DT NN VP
Apply rule with DT on LHS (DT → the)

the NN VP
Apply rule with NN on LHS (NN → rat)

the rat VP

S → NP VP
NP → DT NN
VP → VBD NP
DT → the
NN → rat
NN → cheese
VBD → ate

COMP90042 L14

20

Generating Sentences with CFGs
Apply rule with VP on LHS (VP → VBD NP)

the rat VBD NP
Apply rule with VBD on LHS (VBD → ate)

the rat ate NP
Apply rule with NP on LHS (NP → DT NN)

the rat ate DT NN
Apply rule with DT on LHS (DT → the)

the rat ate the NN
Apply rule with NN on LHS (NN → cheese)

the rat ate the cheese No non-terminals
left, we’re done!

S → NP VP
NP → DT NN
VP → VBD NP
DT → the
NN → rat
NN → cheese
VBD → ate

COMP90042 L14

21

CFG Trees

• Generation corresponds to a syntactic tree
• Non-terminals are internal nodes
• Terminals are leaves

• CFG parsing is the reverse 

process (sentence → tree)

(S (NP (DT the)
(NN rat))
(VP (VBG ate)
(NP (DT the)
(NN cheese))))

COMP90042 L14

22

A CFG for Arithmetic Expressions

• S = starting symbol

• = operator OR

• Recursive, NUM and S can produce themselves

|

COMP90042 L14

23

Parsing

• Is ‘4’ a valid string?

COMP90042 L14

24

Parsing

• Is ‘1+2-3’ a valid string?

COMP90042 L14

25

CYK Algorithm

COMP90042 L14

26

CYK Algorithm

• Bottom-up parsing

• Tests whether a string is valid given a CFG,
without enumerating all possible parses

• Core idea: form small constituents first, and merge
them into larger constituents

• Requirement: CFGs must be in Chomsky Normal
Forms

COMP90042 L14

27

Convert to Chomsky Normal Form

• Change grammar so all rules of form:
‣ A → B C
‣ A → a

• Convert rules of form A → B c into:
‣ A → B X
‣ X → c

COMP90042 L14

28

Convert to Chomsky Normal Form
• Convert rules A → B C D into:
‣ A → B Y
‣ Y → C D
‣ E.g. VP → VP NP NP 


for ditransitive cases, “sold [her] [the book]”
• X, Y are new symbols we have introduced

COMP90042 L14

29

Convert to Chomsky Normal Form

• CNF disallows unary rules, A → B.

• Imagine NP → S; and S → NP … leads to
infinitely many trees with same yield.

• Replace RHS non-terminal with its productions

• A → B, B → cat, B → dog

• A → cat, A → dog

COMP90042 L14

30

The CYK Parsing Algorithm

• Convert grammar to Chomsky Normal Form (CNF)

• Fill in a parse table (left to right, bottom to top)

• Use table to derive parse

• S in top right corner of table = success!

• Convert result back to original grammar

COMP90042 L14

31

we eat sushi with chopsticks


[0,1] [0,2] [0,3] [0,4] [0,5]

[1,2] [1,3] [1,4] [1,5]

[2,3] [2,4] [2,5]

[3,4] [3,5]

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

COMP90042 L14

32

we eat sushi with chopsticks


NP


[0,1] [0,2] [0,3] [0,4] [0,5]

V

[1,2] [1,3] [1,4] [1,5]

NP

[2,3] [2,4] [2,5]

IN

[3,4] [3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

COMP90042 L14

33

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2] [0,3] [0,4] [0,5]

V

[1,2] [1,3] [1,4] [1,5]

NP

[2,3] [2,4] [2,5]

IN

[3,4] [3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

COMP90042 L14

34

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2] [0,3] [0,4] [0,5]

V

[1,2]

VP

[1,3] [1,4] [1,5]

NP

[2,3] [2,4] [2,5]

IN

[3,4] [3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

COMP90042 L14

35

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3] [0,4] [0,5]

V

[1,2]

VP

[1,3] [1,4] [1,5]

NP

[2,3] [2,4] [2,5]

IN

[3,4] [3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

COMP90042 L14

36

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4] [1,5]

NP

[2,3]

Ø

[2,4] [2,5]

IN

[3,4] [3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

COMP90042 L14

37

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4] [1,5]

NP

[2,3]

Ø

[2,4] [2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

Split=4

COMP90042 L14

38

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4] [1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

Split=4

Split=3

COMP90042 L14

39

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

?

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

Split=4

Split=3

PollEv.com/jeyhanlau569

http://PollEv.com/jeyhanlau569
http://PollEv.com/jeyhanlau569

COMP90042 L14

40

COMP90042 L14

41

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=1

Split=4

Split=3

Split=2

COMP90042 L14

42

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4] [0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=2

Split=4

Split=2

Split=3

Split=1

Split=3

COMP90042 L14

43

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4]

S

[0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=1

Split=2

Split=4

Split=3

Split=2

Split=3

Split=1

COMP90042 L14

44

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4]

S

[0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

S → NP VP
NP → NP PP
PP → IN NP
VP → V NP
VP → VP PP
NP → we
NP → sushi
NP → chopsticks
IN → with
V → eat

Split=1

Split=2

Split=4

Split=3

Split=2

Split=3

Split=1

sentence is valid!

COMP90042 L14

45

CYK: Retrieving the Parses

• S in the top-right corner of parse table indicates
success

• To get parse(s), follow pointers back for each
match

COMP90042 L14

46

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4]

S

[0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

Split=1

Split=2

Split=4

Split=3

Split=2

Split=3

Split=1

COMP90042 L14

47

we eat sushi with chopsticks


NP


[0,1]

Ø

[0,2]

S

[0,3]

Ø

[0,4]

S

[0,5]

V

[1,2]

VP

[1,3]

Ø

[1,4]

VP

[1,5]

NP

[2,3]

Ø

[2,4]

NP

[2,5]

IN

[3,4]

PP

[3,5]

NP

[4,5]

Split=1

Split=2

Split=4

Split=3

Split=2

Split=3

Split=1

COMP90042 L14

48

CYK Algorithm

JM3, Ch 12

fill the diagonals (NP → we)

bottom to top
going through the ‘splits’

create the links if they are in the production rules

COMP90042 L14

49

Representing English with CFGs

COMP90042 L14

50

From Toy Grammars to Real Grammars

• Toy grammars with handful of productions good for
demonstration or extremely limited domains

• For real texts, we need real grammars

• Many thousands of production rules

COMP90042 L14

51

Key Constituents in Penn Treebank

• Sentence (S)

• Noun phrase (NP)

• Verb phrase (VP)

• Prepositional phrase (PP)

• Adjective phrase (AdjP)

• Adverbial phrase (AdvP)

• Subordinate clause (SBAR)

COMP90042 L14

52

Example PTB/0001

( (S 
    (NP-SBJ 
      (NP (NNP Pierre) (NNP Vinken) )
      (, ,) 
      (ADJP 
        (NP (CD 61) (NNS years) )
        (JJ old) )
      (, ,) )
    (VP (MD will) 
      (VP (VB join) 
        (NP (DT the) (NN board) )
        (PP-CLR (IN as) 
          (NP (DT a) (JJ nonexecutive) (NN director) ))
        (NP-TMP (NNP Nov.) (CD 29) )))
    (. .) ))

COMP90042 L14

53

Basic English Sentence Structures

• Declarative sentences (S → NP VP)
‣ The rat ate the cheese

• Imperative sentences (S → VP)
‣ Eat the cheese!

• Yes/no questions (S → VB NP VP)
‣ Did the rat eat the cheese?

• Wh-subject-questions (S → WH VP)
‣ Who ate the cheese?

• Wh-object-questions (S → WH VB NP VP)
‣ What did the rat eat?

COMP90042 L14

54

English Noun Phrases

• Pre-modifiers
‣ DT, CD, ADJP, NNP, NN
‣ E.g. the two very best Philly cheese steaks

• Post-modifiers
‣ PP, VP, SBAR
‣ A delivery from Bob coming today that I don’t want to

miss

NP → DT? CD? ADJP? (NN|NNP)+ PP* VP? SBAR?

COMP90042 L14

55

Verb Phrases

• Auxiliaries
‣ MD, AdvP, VB, TO
‣ E.g should really have tried to wait

VP → (MD|VB|TO) AdvP? VP

• Arguments and adjuncts
‣ NP, PP, SBAR, VP, AdvP
‣ E.g told him yesterday that I was ready
‣ E.g. gave John a gift for his birthday to make amends

VP → VB NP? NP? PP* AdvP* VP? SBAR?

COMP90042 L14

56

Other Constituents
• Prepositional phrase

‣ PP → IN NP in the house

• Adjective phrase
‣ AdjP → (AdvP) JJ really nice

• Adverb phrase
‣ AdvP → (AdvP) RB not too well

• Subordinate clause
‣ SBAR → (IN) S since I came here

• Coordination
‣ NP → NP CC NP; VP → VP CC VP; etc. Jack and Jill

• Complex sentences
‣ S → S SBAR; S → SBAR S; etc. if he goes, I’ll go

COMP90042 L14

57

A Final Word

• Context-free grammars can represent most
linguistic structures of natural languages

• There are relatively fast dynamic programming
algorithms (CYK) to retrieve this structure

COMP90042 L14

58

Parse Ambiguity

• But what about ambiguity? Often more than one
tree can describe a string

COMP90042 L14

59

Reading

• E18 Ch. 9.2, 10.1