程序代写代做代考 python data structure algorithm Computational Linguistics

Computational Linguistics

Computational

Linguistics

Copyright © 2017 Suzanne

Stevenson , Graeme Hirst

and Gerald Penn. All rights

reserved.

3

3. Chart parsing

Gerald Penn

Department of Computer Science, University of Toronto

CSC 2501 / 485

Fall 2018

Reading: Jurafsky & Martin: 13.3–4. Allen: 3.4, 3.6.

Bird et al: 8.4, online extras 8.2 to end of

section “Chart Parsing in NLTK”.

Efficient parsing
• Want to avoid problems of blind search:

• Avoid redoing analyses that are identical in more
than one path of the search.

• Guide the analysis with both

• the actual input

• the expectations that follow from the choice of a
grammar rule.

• Combine strengths of both top-down and
bottom-up methods.

2

Efficient parsing
• Want to avoid problems of blind search:

• Avoid redoing analyses that are identical in
more than one path of the search.

• Guide the analysis with both

• the actual input

• the expectations that follow from the choice of a
grammar rule.

• Combine strengths of both top-down and
bottom-up methods.

3

4

• Main idea:

• Use data structures to maintain information:
a chart and an agenda

• Agenda:

• List of constituents that need to be processed.

• Chart:

• Records (“memoizes”) work; obviates repetition.

• Related ideas: Well-formed substring table (WFST);
CKY parsing; Earley parsing; dynamic program-
ming.

Chart parsing

Charts 1
• Notation for positions in sentence from 0 to n

(length of sentence):

• 0 The 1 kids 2 opened 3 the 4 box 5

5
From: Steven Bird, Ewan Klein, and Edward Loper, Natural Language

Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons

licence.

RHS not

really there

6

• Contents of chart:

1. Completed constituents (inactive arcs).

• Representation: Labelled arc (edge) from one
point in sentence to another (or same point).

• Directed; always left-to-right (or to self).

• Label is the left nonterminal of the grammar rule
that derived it.

Charts 2

7

• Contents of chart:

2. Partially built constituents (also called active
arcs).
Can think of them as hypotheses.

• Representation: Labelled arc (edge) from one
point in sentence to another (or same point).

• Directed; always left-to-right (or to self).

• Label is grammar rule used for arc.

Charts 2

Notation for arc labels
• Notation: ‘•’ means ‘complete to here’.

• A → X Y • Z
‘In parsing an A, we’ve so far seen an X and a Y,
and our A will be complete once we’ve seen a Z.’

• A → X Y Z •
‘We have seen an X, a Y, and a Z, and hence
completed the parse of an A.’

• A → • X Y Z
‘In parsing an A, so far we haven’t seen anything.’

8

9

F
ro

m
:
S

te
v
e
n
B

ir
d
,

E
w

a
n
K

le
in

,
a
n
d
E

d
w

a
rd

L
o
p
e
r,

N
a
tu

ra
l
L
a
n
g
u
a
g
e

P
ro

c
e
s
s
in

g
i
n
P

y
th

o
n

,
v
.
9
.5

.3
,
J
u
ly

2
0
0
8
.

U
s
e
d
u

n
d
e
r

C
re

a
ti
v
e
C

o
m

m
o
n
s

li
c
e
n
c
e
.

VP → V NP •

VP → V • NP

Fundamental rule of chart parsing

• Arc extension:

Let X, Y, Z be sequences of symbols, where X
and Y are possibly empty.

If the chart contains an active arc from i to j of
the form

A → X • B Y
and a completed arc from j to k of the form

B → Z • or B → word
then add an arc from i to k

A → X B • Y

10

B → Z •A → X • B Y

11
Adapted from: Steven Bird, Ewan Klein, and Edward Loper, Natural Language

Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons

licence.

A → X B • Y

B → Z •

12

Part of a chart from the NLTK

chart parser demo,
nltk.app.chartparser()

13

Part of a chart from the NLTK chart parser demo,
nltk.app.chartparser()

Charts 3
• An arc can connect any positions i, j

(0 ≤ i ≤ j ≤ n).

• Can have > 1 arc on any i,j…

• But only one label for any i-j arc!

• Can associate all arcs on positions i,j with cell
ij of upper-triangular matrix.

14

15

0 1 2 3 4 5 6 7

0

1

2

3

4

5

6

7

The matrix for a seven-word sentence

from the NLTK chart parser demo
nltk.app.chartparser()

Arcs in top right

corner cell cover

the whole sentence.

Those for S are

parse edges.

Bottom-up arc-addition rule

• Arc addition (or prediction):

If the chart contains an completed arc from
i to j of the form

A → X •
and the grammar contains a rule

B → A Z
then add an arc from i to i

B → • A Z
or an arc B → A • Z from i to j.

17

18
Adapted from: Steven Bird, Ewan Klein, and Edward Loper, Natural Language

Processing in Python, v. 9.5.3, July 2008. Used under Creative Commons

licence.

A → X •

B → A • Z

B → • A Z

19

• Initialize chart with each word in the input
sentence (and, in effect, with their lexical
categories).

• Loop until nothing more happens:

• Apply the bottom-up prediction rule wherever you
can.

• Apply the fundamental rule wherever you can.

• Return the trees corresponding to the parse
edges in the chart.

Implies that trees are built as parse progresses and are associated with each arc,
or that each arc keeps pointers to the arcs of its constituents to allow post hoc
reconstruction of trees.

Bottom-up chart parsing BKL’s view

20

>>> nltk.app.chartparser()

Top-down Init Rule

Top-down

Predict Rule

Bottom-up

Left-Corner Strategy

Top-down Strategy

Bottom-up Strategy

Bottom-up Predict Rule

Bottom-up Left-Corner

Predict Rule

Fundamental Rule

Reset Parser

21

• Builds all constituents exactly once (almost –
at least it won’t add more than one inactive
edge with the same label and i-j).

• Never re-computes the prefix of an RHS (of
the same rule – it will if two rules share the
same prefix).

• Exploits context-free nature of rules to reduce
the search. How?

Observations

Controlling the process
• “Wherever you can”: too uncontrolled.

Try to avoid predictions and expansions that
will lead nowhere.

• So use agenda — a list of completed arcs.

• When an arc is completed, it is initially added to
the agenda, not the chart.

• Agenda rules decide which completed arc to move
to the chart next.

• E.g., treat agenda as stack or as queue; or pick
item that looks “most efficient” or “most likely”; or
pick NPs first; or ….

22

23

• Initialize agenda with the list of lexical
categories of each word in the input
sentence.

• Until agenda is empty, repeat:

– Move next constituent C from agenda to chart.

a. Find rules whose RHS starts with C and add
corresponding active arcs to the chart.

b. Find active arcs that continue with C and extend
them; add the new active arcs to the chart.

c. Find active arcs that have been completed; add
their LHS as a new constituent to the agenda.

Bottom-up chart parsing ⁓J&M’s view

33

INITIALIZE:
set Agenda = list of all possible categories of each input word

(in order of input);
set n = length of input;
set Chart = ();

ITERATE:
loop

if Agenda = () then
if there is at least one S constituent from 0 to n then
return SUCCESS
else
return FAIL
end if

else …

Bottom-up chart parsing algorithm 1

34

Set Ci,j = First(Agenda); /* Remove first item from agenda. */
/* Ci,j is a completed constituent of type C from position i to position j

*/
Add Ci,j to Chart;

ARC UPDATE:
a. BOTTOM-UP ARC ADDITION (PREDICTION):

for each grammar rule X → C X1 … XN do
Add arc X → C • X1 … XN, from i to j, to Chart;

b. ARC EXTENSION (FUNDAMENTAL RULE):
for each arc X → X1 … • C … XN, from k to i, do

Add arc X → X1 … C • … XN, from k to j, to Chart;
c. ARC COMPLETION:

for each arc X → X1 … XN C • added in step (a) or step (b) do
Move completed constituent X to Agenda;

end if
end loop

Bottom-up chart parsing algorithm 2

• Ignores useful top-down knowledge (rule
contexts).

Problem with bottom-up

chart parsing

35

36

>>> nltk.app.chartparser()

Add ambiguity to lexicon:

N → saw

V → dog

NP → N

Parse bottom-up:

the dog saw John

• Same as bottom-up, except new arcs are
added to chart only if based on predictions
from existing arcs.

• Initialize chart with unstarted active arcs for S.

S → • X Y

S → • Z Q

• Whenever an active arc is added, also add
unstarted arcs for its next needed constituent.

Top-down chart parsing

38

39

>>> nltk.app.chartparser()

Add ambiguity to lexicon:

N → saw

V → dog

NP → N

Parse top-down:

the dog saw John

41

INITIALIZE:
set Agenda = list of all possible categories of each input word

(in order of input);
set n = length of input;
set Chart = ();
for each grammar rule S → X1 … XN do

Add arc S → • X1 … XN to Chart at position 0;
apply TOP-DOWN ARC ADDITION [step (a’) below] to the new arc;

end for

ITERATE:
loop

if Agenda = () then
if there is at least one S constituent from 0 to n then
return SUCCESS
else
return FAIL
end if

else …

Top-down chart parsing algorithm 1

42

Set Ci,j = First(Agenda); /* Remove first item from agenda. */
/* Ci,j is a completed constituent of type C from position i to

position j */
Add Ci,j to Chart;

ARC UPDATE:
b. ARC EXTENSION (FUNDAMENTAL RULE):

for each arc X → X1 … • C … XN, from k to i, do
Add arc X → X1 … C • … XN, from k to j, to Chart;

a’. TOP-DOWN ARC ADDITION (PREDICTION):
/* Recursive: until no new arcs can be added */

for each arc X → X1 … • XL … XN, from k to j, added in
step (b) or (a’), do

Add arc XL → • Y1 … YM, from j to j, to Chart;
c. ARC COMPLETION:

for each arc X → X1 … XN C • added in step (b) do
Move completed constituent X to Agenda;

end if
end loop

Top-down chart parsing algorithm 2

Notes on chart parsing
• Chart parsing separates:

1. Policy for selecting constituent from agenda;

2. Policy for adding new arcs to chart;

3. Policy for initializing chart and agenda.

• “Top-down” and “bottom-up” now refer to arc-
addition rule.

• Initialization rule gives bottom-up aspect in either
case.

• Polynomial algorithm (around O(n3)), instead
of exponential.

43