PowerPoint Presentation
LECTURE 10
Grammars and Parsing
Arkaitz Zubiaga, 7th February, 2018
2
What is parsing?
What are consttuencies and dependency structures?
Probabilistc parsing: Context Free Grammars (CFG).
Lexicalised parsing.
Dependency parsing.
LECTURE 10: CONTENTS
3
Parsing: process of recognising a sentence and assigning
syntactc structure to it.
Syntactc structure includes:
Consttuents: how words group together to behave as single
units.
Grammatcal relatons, dependencies, subcategorisaton:
how words and consttuents relate to each other.
GRAMMARS AND PARSING
4
Examples of consttuents: Noun Phrase (NP), Verb Phrase (VP)
Grammatcal relatons:
She ate a large breakfast
“She”: SUBJECT, “large breakfast”: OBJECT
GRAMMARS AND PARSING: EXAMPLE
5
Parsing is key for applicatons needing deep understanding of
language, e.g.:
Grammar checkers.
Informaton extracton.
Machine translaton.
Dialogue management.
Queston answering.
Recently with deep learning and word embeddings, ongoing
discussion as to whether it is stll needed.
WHY IS SYNTAX IMPORTANT?
6
Consttuency
Grammatcal relatons
Dependencies and Heads
Subcategorisaton
Key formalism: Context Free Grammars, Dependency Grammars
Resources: Treebanks
Algorithms for parsing
SYNTACTIC CONCEPTS THAT WE WILL EXPLORE
7
In general, words forming consttuents can move around
together, e.g.:
On 7th September I’d like to fy from London to New York.
I’d like to fy from London to New York on 7th September.
On September I’d like to fy from London to New York 7th.
e.g. a prepositonal phrase (PP) would need a prepositon
followed by a noun (e.g. “In September”, “on tme”)
CONSTITUENCY
8
Analysts said Mr. Stronach wants to resume a more infuental role in running the company.
CONSTITUENCY (PHRASE STRUCTURE)
9
Dependency structure shows which words depend on (modify
or are arguments of) which other words.
DEPENDENCY STRUCTURE
Theboy put the tortoiseonthe rug
10
The computer on the 3rd foor has crashed.
What has crashed?
The computer.
The 3rd foor.
HOW CAN PARSING AND STRUCTURE HELP?
11
Three diferent ways of parsing texts:
Probabilistc parsing: context-free grammars.
Lexicalised parsing.
Dependency parsing.
APPROACHES TO PARSING
PROBABILISTIC PARSING:
CONTEXT-FREE GRAMMARS
13
Context-free grammars (CFGs) consist of:
Terminals.
Non-terminals.
Rules.
which allow us to produce grammatcal sentences.
CONTEXT-FREE GRAMMARS (CFG)
14
S → NP VP N → children
VP → V NP PP N → candy
NP → N V → buy
PP → P NP N → money
P → with
V → eat
CFG EXAMPLE
Non-terminals
Rules
Terminals
15
S → NP VP N → children
VP → V NP PP N → candy
NP → N V → buy
PP → P NP N → money
P → with
V → eat
CFG EXAMPLE
S
NP VP
N V NP PP
children buy candy
N P
NP
N
with money
16
Like CFG, but each rule has a probability.
S → NP VP (1.0) N → children (0.5)
VP → V NP (0.6) N → candy (0.3)
VP → V NP PP (0.4) N → money (0.2)
… …
PROBABILISTIC CFG (PCFG)
1.0
1.0
17
Cocke-Kasami-Younger (CKY) parsing.
We have the sentence:
children buy candy with money
We aim to infer the whole tree,
as in the picture on the right.
CKY PARSING ALGORITHM
S
NP VP
N V NP PP
children buy candy
N P
NP
N
with money
18
Having the entre consttuency list (T, N, R).
Use botom-up approach to fll in the tree.
CKY PARSING ALGORITHM
19
CKY PARSING ALGORITHM
children buy candy with money
20
CKY PARSING ALGORITHM
children buy candy with money
1 1 1 1 1
21
CKY PARSING ALGORITHM
children buy candy with money
1 1 1 1 1
2 2 2 2
22
CKY PARSING ALGORITHM
children buy candy with money
1 1 1 1 1
2 2 2 2
3 3 3
4 4
5
23
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.55
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0
24
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.55
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0
25
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.55
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
0.90.50.1
S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0
26
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.55
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
0.90.50.1
0.9*0.06*0.35 = 0.0189
0.5*0.14*0.1 = 0.007
0.1*0.14*0.35 = 0.049
27
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.5
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
0.90.50.1
0.9*0.06*0.35 = 0.0189
0.5*0.14*0.1 = 0.007
0.1*0.14*0.35 = 0.049
The Viterbi (maximum) score
determines the predicted
parsed tree via CKY.
28
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.5
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
0.90.50.1
0.9*0.06*0.35 = 0.0189
0.5*0.14*0.1 = 0.007
0.1*0.14*0.35 = 0.049
It’s larger, but doesn’t belong
to a “S → …” rule (i.e. whole
sentence)
29
CKY PARSING ALGORITHM
people fish
N
P →
0.35
V →
0.1
N
→
0.5
VP →
0.06
N
P →
0.14
V →
0.6
N
→
0.2
0.90.50.1
0.9*0.06*0.35 = 0.0189
0.5*0.14*0.1 = 0.007
0.1*0.14*0.35 = 0.049
If there were more levels
(upwards) in the tree, then
we’d choose this one, and
carry on.
30
So far we have only considered the probabilites of POS-based
rules, e.g. P(JJ NP) → 0.3
We can “lexicalise” that, by considering the probabilites of
words occurring in diferent consttuents, e.g.:
“money”: noun (most common, P=0.9) or adjectve (P=0.1).
“money laundering”: can be JJ NP or NP NP.
However, “money” more likely to occur in NP NP than in
JJ NP → then increases P(NP NP).
LEXICALISATION OF PCFGs
31
Probability of diferent verbal complement frames (i.e.,
“subcategorisatons”) depends on the verb:
LEXICALISATION OF PCFGs
Local Tree come take think want
VP → V 9.5% 2.6% 4.6% 5.7%
VP → V NP 1.1% 32.1% 0.2% 13.9%
VP → V PP 34.5% 3.1% 7.1% 0.3%
VP → V SBAR 6.6% 0.3% 73.0% 0.2%
VP → V S 2.2% 1.3% 4.8% 70.8%
VP → V NP S 0.1% 5.7% 0.0% 0.3%
VP → V PRT NP 0.3% 5.8% 0.0% 0.0%
VP → V PRT PP 6.1% 1.5% 0.2% 0.0%
32
Trees can be more complex.
CKY: break it down, botom-up; POS tagging for leaves, then go up.
CONSTITUENT TREE
33
PENN TREEBANK NON-TERMINALS
34
Dependent on entre tree, e.g. “eat sushi” is diferent in the below
examples.
STATE-OF-THE-ART
35
PCFG achieves ~73% on Penn TreeBank.
State-of-the art ~92%: Lexicalised PCFG.
STATE-OF-THE-ART
DEPENDENCY PARSING
37
Dependency syntax postulates that syntactc structure consists of
lexical items linked by binary asymmetric relatons (“arrows”)
called dependencies.
DEPENDENCY PARSING
38
DEPENDENCY PARSING
39
DEPENDENCY PARSING
submitted
Bills were
Brownback
Senator
nsubjpass auxpass prep
nn
immigration
conj
by
cc
and
ports
pobj
prep
on
pobj
Republican
Kansas
pobj
prep
of
appos
Bills on ports and immigration were submitted by Brownback Senator Republican of Kansas
40
The arrow connects a
head
(governor, superior, regent)
with a dependent
(modifer, inferior, subordinate)
DEPENDENCY PARSING submitted
Bills were
Brownback
Senator
nsubjpass auxpass prep
nn
immigration
conj
by
cc
and
ports
pobj
prep
on
pobj
Republican
Kansas
pobj
prep
of
appos
41
How do we decide which one’s the head?
Usually defne heads in PCFG, e.g.:
S → NP VP
VP → VBD NP PP
NP → DT JJ NN
DEPENDENCY PARSING
42
Graph Algorithms:
Consider all word pairs.
Create a Maximum Spanning Tree for a sentence.
Transiton-based Approaches:
Similar to how we parse a program:
Shif-Reduce Parser: MaltParser.
DEPENDENCY PARSING
43
MALTPARSER (Nivre et al. 2008)
Simple form of greedy discriminative dependency parser.
The parser does a sequence of bottom up actions.
The parser has:
a stack σ, written with top to the right
starts with the ROOT symbol
a buffer β, written with top to the left
starts with the input sentence
a set of dependency arcs A, which starts off empty
a set of actions
44
MALTPARSER (Nivre et al. 2008)
Start: σ = [ROOT], β = w
1
, …, w
n
, A = ∅
1.Left-Arc
r
σ|w
i
, w
j
|β, A → σ, w
j
|β, A {r(w∪
j
,w
i
)}
Precondition: r’ (w
k
, w
i
) A, w∉
i
≠ ROOT
2.Right-Arc
r
σ|wi, wj|β, A → σ|w
i
|w
j
, β, A {r(w∪
i
,w
j
)}
3.Reduce σ|w
i
, β, A → σ, β, A
1. Precondition: r’ (w
k
, w
i
) A∈
4.Shift σ, w
i
|β, A → σ|w
i
, β, A
Finish: β = ∅
45
MALTPARSER
1. Lef-Arcr σ|wi, wj|β, A σ, wj|β, A {∪ r(wj,wi)}
Preconditon: (wk, r’, wi) A, ∉ wi ≠ ROOT
2. Right-Arcr σ|wi, wj|β, A σ|wi|wj, β, A {∪ r(wi,wj)}
3. Reduce σ|wi, β, A σ, β, A
Preconditon: (wk, r’, wi) A∈
4. Shif σ, wi|β, A σ|wi, β, A
1. Shift:
2. Left-arc:
3. Shift:
4. Right-arc:
46
MALTPARSER
1. Lef-Arcr σ|wi, wj|β, A σ, wj|β, A {∪ r(wj,wi)}
Preconditon: (wk, r’, wi) A, ∉ wi ≠ ROOT
2. Right-Arcr σ|wi, wj|β, A σ|wi|wj, β, A {∪ r(wi,wj)}
3. Reduce σ|wi, β, A σ, β, A
Preconditon: (wk, r’, wi) A∈
4. Shif σ, wi|β, A σ|wi, β, A
4. Right-arc:
…
8. Reduce:
…
16. Reduce:
47
RESOURCES
There is a long list of Treebanks available for a wide range of
languages, a good list can be found here:
https://en.wikipedia.org/wiki/Treebank
https://en.wikipedia.org/wiki/Treebank
48
REFERENCES
Kasami, T. (1965). An Efficient Recognition and Syntax Analysis
Algorithm for Context-Free Languages.
Eisner, J. M. (1996, August). Three new probabilistic models for
dependency parsing: An exploration. In Proceedings of the 16th
conference on Computational linguistics-Volume 1 (pp. 340-345).
Association for Computational Linguistics.
McDonald, R., Pereira, F., Ribarov, K., & Hajič, J. (2005, October).
Non-projective dependency parsing using spanning tree algorithms. In
Proceedings of the conference on Human Language Technology and
Empirical Methods in Natural Language Processing (pp. 523-530).
Association for Computational Linguistics.
49
REFERENCES
Karlsson, F. (1990, August). Constraint grammar as a framework for
parsing running text. In Proceedings of the 13th conference on
Computational linguistics-Volume 3 (pp. 168-173). Association for
Computational Linguistics.
Nivre, J. (2008). Algorithms for deterministic incremental dependency
parsing. Computational Linguistics, 34(4), 513-553.
50
ASSOCIATED READING
Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 2nd edition. Chapter 12.
Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 3rd edition. Chapters 12-
14.
Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Slide 45
Slide 46
Slide 47
Slide 48
Slide 49
Slide 50