程序代写代做代考 algorithm deep learning PowerPoint Presentation

PowerPoint Presentation

LECTURE 10

Grammars and Parsing

Arkaitz Zubiaga, 7th February, 2018

 What is parsing?

 What are consttuencies and dependency structures?

 Probabilistc parsing: Context Free Grammars (CFG).

 Lexicalised parsing.

 Dependency parsing.

LECTURE 10: CONTENTS

 Parsing: process of recognising a sentence and assigning
syntactc structure to it.

 Syntactc structure includes:

 Consttuents: how words group together to behave as single
units.

 Grammatcal relatons, dependencies, subcategorisaton:
how words and consttuents relate to each other.

GRAMMARS AND PARSING

 Examples of consttuents: Noun Phrase (NP), Verb Phrase (VP)

 Grammatcal relatons:

 She ate a large breakfast

 “She”: SUBJECT, “large breakfast”: OBJECT

GRAMMARS AND PARSING: EXAMPLE

 Parsing is key for applicatons needing deep understanding of
language, e.g.:

 Grammar checkers.

 Informaton extracton.

 Machine translaton.

 Dialogue management.

 Queston answering.

 Recently with deep learning and word embeddings, ongoing
discussion as to whether it is stll needed.

WHY IS SYNTAX IMPORTANT?

 Consttuency

 Grammatcal relatons

 Dependencies and Heads

 Subcategorisaton

 Key formalism: Context Free Grammars, Dependency Grammars

 Resources: Treebanks

 Algorithms for parsing

SYNTACTIC CONCEPTS THAT WE WILL EXPLORE

 In general, words forming consttuents can move around
together, e.g.:

 On 7th September I’d like to fy from London to New York.

 I’d like to fy from London to New York on 7th September.

 On September I’d like to fy from London to New York 7th.

 e.g. a prepositonal phrase (PP) would need a prepositon
followed by a noun (e.g. “In September”, “on tme”)

CONSTITUENCY

 Analysts said Mr. Stronach wants to resume a more infuental role in running the company.

CONSTITUENCY (PHRASE STRUCTURE)

 Dependency structure shows which words depend on (modify
or are arguments of) which other words.

DEPENDENCY STRUCTURE

Theboy put the tortoiseonthe rug

 The computer on the 3rd foor has crashed.

 What has crashed?

 The computer.

 The 3rd foor.

HOW CAN PARSING AND STRUCTURE HELP?

 Three diferent ways of parsing texts:

 Probabilistc parsing: context-free grammars.

 Lexicalised parsing.

 Dependency parsing.

APPROACHES TO PARSING

PROBABILISTIC PARSING:
CONTEXT-FREE GRAMMARS

 Context-free grammars (CFGs) consist of:

 Terminals.

 Non-terminals.

 Rules.

 which allow us to produce grammatcal sentences.

CONTEXT-FREE GRAMMARS (CFG)

 S → NP VP N → children
VP → V NP PP N → candy
NP → N V → buy
PP → P NP N → money

P → with
V → eat

CFG EXAMPLE

Non-terminals

Rules

Terminals

 S → NP VP N → children
VP → V NP PP N → candy
NP → N V → buy
PP → P NP N → money

P → with
V → eat

CFG EXAMPLE

NP VP

N V NP PP

children buy candy

N P
NP

with money

 Like CFG, but each rule has a probability.

S → NP VP (1.0) N → children (0.5)
VP → V NP (0.6) N → candy (0.3)
VP → V NP PP (0.4) N → money (0.2)
… …

PROBABILISTIC CFG (PCFG)

1.0
1.0

 Cocke-Kasami-Younger (CKY) parsing.

 We have the sentence:
children buy candy with money

We aim to infer the whole tree,
as in the picture on the right.

CKY PARSING ALGORITHM

NP VP

N V NP PP

children buy candy

N P
NP

with money

 Having the entre consttuency list (T, N, R).

 Use botom-up approach to fll in the tree.

CKY PARSING ALGORITHM

children buy candy with money

CKY PARSING ALGORITHM

children buy candy with money

1 1 1 1 1

CKY PARSING ALGORITHM

children buy candy with money

1 1 1 1 1

2 2 2 2

CKY PARSING ALGORITHM

children buy candy with money

1 1 1 1 1

2 2 2 2

3 3 3

4 4

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.55

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.55

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.55

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

0.90.50.1

S → NP VP 0.9
S → VP 0.1
VP → V NP 0.5
VP → V 0.1
VP → V @VP_V 0.3
VP → V PP 0.1
@VP_V → NP PP 1.0
NP → NP NP 0.1
NP → NP PP 0.2
NP → N 0.7
PP → P NP 1.0

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.55

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

0.90.50.1

0.9*0.06*0.35 = 0.0189

0.5*0.14*0.1 = 0.007

0.1*0.14*0.35 = 0.049

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.5

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

0.90.50.1

0.9*0.06*0.35 = 0.0189

0.5*0.14*0.1 = 0.007

0.1*0.14*0.35 = 0.049

The Viterbi (maximum) score
determines the predicted
parsed tree via CKY.

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.5

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

0.90.50.1

0.9*0.06*0.35 = 0.0189

0.5*0.14*0.1 = 0.007

0.1*0.14*0.35 = 0.049

It’s larger, but doesn’t belong
to a “S → …” rule (i.e. whole
sentence)

CKY PARSING ALGORITHM

people fish

N
P →

0.35

V →
0.1

N
→

0.5

VP →
0.06

N
P →

0.14

V →
0.6

N
→

0.2

0.90.50.1

0.9*0.06*0.35 = 0.0189

0.5*0.14*0.1 = 0.007

0.1*0.14*0.35 = 0.049

If there were more levels
(upwards) in the tree, then
we’d choose this one, and
carry on.

 So far we have only considered the probabilites of POS-based
rules, e.g. P(JJ NP) → 0.3

 We can “lexicalise” that, by considering the probabilites of
words occurring in diferent consttuents, e.g.:

 “money”: noun (most common, P=0.9) or adjectve (P=0.1).

 “money laundering”: can be JJ NP or NP NP.

 However, “money” more likely to occur in NP NP than in
JJ NP → then increases P(NP NP).

LEXICALISATION OF PCFGs

 Probability of diferent verbal complement frames (i.e.,
“subcategorisatons”) depends on the verb:

LEXICALISATION OF PCFGs

Local Tree come take think want
VP → V 9.5% 2.6% 4.6% 5.7%

VP → V NP 1.1% 32.1% 0.2% 13.9%

VP → V PP 34.5% 3.1% 7.1% 0.3%

VP → V SBAR 6.6% 0.3% 73.0% 0.2%

VP → V S 2.2% 1.3% 4.8% 70.8%

VP → V NP S 0.1% 5.7% 0.0% 0.3%

VP → V PRT NP 0.3% 5.8% 0.0% 0.0%

VP → V PRT PP 6.1% 1.5% 0.2% 0.0%

 Trees can be more complex.

 CKY: break it down, botom-up; POS tagging for leaves, then go up.

CONSTITUENT TREE

PENN TREEBANK NON-TERMINALS

 Dependent on entre tree, e.g. “eat sushi” is diferent in the below
examples.

STATE-OF-THE-ART

 PCFG achieves ~73% on Penn TreeBank.

 State-of-the art ~92%: Lexicalised PCFG.

STATE-OF-THE-ART

DEPENDENCY PARSING

 Dependency syntax postulates that syntactc structure consists of
lexical items linked by binary asymmetric relatons (“arrows”)
called dependencies.

DEPENDENCY PARSING

DEPENDENCY PARSING
submitted

Bills were

Brownback

Senator

nsubjpass auxpass prep

immigration

conj

and

ports

pobj

prep

pobj

Republican

Kansas

pobj

prep

appos

Bills on ports and immigration were submitted by Brownback Senator Republican of Kansas

 The arrow connects a
head
(governor, superior, regent)
with a dependent
(modifer, inferior, subordinate)

DEPENDENCY PARSING submitted

Bills were

Brownback

Senator

nsubjpass auxpass prep

immigration

conj

and

ports

pobj

prep

pobj

Republican

Kansas

pobj

prep

appos

 How do we decide which one’s the head?

 Usually defne heads in PCFG, e.g.:

 S → NP VP

 VP → VBD NP PP

 NP → DT JJ NN

DEPENDENCY PARSING

 Graph Algorithms:

 Consider all word pairs.

 Create a Maximum Spanning Tree for a sentence.

 Transiton-based Approaches:

 Similar to how we parse a program:

 Shif-Reduce Parser: MaltParser.

DEPENDENCY PARSING

MALTPARSER (Nivre et al. 2008)

 Simple form of greedy discriminative dependency parser.
 The parser does a sequence of bottom up actions.
 The parser has:

 a stack σ, written with top to the right
 starts with the ROOT symbol

 a buffer β, written with top to the left
 starts with the input sentence

 a set of dependency arcs A, which starts off empty
 a set of actions

MALTPARSER (Nivre et al. 2008)

 Start: σ = [ROOT], β = w
1
, …, w

n
, A = ∅

1.Left-Arc
r
σ|w

i
, w

j
|β, A → σ, w

j
|β, A {r(w∪

j
,w

i
)}

Precondition: r’ (w
k
, w

i
) A, w∉

i
≠ ROOT

2.Right-Arc
r
σ|wi, wj|β, A → σ|w

i
|w

j
, β, A {r(w∪

i
,w

j
)}

3.Reduce σ|w
i
, β, A → σ, β, A

1. Precondition: r’ (w
k
, w

i
) A∈

4.Shift σ, w
i
|β, A → σ|w

i
, β, A

 Finish: β = ∅

MALTPARSER

1. Lef-Arcr σ|wi, wj|β, A  σ, wj|β, A {∪ r(wj,wi)}
Preconditon: (wk, r’, wi) A, ∉ wi ≠ ROOT

2. Right-Arcr σ|wi, wj|β, A  σ|wi|wj, β, A {∪ r(wi,wj)}
3. Reduce σ|wi, β, A  σ, β, A

Preconditon: (wk, r’, wi) A∈
4. Shif σ, wi|β, A  σ|wi, β, A

1. Shift:

2. Left-arc:

3. Shift:

4. Right-arc:

MALTPARSER

1. Lef-Arcr σ|wi, wj|β, A  σ, wj|β, A {∪ r(wj,wi)}
Preconditon: (wk, r’, wi) A, ∉ wi ≠ ROOT

2. Right-Arcr σ|wi, wj|β, A  σ|wi|wj, β, A {∪ r(wi,wj)}
3. Reduce σ|wi, β, A  σ, β, A

Preconditon: (wk, r’, wi) A∈
4. Shif σ, wi|β, A  σ|wi, β, A

4. Right-arc:

…

8. Reduce:

…

16. Reduce:

RESOURCES

 There is a long list of Treebanks available for a wide range of
languages, a good list can be found here:

https://en.wikipedia.org/wiki/Treebank

REFERENCES

 Kasami, T. (1965). An Efficient Recognition and Syntax Analysis
Algorithm for Context-Free Languages.

 Eisner, J. M. (1996, August). Three new probabilistic models for
dependency parsing: An exploration. In Proceedings of the 16th
conference on Computational linguistics-Volume 1 (pp. 340-345).
Association for Computational Linguistics.

 McDonald, R., Pereira, F., Ribarov, K., & Hajič, J. (2005, October).
Non-projective dependency parsing using spanning tree algorithms. In
Proceedings of the conference on Human Language Technology and
Empirical Methods in Natural Language Processing (pp. 523-530).
Association for Computational Linguistics.

REFERENCES

 Karlsson, F. (1990, August). Constraint grammar as a framework for
parsing running text. In Proceedings of the 13th conference on
Computational linguistics-Volume 3 (pp. 168-173). Association for
Computational Linguistics.

 Nivre, J. (2008). Algorithms for deterministic incremental dependency
parsing. Computational Linguistics, 34(4), 513-553.

ASSOCIATED READING

 Jurafsky, Daniel, and James H. Martin. 2009. Speech and Language
Processing: An Introduction to Natural Language Processing, Speech
Recognition, and Computational Linguistics. 2nd edition. Chapter 12.

Slide 1
Slide 2
Slide 3
Slide 4
Slide 5
Slide 6
Slide 7
Slide 8
Slide 9
Slide 10
Slide 11
Slide 12
Slide 13
Slide 14
Slide 15
Slide 16
Slide 17
Slide 18
Slide 19
Slide 20
Slide 21
Slide 22
Slide 23
Slide 24
Slide 25
Slide 26
Slide 27
Slide 28
Slide 29
Slide 30
Slide 31
Slide 32
Slide 33
Slide 34
Slide 35
Slide 36
Slide 37
Slide 38
Slide 39
Slide 40
Slide 41
Slide 42
Slide 43
Slide 44
Slide 45
Slide 46
Slide 47
Slide 48
Slide 49
Slide 50

Related Posts