Parsing
Lizhen Qu
Overview of the NLP Lectures
• Introduction to natural language processing (NLP).
• Regular expressions, sentence splitting, tokenization,
part-of-speech tagging.
• Language models.
• Vector semantics.
• Parsing.
– Dependency parsing.
• Semantics.
2
Relation Extraction
• Find worksFor(entity_a, entity_b) relation from text.
3
Mark has worked for Telstra.
Per Org
John works for Facebook.
Christina asks why it is better to work at Facebook than Google.
training data
Per Org
Per Org Org
worksFor
Mark Zuckerberg said he would have worked for Microsoft.
Per Org
Use Shortest Paths between Entity Mentions
4
<-nsubj-prep->pobj->
<-nsubj-ccomp->prep->pobj->
<-nsubj-prep->pobj->
<-nsubj-advcl->prep->pobj-> (Christina, Google)
(John, Facebook)
(Mark, Telstra)
(Zuckerberg, Microsoft)
Dependency Grammar
• Syntactic structure consists of lexical items,
linked by binary asymmetric relations called
dependencies.
• head dependent
• head (governor) : grammatically most important.
• dependent (modifier): modifier, object, or complement.
5
Dependency Trees
• Without labels.
• With labels.
6
Dependency Parsing
• Formal definition for unlabeled dependency trees:
• Dependency parsing: task of mapping an input string to a
dependency graph satisfying certain conditions.
7
Dependency graph D = (V,E) where
• V is the set of nodes (words in a input sequence).
• E is the set of arcs indicating grammatical relations.
• vi ! vj or (vi, vj) 2 E denotes an arc from head vi to dependent vj .
Projective Dependency Tree
8
Ram saw a dog yesterday which was a Yorkshire Terrier
Non-Projective Dependency Tree
9
Ram saw a dog yesterday which was a Yorkshire Terrier
Crossing lines!
English has very few non-projective cases.
Well-Formedness
• A dependency graph is well-formed iff
– Single head: Each word has only one head.
– Acyclic: The graph should be acyclic.
– Connected: There is a path between any pairs of nodes.
– Projective: iif an edge from word A to word B implies that
there exists a directed path in the graph from A to every
word between A and B in the sentences.
10
Parsing Algorithms
• Graph-based parsing
– CYK, Eisner, McDonald
• Transition-based parsing
– Covington, Yamada and Matsumuto, Nivre etc.
11
Nivre’s Algorithm (Arc-eager) [3]
• Transition-based.
•
•
12
Parser configuration < S, I,A >:
• S is the stack. S[0] is the topmost node.
• I is the list of remaining input words. I[0] is the leftmost word.
• A is the set of current dependencies (arcs) for the dependency graph.
INPUT: a word sequence v = v1|…|vn, a set of rules R.
Left-Arc (LA) < vi|S, vj |I, A >)< S, vj |I, A [ {(vj , vi)} vi vj 2 R @vk(vk, vi) 2 A Right-Arc (RA) < vi|S, vj |I, A >)< vj |vi|S, I, A [ {(vi, vj)} vi ! vj 2 R @vk(vk, vj) 2 A Reduce (R) < vi|S, I, A >)< S, I,A > 9(vk, vi) 2 A
Shift (S) < S, vj |I, A >)< vj |S, I, A >
Parser Transitions
13
single head
Parsing Details
• Slight modifications:
– Each dependency graph has an artificial root in order to
form a tree.
– Parsing starts with an initial configuration and
terminates when it reaches through a
sequence of transitions.
• Nondeterministic transitions?
– Priority ordering of transitions.
– Guided parsing.
14
LA > RA >
if S[0] can be a transitive head of I[0], then Shift, otherwise Reduce.
< [ROOT ],n, ; >
< [ROOT ],nil, A >
Grammatical Rules for the Example
15
Noun ! Adj
ROOT ! V erb
Noun ! Det
V erb ! Prep
V erb ! Noun
figure ! on
on ! screen
Example
16
Red figures on the screen indicated falling stocks ROOT
I
ROOT
Example
17
Red figures on the screen indicated falling stocks ROOT
I
Shift
red
ROOT
Example
18
Red figures on the screen indicated falling stocks ROOT
I
Left-arc
ROOT
Example
19
Red figures on the screen indicated falling stocks ROOT
I
Shift
figures
ROOT
Example
20
Red figures on the screen indicated falling stocks ROOT
I
Right-arc
on
figures
ROOT
Example
21
Red figures on the screen indicated falling stocks ROOT
I
Shift
the
on
figures
ROOT
Example
22
Red figures on the screen indicated falling stocks ROOT
I
Left-arc
on
figures
ROOT
Example
23
Red figures on the screen indicated falling stocks ROOT
I
Right-arc
screen
on
figures
ROOT
Example
24
Red figures on the screen indicated falling stocks ROOT
I
Reduce
on
figures
ROOT
Example
25
Red figures on the screen indicated falling stocks ROOT
I
Reduce
figures
ROOT
Example
26
Red figures on the screen indicated falling stocks ROOT
I
Left-arc
ROOT
Example
27
Red figures on the screen indicated falling stocks ROOT
I
Right-arc
indicated
ROOT
Example
28
Red figures on the screen indicated falling stocks ROOT
I
Shift
falling
indicated
ROOT
Example
29
Red figures on the screen indicated falling stocks ROOT
I
Left-arc
indicated
ROOT
Example
30
Red figures on the screen indicated falling stocks ROOT
I
Right-arc
stocks
indicated
ROOT
Example
31
Red figures on the screen indicated falling stocks
I
Reduce
_ROOT_
indicated
ROOT
Example
32
Red figures on the screen indicated falling stocks
I
Reduce
ROOT
ROOT
Configurations of the Example
33
S
LA
S
S
RA
R
RA
S
LA
RA
R
R
;
;
Properties of Nivre’s Algorithm
• O(n) : Linear time complexity.
• Full dependency graphs are well-formed.
34
Dependency Corpora
• CoNLL dependencies.
– http://www.aclweb.org/anthology/D07-1096
• Stanford typed dependencies.
– http://nlp.stanford.edu/software/dependencies_manual.pdf
• Universal dependencies.
– http://universaldependencies.org/u/overview/syntax.html
35
Guided Parsing [6]
36
• Train a classifier to predict parse transitions!
–
– A is a set of typed dependencies (arcs).
• Feature space:
36
f : configuration→ {LA,RA,R,S}× (A∪{nil})
Arc Standard
• Three parse actions.
• Neural networks for action prediction [9].
37
Softmax
hidden layers
Embedding look-up table
Parser configurations
Left-Arc (LA) < vi|vj |S, I, A >)< vi|S, I, A [ {(vi, vj)} Right-Arc (RA) < vi|vj |S, I, A >)< vj |S, I, A [ {(vj , vi)} Shift (S) < S, vj |I, A >)< vj |S, I, A >
Off-the-Shelf Dependency Parsers
• MaltParser (http://www.maltparser.org/)
• SyntaxNet (https://github.com/tensorflow/models/tree/master/research/syntaxnet )
• Stanford parser (http://nlp.stanford.edu/software/lex-parser.shtml)
• TurboParser (http://www.cs.cmu.edu/~ark/TurboParser/)
38
Overview of the NLP Lectures
• Introduction to natural language processing (NLP).
• Regular expressions, sentence splitting, tokenization,
part-of-speech tagging.
• Language models.
• Vector semantics.
• Parsing.
– Dependency parsing.
– Constituency parsing.
• Compositional semantics and NLP applications.
39
Constituency Parsing
• Deeper understanding of word groups and their
grammatical relationships.
40
S
NP VP
AT NNS VBD NP
AT NN the children ate
the cake
constituent parse tree
non-terminal
(clusters or
generalization of
symbols)
terminal
(symbols)
Constituency
• Constituent: a word or a group of words that
behaves as a single unit.
• Why do these words group together?
– Appear in similar syntactic environments.
– Preposed or postposed construction.
41
three parties from Sydney arrive …
Drunk driver fled …
they sit …
from arrive …
the fled …
as sit …
On August 30th, I’d like to fly from Canberra to Sydney.
I’d like to fly on August 30th from Canberra to Sydney.
I’d like to fly from Canberra to Sydney on August 30th.
Context-Free Grammars (CFGs)
• A context free grammar consists of
– a set of context-free rules, each of which expresses the
ways that symbols of the language can be grouped and
ordered together.
– a lexicon of words and symbols.
42
NP Det Nominal
Nominal Noun | Nominal Noun
bus
stop
the
.
a
…
Nominal Noun
Derivations
• The sequence of rule expansions is called a
derivation of the string of words.
– parse tree.
– bracketed notation.
43
the bus stop
Noun Noun Det
Nominal
NP
Nominal
[NP [Det the][Nom [Nom [Noun bus]][Noun stop]]
A Toy Example
44
NP Det Nominal
Nominal Noun | Nominal Noun
Det the | a | an
Noun bus
Noun stop
the bus stop
Nominal Noun
A Toy Example
45
NP Det Nominal
Nominal Noun | Nominal Noun
Det the | a | an
Noun bus
Noun stop
the bus stop
Noun Noun Det
Nominal
Nominal Noun
A Toy Example
46
NP Det Nominal
Nominal Noun | Nominal Noun
Det the | a | an
Noun bus
Noun stop
the bus stop
Noun Noun Det
Nominal
Nominal
Nominal Noun
A Toy Example
47
NP Det Nominal
Nominal Noun | Nominal Noun
Det the | a | an
Noun bus
Noun stop
the bus stop
Noun Noun Det
Nominal
NP
Nominal
Nominal Noun
Formal Definition of CFG
• A context-free grammar .
– is a set of non-terminals.
– is a set of terminal symbols, .
– is a set of rules (productions), each of the form ,
where is a non-terminal, is a string of symbols from
the infinite set of strings .
– is a designated start symbol.
48
G = (N,⌃, R, S)
N
⌃
R
N \ ⌃ = ;
S
A ! B
A B
{⌃ [N}⇤
A Toy Example
49
NP Det Nominal
Nominal Noun | Nominal Noun
Det the | a | an
Noun bus
Noun stop
the bus stop
Noun Noun Det
Nominal
NP
Nominal
Nominal Noun
S
S NP
Ambiguity of Parsing
50
Probabilistic context-free grammar (PCFG)
• A parameter to each grammar rule [3].
51
rule parameter
find the most likely parse tree. T is
set of all possible trees.
pG(t) =
nY
i=1
q(↵ ! �)
arg max
t2TG
pG(t)
Learning PCFG from Treebanks
• Penn treebank and English Web treebank.
52
q⇤(↵ ! �) =
Count(↵ ! �)
Count(↵)
Maximum-Likelihood
estimation:
Top Down Parsing
53
book that flight
arg max
t2TG
pG(t)
Bottom Up Parsing
54
arg max
t2TG
pG(t)
Grammar Equivalence
• Two grammars are equivalent if they generate the
same language (set of strings) .
• Chomsky Normal Form (CNF).
– Allow only two types of rules. The right-hand side of each
rule either has two non-terminals or one terminal,
except .
55
unit production
S ! ”
A ! B a D
C ! ”
E ! A
where A,B,C,D,E 2 N and a 2 ⌃
Grammar Equivalence
• Two grammars are equivalent if they generate the
same language (set of strings) .
• Chomsky Normal Form (CNF).
– Allow only two types of rules. The right-hand side of each
rule either has two non-terminals or one terminal,
except .
– Every context-free grammar can be transformed into an
equivalent one in CNF. 56
S ! ”
G ! E D
F ! B G
E ! a
where B,D,E, F,G 2 N and a 2 ⌃
Dependency Structures vs. Phrase Structures
• Dependency structures explicitly represent
– Head-dependent relations (directed arcs).
– Functional categories (arc labels).
– predicate-argument structure.
• Dependency structure independent of word order.
– Suitable for free word order languages, such as Indian
languages.
• Phrase structures explicitly represent
– Phrases (non-terminal nodes).
– Structural categories (non-terminal labels).
– Fragments are directly interpretable.
57
Available Constituency Parsers
• Stanford parser.
– http://nlp.stanford.edu/software/srparser.shtml
• Charniak-Johnson parser.
– http://web.science.mq.edu.au/~mjohnson/Software.htm
• Charniak parser.
– ftp://ftp.cs.brown.edu/pub/nlparser/
58
References
• [1] Speech and Language Processing (Chapter 12 & 13)
• [2] Lectures Notes on CFGs and CKY Parsing.
– http://web.mit.edu/6.863/www/fall2012/lectures/lecture7-notes.pdf
• [3] http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf
• [4] Richard Socher, Christopher D. Manning, Andrew Y. Ng. Learning
Continuous Phrase Representations and Syntactic Parsing with Recursive
Neural Networks.
• [5] An Efficient Algorithm for Projective Dependency Parsing (We modify it
to allow an artificial tree root.)
• [6] Joakim Nivre, Johan Hall, Jens Nilsson. Memory-Based Dependency
Parsing.
• [7] Exploiting Background Knowledge for Relation Extraction.
– http://www.aclweb.org/anthology/C10-1018
• [8] https://web.stanford.edu/~jurafsky/slp3/14.pdf
• [9] https://cs.stanford.edu/~danqi/papers/emnlp2014.pdf
59