COMP5046
Natural Language Processing
Lecture 7: Dependency Parsing
Dr. Caren Han
Semester 1, 2021
School of Computer Science, University of Sydney
0
The course topics
What will you learn in this course?
Week 1: Introduction to Natural Language Processing (NLP)
Week 2: Word Embeddings (Word Vector for Meaning)
Week 3: Word Classification with Machine Learning I Week 4: Word Classification with Machine Learning II
NLP and Machine Learning
Week 5: Language Fundamental
Week 6: Part of Speech Tagging
Week 7: Dependency Parsing
Week 8: Language Model and Natural Language Generation
NLP Techniques
Week 9: Information Extraction: Named Entity Recognition
Advanced Topic
Week 10: Advanced NLP: Attention and Reading Comprehension
Week 11: Advanced NLP: Transformer and Machine Translation
Week 12: Advanced NLP: Pretrained Model in NLP
Week 13: Future of NLP and Exam Review
0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing
1
Linguistic Structure
Computer Language
Tokenisation
A token can be a variable or function name, an operator or a number.
1
Linguistic Structure
Parsing Computer Language
Parsing
The parser turns a list of tokens into a tree of nodes – logical order
Can we apply this in a human (natural) language?
1
Linguistic Structure
Parsing Natural Language (Human Language)
Q: Can we apply this in a human (natural) language?
A: Possible! But it is much more difficult than parsing computer language!
Why?
• No types for words
• No brackets around phrases • Ambiguity!
1
Linguistic Structure
Natural Language: Linguistic Structure
Let’s try to categorise the given words (Part of Speech Tags) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
1
Linguistic Structure
Natural Language: Linguistic Structure
Let’s try to categorise the given words (Part of Speech Tags) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Example: a sentence includes a subject and a predicate.
The subject is noun phrase and the predicate is a verb phrases.
1
Linguistic Structure
Natural Language: Linguistic Structure
Phrase Structure Grammar = Context-free Grammar (CFG) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Example: a sentence includes a subject and a predicate.
The subject is noun phrase and the predicate is a verb phrases.
1
Linguistic Structure
Natural Language: Linguistic Structure
Phrase Structure Grammar = Context-free Grammar (CFG) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Parsing
• Associating tree structures to a sentence, given a grammar
(Context Free Grammar or Dependency Grammar) will talk about this soon!
1
Linguistic Structure
Parsing Natural Language (Human Language)
Q: Can we apply this in a human (natural) language?
A: Possible! But it is much more difficult than parsing computer language!
Why?
• No types for words
• No brackets around phrases • Ambiguity!
1
Linguistic Structure
Syntactic Ambiguities
Grammars are declarative
• They don’t specify how the parse tree will be constructed
Ambiguity
1. Prepositional Phrase (PP) attachment ambiguity
2. Coordination Scope
3. Gaps
4. Particles or Prepositions
5. Gerund or adjective
There are many more ambiguities…
1
Linguistic Structure
Syntactic Ambiguities – PP attachment Ambiguity
I saw the man with the telescope
I saw the man with the telescope I saw the man with the telescope
1
Linguistic Structure
Syntactic Ambiguities – PP attachment Ambiguity Multiply
• A key parsing decision is how we ‘attach’ various constituents
• PPs, adverbial or participial phrases, infinitives, coordinations
The board approved its acquisition by Royal Trustco Ltd.
of Toronto for $27 a share at its monthly meeting
1
Linguistic Structure
Syntactic Ambiguities – Coordination Scope Ambiguity
I ate red apples and bananas
1
Linguistic Structure
Syntactic Ambiguities – Gaps
She never saw a dog and did not smile
?
?
1
Linguistic Structure
Syntactic Ambiguities – Particles or Prepositions
• Some verbs are followed by adverb particles.
(e.g. put on, take off, give away, bring up, call in)
She ran up a large bill She ran up a large hill
Difference between an adverb particle and a preposition.
• the particle is closely tied to its verb to form idiomatic expressions
• the preposition is closely tied to the noun or pronoun it modifies.
1
Linguistic Structure
Syntactic Ambiguities – Gerund or Adjective
Dancing shoes can provide nice experience
Gerund Adjective
1
Linguistic Structure
When and Where do we use Parsing?
Syntactic Analysis checks whether the generated tokens form a meaningful expression
• Humans communicate complex ideas by composing words together into bigger units to convey complex meanings
• We need to understand sentence structure in order to be able to interpret language correctly
• Grammar Checking
• Question Answering
• Machine Translation
• Information Extraction
• Text Classification
• Chatbot
… and many others
1
Linguistic Structure
Two main views of linguistic structure
Constituency Grammar (a.k.a context-free grammar, phrase structure grammar)
• Noam Chomsky (1928 – )
• Immediate constituent analysis
• Insists on classification and distribution
Dependency Grammar
• Lucien Tesniere (1893 – 1954)
• Functional dependency relations
• Emphasises the relations between syntactic units,
thus adding meaningful links (semantics)
1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word
1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word
1
Linguistic Structure
Constituency Grammar
• A basic observation about syntactic structure is that groups of words can act as single units
• Such groups of words are called constituents
• Constituents tend to have similar internal structure, and behave
similarity with respect to other units
Examples
• noun phrases (NP)
• she, the house, Robin Hood and his merry men, etc.
• verb phrases (VP)
• blushed, loves Mary, was told to sit down and be quiet, lived happily ever after
• prepositional phrases (PP)
• on it, with the telescope, through the foggy dew, etc.
1
Linguistic Structure
A sample context-free grammar
I prefer a morning flight
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively
1
Linguistic Structure
A sample context-free grammar
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively
I prefer a morning flight I, prefer, a, morning, flight
PRPVBPDTNN NN
1
Linguistic Structure
A sample context-free grammar
I, prefer, a, morning, flight
PRPVBPDTNN NN
1
Linguistic Structure
A sample context-free grammar
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively
I, Iprefer,ama,omrnoirnnginflgig,fhltight
PRP VBP DT NN NN
NP
NP VP
Nominal
S
1
Linguistic Structure
A sample context-free grammar
1
Linguistic Structure
A sample context-free grammar
S
VP
Verb NP
root (top)
leaves (bottom)
NP Pro I
prefer
Det Nom
a Nom Noun
Noun
flight
morning
1
Linguistic Structure
Treebanks
Corpora where each sentence is annotated with a parse tree
• Treebanks are generally created by
• parsing texts with an existing parser
• having human annotators correct the result
• This requires detailed annotation guidelines for annotating different
grammatical constructions
• Penn Treebank is a popular treebank for English (Wall Street Journal section)
1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word
0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing
2
Dependency Structure Dependency Structure
Syntactic structure: lexical items linked by binary asymmetrical relations (“arrows”) called dependencies
dependencies
compound Red Hat
modifier head
Red – modifier, dependent, child, subordinate
Hat – head, governor, parent, regent
Compound – dependency relations (e.g. subject, prepositional object, etc)
*Head determines the syntactic/semantic category of the construct
*The arrows are commonly typed with the name of grammatical relations
2
Dependency Structure Dependency Parsing
Represents Lexical/syntactic dependencies between words
• A sentence is parsed by choosing for each word what other word
(including ROOT) is it a dependent of
Dependencies form a tree (connected, acyclic, single-head)
• How to make the dependencies a tree – Constraints
Only one word is a dependent of ROOT (the main predicate of a sentence) • Don’t want cycles A → B, B → A
ROOT Fra1nce w2on th3 e wo4rld cu5 p b6y bea7ting Ger8many 1234567 8
2
Dependency Structure Dependency Grammar/Parsing History
Panini’s grammar (4th century BCE)
The notion of dependencies between grammatical units
Ibn Maḍāʾ (12th century)
The first grammarian to use the term dependency in the grammatical sense
Sámuel Brassai, Franz Kern, Heimann Hariton Tiktin (1800 – 1930)
The dependency seems to have coexisted side by side with that of phrase structure
Lucien Tesnière (1959)
Was dominant approach in “East” in 20th Century (Russia, China, …) Good for free-er word order languages
David Hays (1962)
The great development surrounding dependency-based theories has come from computational linguistics
2
Dependency Structure Dependency Grammar/Parsing
Some people draw the arrows one way; some the other way!
• Tesnière had them point from head to dependent…
Usually add a fake ROOT so every word is a dependent of precisely 1 other node
Projectivity vs Non-Projectivity
• There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
• Dependencies parallel to a CFG tree must be projective
• Forming dependencies by taking 1 child of each category as head
• But dependency theory normally does allow non-projective structures to account for displaced constituents
2
Dependency Structure Dependency Grammar/Parsing
Projectivity vs Non-Projectivity
• There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
• Dependencies parallel to a CFG tree must be projective
• Forming dependencies by taking 1 child of each category as head
• But dependency theory normally does allow non-projective structures to account for displaced constituents
2
Dependency Structure Dependency Relations
• The following list shows the 37 universal syntactic relations used in Universal Dependencies v2.
Universal Dependency Relations: http://universaldependencies.org/u/dep/index.html
2
Dependency Structure Dependency Relations with annotations
• The idea of dependency structure goes back a long way
• [Universal Dependencies: http://universaldependencies.org/ ;
• cf. Marcus et al. 1993, The Penn Treebank, Computational Linguistics]
• Starting off, building a treebank seems a lot slower and less useful than building a grammar
• But a treebank gives us many things
• Reusability of the labor
• Many parsers, part-of-speech taggers, etc. can be built on it
• Valuable resource for linguistics
• Broad coverage, not just a few intuitions
• Frequencies and distributional information
• A way to evaluate systems
Dependency Structure Dependency Parsing
Exercise – Let’s do it together!
– Simpler to parse than context-free grammars
I prefer a morning flight
ROOT
2
ROOT
Unionised workers are usually better paid than their non-union counterparts
0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing
3
Dependency Parsing Approaches Methods of Dependency Parsing
• Dynamic programming
Extension of the CYK algorithm to dependency parsing (Eisner, 1996)
• Constraint Satisfaction (Karlsson, 1990)
word(pos(x)) = DET→(label(X)=NMOD, word(mod(x))=NN, pos(x) < mod(x)) A determiner (DET) modifies a noun (NN) on the right with the label NMOD.
• Graph-based Dependency Parsing
Create a Minimum Spanning Tree for a sentence
McDonald et al.’s (2005) MSTParser scores dependencies independently using an Machine Learning classifier
• Transition-based Dependency Parsing (Nivre 2008)
• Neural Dependency Parsing
CYK: Cocke–Younger–Kasami algorithm
3
Dependency Parsing Approaches Graph-based dependency parsers
MST Parser (McDonald et al., 2005)
• Projectivity
• English dependency trees are mostly projective (can be drawn
without crossing dependencies)
• Other languages are not
• Idea
• Dependency parsing is equivalent to search for a maximum spanning
tree in a directed graph
• Chu and Liu (1965) and Edmonds (1967) give an efficient algorithm
for finding MST for directed graphs
3
Dependency Parsing Approaches Graph-based dependency parsers
•
Compute a score for every possible dependency for each edge
0.5
0.8
0.3
2.0 ROOT The big cat
sat
e.g., picking the head for “big”
3
Dependency Parsing Approaches Graph-based dependency parsers
•
Compute a score for every possible dependency for each edge
• Then add an edge from each word to its highest-scoring candidate head
• And repeat the same process for each other word
0.5
0.8
0.3
2.0 ROOT The big cat
sat
e.g., picking the head for “big”
3
Dependency Parsing Approaches Graph-based dependency parsers
• Consider the sentence “Lisa saw Bart”
Root
9 10
saw
11
9
20
Lisa
30
Bart
Root Lisa
saw Bart
30
3
0
3
Dependency Parsing Approaches Methods of Dependency Parsing
Graph-based Dependency Parsing
• Non-deterministic dependency parsing
• Build a complete graph with
directed/weighted edges
• Find the highest scoring tree from a
complete dependency graph
greedy algorithm
Transition-based Dependency Parsing
• Deterministic dependency parsing
• Build a tree by applying a sequence of
transition actions
• Find the highest scoring action
sequence that builds a legal tree
0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing
4
Transition-based Parsing
Greedy transition-based parsing (Nivre 2008)
• A simple form of greedy discriminative dependency parser
• Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.)
• Design a dumber but really fast algorithm and let the machine learning(deep learning) do the rest.
• Eisner’s algorithm (Dynamic Programming-based Dependency Parsing) searches over many different dependency trees at the same time.
• A transition-based dependency parser only builds one tree, in one left-to- right sweep over the input
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
• A sequence of bottom up actions
• Roughly like “shift” or “reduce” in a shift-reduce parser, but the
“reduce” actions are specialized to create dependencies with head on left or right
• It is implemented in most practical transition- based dependency parsers, including MaltParser. The arc-standard algorithm is a simple algorithm for transition-based dependency parsing.
4
Transition-based Parsing Transition-based parsing
A sentence Input buffer
𝑤1
𝑤2
𝑤3
...
𝑤𝑛
Stack buffer
Dependency Relations
s1
s2
s3
s4
s5
...
sn
Parser (Classifier)
“predict” the optimal (best) transition given the current configuration
With machine learning
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack Buffer
Dependency Graph
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
me
a
morning
flight
• Initial configuration:
• All words are in the buffer.
• The stack is empty or starts with the ROOT symbol
• The dependency graph is empty.
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
a
morning
flight
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
ROOT cannot have incoming arc
Buffer
ROOT
Dependency Graph
book
me
a
morning
flight
Left Arc and Right Arc require 2 elements in stack to be applied
•
Possible Transition
Shift
Push the next word in buffer onto the stack
• •
Left-Arc
Add an arc from the topmost word to • the 2nd-topmost word on the stack
Right-Arc
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove 2nd word from stack
• Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
me
a
morning
flight
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
a
morning
flight
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
a
morning
flight
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
me
a
morning
flight
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
a
morning
flight
book
me
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
a
morning
flight
book
me
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
a
morning
flight
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
morning
flight
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
a
flight
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
a
morning
flight
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
book
flight
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
me
a
morning
flight
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
book
me
a
morning
flight
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
•
• •
• •
Add an arc from the 2nd-topmost word to the topmost word on the stack
Remove the topmost word from stack
ROOT
ROOT
Possible Transition
Shift
Push the next word in buffer onto the stack
Left-Arc
Add an arc from the topmost word to the 2nd-topmost word on the stack
Remove 2nd word from stack
Right-Arc
book
book
me
a
morning
flight
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Stack
Dependency Graph
Buffer
ROOT
ROOT
book
me
• Terminal configuration:
• The buffer is empty.
• The stack contains a single word.
a
morning
flight
4
Transition-based Parsing Transition-based parsing
4
Transition-based Parsing
Transition-based parsing – The arc-standard algorithm
Start: σ = [ROOT], β = w1, ..., wn , A = ∅
1. Shift σ, wi|β, A→σ|wi, β, A
2. Left-Arcr σ|wi|wj, β, A→σ|wj, β, A∪{r(wj,wi)}
3. Right-Arcr σ|wi|wj, β, A → σ|wi, β, A∪{r(wi,wj)}
Finish: σ = [w], β = ∅
How to choose the next action?
4
Transition-based Parsing Transition-based parsing
A sentence Input buffer
𝑤1
𝑤2
𝑤3
...
𝑤𝑛
Stack buffer
Dependency Relations
s1
s2
s3
s4
s5
...
sn
Parser (Classifier)
“predict” the optimal (best) transition given the current configuration
With machine learning
4
Transition-based Parsing How to choose the next action?
Stand back, You know machine learning!
Goal: Predict the next transition (class), given the current configuration.
• We let the parser run on gold-standard trees.
• Every time there is a choice to make, we simply look into the tree and do ‘the
right thing’.
• We collect all (configuration, transition) pairs and train a classifier on them.
• When parsing unseen sentences, we use the trained classifier as a guide.
What if the number of pairs is far too large?
4
Transition-based Parsing Feature Representation
• Define a set of features of configurations that you consider to be relevant for the task of predicting the next transition.
Example: word forms of the topmost two words on the stack and the next two words in the buffer
• Describe every configuration in terms of a feature vector.
• In practical systems, we have thousands of features and hundreds of transitions.
• There are several machine-learning paradigms that can be used to train a
guide for such a task
• Examples: perceptron, decision trees, support-vector machines, memory-based learning
4
Transition-based Parsing Evaluation of Dependency Parsing
𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = # 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒅𝒆𝒑𝒔 # 𝒐𝒇 𝒅𝒆𝒑𝒔
Unlabeled attachment score (UAS) = head Labeled attachment score (LAS) = head and label
4
Transition-based Parsing Evaluation of Dependency Parsing
ROOT She saw the video lecture
012345
Gold Standard Parsed (assume this is what you classified)
1
2
she
nsubj
2
0
saw
root
3
5
the
det
4
5
video
nn
5
2
lecture
obj
1
2
she
nsubj
2
0
saw
root
3
4
the
det
4
5
video
nsubj
5
2
lecture
ccomp
4
Transition-based Parsing Evaluation of Dependency Parsing
ROOT She saw the video lecture
012345
Gold Standard
Parsed (assume this is what you classified)
1
2
she
nsubj
2
0
saw
root
3
5
the
det
4
5
video
nn
5
2
lecture
obj
1
2
she
nsubj
2
0
saw
root
3
4
the
det
4
5
video
nsubj
5
2
lecture
ccomp
Unlabeled attachment score (UAS) = Labeled attachment score (LAS) =
4 / 5= 80% 2 / 5= 40%
0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing
5
Deep Learning-based Parsing Distributed Representations
• Represent each word as a d-dimensional dense vector (i.e., word embedding)
• Similar words are expected to have close vectors.
• NNS (plural noun) should be close to NN (singular noun).
• Meanwhile, part-of-speech tags (POS) and dependency labels are also represented as d-dimensional vectors.
• The smaller discrete sets also exhibit many semantical similarities
5
Deep Learning-based Parsing
Neural Dependency Parsing (Chen & Manning, 2014)
5
Deep Learning-based Parsing Neural Dependency Parsing
Accuracy and parsing speed Accuracy and parsing speed on CTB on PTB + Stanford dependencies.
Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750).
• PTB: English Penn Treebank • CTB: Chinese Penn Treebank
5
Deep Learning-based Parsing Dependency Parsing Trends – Penn Treebank
https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank
5
Deep Learning-based Parsing
Semi-Supervised Sequence Modeling with Cross-View Training (Clark, 2018)
https://arxiv.org/pdf/1809.08370v1.pdf
5
Deep Learning-based Parsing Dependency Parsing Trends – Penn Treebank
https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank
/
Final Exercise
VB PRP Thank you!
VBP DT JJ NN Have a great day!
/
Reference
Reference for this lecture
• Deng, L., & Liu, Y. (Eds.). (2018). Deep Learning in Natural Language Processing. Springer.
• Rao, D., & McMahan, B. (2019). Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. " O'Reilly Media, Inc.".
• Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press.
• Nivri, J (2016). Transition-based dependency parsing, lecture notes, Uppsala Universitet
• Manning, C 2017, Natural Language Processing with Deep Learning, lecture notes, Stanford University
• Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740- 750).
• Eisner, J. M. (1996, August). Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 340-345). Association for Computational Linguistics.