CS计算机代考程序代写 decision tree deep learning algorithm COMP5046

COMP5046
Natural Language Processing
Lecture 7: Dependency Parsing
Dr. Caren Han
Semester 1, 2021
School of Computer Science, University of Sydney

0
The course topics
What will you learn in this course?
Week 1: Introduction to Natural Language Processing (NLP)
Week 2: Word Embeddings (Word Vector for Meaning)
Week 3: Word Classification with Machine Learning I Week 4: Word Classification with Machine Learning II
NLP and Machine Learning
Week 5: Language Fundamental
Week 6: Part of Speech Tagging
Week 7: Dependency Parsing
Week 8: Language Model and Natural Language Generation
NLP Techniques
Week 9: Information Extraction: Named Entity Recognition
Advanced Topic
Week 10: Advanced NLP: Attention and Reading Comprehension
Week 11: Advanced NLP: Transformer and Machine Translation
Week 12: Advanced NLP: Pretrained Model in NLP
Week 13: Future of NLP and Exam Review

0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing

1
Linguistic Structure
Computer Language
Tokenisation
A token can be a variable or function name, an operator or a number.

1
Linguistic Structure
Parsing Computer Language
Parsing
The parser turns a list of tokens into a tree of nodes – logical order
Can we apply this in a human (natural) language?

1
Linguistic Structure
Parsing Natural Language (Human Language)
Q: Can we apply this in a human (natural) language?
A: Possible! But it is much more difficult than parsing computer language!
Why?
• No types for words
• No brackets around phrases • Ambiguity!

1
Linguistic Structure
Natural Language: Linguistic Structure
Let’s try to categorise the given words (Part of Speech Tags) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases

1
Linguistic Structure
Natural Language: Linguistic Structure
Let’s try to categorise the given words (Part of Speech Tags) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Example: a sentence includes a subject and a predicate.
The subject is noun phrase and the predicate is a verb phrases.

1
Linguistic Structure
Natural Language: Linguistic Structure
Phrase Structure Grammar = Context-free Grammar (CFG) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Example: a sentence includes a subject and a predicate.
The subject is noun phrase and the predicate is a verb phrases.

1
Linguistic Structure
Natural Language: Linguistic Structure
Phrase Structure Grammar = Context-free Grammar (CFG) The expensive computer is on the table
DTAdj NVPDTN
However, Language is more than just a “bag of words”. Grammatical rules apply to combine:
• words into phrases
• phrases into bigger phrases
Parsing
• Associating tree structures to a sentence, given a grammar
(Context Free Grammar or Dependency Grammar) will talk about this soon!

1
Linguistic Structure
Parsing Natural Language (Human Language)
Q: Can we apply this in a human (natural) language?
A: Possible! But it is much more difficult than parsing computer language!
Why?
• No types for words
• No brackets around phrases • Ambiguity!

1
Linguistic Structure
Syntactic Ambiguities
Grammars are declarative
• They don’t specify how the parse tree will be constructed
Ambiguity
1. Prepositional Phrase (PP) attachment ambiguity
2. Coordination Scope
3. Gaps
4. Particles or Prepositions
5. Gerund or adjective
There are many more ambiguities…

1
Linguistic Structure
Syntactic Ambiguities – PP attachment Ambiguity
I saw the man with the telescope
I saw the man with the telescope I saw the man with the telescope

1
Linguistic Structure
Syntactic Ambiguities – PP attachment Ambiguity Multiply
• A key parsing decision is how we ‘attach’ various constituents
• PPs, adverbial or participial phrases, infinitives, coordinations
The board approved its acquisition by Royal Trustco Ltd.
of Toronto for $27 a share at its monthly meeting

1
Linguistic Structure
Syntactic Ambiguities – Coordination Scope Ambiguity
I ate red apples and bananas

1
Linguistic Structure
Syntactic Ambiguities – Gaps
She never saw a dog and did not smile
?
?

1
Linguistic Structure
Syntactic Ambiguities – Particles or Prepositions
• Some verbs are followed by adverb particles.
(e.g. put on, take off, give away, bring up, call in)
She ran up a large bill She ran up a large hill
Difference between an adverb particle and a preposition.
• the particle is closely tied to its verb to form idiomatic expressions
• the preposition is closely tied to the noun or pronoun it modifies.

1
Linguistic Structure
Syntactic Ambiguities – Gerund or Adjective
Dancing shoes can provide nice experience
Gerund Adjective

1
Linguistic Structure
When and Where do we use Parsing?
Syntactic Analysis checks whether the generated tokens form a meaningful expression
• Humans communicate complex ideas by composing words together into bigger units to convey complex meanings
• We need to understand sentence structure in order to be able to interpret language correctly
• Grammar Checking
• Question Answering
• Machine Translation
• Information Extraction
• Text Classification
• Chatbot
… and many others

1
Linguistic Structure
Two main views of linguistic structure
Constituency Grammar (a.k.a context-free grammar, phrase structure grammar)
• Noam Chomsky (1928 – )
• Immediate constituent analysis
• Insists on classification and distribution
Dependency Grammar
• Lucien Tesniere (1893 – 1954)
• Functional dependency relations
• Emphasises the relations between syntactic units,
thus adding meaningful links (semantics)

1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word

1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word

1
Linguistic Structure
Constituency Grammar
• A basic observation about syntactic structure is that groups of words can act as single units
• Such groups of words are called constituents
• Constituents tend to have similar internal structure, and behave
similarity with respect to other units
Examples
• noun phrases (NP)
• she, the house, Robin Hood and his merry men, etc.
• verb phrases (VP)
• blushed, loves Mary, was told to sit down and be quiet, lived happily ever after
• prepositional phrases (PP)
• on it, with the telescope, through the foggy dew, etc.

1
Linguistic Structure
A sample context-free grammar
I prefer a morning flight
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively

1
Linguistic Structure
A sample context-free grammar
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively
I prefer a morning flight I, prefer, a, morning, flight
PRPVBPDTNN NN

1
Linguistic Structure
A sample context-free grammar
I, prefer, a, morning, flight
PRPVBPDTNN NN

1
Linguistic Structure
A sample context-free grammar
1. Starting unit: words are given a category (part-of-speech)
2. Combining words into phrases with categories
3. Combining phrases into bigger phrases recursively
I, Iprefer,ama,omrnoirnnginflgig,fhltight
PRP VBP DT NN NN
NP
NP VP
Nominal
S

1
Linguistic Structure
A sample context-free grammar

1
Linguistic Structure
A sample context-free grammar
S
VP
Verb NP
root (top)
leaves (bottom)
NP Pro I
prefer
Det Nom
a Nom Noun
Noun
flight
morning

1
Linguistic Structure
Treebanks
Corpora where each sentence is annotated with a parse tree
• Treebanks are generally created by
• parsing texts with an existing parser
• having human annotators correct the result
• This requires detailed annotation guidelines for annotating different
grammatical constructions
• Penn Treebank is a popular treebank for English (Wall Street Journal section)

1
Linguistic Structure
Two main views of linguistic structure
Constituency Parsing
Constituency grammars
One-to-one-or-more correspondence. For every word in a sentence, there is at least one node in the syntactic structure that corresponds to that word.
Dependency Parsing
Dependency grammars
one-to-one relation; for every word in the sentence, there is exactly one node in the syntactic structure that corresponds to that word

0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing

2
Dependency Structure Dependency Structure
Syntactic structure: lexical items linked by binary asymmetrical relations (“arrows”) called dependencies
dependencies
compound Red Hat
modifier head
Red – modifier, dependent, child, subordinate
Hat – head, governor, parent, regent
Compound – dependency relations (e.g. subject, prepositional object, etc)
*Head determines the syntactic/semantic category of the construct
*The arrows are commonly typed with the name of grammatical relations

2
Dependency Structure Dependency Parsing
Represents Lexical/syntactic dependencies between words
• A sentence is parsed by choosing for each word what other word
(including ROOT) is it a dependent of
Dependencies form a tree (connected, acyclic, single-head)
• How to make the dependencies a tree – Constraints
Only one word is a dependent of ROOT (the main predicate of a sentence) • Don’t want cycles A → B, B → A
ROOT Fra1nce w2on th3 e wo4rld cu5 p b6y bea7ting Ger8many 1234567 8

2
Dependency Structure Dependency Grammar/Parsing History
Panini’s grammar (4th century BCE)
The notion of dependencies between grammatical units
Ibn Maḍāʾ (12th century)
The first grammarian to use the term dependency in the grammatical sense
Sámuel Brassai, Franz Kern, Heimann Hariton Tiktin (1800 – 1930)
The dependency seems to have coexisted side by side with that of phrase structure
Lucien Tesnière (1959)
Was dominant approach in “East” in 20th Century (Russia, China, …) Good for free-er word order languages
David Hays (1962)
The great development surrounding dependency-based theories has come from computational linguistics

2
Dependency Structure Dependency Grammar/Parsing
Some people draw the arrows one way; some the other way!
• Tesnière had them point from head to dependent…
Usually add a fake ROOT so every word is a dependent of precisely 1 other node
Projectivity vs Non-Projectivity
• There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
• Dependencies parallel to a CFG tree must be projective
• Forming dependencies by taking 1 child of each category as head
• But dependency theory normally does allow non-projective structures to account for displaced constituents

2
Dependency Structure Dependency Grammar/Parsing
Projectivity vs Non-Projectivity
• There are no crossing dependency arcs when the words are laid out in their linear order, with all arcs above the words
• Dependencies parallel to a CFG tree must be projective
• Forming dependencies by taking 1 child of each category as head
• But dependency theory normally does allow non-projective structures to account for displaced constituents

2
Dependency Structure Dependency Relations
• The following list shows the 37 universal syntactic relations used in Universal Dependencies v2.
Universal Dependency Relations: http://universaldependencies.org/u/dep/index.html

2
Dependency Structure Dependency Relations with annotations
• The idea of dependency structure goes back a long way
• [Universal Dependencies: http://universaldependencies.org/ ;
• cf. Marcus et al. 1993, The Penn Treebank, Computational Linguistics]
• Starting off, building a treebank seems a lot slower and less useful than building a grammar
• But a treebank gives us many things
• Reusability of the labor
• Many parsers, part-of-speech taggers, etc. can be built on it
• Valuable resource for linguistics
• Broad coverage, not just a few intuitions
• Frequencies and distributional information
• A way to evaluate systems

Dependency Structure Dependency Parsing
Exercise – Let’s do it together!
– Simpler to parse than context-free grammars
I prefer a morning flight
ROOT
2
ROOT
Unionised workers are usually better paid than their non-union counterparts

0
LECTURE PLAN Lecture 7: Parsing
1. Linguistic Structure
2. Dependency Structure
3. Dependency Parsing Algorithms
4. Transition-based Dependency Parsing
5. Deep Learning-based Dependency Parsing

3
Dependency Parsing Approaches Methods of Dependency Parsing
• Dynamic programming
Extension of the CYK algorithm to dependency parsing (Eisner, 1996)
• Constraint Satisfaction (Karlsson, 1990)
word(pos(x)) = DET→(label(X)=NMOD, word(mod(x))=NN, pos(x) < mod(x)) A determiner (DET) modifies a noun (NN) on the right with the label NMOD. • Graph-based Dependency Parsing Create a Minimum Spanning Tree for a sentence McDonald et al.’s (2005) MSTParser scores dependencies independently using an Machine Learning classifier • Transition-based Dependency Parsing (Nivre 2008) • Neural Dependency Parsing CYK: Cocke–Younger–Kasami algorithm 3 Dependency Parsing Approaches Graph-based dependency parsers MST Parser (McDonald et al., 2005) • Projectivity • English dependency trees are mostly projective (can be drawn without crossing dependencies) • Other languages are not • Idea • Dependency parsing is equivalent to search for a maximum spanning tree in a directed graph • Chu and Liu (1965) and Edmonds (1967) give an efficient algorithm for finding MST for directed graphs 3 Dependency Parsing Approaches Graph-based dependency parsers • Compute a score for every possible dependency for each edge 0.5 0.8 0.3 2.0 ROOT The big cat sat e.g., picking the head for “big” 3 Dependency Parsing Approaches Graph-based dependency parsers • Compute a score for every possible dependency for each edge • Then add an edge from each word to its highest-scoring candidate head • And repeat the same process for each other word 0.5 0.8 0.3 2.0 ROOT The big cat sat e.g., picking the head for “big” 3 Dependency Parsing Approaches Graph-based dependency parsers • Consider the sentence “Lisa saw Bart” Root 9 10 saw 11 9 20 Lisa 30 Bart Root Lisa saw Bart 30 3 0 3 Dependency Parsing Approaches Methods of Dependency Parsing Graph-based Dependency Parsing • Non-deterministic dependency parsing • Build a complete graph with directed/weighted edges • Find the highest scoring tree from a complete dependency graph greedy algorithm Transition-based Dependency Parsing • Deterministic dependency parsing • Build a tree by applying a sequence of transition actions • Find the highest scoring action sequence that builds a legal tree 0 LECTURE PLAN Lecture 7: Parsing 1. Linguistic Structure 2. Dependency Structure 3. Dependency Parsing Algorithms 4. Transition-based Dependency Parsing 5. Deep Learning-based Dependency Parsing 4 Transition-based Parsing Greedy transition-based parsing (Nivre 2008) • A simple form of greedy discriminative dependency parser • Transition: an operation that searches for a dependency relation between each pair of words (e.g. Left-Arc, Shift, etc.) • Design a dumber but really fast algorithm and let the machine learning(deep learning) do the rest. • Eisner’s algorithm (Dynamic Programming-based Dependency Parsing) searches over many different dependency trees at the same time. • A transition-based dependency parser only builds one tree, in one left-to- right sweep over the input 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm • A sequence of bottom up actions • Roughly like “shift” or “reduce” in a shift-reduce parser, but the “reduce” actions are specialized to create dependencies with head on left or right • It is implemented in most practical transition- based dependency parsers, including MaltParser. The arc-standard algorithm is a simple algorithm for transition-based dependency parsing. 4 Transition-based Parsing Transition-based parsing A sentence Input buffer 𝑤1 𝑤2 𝑤3 ... 𝑤𝑛 Stack buffer Dependency Relations s1 s2 s3 s4 s5 ... sn Parser (Classifier) “predict” the optimal (best) transition given the current configuration With machine learning 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Buffer Dependency Graph 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book me a morning flight • Initial configuration: • All words are in the buffer. • The stack is empty or starts with the ROOT symbol • The dependency graph is empty. 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me a morning flight 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack ROOT cannot have incoming arc Buffer ROOT Dependency Graph book me a morning flight Left Arc and Right Arc require 2 elements in stack to be applied • Possible Transition Shift Push the next word in buffer onto the stack • • Left-Arc Add an arc from the topmost word to • the 2nd-topmost word on the stack Right-Arc Add an arc from the 2nd-topmost word to the topmost word on the stack Remove 2nd word from stack • Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book me a morning flight Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me a morning flight 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me a morning flight 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book me a morning flight Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book a morning flight book me 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book a morning flight book me 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book a morning flight Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me morning flight • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book a flight Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me a morning flight • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT book flight Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book me a morning flight • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book book me a morning flight 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer • • • • • Add an arc from the 2nd-topmost word to the topmost word on the stack Remove the topmost word from stack ROOT ROOT Possible Transition Shift Push the next word in buffer onto the stack Left-Arc Add an arc from the topmost word to the 2nd-topmost word on the stack Remove 2nd word from stack Right-Arc book book me a morning flight 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Stack Dependency Graph Buffer ROOT ROOT book me • Terminal configuration: • The buffer is empty. • The stack contains a single word. a morning flight 4 Transition-based Parsing Transition-based parsing 4 Transition-based Parsing Transition-based parsing – The arc-standard algorithm Start: σ = [ROOT], β = w1, ..., wn , A = ∅ 1. Shift σ, wi|β, A→σ|wi, β, A 2. Left-Arcr σ|wi|wj, β, A→σ|wj, β, A∪{r(wj,wi)} 3. Right-Arcr σ|wi|wj, β, A → σ|wi, β, A∪{r(wi,wj)} Finish: σ = [w], β = ∅ How to choose the next action? 4 Transition-based Parsing Transition-based parsing A sentence Input buffer 𝑤1 𝑤2 𝑤3 ... 𝑤𝑛 Stack buffer Dependency Relations s1 s2 s3 s4 s5 ... sn Parser (Classifier) “predict” the optimal (best) transition given the current configuration With machine learning 4 Transition-based Parsing How to choose the next action? Stand back, You know machine learning! Goal: Predict the next transition (class), given the current configuration. • We let the parser run on gold-standard trees. • Every time there is a choice to make, we simply look into the tree and do ‘the right thing’. • We collect all (configuration, transition) pairs and train a classifier on them. • When parsing unseen sentences, we use the trained classifier as a guide. What if the number of pairs is far too large? 4 Transition-based Parsing Feature Representation • Define a set of features of configurations that you consider to be relevant for the task of predicting the next transition. Example: word forms of the topmost two words on the stack and the next two words in the buffer • Describe every configuration in terms of a feature vector. • In practical systems, we have thousands of features and hundreds of transitions. • There are several machine-learning paradigms that can be used to train a guide for such a task • Examples: perceptron, decision trees, support-vector machines, memory-based learning 4 Transition-based Parsing Evaluation of Dependency Parsing 𝑨𝒄𝒄𝒖𝒓𝒂𝒄𝒚 = # 𝒄𝒐𝒓𝒓𝒆𝒄𝒕 𝒅𝒆𝒑𝒔 # 𝒐𝒇 𝒅𝒆𝒑𝒔 Unlabeled attachment score (UAS) = head Labeled attachment score (LAS) = head and label 4 Transition-based Parsing Evaluation of Dependency Parsing ROOT She saw the video lecture 012345 Gold Standard Parsed (assume this is what you classified) 1 2 she nsubj 2 0 saw root 3 5 the det 4 5 video nn 5 2 lecture obj 1 2 she nsubj 2 0 saw root 3 4 the det 4 5 video nsubj 5 2 lecture ccomp 4 Transition-based Parsing Evaluation of Dependency Parsing ROOT She saw the video lecture 012345 Gold Standard Parsed (assume this is what you classified) 1 2 she nsubj 2 0 saw root 3 5 the det 4 5 video nn 5 2 lecture obj 1 2 she nsubj 2 0 saw root 3 4 the det 4 5 video nsubj 5 2 lecture ccomp Unlabeled attachment score (UAS) = Labeled attachment score (LAS) = 4 / 5= 80% 2 / 5= 40% 0 LECTURE PLAN Lecture 7: Parsing 1. Linguistic Structure 2. Dependency Structure 3. Dependency Parsing Algorithms 4. Transition-based Dependency Parsing 5. Deep Learning-based Dependency Parsing 5 Deep Learning-based Parsing Distributed Representations • Represent each word as a d-dimensional dense vector (i.e., word embedding) • Similar words are expected to have close vectors. • NNS (plural noun) should be close to NN (singular noun). • Meanwhile, part-of-speech tags (POS) and dependency labels are also represented as d-dimensional vectors. • The smaller discrete sets also exhibit many semantical similarities 5 Deep Learning-based Parsing Neural Dependency Parsing (Chen & Manning, 2014) 5 Deep Learning-based Parsing Neural Dependency Parsing Accuracy and parsing speed Accuracy and parsing speed on CTB on PTB + Stanford dependencies. Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740-750). • PTB: English Penn Treebank • CTB: Chinese Penn Treebank 5 Deep Learning-based Parsing Dependency Parsing Trends – Penn Treebank https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank 5 Deep Learning-based Parsing Semi-Supervised Sequence Modeling with Cross-View Training (Clark, 2018) https://arxiv.org/pdf/1809.08370v1.pdf 5 Deep Learning-based Parsing Dependency Parsing Trends – Penn Treebank https://paperswithcode.com/sota/dependency-parsing-on-penn-treebank / Final Exercise VB PRP Thank you! VBP DT JJ NN Have a great day! / Reference Reference for this lecture • Deng, L., & Liu, Y. (Eds.). (2018). Deep Learning in Natural Language Processing. Springer. • Rao, D., & McMahan, B. (2019). Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning. " O'Reilly Media, Inc.". • Manning, C. D., Manning, C. D., & Schütze, H. (1999). Foundations of statistical natural language processing. MIT press. • Nivri, J (2016). Transition-based dependency parsing, lecture notes, Uppsala Universitet • Manning, C 2017, Natural Language Processing with Deep Learning, lecture notes, Stanford University • Chen, D., & Manning, C. (2014). A fast and accurate dependency parser using neural networks. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 740- 750). • Eisner, J. M. (1996, August). Three new probabilistic models for dependency parsing: An exploration. In Proceedings of the 16th conference on Computational linguistics-Volume 1 (pp. 340-345). Association for Computational Linguistics.