程序代写代做代考 algorithm B tree Parsing

Parsing

Lizhen Qu

Overview of the NLP Lectures

•  Introduction to natural language processing (NLP).

•  Regular expressions, sentence splitting, tokenization,
part-of-speech tagging.

•  Language models.

•  Vector semantics.

•  Parsing.
–  Dependency parsing.

•  Semantics.

2

Relation Extraction

•  Find worksFor(entity_a, entity_b) relation from text.

3

Mark has worked for Telstra.

Per Org

John works for Facebook.

Christina asks why it is better to work at Facebook than Google.

training data

Per Org

Per Org Org

worksFor

Mark Zuckerberg said he would have worked for Microsoft.
Per Org

Use Shortest Paths between Entity Mentions

4

<-nsubj-prep->pobj->

<-nsubj-ccomp->prep->pobj->

<-nsubj-prep->pobj->

<-nsubj-advcl->prep->pobj-> (Christina, Google)

(John, Facebook)

(Mark, Telstra)

(Zuckerberg, Microsoft)

Dependency Grammar

•  Syntactic structure consists of lexical items,
linked by binary asymmetric relations called
dependencies.

•  head dependent
•  head (governor) : grammatically most important.
•  dependent (modifier): modifier, object, or complement.

5

Dependency Trees

•  Without labels.

•  With labels.

6

Dependency Parsing

•  Formal definition for unlabeled dependency trees:

•  Dependency parsing: task of mapping an input string to a
dependency graph satisfying certain conditions.

7

Dependency graph D = (V,E) where

• V is the set of nodes (words in a input sequence).

• E is the set of arcs indicating grammatical relations.

• vi ! vj or (vi, vj) 2 E denotes an arc from head vi to dependent vj .

Projective Dependency Tree

8

Ram saw a dog yesterday which was a Yorkshire Terrier

Non-Projective Dependency Tree

9

Ram saw a dog yesterday which was a Yorkshire Terrier

Crossing lines!

English has very few non-projective cases.

Well-Formedness

•  A dependency graph is well-formed iff

–  Single head: Each word has only one head.

–  Acyclic: The graph should be acyclic.

–  Connected: There is a path between any pairs of nodes.

–  Projective: iif an edge from word A to word B implies that
there exists a directed path in the graph from A to every
word between A and B in the sentences.

10

Parsing Algorithms

•  Graph-based parsing
–  CYK, Eisner, McDonald

•  Transition-based parsing
–  Covington, Yamada and Matsumuto, Nivre etc.

11

Nivre’s Algorithm (Arc-eager) [3]

•  Transition-based.

• 

• 

12

Parser configuration < S, I,A >:

• S is the stack. S[0] is the topmost node.

• I is the list of remaining input words. I[0] is the leftmost word.

• A is the set of current dependencies (arcs) for the dependency graph.

INPUT: a word sequence v = v1|…|vn, a set of rules R.

Left-Arc (LA) < vi|S, vj |I, A >)< S, vj |I, A [ {(vj , vi)} vi vj 2 R @vk(vk, vi) 2 A Right-Arc (RA) < vi|S, vj |I, A >)< vj |vi|S, I, A [ {(vi, vj)} vi ! vj 2 R @vk(vk, vj) 2 A Reduce (R) < vi|S, I, A >)< S, I,A > 9(vk, vi) 2 A

Shift (S) < S, vj |I, A >)< vj |S, I, A >

Parser Transitions

13

single head

Parsing Details

•  Slight modifications:
–  Each dependency graph has an artificial root in order to

form a tree.
–  Parsing starts with an initial configuration and

terminates when it reaches through a
sequence of transitions.

•  Nondeterministic transitions?
–  Priority ordering of transitions.

–  Guided parsing.

14

LA > RA >
if S[0] can be a transitive head of I[0], then Shift, otherwise Reduce.

< [ROOT ],n, ; >
< [ROOT ],nil, A >

Grammatical Rules for the Example

15

Noun ! Adj

ROOT ! V erb

Noun ! Det

V erb ! Prep

V erb ! Noun

figure ! on

on ! screen

Example

16

Red figures on the screen indicated falling stocks ROOT
I

ROOT

Example

17

Red figures on the screen indicated falling stocks ROOT
I

Shift

red

ROOT

Example

18

Red figures on the screen indicated falling stocks ROOT
I

Left-arc

ROOT

Example

19

Red figures on the screen indicated falling stocks ROOT
I

Shift

figures

ROOT

Example

20

Red figures on the screen indicated falling stocks ROOT
I

Right-arc

on
figures

ROOT

Example

21

Red figures on the screen indicated falling stocks ROOT
I

Shift

the
on

figures

ROOT

Example

22

Red figures on the screen indicated falling stocks ROOT
I

Left-arc

on
figures

ROOT

Example

23

Red figures on the screen indicated falling stocks ROOT
I

Right-arc

screen
on

figures

ROOT

Example

24

Red figures on the screen indicated falling stocks ROOT
I

Reduce

on
figures

ROOT

Example

25

Red figures on the screen indicated falling stocks ROOT
I

Reduce

figures

ROOT

Example

26

Red figures on the screen indicated falling stocks ROOT
I

Left-arc

ROOT

Example

27

Red figures on the screen indicated falling stocks ROOT
I

Right-arc

indicated

ROOT

Example

28

Red figures on the screen indicated falling stocks ROOT
I

Shift

falling
indicated

ROOT

Example

29

Red figures on the screen indicated falling stocks ROOT
I

Left-arc

indicated

ROOT

Example

30

Red figures on the screen indicated falling stocks ROOT
I

Right-arc

stocks
indicated

ROOT

Example

31

Red figures on the screen indicated falling stocks
I

Reduce

_ROOT_

indicated

ROOT

Example

32

Red figures on the screen indicated falling stocks
I

Reduce

ROOT

ROOT

Configurations of the Example

33


S
LA
S RA
S LA
RA
R LA
RA
S
LA
RA
R
R

;
;

Properties of Nivre’s Algorithm

•  O(n) : Linear time complexity.

•  Full dependency graphs are well-formed.

34

Dependency Corpora

•  CoNLL dependencies.
–  http://www.aclweb.org/anthology/D07-1096

•  Stanford typed dependencies.
–  http://nlp.stanford.edu/software/dependencies_manual.pdf

•  Universal dependencies.
–  http://universaldependencies.org/u/overview/syntax.html

35

Guided Parsing [6]

36

•  Train a classifier to predict parse transitions!
– 
–  A is a set of typed dependencies (arcs).

•  Feature space:

36

f : configuration→ {LA,RA,R,S}× (A∪{nil})

Arc Standard

•  Three parse actions.

•  Neural networks for action prediction [9].

37

Softmax

hidden layers

Embedding look-up table

Parser configurations

Left-Arc (LA) < vi|vj |S, I, A >)< vi|S, I, A [ {(vi, vj)} Right-Arc (RA) < vi|vj |S, I, A >)< vj |S, I, A [ {(vj , vi)} Shift (S) < S, vj |I, A >)< vj |S, I, A >

Off-the-Shelf Dependency Parsers

•  MaltParser (http://www.maltparser.org/)

•  SyntaxNet (https://github.com/tensorflow/models/tree/master/research/syntaxnet )

•  Stanford parser (http://nlp.stanford.edu/software/lex-parser.shtml)

•  TurboParser (http://www.cs.cmu.edu/~ark/TurboParser/)

38

Overview of the NLP Lectures

•  Introduction to natural language processing (NLP).

•  Regular expressions, sentence splitting, tokenization,
part-of-speech tagging.

•  Language models.

•  Vector semantics.

•  Parsing.
–  Dependency parsing.
–  Constituency parsing.

•  Compositional semantics and NLP applications.
39

Constituency Parsing

•  Deeper understanding of word groups and their
grammatical relationships.

40

S

NP VP

AT NNS VBD NP

AT NN the children ate

the cake

constituent parse tree

non-terminal
(clusters or
generalization of
symbols)

terminal
(symbols)

Constituency

•  Constituent: a word or a group of words that
behaves as a single unit.

•  Why do these words group together?
–  Appear in similar syntactic environments.

–  Preposed or postposed construction.

41

three parties from Sydney arrive …
Drunk driver fled …
they sit …

from arrive …
the fled …
as sit …

On August 30th, I’d like to fly from Canberra to Sydney.
I’d like to fly on August 30th from Canberra to Sydney.
I’d like to fly from Canberra to Sydney on August 30th.

Context-Free Grammars (CFGs)

•  A context free grammar consists of
–  a set of context-free rules, each of which expresses the

ways that symbols of the language can be grouped and
ordered together.

–  a lexicon of words and symbols.

42

NP Det Nominal

Nominal Noun | Nominal Noun

bus
stop
the
.
a

Nominal Noun

Derivations

•  The sequence of rule expansions is called a
derivation of the string of words.
–  parse tree.
–  bracketed notation.

43

the bus stop

Noun Noun Det

Nominal

NP

Nominal

[NP [Det the][Nom [Nom [Noun bus]][Noun stop]]

A Toy Example

44

NP Det Nominal
Nominal Noun | Nominal Noun

Det the | a | an

Noun bus
Noun stop

the bus stop

Nominal Noun

A Toy Example

45

NP Det Nominal
Nominal Noun | Nominal Noun

Det the | a | an

Noun bus
Noun stop

the bus stop

Noun Noun Det

Nominal

Nominal Noun

A Toy Example

46

NP Det Nominal
Nominal Noun | Nominal Noun

Det the | a | an

Noun bus
Noun stop

the bus stop

Noun Noun Det

Nominal

Nominal

Nominal Noun

A Toy Example

47

NP Det Nominal
Nominal Noun | Nominal Noun

Det the | a | an

Noun bus
Noun stop

the bus stop

Noun Noun Det

Nominal

NP

Nominal

Nominal Noun

Formal Definition of CFG

•  A context-free grammar .

–  is a set of non-terminals.
–  is a set of terminal symbols, .
–  is a set of rules (productions), each of the form ,

where is a non-terminal, is a string of symbols from
the infinite set of strings .

–  is a designated start symbol.

48

G = (N,⌃, R, S)

N


R

N \ ⌃ = ;

S

A ! B
A B

{⌃ [N}⇤

A Toy Example

49

NP Det Nominal
Nominal Noun | Nominal Noun

Det the | a | an

Noun bus
Noun stop

the bus stop

Noun Noun Det

Nominal

NP

Nominal

Nominal Noun

S

S NP

Ambiguity of Parsing

50

Probabilistic context-free grammar (PCFG)

•  A parameter to each grammar rule [3].

51

rule parameter

find the most likely parse tree. T is
set of all possible trees.

pG(t) =
nY

i=1

q(↵ ! �)

arg max

t2TG
pG(t)

Learning PCFG from Treebanks

•  Penn treebank and English Web treebank.

52

q⇤(↵ ! �) =
Count(↵ ! �)

Count(↵)
Maximum-Likelihood
estimation:

Top Down Parsing

53

book that flight

arg max

t2TG
pG(t)

Bottom Up Parsing

54

arg max

t2TG
pG(t)

Grammar Equivalence

•  Two grammars are equivalent if they generate the
same language (set of strings) .

•  Chomsky Normal Form (CNF).
–  Allow only two types of rules. The right-hand side of each

rule either has two non-terminals or one terminal,
except .

55

unit production

S ! ”

A ! B a D
C ! ”
E ! A
where A,B,C,D,E 2 N and a 2 ⌃

Grammar Equivalence

•  Two grammars are equivalent if they generate the
same language (set of strings) .

•  Chomsky Normal Form (CNF).
–  Allow only two types of rules. The right-hand side of each

rule either has two non-terminals or one terminal,
except .

–  Every context-free grammar can be transformed into an
equivalent one in CNF. 56

S ! ”

G ! E D
F ! B G
E ! a
where B,D,E, F,G 2 N and a 2 ⌃

Dependency Structures vs. Phrase Structures

•  Dependency structures explicitly represent
–  Head-dependent relations (directed arcs).
–  Functional categories (arc labels).
–  predicate-argument structure.

•  Dependency structure independent of word order.
–  Suitable for free word order languages, such as Indian

languages.

•  Phrase structures explicitly represent
–  Phrases (non-terminal nodes).
–  Structural categories (non-terminal labels).
–  Fragments are directly interpretable.

57

Available Constituency Parsers

•  Stanford parser.
–  http://nlp.stanford.edu/software/srparser.shtml

•  Charniak-Johnson parser.
–  http://web.science.mq.edu.au/~mjohnson/Software.htm

•  Charniak parser.
–  ftp://ftp.cs.brown.edu/pub/nlparser/

58

References
•  [1] Speech and Language Processing (Chapter 12 & 13)
•  [2] Lectures Notes on CFGs and CKY Parsing.

–  http://web.mit.edu/6.863/www/fall2012/lectures/lecture7-notes.pdf
•  [3] http://www.cs.columbia.edu/~mcollins/courses/nlp2011/notes/pcfgs.pdf
•  [4] Richard Socher, Christopher D. Manning, Andrew Y. Ng. Learning

Continuous Phrase Representations and Syntactic Parsing with Recursive
Neural Networks.

•  [5] An Efficient Algorithm for Projective Dependency Parsing (We modify it
to allow an artificial tree root.)

•  [6] Joakim Nivre, Johan Hall, Jens Nilsson. Memory-Based Dependency
Parsing.

•  [7] Exploiting Background Knowledge for Relation Extraction.
–  http://www.aclweb.org/anthology/C10-1018

•  [8] https://web.stanford.edu/~jurafsky/slp3/14.pdf
•  [9] https://cs.stanford.edu/~danqi/papers/emnlp2014.pdf

59