程序代写代做代考 algorithm graph Structured predictions: trees and graphs

Structured predictions: trees and graphs
I Syntactic parsing
I Phrase structure (or constituent) trees
I Dependency trees
I Graph-based semantic parsing
I Abstract Meaning Representations
I Needs a grammar that dictates admissible or non-admissible
trees for a sentence
I Requires an ecient decoding algorithm (e.g., CKY), like sequence labeling
I Same statistical or neural models extended to parsing, e.g., Na ̈ıve Bayes, Perceptron, Neural networks.

Syntactic Parsing
I Syntactic parsing is the task of assigning a syntactic structure (typically in the form of a tree) to a sentence
I The grammar formalism dictates what a tree looks like syntactically
I The main technical challenge is ambiguity (like many other NLP problems), the fact that multiple trees are possible for a sentence given a grammar
I The general solution to the problem is to build a model that assigns a “score” to each possible tree, and using a decoding (search) algorithm to find the best tree (or top-k trees)
I Syntactic parsing is one of the core problems in NLP, and syntactic structures are useful for a wide range of NLP applications

Two broad schools of thought on the syntactic structure
I Phrase structures
I The building blocks of phrase structures are “phrases” that are I consecutive groups of words that make sense linguistically
I Each phrase receive a syntactic category (e.g, “NP”, “VP”)
The phrases are hierarchically organized into a tree structure I Dependency structures
I The fundamental building blocks of a dependency structure are relations between words in a sentence that may or may not be
I consecutive (They are non-consecutive more often than not) I Dependency relations are thus naturally lexicalized
I The relations are hierarchically organized
The syntactic relations between the words are typically labeled
I Each category of syntactic structures also have di↵erent flavors

Phrase-structure parsing based on context-free grammars

Context-free grammars
Context-free languages are specified by context-free grammars, which are tuples of (N,⌃,R,S) consisting of:
I a finite set of non-terminals N;
I a finite alphabet ⌃ of terminal symbols;
I a set of production rules R, each of the form A ! , where A 2 N and 2 (⌃ [ N)⇤
I a designated start symbol S

Example CFG: mathematical operations
S !S OP S|Num
OP ! +| | ⇥ |÷
Num ! Num Digit|Digit Digit ! 0|1|2| · · · |9

Is natural language context-free?
I Context-free grammars can be used to represent syntax, which is the set of rules that determine whether an utterance is judged to be grammatical.
I If this representation were perfectly faithful, then a natural language such as English could be transformed into a formal language, consisting of exactly the (infinite) set of strings that would be judged to be grammatical by a fluent English speaker.
I Contemporary theories generally do not consider natural languages to be context-free, yet context-free grammars are widely used in natural language parsing.
I The reason is that context-free representations strike a good balance: they cover a broad range of syntactic phenomena, and they can be parsed eciently.

A phrase-structure grammar
A phrase-structure grammar is one in which sentences are broken down into constituents (phrases), which are contiguous sequences of words that function as coherent units for the purpose of linguistic analysis.
Phrases are labeled with a type that are determined by their heads: noun phrases (NP), verb phrases (VP), etc.
S ! NP VP
NP ! NP PP |we|sushi|chopsticks NP ! NP CC NP
VP ! V NP |VP PP
PP ! IN NP
V ! eat
CC ! and

Chomsky Normal Form
I In Chomsky Normal Form (CNF), the right-hand side of every production includes either two non-terminals, or a single terminal symbol.
I CNF can be parsed eciently in cubic time.
I But some CFG productions are not naturally CNF, e.g.,
NP ! NP CC NP.
I The general practice in syntactic parsing is to convert CFGs to CNF during training and decoding.

Ambiguity of di↵erent types
Even with a grammar, multiple parses (trees) for a sentence are often possible due to ambiguity:
I Attachment ambiguity: e.g., We eat sushi with chopsticks, I shot an elephant in my pajamas.
I Modifier scope: e.g., southern food store, plastic cup holder
I Particle vs preposition: e.g., The puppy tore up the staircase.
I Complement structure: e.g., The students complained to the professor that they didn’t understand.
I Coordination scope: e.g., “I see,” said the blind man, as he picked up the hammer and saw.

Spurious ambiguity
In practice, a grammar often allows parses that do not correspond with human intuition at all. Such cases are called spurious
Ambiguity in syntactic parsing
ambiguity
NP
S
MD VB
VP
PP
NP CC NP
VP
PP
IN
NP
The post office will hold out discounts and service concessions as incentives
43

Weighted context-free grammars
The major challenge in syntactic parsing is not just to find a possible tree for a sentence, but to find the best tree among all possible trees. One way to do that is to have weighted grammar so we can weight the trees…
S ! NP VP -3 NP ! NNP -1
S ! NP VP
NP ! NNP
NP ! NP CC NP NP ! NP1 NP -0.5
NP ! ADJP NP NP ! NNS
NP ! NN
NP ! NP PP VP ! VBZ NP VP ! VBD NP VP ! VP PP PP ! IN NP ADJP ! JJ NNP ! John
NP1 ! NP CC -0.5 NP ! ADJP NP -0.5 NP ! NNS -1 NP ! NN -1 NP ! NP PP -0.3 VP ! VBZ NP -0.5 VP ! VBD NP -0.5 VP ! VP PP -0.2 PP ! IN NP 0 ADJP ! JJ 0 NNP ! John 0

Probabilistic context-free grammars
And one special form of weighted context-free grammars is probabilistic context free grammars where the sum of all rules with the same left hand side is 1:
S ! NP VP
NP ! NNP
NP ! NP CC NP
NP ! ADJP NP NP ! NNS
NP ! NN
NP ! NP PP VP ! VBZ NP VP ! VBD NP VP ! VP PP PP ! IN NP ADJP ! JJ NNP ! John NNS ! apples NNS ! oranges NNS ! noodles NN ! gravy
JJ ! green IN ! with VBZ ! likes
S ! NP VP 1.0 NP ! NNP 0.1 NP ! NP1 NP 0.2 NP1 ! NP CC 1.0 NP ! ADJP NP 0.2 NP ! NNS 0.1 NP ! NN 0.1 NP ! NP PP 0.3 VP ! VBZ NP 0.2 VP ! VBD NP 0.3 VP ! VP PP 0.5 PP ! IN NP 1.0 ADJP ! JJ 1.0 NNP ! John 1.0 NNS ! apples 0.4 NNS ! oranges 0.4 NNS ! noodles 0.2 NN ! gravy 1.0 JJ ! green 1.0 IN ! with 1.0 VBZ ! likes 1.0
VBD ! ate VBD ! ate 1.0

The probability model of a phrase structure tree
Independence assumptions:
I Place invariance: Location of a subtree does not impact the probability of the subtree
I Context Free-ness: Probability of a subtree not impacted by words outside of the subtree
I Ancestor Free-ness: Probability of a subtree does not depend on non-terminals outside of the subtree
Given these assumptions, the probability of a tree (the joint probability of the tree and the input sentence) is:
Ym i=1
The decoding process is to find the tree that has the highest probability:
Tˆ(s) = argmax P(T ) T s.t.s=yield(T )
P(T ) =
P(RHSi |LHSi )

Probability of an example tree
P(T1) = 1.0⇥0.1⇥1.0⇥0.3⇥ P(T2) = 1.0⇥0.1⇥1.0⇥0.5⇥ 1.0⇥0.3⇥0.1⇥0.2⇥1.0⇥1.0⇥ 0.3⇥1.0⇥0.1⇥0.2⇥1.0⇥1.0⇥ 0.1 ⇥ 1.0 = 0.000018 0.1 ⇥ 1.0 = 0.00003

Probability of an example tree

Parsing with PCFGs: the CKY algorithm
1: 2: 3: 4:
5: 6: 7:
8:
9: 10: 11:
12:
procedure WCKY(w,G = (N,⌃,R,S)) for all i,j,X do
t[i,j,X] 0 b[i,j,X] ;
for m 2 {1,2,··· ,M} do for all X 2 N do
t[m,m+1,X]
(X !wm,(m,m+1,m))
for ` 2 {2,3,··· ,M} do
for m 2 {0, 1, · · · , M `} do
for k 2 {m + 1, m + 2, · · · , m + ` 1} do t[m,m+`,X] maxk,Y,Z (X ! Y Z,(m,m+
`, k )) + t [m, k , Y ] + t [k , m + `, Z ]
b[m,m + `,X] argmaxk,Y,Z (X !
Y Z,(m,m+`,k))+t[m,k,Y]+t[k,m+`,Z] return TraceBack(S,0,M,b)

CKY example
0
1
2 3
4
12345
John
eats
NNP:1:0 NP:0.1
VBD:1.0
noodles
NNS:0.2 NP: 0.02
with
IN:1.0
gravy
NN:1.0 NP:0.1

CKY example
0
1
2 3
4
12345
John
eats
NP:0.1 NNP:1:0

VBD:1.0
noodles
VP:0.006
NP: 0.02 NNS:0.2

with
IN:1.0
PP:0.1
gravy
NP:0.1 NN:1.0

CKY example
0
1
2 3
4
12345
John
eats
NP:0.1 NNP:1:0

S:0.0006
VBD:1.0
VP:0.006
noodles
NP: 0.02 NNS:0.2

IN:1.0
NP:0.0006
with
PP:0.1
gravy
NP:0.1 NN:1.0

CKY example
0
1
2 3
4
12345
John
NP:0.1 NNP:1:0
eats
– VBD:1.0
noodles
S:0.0006 –
VP:0.006 –
NP: 0.02 – NNS:0.2
with
gravy
NP:0.1 NN:1.0
IN:1.0 PP:0.1
VP:0.0003 VP:0.00018
NP:0.0006

CKY example
0
1
2 3
4
12345
John
NP:0.1 NNP:1:0
eats
– VBD:1.0
noodles
S:0.0006
VP:0.006 –
NP: 0.02 – NNS:0.2
S:0.00003
VP:0.0003 VP:0.00018
NP:0.0006
with
gravy
NP:0.1 NN:1.0
IN:1.0 PP:0.1

Learning PCFGs
Parameters in probabilistic context-free grammars can be estimated by relative frequency, as with HMMs:
(X !↵)=logP(X !↵) Pˆ(X ! ↵) = count(X ! ↵
count (X )
E.g., the probability of the production NP ! DET NN is the corpus count of this production, divided by the count of the non-terminal NP. This applies to terminals as well.

Grammar Refinement
I Grammars extracted from treebanks (e.g., the Penn TreeBank) are often sensitive to ambiguities in the parses, even with the weighted productions
I There are various attempt to augment with the vanilla PCFG with more expressive productions
I Parent annotation I Lexicalization

Parent annotation

Lexicalized CFGs