程序代写代做代考 algorithm C Statistical Parsing Using Treebanks

Statistical Parsing Using Treebanks
ANLP: Week 6, Unit 1
Shay Cohen
Based on slides from ANLP 2019
Last class
􏰀 Recursive Descent Parsing 􏰀 Shift-Reduce Parsing
􏰀 CYK:
For j > i + 1:
j−1
Chart[A,i,j]= 􏰆 􏰆 Chart[B,i,k]∧Chart[C,k,j]
k=i+1 A→B C Seed the chart, for i +1 = j:
Chart[A, i, i + 1] = True if there exists a rule A → wi+1 where wi +1 is the (i + 1)th word in the string
How big a problem is disambiguation?
􏰀 Early work in computational linguistics tried to develop broad-coverage hand-written grammars.
􏰀 That is, grammars that include all sentences humans would 􏰀 judge as grammatical in their language;
while excluding all other sentences.
􏰀 As coverage grows, sentences can have hundreds or thousands of parses. Very difficult to write heuristic rules for disambiguation.
􏰀 Plus, grammar is hard to keep track of! Trying to fix one problem can introduce others.
􏰀 Enter the treebank grammar.
Towards probabilistic parsing
􏰀 We’ve seen various parsing algorithms, including one that parses exhaustively in polynomial time (CKY).
􏰀 But we haven’t discussed how to choose which of many possible parses is the right one.
􏰀 The obvious solution: probabilities.

Today’s lecture
􏰀 What is a treebank and how do we use it to get a probabilistic grammar?
􏰀 How do we use probabilities in parsing?
􏰀 What are some problems that probabilistic CFGs help solve?
􏰀 What are some remaining weaknesses of simple PCFGs, and what are some ways to address them?
Treebank grammars
􏰀 The big idea: instead of paying linguists to write a grammar, pay them to annotate real sentences with parse trees.
􏰀 This way, we implicitly get a grammar (for CFG: read the rules off the trees).
􏰀 And we get probabilities for those rules (using any of our favorite estimation techniques).
􏰀 We can use these probabilities to improve disambiguation and even speed up parsing.
Example: The Penn Treebank
􏰀 The first large-scale parse annotation project, begun in 1989.
􏰀 Original corpus of syntactic parses: Wall Street Journal text
􏰀 About 40,000 annotated sentences (1m words) 􏰀 Standard phrasal categories like S, NP, VP, PP.
􏰀 Now many other data sets (e.g. transcribed speech), and different kinds of annotation; also inspired treebanks in many other languages.
Treebank grammars
For example, if we have this tree in our corpus: S
Then we add rules
S → NP VP NP → Pro Pro → i
NP
Pro Vt
VP
i saw Det N
the man
Vt → saw NP→Det N Art → the N → man
NP VP → Vt NP
With more trees, we can start to count rules and estimate their probabilities.

Other language treebanks
􏰀 Many annotated with dependency grammar rather than CFG (see next lecture).
􏰀 Some require paid licenses, others are free. 􏰀 Just a few examples:
􏰀 Danish Dependency Treebank 􏰀 Alpino Treebank (Dutch)
􏰀 Bosque Treebank (Portuguese) 􏰀 Talbanken (Swedish)
􏰀 Prague Dependency Treebank (Czech)
􏰀 TIGER corpus, Tuebingen Treebank, NEGRA corpus (German) 􏰀 Penn Chinese Treebank
􏰀 Penn Arabic Treebank
􏰀 Tuebingen Treebank of Spoken Japanese, Kyoto Text Corpus
Creating a treebank PCFG
A probabilistic context-free grammar (PCFG) is a CFG where each rule A → α (where α is a symbol sequence) is assigned a probability P(α|A).
The sum over all expansions of A must equal 1: 􏰀􏰃
􏰀 α′ P(α′|A) = 1.
Easiest way to create a PCFG from a treebank: MLE
􏰀 Count all occurrences of A → α in treebank.
􏰀 Divide by the count of all rules whose LHS is A to get P(α|A)
􏰀 But as usual many rules have very low frequencies, so MLE isn’t good enough and we need to smooth.
The probability of a parse
􏰀 Under this model, the probability of a parse t is simply the product of all rules in the parse:
P(t)= 􏰅 p(A→α|A) A→α∈t
􏰀 Example:
The generative model
Like n-gram models and HMMs, PCFGs are a generative model. Assumes sentences are generated as follows:
􏰀 Start with root category S.
􏰀 Choose an expansion α for S with probability P(α|S).
􏰀 Then recurse on each symbol in α.
􏰀 Continue until all symbols are terminals (nothing left to expand).

Statistical disambiguation example
How can parse probabilities help disambiguate PP attachment?
􏰀 Let’s use the following PCFG, inspired by Manning & Schuetze (1999):
S → NP VP 1.0 PP → P NP 1.0 VP → V NP 0.7 VP → VP PP 0.3 P → with 1.0 V → saw 1.0
NP→NPPP 0.4 NP → kids 0.1 NP → birds 0.18 NP→saw 0.04 NP→fish 0.18 NP → binoculars 0.1
􏰀 We want to parse kids saw birds with fish.
Probability of parse 1
NP0.1 kids
S1.0
V1.0 saw
VP0.7
NP0.18 birds
NP0.4
P1.0 NP0.18 with fish
PP1.0
􏰀 P(t1) = 1.0·0.1·0.7·1.0·0.4·0.18·1.0·1.0·0.18 = 0.0009072
The probability of a sentence
􏰀 Since t implicitly includes the words w⃗ , we have P ( t ) = P ( t , w⃗ ) .
􏰀 So, we also have a language model. Sentence probability is obtained by summing over T (w⃗ ), the set of valid parses of w⃗ :
P(w⃗)= 􏰄 P(t,w⃗)= 􏰄 P(t) t ∈ T ( w⃗ ) t ∈ T ( w⃗ )
􏰀 In our example,
P(kids saw birds with fish) = 0.0006804 + 0.0009072.
Probability of parse 2
S1.0
VP0.7
V1.0 NP0.18
saw birds
NP0.1 kids
VP0.3
PP1.0
P1.0 NP0.18
with fish
􏰀 P(t2) = 1.0·0.1·0.3·0.7·1.0·0.18·1.0·1.0·0.18 = 0.0006804 􏰀 which is less than P(t1) = 0.0009072, so t1 is preferred. Yay!