Syntax and Parsing 4: Statistical Parsing
Variation and Ambiguity
This time:
The problem of ambiguity
Lexical ambiguity Structural ambiguity Local vs. global ambiguity
Probabilistic Context-Free Grammar (PCFG)
Parse probability and string probability Disambiguation
Training
Two challenges for NLP are variation and ambiguity:
Variation:
Practically unlimited number of ways of saying the same thing Need huge flexibility in characterising language and interpretations
Ambiguity:
Many different interpretations of a single linguistic utterance Need efficient way of choosing the “right” interpretation
Some problems with PCFG
Lexical preferences Contextual probabilities
Data Science Group (Informatics)
Ambiguity
Lexical Ambiguity:
Will consider the problem of ambiguity in this lecture
NLE/ANLP
Autumn 2015
1 / 26
Data Science Group (Informatics)
Ambiguity
NLE/ANLP
Autumn 2015
2 / 26
A word with more than one meaning
is fire a verb or noun?
if it’s a verb, which sense shoot or sack?
Structural Ambiguity:
Grammar licenses more than one analysis
Alternative analyses should not be equivalent
e.g. The man hit the boy with the stick Who has the stick?
Sometimes different meanings have the same part of speech — multiple noun senses of star
Sometimes systematically related
— chicken has both an animal and a food sense — regular polysemy
Often one sense is strongly predominant
e.g. They saw her duck
Did they see a woman ducking down or a woman’s aquatic bird?!
Data Science Group (Informatics) NLE/ANLP Autumn 2015
3 / 26
Data Science Group (Informatics) NLE/ANLP Autumn 2015
4 / 26
Structural Ambiguity
Both Lexical and Structural Ambiguity
S
NP VP
Det N V
The man hit NP
Det
the
NP
N Prep
SS NP VP NP
VP
S
PP boy with Det
NP
N
stick
They V
saw Det N saw NP VP
NP They V
S
the
NP
DetNVP PP
The man V NP Prep NP
hit Det N with Det N
VP
her duck
NLE/ANLP
her V duck
Autumn 2015
the boy
Data Science Group (Informatics) NLE/ANLP
Local vs Global Ambiguity
the stick
Autumn 2015
5 / 26
Data Science Group (Informatics)
6 / 26
Local Ambiguity:
Component of sentence ambiguous, but resolved by rest of
sentence
Fruit flies in every direction
Garden path sentences:
The horse raced past the barn fell
Backtracking, lookahead or parallelism
Local vs Global Ambiguity
Global:
The sentence as a whole has two meanings The man hit the boy with the stick
Context might help to resolve ambiguity
Data Science Group (Informatics) NLE/ANLP Autumn 2015
7 / 26
Data Science Group (Informatics) NLE/ANLP
Autumn 2015
8 / 26
Probabilistic Context-Free Grammar (PCFG)
PCFG and Rule Probabilities
PCFG is a natural extension to CFG:
Assigns probabilities to parses — and thus also strings
Useful in choosing between different parses — syntactic ambiguity
Can provide robust language processing
— provides approach to partial ungrammaticality
Data Science Group (Informatics) NLE/ANLP
Illustrating Ambiguity
Example: the patient nurses sleep
S
Associate a probability with each grammar rule:
Probabilities assigned to rules introducing terminal symbols Probabilities assigned to phrase structure rules
S → NP VP
NP → N
NP → Det N NP→DetAN
VP → V
VP → V NP
(1.0) Det → the
(0.4) A → patient
N → nurses
N → patient (0.4) N → sleep
(1.0) (1.0)
(0.4) (0.5) (0.1)
(0.7) (0.3)
(0.4) (0.2)
(0.6)
V → nurses V → sleep
Autumn 2015
9 / 26
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
10 / 26
Illustrating Ambiguity (cont.)
An alternative parse:
Det N
the patient
S
VP Det A N V
NP
NP
VP
V NP
nurses N sleep
the patient nurses
[S [NP [Det the]
[A patient]
[N nurses]]
[VP [V sleep]]]
sleep
[S [NP [Det the]
[N patient]]
[VP [V nurses]
[NP [N sleep]]]]
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
11 / 26
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
12 / 26
Syntactic Disambiguation
Probability of a Parse
PCFGs provide a way to choose between alternative parses of a string
To calculate the probability of a given phrase structure tree:
Multiply together probabilities of rules involved in its construction Assumes independence between rule choices:
The essence of what it means to be context-free
Using PCFG for disambiguation:
Calculate probabilities of the different phrase structure trees Choose the most probable tree as the preferred interpretation
Data Science Group (Informatics) NLE/ANLP
Example
Consider the first parse of the patient nurses sleep: S 0.0096
Autumn 2015
13 / 26
Data Science Group (Informatics) NLE/ANLP Autumn 2015 14 / 26
Disambiguation
We can carry out a similar calculation for the second phrase structure tree associated with the patient nurses sleep
If we do this, we find that the probability of this parse is 0.00336 Since 0.0096 > 0.00336, we prefer the first phrase structure tree
Det 1.0 the
NP 0.08
A 1.0 patient
N 0.4 nurses
VP 0.12 V 0.3
sleep
The number against each tree node represents the probability of that constituent (sub-tree)
The probability assigned to the whole parse tree is 0.0096
Data Science Group (Informatics) NLE/ANLP Autumn 2015 15 / 26
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 26
Assigning Probabilities to Strings
Training a Statistical Parser
We can also assign probabilities to strings:
The probability of a string is the sum of the probabilities of its
parses
The probability of the patient nurses sleep is therefore
A PCFG contains many parameters
One probability per rule (PCFG)
Grammars containing rules introducing words have huge numbers of rules
0.0096 + 0.00336 = 0.0132 This gives us a language model
Data Science Group (Informatics) NLE/ANLP
Training from a Treebank
Autumn 2015
17 / 26
The probabilities cannot be assigned by a human Need to estimate parameters based on text
How are the probabilities obtained? From raw text
From manually ‘treebanked’ text
Training from treebanks results in a more accurate system
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Obtaining Rule Probabilities
Estimating the probabilities associated with grammar rules:
18 / 26
Extract exact rules from the treebank
Estimate rule probability from frequencies observed in treebank
( (S
(NP-SBJ-23
(NP (DT No) (NN price) ) (PP (IN for)
(NP (DT the) (JJ new) (NNS shares) ))) (VP (VBZ has)
(VP (VBN been)
(VP (VBN set)
(NP (-NONE- *-23) ))))
(. .) ))
Count occurrences of rules in a collection of parse trees count(NP → Det N)
P(NP → Det N) ≈ α count(NP → α) Proportion of times that an NP is expanded with the given rule
Note that for each non-terminal, rule probabilities sum to 1
Data Science Group (Informatics) NLE/ANLP
Autumn 2015
19 / 26
Data Science Group (Informatics) NLE/ANLP Autumn 2015
20 / 26
Assessment of Approach
Training from an Unannotated Corpus
How well does this work?
Need a very large treebank Treebanks are difficult to produce
Around 70% accuracy when training on the Penn Treebank — 1M words of manually analysed text
Unsupervised learning:
Learning grammar from raw text
For PCFG can use inside-outside algorithm
Variation of forward-backward algorithm used for HMM training
Problems: Very slow
No guarantee of optimality — stuck in local maxima
Typically results in more non-terminals than necessary
No guarantee that structures learned will have any correspondance to linguistic theory
Even if initialised with a grammar, training might change the meaning of non-terminal categories
Data Science Group (Informatics) NLE/ANLP Autumn 2015
Lexicalized Grammars
To overcome this we could:
Create a new rule for each possible lexical head
Associate a probability with each new rule
The probability of the rule is conditioned on the head
— for given a head and non-terminal, probabilities sum to 1
Sparse data
— need to smooth to avoid 0 probabilities
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
21 / 26
22 / 26
The Problem of Lexical Preferences
PCFGs are not the answer to all our problems
Unable to model lexical preferences: NN
VP(see) → V(see) NP(man) PP(with) VP(see) → V(see) NP(man) NP(man) → N(man) PP(with)
NNNN
emergency N N N N brain surgery emergency brain
PCFG can’t distinguish between these alternatives
surgery
Data Science Group (Informatics) NLE/ANLP
Autumn 2015
23 / 26
Data Science Group (Informatics) NLE/ANLP Autumn 2015
24 / 26
Parsing and Contextual Probabilities
Next Topic: Lexical Semantics
Another problem with PCFG:
Probabilities of sub-trees cannot depend on context
A pronoun is more common as a subject than as an object A single NP → Pro rule cannot account for this
SS
NP VP NP VP
Language and meaning Word meaning
Lexical relations WordNet
Word similarity
Data Science Group (Informatics)
Pro Verb NP Det
Det N
Verb NP Pro
N
Parsing is an active area of research, and the current best systems are around 90% accurate
Data Science Group (Informatics) NLE/ANLP Autumn 2015 25 / 26
NLE/ANLP
Autumn 2015
26 / 26