Syntax and Parsing 4: Statistical Parsing
This time:
The problem of ambiguity
Lexical ambiguity Structural ambiguity Local vs. global ambiguity
Probabilistic Context-Free Grammar (PCFG)
Parse probability and string probability Disambiguation
Training
Some problems with PCFG
Lexical preferences Contextual probabilities
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
1 / 26
Variation and Ambiguity
Two challenges for NLP are variation and ambiguity:
Variation:
Practically unlimited number of ways of saying the same thing Need huge flexibility in characterising language and interpretations
Ambiguity:
Many different interpretations of a single linguistic utterance Need efficient way of choosing the “right” interpretation
Will consider the problem of ambiguity in this lecture
Data Science Group (Informatics) NLE/ANLP Autumn 2015 2 / 26
Ambiguity
Lexical Ambiguity:
A word with more than one meaning
is fire a verb or noun?
if it’s a verb, which sense shoot or sack?
Sometimes different meanings have the same part of speech — multiple noun senses of star
Sometimes systematically related
— chicken has both an animal and a food sense — regular polysemy
Often one sense is strongly predominant
Data Science Group (Informatics) NLE/ANLP Autumn 2015 3 / 26
Ambiguity
Structural Ambiguity:
Grammar licenses more than one analysis
Alternative analyses should not be equivalent
e.g. The man hit the boy with the stick Who has the stick?
e.g. They saw her duck
Did they see a woman ducking down or a woman’s aquatic bird?!
Data Science Group (Informatics) NLE/ANLP Autumn 2015 4 / 26
Structural Ambiguity
S
NP
VP
NP
Det N V
The
man
hit
NP
PP
N Prep
boy with Det
the
Det
the
NP
N
stick
S
NP
DetNVP PP
The man V NP Prep NP
hit Det N with Det N the boy the stick
VP
Data Science Group (Informatics) NLE/ANLP Autumn 2015
5 / 26
Both Lexical and Structural Ambiguity
SS
NP VP NP VP
They V NP They V S
saw Det N saw NP VP
her duck her V duck
Data Science Group (Informatics)
NLE/ANLP Autumn 2015
6 / 26
Local vs Global Ambiguity
Local Ambiguity:
Component of sentence ambiguous, but resolved by rest of
sentence
Fruit flies in every direction
Garden path sentences:
The horse raced past the barn fell
Backtracking, lookahead or parallelism
Data Science Group (Informatics) NLE/ANLP Autumn 2015 7 / 26
Local vs Global Ambiguity
Global:
The sentence as a whole has two meanings The man hit the boy with the stick
Context might help to resolve ambiguity
Data Science Group (Informatics) NLE/ANLP Autumn 2015 8 / 26
Probabilistic Context-Free Grammar (PCFG)
PCFG is a natural extension to CFG:
Assigns probabilities to parses — and thus also strings
Useful in choosing between different parses — syntactic ambiguity
Can provide robust language processing
— provides approach to partial ungrammaticality
Data Science Group (Informatics) NLE/ANLP Autumn 2015 9 / 26
PCFG and Rule Probabilities
Associate a probability with each grammar rule:
Probabilities assigned to rules introducing terminal symbols Probabilities assigned to phrase structure rules
S → NP VP
NP → N
NP → Det N NP→DetAN
VP → V
VP → V NP
(1.0) Det → the
(0.4) A → patient
N → nurses
N → patient (0.4) N → sleep
(1.0) (1.0)
(0.4) (0.5) (0.1)
(0.7) (0.3)
(0.4) (0.2)
(0.6)
V → nurses V → sleep
Data Science Group (Informatics) NLE/ANLP Autumn 2015 10 / 26
Illustrating Ambiguity
Example: the patient nurses sleep
S
VP Det A N V
NP
the patient nurses
sleep
[S [NP [Det the]
[A patient]
[N nurses]]
[VP [V sleep]]]
Data Science Group (Informatics) NLE/ANLP
Autumn 2015
11 / 26
Illustrating Ambiguity (cont.)
An alternative parse:
S
NP
VP
V NP
nurses N sleep
Det N
the patient
[S [NP [Det the]
[N patient]]
[VP [V nurses]
[NP [N sleep]]]]
Data Science Group (Informatics) NLE/ANLP Autumn 2015 12 / 26
Syntactic Disambiguation
PCFGs provide a way to choose between alternative parses of a string
Using PCFG for disambiguation:
Calculate probabilities of the different phrase structure trees Choose the most probable tree as the preferred interpretation
Data Science Group (Informatics) NLE/ANLP Autumn 2015 13 / 26
Probability of a Parse
To calculate the probability of a given phrase structure tree:
Multiply together probabilities of rules involved in its construction
Assumes independence between rule choices:
The essence of what it means to be context-free
Data Science Group (Informatics) NLE/ANLP Autumn 2015 14 / 26
Example
Consider the first parse of the patient nurses sleep: S 0.0096
NP 0.08 Det 1.0 A 1.0
the patient
N 0.4 nurses
VP 0.12 V 0.3
sleep
The number against each tree node represents the probability of that constituent (sub-tree)
The probability assigned to the whole parse tree is 0.0096
Data Science Group (Informatics) NLE/ANLP Autumn 2015 15 / 26
Disambiguation
We can carry out a similar calculation for the second phrase structure tree associated with the patient nurses sleep
If we do this, we find that the probability of this parse is 0.00336 Since 0.0096 > 0.00336, we prefer the first phrase structure tree
Data Science Group (Informatics) NLE/ANLP Autumn 2015 16 / 26
Assigning Probabilities to Strings
We can also assign probabilities to strings:
The probability of a string is the sum of the probabilities of its
parses
The probability of the patient nurses sleep is therefore
0.0096 + 0.00336 = 0.0132 This gives us a language model
Data Science Group (Informatics) NLE/ANLP Autumn 2015 17 / 26
Training a Statistical Parser
A PCFG contains many parameters One probability per rule (PCFG)
Grammars containing rules introducing words have huge numbers of rules
The probabilities cannot be assigned by a human Need to estimate parameters based on text
How are the probabilities obtained? From raw text
From manually ‘treebanked’ text
Training from treebanks results in a more accurate system
Data Science Group (Informatics) NLE/ANLP Autumn 2015 18 / 26
Training from a Treebank
Extract exact rules from the treebank
Estimate rule probability from frequencies observed in treebank
( (S
(NP-SBJ-23
(NP (DT No) (NN price) ) (PP (IN for)
(NP (DT the) (JJ new) (NNS shares) ))) (VP (VBZ has)
(VP (VBN been)
(VP (VBN set)
(NP (-NONE- *-23) ))))
(. .) ))
Data Science Group (Informatics) NLE/ANLP Autumn 2015 19 / 26
Obtaining Rule Probabilities
Estimating the probabilities associated with grammar rules: Count occurrences of rules in a collection of parse trees
count(NP → Det N) P(NP→DetN) ≈ αcount(NP→α)
Proportion of times that an NP is expanded with the given rule Note that for each non-terminal, rule probabilities sum to 1
Data Science Group (Informatics) NLE/ANLP Autumn 2015 20 / 26
Assessment of Approach
How well does this work?
Need a very large treebank
Treebanks are difficult to produce
Around 70% accuracy when training on the Penn Treebank — 1M words of manually analysed text
Data Science Group (Informatics) NLE/ANLP Autumn 2015 21 / 26
Training from an Unannotated Corpus
Unsupervised learning:
Learning grammar from raw text
For PCFG can use inside-outside algorithm
Variation of forward-backward algorithm used for HMM training
Problems: Very slow
No guarantee of optimality — stuck in local maxima
Typically results in more non-terminals than necessary
No guarantee that structures learned will have any correspondance to linguistic theory
Even if initialised with a grammar, training might change the meaning of non-terminal categories
Data Science Group (Informatics) NLE/ANLP Autumn 2015 22 / 26
The Problem of Lexical Preferences
PCFGs are not the answer to all our problems Unable to model lexical preferences:
NN NNNN
emergency N N N N surgery brain surgery emergency brain
PCFG can’t distinguish between these alternatives
Data Science Group (Informatics) NLE/ANLP Autumn 2015 23 / 26
Lexicalized Grammars
To overcome this we could:
Create a new rule for each possible lexical head
VP(see) → V(see) NP(man) PP(with) VP(see) → V(see) NP(man) NP(man) → N(man) PP(with)
Associate a probability with each new rule
The probability of the rule is conditioned on the head
— for given a head and non-terminal, probabilities sum to 1
Sparse data
— need to smooth to avoid 0 probabilities
Data Science Group (Informatics) NLE/ANLP Autumn 2015 24 / 26
Parsing and Contextual Probabilities
Another problem with PCFG:
Probabilities of sub-trees cannot depend on context
A pronoun is more common as a subject than as an object A single NP → Pro rule cannot account for this
SS
NP VP NP VP
Pro Verb NP Det
N
Det N Verb NP Pro
Parsing is an active area of research, and the current best systems are around 90% accurate
Data Science Group (Informatics) NLE/ANLP Autumn 2015 25 / 26
Next Topic: Lexical Semantics
Language and meaning Word meaning
Lexical relations WordNet
Word similarity
Data Science Group (Informatics)
NLE/ANLP
Autumn 2015
26 / 26