More on Context-Free Grammars
ANLP: Week 5, Unit 3
Shay Cohen
Based on slides from ANLP 2019
1/26
Last class
Constituents and their heads Context-free grammars
Structural ambiguity
Conversion to Chomsky Normal Form
Replace all words in an RHS with a preterminal that rewrites to that word
Break all RHSes into a sequence of RHSes with two nonterminals, possibly introducing new nonterminals:
transforms into
S → A1A2A3
S → A1B B → A2A3
.
2/26
4/26
Chomsky Normal Form
A context-free grammar is in Chomsky normal form if all productionsareoftheformA→BC orA→awhereA,B,C are nonterminals in the grammar and a is a word in the grammar.
Disregarding the empty string, every CFG is equivalent to a grammar in Chomsky normal form (the grammars’ string languages are identical)
Why is that important?
A normal form constrains the possible ways to represent an object
Makes parsing efficient
3/26
Sentence types
Among the large number of constructions for English sentences, four are particularly common:
Declarative
I prefer a morning flight.
Imperative
Give me the newspaper.
Yes-no question
Do any of these flights have stops?
Wh-questions
What is your name?
5/26
Declarative
S → NP VP
I want a flight from Ontario to Chicago.
The flight should be eleven a.m. tomorrow.
The return flight should leave at around seven. I will be back tomorrow.
Yes-no questions
Often used to ask questions or requests. S → Aux NP VP
Do any of these flights have stops?
Does American’s flight eighteen twenty five serve dinner? Can you give me the same information for United?
6/26
8/26
Imperative
Often begin with a VP and have no subject. S → VP
Show the lowest fare.
Give me Sunday’s flights arriving in Las Vegas from New York
City.
List all flights between five and seven.
Go home.
7/26
Wh-subject-question
Identical to the declarative structure, except that the first NP contains a wh-word.
S → Wh-NP VP
What airlines fly from Burbank to Denver?
Which flights depart Burbank after noon and arrive in Denver by six?
Whose flights serve breakfast?
9/26
Wh-non-subject-question
Wh-phrase is not the subject of the sentence, and so the sentence includes another subject with the auxiliary before the subject NP
S → Wh-NP Aux NP VP
What flights do you have from Burbank to Tacoma?
Noun phrases and determiners
NP → Det Nominal
NP can begin with simple determiners:
a stop
the flight
this flight
those flights any flights some flights
More complex expressions can act as determiners: United’s flight
United’s pilot’s union
Denver’s mayor’s mother’s cancelled flight
The determiner can be a possessive expression: Det → NP’s Determiners are not obligatory:
I like water. I like apples.
12/26
10/26
Long-range dependencies and movement
Wh-non-subject-questions contain long-distance dependencies:
What flights do you have from Burbank to Tacoma? Wh-NP what flights is separated from the predicate that it is
related to the VP have.
Some annotations and linguistic theories (minimalism) contain a small marker called a “trace” or “empty category” that is inserted after the verb to indicate long-distance dependency.
This is to denote that the object has “moved” from the object position to the beginning of the sentence:
I have an 11am flight from Burbank to Tacoma
11/26
NP: the Nominal
The nominal follows the Det and may contain other modifiers. In its simplest form:
Nominal → Noun
Numbers and other quantifiers:
two friends the second leg many flights the last flight
the next day one stop
Adjectives:
a first-class fare a non-stop flight
the longest layover the earliest flight
Adjectives can be grouped into adjective phrase (AP): the least expensive fare
13/26
NP: Nominal
The head noun can also be followed by postmodifiers: Prepositional phrase (PP):
all flights [from Cleveland]
all flights [from Cleveland] [to Newark]
arrival [in San Jose] [before seven]
a reservation [on flight sixty two] [from Tampa] [to Montreal]
A rule to account for postnominal PPs: Nominal → Nominal PP
NP: Nominal
14/26
Relative clauses – a clause that begins with a relative pronoun (that or who). The relative pronoun functions as the subject of the embedded verb:
a flight [that serves breakfast] flights [that leave in the morning] the man [who arrived late]
Nominal → Nominal RelClause RelClause → (who|that) VP
Various postnominal modifiers can be combined:
A flight [from Phoenix to Detroit] [leaving Monday evening] Evening flights [from Nashville] [that serve dinner]
16/26
NP: Nominal
Non-finite clauses:
1. Gerundive(-ing) postmodifiers – VP that begins with the gerundive (-ing) form of the verb:
Nominal → Nominal GerundVP any flights [arriving after eleven]
flights [leaving on Thursday]
2. Infinitives:
the last flight [to arrive in Boston]
3. -ed forms:
I need to have dinner [served]
Which is the aircraft [used by this flight]?
15/26
Verb Phrase
The verb phrase consists of the verb and a number of other constituents:
VP → Verb
VP → Verb NP
VP → Verb NP PP VP → Verb PP
disappear
prefer [a morning flight]
leave [Boston] [in the morning] leaving [on Thursday]
More complex constituents are also possible:
Another VP:
I want [VP to fly from Milwaukee to Orlando] Sentential complement:
I [VP [V think] [S I would like to take the early flight]]
17/26
Conjunction
Major phrase types can be combined with conjunctions like and, or, but:
I need to know [NP [NP the aircraft] and [NP the flight number]] NP → NP and NP
The ability to form coordinate phrases through conjunctions is used to test for constituency:
I need to know the [Nom [Nom aircraft] and [Nom flight number]].
Grammars in Practice
Read off treebanks
May contain thousands of rules
We will talk more about treebanks when we add probabilities to grammars
18/26
20/26
Conjunction
VP conjunctions:
What flights do you have [VP [VP leaving Denver] and [VP
arriving in San Francisco]]
S conjunctions:
[S [S I’m interested in a flight from Dallas to Chicago] and [S I’m also interested in going to Baltimore]]
VP → VP and VP S → S and S
Meta-rule: X → X and X
Any non-terminal can be conjoined with the same non-terminal to yield a constituent of the same type.
19/26
Agreement phenomena
In programming languages, typing rules enforce type agreement between different (often separated) constituents of a program:
int i=0; …; if (i>2) …
There are somewhat similar phenomena in NL: constituents of a sentence (often separated) may be constrained to agree on an attribute such as person, number, gender.
You, I imagine, are unable to attend.
The hills are looking lovely today, aren’t they? He came very close to injuring himself.
21/26
Agreement in various languages
These examples illustrate that in English:
Verbs agree in person and number with their subjects;
Tag questions agree in person, number, tense and mode with their main statement, and have the opposite polarity.
Reflexive pronouns follow suit in person, number and gender. French has much more by way of agreement phenomena:
Adjectives agree with their head noun in gender and number.
Le petit chien, La petite souris, Les petites mouches
Participles of ˆetre verbs agree with their subject:
Il est arriv ́e, Elles sont arriv ́ees
Participles of other verbs agree with preceding direct objects: Il a vu la femme, Il l’a vue
How can we capture these kinds of constraints in a grammar?
Node-splitting via attributes
One solution is to refine our grammar by splitting certain non-terminals according to various attributes. Examples of attributes and their associated values are:
Person: 1st, 2nd, 3rd
Number: singular, plural
Gender: masculine, feminine, neuter
Case: nominative, accusative, dative, . . . Tense: present, past, future, . . .
In principle these are language-specific, though certain common patterns recur in many languages.
22/26
We can then split phrase categories as the language demands, e.g. Split NP on person, number, case (e.g. NP[3,sg,nom]),
Split VP on person, number, tense (e.g. VP[3,sg,fut]).
24/26
Agreement rules: why bother?
Modelling agreement is obviously important if we want to generate grammatically correct NL text.
But even for understanding input text, agreement can be useful for resolving ambiguity.
E.g. the following sentence is ambiguous . . .
The boy who eats flies ducks. . . . whilst the following are less so:
The boys who eat fly ducks. The boys who eat flies duck.
23/26
Parameterized CFG productions
We can often use such attributes to enforce agreement constraints. This works because of the head phrase structure typical of NLs. E.g. we may write parameterized rules such as:
S → NP[p,n,nom] VP[p,n] NP[3,n,c] → Det[n] Nom[n]
Each of these really abbreviates a finite number of rules obtained by specializing the attribute variables. (Still a CFG!)
When specializing, each variable must take the same value everywhere, e.g.
S →
S → NP[3,pl,acc] →
NP[3,sg,nom] VP[3,sg] NP[1,pl,nom] VP[1,pl] Det[pl] Nom[pl]
Parsing algorithms can be adapted to work with this machinery: don’t have to ‘build’ all the specialized rules individually.
25/26
Example: subject-verb agreement in English
S → NP[p,n,c] → Pro[1,sg,nom] → Pro[1,sg,acc] → NP[3,n,c] → Nom[n] → N[sg] → N[pl] → RelOpt[n] → VP[p,n] → VV[p,n] → V[3,sg] → BE[p,n] → VG →
(Other rules omitted.)
NP[p,n,nom] VP[p,n] Pro[p,n,c]
I, etc.
me, etc.
Det[n] Nom[n] RelOpt[n] N[n] | Adj Nom[n] person, etc.
people, etc.
ε | who VP[3,n] VV[p,n] NP[p’,n’,acc] V[p,n] | BE[p,n] VG teaches, etc.
is, etc.
teaching, etc.
26/26
Example: subject-verb agreement in English
S → NP[p,n,c] → Pro[1,sg,nom] → Pro[1,sg,acc] → NP[3,n,c] → Nom[n] → N[sg] → N[pl] → RelOpt[n] → VP[p,n] → VV[p,n] → V[3,sg] → BE[p,n] → VG →
NP[p,n,nom] VP[p,n] Pro[p,n,c]
I, etc.
me, etc.
Det[n] Nom[n] RelOpt[n] N[n] | Adj Nom[n] person, etc.
people, etc.
ε | who VP[3,n] VV[p,n] NP[p’,n’,acc] V[p,n] | BE[p,n] VG teaches, etc.
is, etc.
teaching, etc.
27/26