8. Tree grammars
1 Review: Stringsets and string grammars
The kind of thing we’ve done with strings many times now follows this pattern:
(1) a. Identify an alphabet of symbols; call it Σ.
b. This determines a certain set of strings over this alphabet; usually written Σ∗. c. Identify some subset of Σ∗ as the stringset of interest; call this L, so L ⊆ Σ∗. d. Ask what (string) grammar(s) can generate exactly that set of strings L.
Remember that step (1b) involves an important recursive definition:
(2) For any set Σ, we define Σ∗ as the smallest set such that: • ε∈Σ∗,and
• ifx∈Σandu∈Σ∗ then(x:u)∈Σ∗.
So if Σ = {a, b}, then Σ∗ contains things like a:(a:(b:ε)), which we abbreviate as ‘aab’.
Then, in step (1c), we identify some stringsets that we might be interested in:
(3) a. L1 = {w | w ∈ Σ∗ and every ‘a’ is immediately followed by a ‘b’}
b. L2 ={w|w∈Σ∗ andw̸=εandthefirstandlastsymbolsofwarethesame}
c. L3 ={w|w∈Σ∗ andthenumberofoccurrencesof‘a’inwiseven}
d. L4 = {w | w ∈ Σ∗ and w contains an ‘a’ that is followed (not necessarily immediately) by a ‘b’} e. L5 ={anbn |n∈Nandn≥0}
f. L6 ={w+wR |w∈Σ∗} (wherewR isthereverseofthestringw) g. L7 ={w+w|w∈Σ∗}
And for each such stringset L, we can ask (step (1d)) what kinds of grammars can generate exactly L.
2 Generalizing: Treesets and tree grammars
Things will follow an analogous pattern here:
(4) a. Identify an alphabet of symbols; call it Σ.
b. This determines a certain set of trees over this alphabet; usually written TΣ. c. Identify some subset of TΣ as the treeset of interest; call this L, so L ⊆ TΣ. d. Ask what (tree) grammar(s) can generate exactly that set of trees L.
Ling185A, Winter 2021 — Tim Hunter, UCLA
2.1 The set of trees over an alphabet
(5) For any set Σ, we define TΣ as the smallest set such that:
• ifx∈Σ,thenx[]∈TΣ,and
• if x ∈ Σ and t1,t2,…,tk ∈ TΣ, then x[t1,t2,…,tk] ∈ TΣ.
Those square brackets in this definition are analogous to the colon in the definition of Σ∗. The colon makes strings out of symbols, and the square brackets make trees out of symbols. (These pieces of punctuation correspond to constructors in Haskell.)
So for example, if Σ = {a, b, c}, then the set TΣ looks something like this:
(6) TΣ = a[] , b[] , c[] , a[a[]] , …, a[b[],b[],c[]] , …, b[c[a[]],a[b[],b[]]] , …
But just as we allow ourselves to write a:(a:(b:ε)) more conveniently as ‘aab’, we allow ourselves to write b[c[a[]], a[b[], b[]]] more conveniently as:
(7)
b
ca
abb
Also it’s sometimes convenient to leave off empty pairs of brackets, so instead of b[c[a[]],a[b[],b[]]] we sometimes write b[c[a], a[b, b]].
One more definition is useful:
(8) For any set Σ and any natural number n, we define TΣn as the set of all trees in TΣ in which every
node has at most n daughters.
So the tree in (7), for example, is a member of TΣ2 and is also a member of TΣ3, but is not a member of TΣ1.
The largest number of daughters of any node in a tree is sometimes called the tree’s branching degree. So TΣn is the set of all trees in TΣ with branching degree less than or equal to n. The branching degree of the tree in (7) is 2.
2.2 Subsets of TΣ (“treesets”)
Using the alphabet Σ = {a, b}, here are some treesets we might be interested in:
(9)
2.3
(10)
a. L1 = {t ∈ TΣ2 | the number of occurrences of ‘a’ in t is even}
b. L2 = {t ∈ TΣ2 | every ‘b’ in t dominates a binary-branching ‘a’}
c. L3 = {t ∈ TΣ2 | t contains a binary-branching ‘a’ whose left daughter subtree contains an ‘a’
and whose right daughter subtree contains a ‘b’} d. L4 = {t ∈ TΣ2 | t contains equal numbers of occurrences of ‘a’ and ‘b’}
One kind of tree grammar
A (bottom-up) finite-state tree automaton (FSTA) is a four-tuple (Q, Σ, F, ∆) where:
• Q is a finite set of states;
• Σ, the alphabet, is a finite set of symbols;
• F ⊆ Q is the set of ending states; and
• ∆⊆Q∗ ×Σ×Qisthesetoftransitions,whichmustbefinite.
Ling185A, Winter 2021 — Tim Hunter, UCLA
For any FSTA G = (Q, Σ, F, ∆), underG is a function from TΣ × Q to booleans: (11) underG(x[])(q) = ∆([], x, q)
underG(x[t1,…,tk])(q)= ··· ∆([q1,…,qk],x,q)∧underG(t1)(q1)∧···∧underG(tk)(qk) q1 ∈Q qk ∈Q
And L(G) is a subset of TΣ:
(12) t ∈ L(G) ⇐⇒ underG(t)(q) ∧ F (q)
q∈Q
As usual, slot in your favourite semiring as desired!
2.4 Examples
2.4.1 Even/odd
The FSTA G1 in (13) generates the treeset L1 from (9) above (requiring an even number of ‘a’s).
(13) G1 = ({even, odd}, {a, b}, {even}, ∆) where∆= {
([even, even], a, ([even, odd], a, ([odd, even], a, ([odd, odd], a, ([even], a, ([odd], a, ([], a,
odd), ([even, even], b, even), ([even, odd], b, even), ([odd, even], b, odd), ([odd, odd], b, odd), ([even], b, even), ([odd], b, odd), ([], b,
even), odd), odd), even), even), odd), even),
(14)
odd
aba
}
even
a
odd
b
odd even
a
even odd
b
This grammar is bottom-up deterministic: given a sequence of “child states” and a symbol, there’s at most one applicable transition. This reflects the fact that there’s a function that determines whether a tree contains an even or odd number of ‘a’s. But this grammar is not top-down deterministic: note the “choices” one has to make at binary-branching nodes when working top-down.
Ling185A, Winter 2021 — Tim Hunter, UCLA
2.4.2 Another abstract example
The grammar in (15) generates the treeset L2 from (9) above (requiring that every ‘b’ dominates a binary- branching ‘a’).
(15)
(16)
G2 = ({0,1},{a,b},{0,1},∆)
where ∆ = {
}
([0,0], ([0, 1], ([1, 0], ([1, 1], ([0], ([1], ([],
a, 1), a, 1), a, 1), a, 1), a, 0), a, 1), a, 0),
([0, 1], ([1, 0], ([1, 1],
b, 1), b, 1), b, 1),
([1], b, 1),
11
bbb
1011
aa
bbb
2.4.3
000101
aaaaaa
11
aa
00 00
aa aa
00
aa
A more linguistic example
Now let’s suppose that the alphabet Σ is the set of English words, plus the additional symbol ∗. (17) Σ = {∗, the, cat, dog, anybody, ever, not, nobody, . . . }
Then the FSTA in (19) encodes a simple version of the NPI-licensing constraint: an NPI such as ‘anybody’ or ‘ever’ must be c-commanded by a licensor such as ‘not’ or ‘nobody’.
(18)
(19)
a. Nobody met anybody
b. * John met anybody
c. Nobody thinks that John met anybody
d. The fact that nobody met anybody surprised John e. * The fact that nobody met John surprised anybody
G3 = ({0, lic, neg}, Σ, {0, lic}, ∆)
where∆= {
([neg, neg], ([0, neg], ([neg, 0], ([0, 0],
([lic, neg], ([lic, 0], ([0, lic], ([lic, lic],
∗, neg), ∗, neg), ∗, neg), ∗, 0),
∗, 0),
∗, 0),
∗, 0),
∗, 0) }
([], anybody,
([], ever,
([], not,
([], nobody,
([], s,
neg),
neg),
lic),
lic),
0) foranyothers∈Σ−{∗},
Ling185A, Winter 2021 — Tim Hunter, UCLA
(20)
0
∗
0 ∗0∗
0 that 0 ∗
lic
∗
neg
0
met anybody
00
surprised John
nobody
neg
Notice that the pattern of two-daughter transitions and zero-daughter transitions in this grammar ensures that the generated trees will contain only (i) binary nodes with the symbol ∗, and (ii) leaf nodes with other symbols.
2.5 So what do FSTAs gain for us?
But wait a minute — if the goal is just to account for the facts about strings in (18), then we can do the same thing with a plain old CFG.
(21)
0
00
0000
that lic neg surprised John nobody 0 neg
met anybody
Of course we’re used to seeing other things as the labels for those internal nodes, and using those labels to enforce certain other requirements (e.g. the requirement that an S is made up of an NP and a VP). But we can just bundle all that information together.1
(22) S0
CP0
VP0
C0S0 V0NP0
that NPlic VPneg surprised John
nobody V0 NPneg met anybody
We can even use a similar trick for “movement”!
1Because the cartesian product of finite sets is necessarily finite.
Ling185A, Winter 2021 — Tim Hunter, UCLA
(23)
S0
NP0
VP0
NP0 surprised John
So while FSTAs are a useful tool for conceptualizing long-distance dependencies (such as NPI-licensing or wh-movement), it turns out that if a stringset can be derived by “reading along the bottom” of all the trees generated by an FSTA, then that stringset can also be generated by a CFG.
3 Stringsets beyond context-free
Some examples of stringsets that cannot be generated by a CFG:
(24) a. {w+w|w∈{a,b}∗} b. {anbncn | n ≥ 0}
c. {anbncndn | n ≥ 0}
d. {aibjcidj |i≥0,j≥0}
These all exhibit crossing dependencies, rather than nesting dependencies of the sort that CFGs can handle. (Imagine trying to recognize these stringsets by by moving through strings from left to right, with an unbounded stack as your available memory.)
3.1 Non-context-free string patterns in natural language
To start, consider the following kinds of sentences in English:
(25) a. we [paint houses]
b. we [help [SC John paint houses]]
c. we [let [SC children help [SC John paint houses]]]
The subject of each “small clause” (SC) gets its case from the verb just above it. In many languages this would be shown overtly on the noun phrases somehow. And in many languages, the choice of verb would affect exactly which case (e.g. accusative or dative) gets assigned to each small clause subject. So we can imagine that the surface strings in fact look like this:
(26) a. we [paint houses]
b. we [help-dat [SC John-dat paint houses]]
c. we [let-acc [SC children-acc help-dat [SC John-dat paint houses]]]
Let’s restrict attention to cases where the accusative-subject small clauses are all “outside” the dative- subject small clauses. Then the English word order pattern can be generated by an FSA: each accusative- assigning verb needs an accusative NP immediately after it, and likewise for each dative-assigning verb. The pattern is analogous to {(V1N1)i(V2N2)j | i ≥ 0, j ≥ 0}.
In a head-final language, we might expect to see a word-order like this:
NP0 RC0
those NP+wh
who NP0 we
met
V0
S=wh
VP=wh
V0 NP=wh
ε
Ling185A, Winter 2021 — Tim Hunter, UCLA
(27) a. we [houses paint]
b. we [[SC John-dat houses paint] help-dat]
c. we [[SC children-acc [SC John-dat houses paint] help-dat] let-acc]
This is beyond an FSA, but possible with a CFG: the very first NP is associated with the very last verb. This is analogous to {N1iN2jV2jV1i | i ≥ 0,j ≥ 0}.
Now the amazing fact: in (at least a certain dialect of) Swiss German, we find the following word order:
(28) a. we [houses paint]
b. we [John-dat houses help-dat paint]
c. we [children-acc John-dat houses let-acc help-dat paint]
This pattern is analogous to {N1iN2jV1iV2j | i ≥ 0,j ≥ 0}, which no CFG can generate. For the record, this is what the relevant parts of the actual sentences look like.
(29) a.
b.
daß mer that we “that we daß mer that we “that we
em Hans es huus ha ̈lfe aastru ̈che
Hans.dat the house.acc helped paint
helped Hans paint the house”
d’chind em Hans es huus lond ha ̈lfe aastru ̈che the children.acc Hans.dat the house.acc let help paint
let the children help Hans paint the house”
Papers presenting this argument were published in the mid-1980s by Riny Huybregts and by Stuart Shieber. Geoff Pullum’s short article entitled “Footloose and context-free” (1986, NLLT) is a very amusing little (six-page) account of the historical development of the ideas.
3.2 A more powerful grammar formalism
Here’s the big idea, building on FSTAs:
• We’ve seen that in order to allow a left-to-right string-processing automaton to recognize patterns
like anbn, we need memory in the form of an unbounded stack, rather than just a single state.
• Let’s introduce the same kind of unbounded stack memory into a bottom-to-top tree-processing
automaton.
• We’ll restrict/simplify the use of the stack slightly: stack information can only flow between a parent
and one of its daughters.
This means that we can generate patterns like anbn along the leaves of a strictly left-branching (or strictly
right-branching) tree. It’s sort of like CFG parsing in disguise.
(30)
[]
Xa
[]
Y
[A]
Yb
[A,A]
Yb
[A,A]
X
[A]
Xa
ε
Ling185A, Winter 2021 — Tim Hunter, UCLA
But when we combine this stack-based memory with center-embedding structures, magic happens! Here’s the rough idea for how we can generate {anbncndn | n ≥ 0}.
(31)
a
a
bX
[]
bXc
ε
Notice that the highest/outermost (a,d) pair is “matched up with” the lowest/innermost (b,c) pair: think- ing bottom-up, the lowest (b,c) pair pushed the deepest ‘A’ onto the stack, and the highest (a,d) pair popped off that ‘A’.
This means that the first ‘a’ is in a dependency with the first ‘c’ — so we have generated crossing dependencies!
Here’s the rough idea for how we can generate {w + w | w ∈ {a, b}∗}.
(32)
a
a
[]
Y
[A]
Yd
[A,A]
Yd
[A,A]
X
[A]
c
[]
Y
[A]
Y
[A,A]
Y
bY
[A,A,B]
X
[A,A]
Xb
[A]
a
[A,A,B]
X
[]
Xa
ε
In a similar way we can generate {aibjcidj | i ≥ 0,j ≥ 0}, which corresponds to the Swiss German case-marking pattern.
Ling185A, Winter 2021 — Tim Hunter, UCLA