Microsoft Word – Wk3Tute_Basque_KatesAnswerGuide
12/8/13 & 14/8/13
Page 1 of 8
Basque dataset – Kate’s answer guide
Revision:
Before we get started with the Basque dataset, let’s revise the steps we took to help us
analyse the Mbabaram dataset last week:
• Establish presumptions: e.g. word order, overt expression of grammatical functions,
etc.
• Start analysing the simplest sentences, and define what they could possibly consist
of based on the presumptions we’ve already established.
• Gradually work up to the more complex sentences.
• Use substitution tables to assist with analysis/categorisation of words into classes
Basque – Part A
We’re going to use a similar process of analysis with the Basque dataset. In Part A, you’ve
been provided with 12 sentences and told that 1 of them contains a mistake. You need to
study the sentences, find the regularities in their structure, and on this basis you need to
identify the incorrect sentence and correct it.
Now in order to do all of this, you’re going to need to consider both distributional evidence
and morphological evidence:
Distributional evidence: which words have the same distribution, i.e. words belonging to the
same word class can fill the same slot/frame: In English ‘The huge ___ chased the tiny dog’ =
nouns).
Morphological evidence: how syntactic information is expressed through morphological
marking on words (i.e. affixes). E.g. last week we saw the suffix –ul in Mbabaram marking
case).
So, before we start analysing the Basque dataset in detail, what do we need to do?
Establishing presumptions:
• Presumption 1: Basque has word order (as found in about 80% of languages)
• Presumption 2: Basque requires overt expression of core grammatical functions as
an independent word/phrase.
If we consider one of the simplest looking examples in our data:
(1) Gizona joaten da
We have words between 1-3 syllables, so it’s possible we could have some affixation
happening on the verb here that is providing information about our grammatical functions
12/8/13 & 14/8/13
Page 2 of 8
(similar to the Wubuy example I showed you last week). However, I always find it easier to
begin an analysis with the presumption that all the core grammatical functions (i.e. SUBJ,
DO, IO) need to be overtly expressed as a word/phrase. Then, if this presumption is
incorrect, the analysis just won’t work and I know that I possibly need to go back and revise
this.
Part A – sample solution
One useful way to start is to first group the examples into 3, 4 and 5 word sentences. Then,
by quickly scanning the sentences, it can be seen that either joaten or ikusten occur in every
sentence in the same position in the sentence (i.e. second last word – penultimate position),
which suggests that together they form a word class. Notice also that they co-occur with all
the other elements in the dataset, but they don’t co-occur with each other. They also differ
in the number of other elements that they occur with (i.e. joaten always occurs with 2 or 3
other elements, whereas ikusten always occurs with 3 or 4). Based on these observations,
we might speculate that these are our verbs, and that they might differ in transitivity. They
also possibly share some kind of suffix, -en or -ten, which could be marking tense. Having
noticed all of this, we can draw up substitution tables for 3 and 4 word sentences containing
joaten and 4 and 5 word sentences containing ikusten:
Table 1. Three word sentences with joaten
Class A Class Bi Class C
1 Gizona joaten da
5 Txakurra-k joaten d-ir-a
11 Txakurra joaten zan
Table 2. Four word sentences with joaten
Class A Class D Class Bi Class C
3 Astoa atzo joaten zan
4 Gizona-k atzo joaten z-ir-an
Table 3. Four word sentences with ikusten
Class A Class A Class Bii Class C
2 Gizona-kin zaldia ikusten du
6 Zaldia-kin gizona ikusten du
7 Astoa-kin zaldia-k ikusten zuen
9 Txakurra-kin astoa-k ikusten d-it-u
Table 4. Five word sentences with ikusten
Class A Class A Class D Class Bii Class C
8 Gizona-kin txakurra atzo ikusten zuen
10 Zaldia-kin gizona-k atzo ikusten z-it-uen
12 Gizona-kin astoa-k atzo ikusten z-it-uen
12/8/13 & 14/8/13
Page 3 of 8
Interestingly, the shortest sentences we have are 3 word sentences. According to the presumptions
we made earlier, there are a few possibilities as to what these might consist of:
– Trans V, 2 Ns (Subj, Obj)
– Intrans V, N (Subj), Modifier
There is another possibility that we didn’t discuss last week regarding what the ‘extra’ word in a 3
word intransitive sentence might be. We have these in English and they usually accompany the verb
when we add aspect, modality or emphasis to a sentence, or if we change the voice to passive.1
AUXILIARIES!
– Intrans V, N (Subj), Aux
We can’t really make any claims about which of these options it is the most likely to be until we try
to identify some grammatical morphemes within some of our word classes. First of all, looking
across all of our substitution tables, we can see that we have a set of words that can substitute for
each other in initial sentence position with sentences containing joaten and in both initial and
second position in sentences containing ikusten, and which are also able to take two different
suffixes, -k and –kin.
Class A: {gizona, txakurra, astoa, zaldia}
Class A grammatical morphemes: {-k, -kin}
Also notice that joaten only ever co-occurs with one member of this class, whereas ikusten can co-
occur with two members, in which case the first member takes the suffix -kin. Based on this, we can
assume that Class A are our nouns (and –kin is possibly some kind of case marker on direct objects or
transitive subjects), joaten is probably an intransitive verb and ikusten is probably a transitive verb:
Class Bi (intransitive verbs): {joaten}
Class Bii (transitive verbs): {ikusten}
Therefore, in our 3 word (intransitive) sentences, we’re left with figuring out Class C, which occurs
sentence final. Notice that this is the same class that appears word final in our 4 word intransitive
sentences (zan is present here in both substitution tables). In fact, if we compare all sentence final
words across the dataset in Part A, we can notice a lot of similarities in form:
1 Aspect: a grammatical category that expresses how an action, event or state, denoted by a verb, relates to
the flow of time, e.g. Chomsky yawned = simple past; Chomsky was yawning = past progressive; Chomsky had
yawned = past perfect)
Modality: a grammatical category that expresses likelihood, ability, permission and obligation, e.g. Chomsky
may yawn, Chomsky must yawn.
Voice: a grammatical category that expresses the relationship between the action (or state) expressed by the
verb and the verb’s arguments (subject, object, etc.), e.g. The dog bit Chomsky (active voice) vs. Chomsky was
bitten (by the dog) (passive voice).
12/8/13 & 14/8/13
Page 4 of 8
Table 5. Sentence final words
Beginning with d- Beginning with z-
da zan
d-ir-a z-ir-an
du zuen
d-it-u z-it-uen
There are some clear formal similarities here: dira is da with an ‘infix’ of -ir-; ziran is zan with the
same infix. Ditu looks like du with infix -it-, and zituen looks like zuen with the same infix.
Although we can’t necessarily decide whether these words are modifiers or auxiliaries yet, we
already have enough information to identify which sentence is likely to be incorrect. We do this
by comparing the distribution of these infixes to the distribution of the grammatical morphology
we already identified in the other word classes. When we look to see where the infixes occur,
we find that they correlate with the presence of the –k suffix on the Class A words (i.e. nouns).
The exception to this is sentence (7):
(7) Astoa-kin zaldia-k ikusten zuen
To make this a grammatical sentence, it should be changed to either:
Astoa-kin zaldia-k ikusten z-it-uen.
Or:
Astoa-kin zaldia ikusten zuen.
Basque – Part B
In Part B, you have been given an additional 12 sentences from Basque, and you need to
decide what you think the most probable grammatical structure for all 24 sentences is:
– Give a lexicon of word classes, and state the evidence for grouping their member
words together ! we already started doing this in Part A;
– Can you (tentatively) identify any of your word classes as particular parts of speech
(e.g. N, V)? If so, what is the evidence (i.e. distributional/morphological) on which
you base your identification of these parts of speech ! we already started doing
this in Part A.
– Can you guess anything about the role of what look like grammatical morphemes?
!!!! we already started doing some of this in Part A
– State how many distinct types of sentences there are and what their structure is
(you can use Phrase Structure Rules and/or any other kind of formula or statement
to do this) ! example of sentence (1) = S !!!! A B C
12/8/13 & 14/8/13
Page 5 of 8
Part B – Sample solution
Class A: Nouns
Lexicon:
Class A: {gizona, txakurra, astoa, zaldia}
Class A Grammatical Morphemes: {-k(e), -kin}
Syntactic evidence for Class A membership:
– They can substitute for one another
– They always occur in the same sentence position (i.e. first position in sentences with
Bi word, first and second position in sentences wth Bii word).
Morphological evidence for Class A membership:
– They can all take the same inflectational suffixes (i.e. –k(e), -kin). When a Class A
word takes both of these they are ordered in the following way: -ke-kin. Note that
the first morpheme has two allomorphs: -k word final and –ke word medial.
Part of speech/role of grammatical morphemes:
I would hypothesise this as the class of nouns, because two of these elements can occur in
what appears to be a transitive sentence, and their form changes both in co-variation with
Class C words and in accordance with their position in the sentence (i.e. initial, second
position) and number (i.e. whether there are one or two members from Class A present). As
–kin always occurs on the first Class A word in transitive sentences and never elsewhere, it is
likely that it is marking case, most likely ergative case on transitive subjects which would
mean an SOV word order (as in 41% of languages). It is not so likely to be marking accusative
case on direct objects, as this would mean the word order is OSV, which is much less
common (only 0.3% of languages). The suffix –k, on the other hand, is not likely to be
marking case considering the following near minimal pair:
(5) Txakurrak joaten dira.
(11) Txakurra joaten zan.
It is also not likely that –k marks gender or noun class, as the same noun does not always
take this suffix (as in 5 and 11). Instead, it is more likely that –k marks some other inherent
property of nouns, most likely plural number (as will be demonstrated in the discussion of
Class C words).
12/8/13 & 14/8/13
Page 6 of 8
Class B: Verbs
Lexicon:
Bi = {joat}
Bii = {ikust, maluskat}
Class B inflectional morphemes = {-en, -ua}
Syntactic evidence for class membership:
– ikust and maluskat substitute for one another in the penultimate position of
sentences with two Class A words (i.e. nouns), just before the Class C word.
– joat always occurs in the same penultimate position with only one Class A word.
This justifies the two sub-groupings Bi and Bii.
Morphological evidence for class membership:
– All words in the B classes can take the –en ending.
– All words in the Bii class can take the –ua ending.
– B class words co-vary in form with Class C words: the -ua ending on a Bii class word
causes the accompanying Class C word to adopt the same inflectional form it would
have with a Bi class word.2 Also, maliskat only occurs with the Class C element z-n.
Part of speech/grammatical morphemes:
I suggest B class words are (lexical) verbs. Evidence is that they divide into subclasses
depending on the number of Class A elements in the sentence. Bi is the class of intransitive
verbs, since it occurs with just one Class A element. Bii is the class of transitive verbs since it
(normally) occurs with two Class A elements.
Class C: Auxiliary Verbs
Lexicon:
C = {d-, z-n}
Class C inflectional morphemes = {-a/-a-, -u/-ue-, -e/-t-, -it-/-ir-}
Syntactic evidence for class membership:
– Class C words always occur last in the sentence.
– They occur in all sentences and substitute for one another.
– The only difference between d-and z-n is that the latter is required for the presence
of Class D in the sentence.
Morphological evidence:
– They share similar patterns of inflection.
– They have a very rich inflectional range which co-varies with the form and number of
Class A words in the sentence and with the subclass and endings of Class B words.
2 The suffix –ua seems to have the effect of making a Bii class i.e. transitive verb behave formally like a Bi class
i.e. intransitive verb: its accompanying Class C element has the inflectional form appropriate to a Bi class, and it
may occur with just one Class A word in the appropriate intransitive form (e.g. sentences 18 and 21).
12/8/13 & 14/8/13
Page 7 of 8
The auxiliaries can be grouped according to combinations of inflections as follows:3
Table 6. Combinations of NP1/NP2 number agreement in Class C
Singular intransitive NP Plural intransitive NP
Intransitive sentence: 1a. d-a
z-a-n
1b. d-ir-a
z-ir-a-n
Transitive sentence: Singular NP2 Plural NP2
Singular NP1
2a. d-u
z-ue-n
2b. d-it-u
z-it-ue-n
Plural NP1
3a. d-u-e
z-u-t-e-n
3b. d-it-u-e
z-it-u-t-e-n
The different affixes above could have the following functions:
• -a/-a- intransitive
• -u/-ue- transitive
• -ir-/-it- plural intransitive NP/plural transitive NP2
• -e/-t- plural transitive NP1
Notice that in this analysis we have some tricky morphology: an infix inside another
infix in 3b. (-t- is inserted inside –ue-).
Hypothesis:
The rich inflection and dependencies between C and A, as well as C and B, suggest that Class
C is a verbal category (i.e. Class C is not likely to be a nominal modifier such as an adjective,
because Class C members in 3b are demonstrating agreement with both NP1 and NP2 in
transitive sentences).
The fact that we already have another verbal category in the sentence consisting of longer
stems suggests that Class C may be auxiliary verbs that mainly exist to carry the verbal
grammatical categories (e.g. transitivity, agreement, etc). This possibility is supported cross
linguistically.
The inflectional infix –ir-/-it- co-varying with the Class A –k(e) suffix and occurring in all the B
groupings above is likely to be number agreement (i.e. with a plural NP in intransitive
sentences and with a plural NP2 in transitive sentences). Notice that -e/-t- also co-varies
with the Class A –k(e) suffix when it is the NP1 in transitive sentences that is plural.
Furthermore, when both NP1 and NP2 are both marked with the –k(e) suffix, you find both
the –ir-/-it- and the -e/-t- affixes on the auxiliary.
So why decide that the –k(e) suffix marks plurality on nouns? We already discussed earlier
why it is not likely to be marking case or gender/noun class. Other options would be that it
marks referential or deictic status (i.e. definiteness, deictic location), but it is not
3 Note that –ir-/-it- appear to be allomorphs conditioned by whether the following vowel is –a or –u.
12/8/13 & 14/8/13
Page 8 of 8
crosslinguistically common for these properties to be agreed with by verbal categories. On
the other hand, it is very common for verbal categories to agree with the number of
subjects/objects.
Class D: Adverbs
Lexicon:
D: {azto}
Syntactic evidence for class membership:
– It optionally occurs just before a Class B word.
– It only ever occurs when the Class C element is z-n.
Morphological evidence for class membership:
It does not inflect.
Part of speech/grammatical morphemes:
These characteristics suggest that atzo is an adverb, and moreover the co-variation with
Class C would suggest that it is an adverb of a sort to co-vary with verbal marking such as
tense / aspect; perhaps an adverb of time.
Construction types (grammar):
Assuming that the hypothesised part of speech labels are correct, the basic structure of
sentences in the language can be given by the following phrase structure rule:
S ! (N) N (Adv) V Aux
However, this does not account for:
• The ‘subcategorisation’ of verbs, into transitive and intransitive
• The inflectional differences between different types of construction
We will talk later in the semester about how best to deal with these facts. For now, the
schematic representations given below illustrate the different construction types in terms of
transitivity differences and inflectional differences (excluding the co-variation due to
number marking on NPs and on Auxs, and the optional appearance of atzo).
Basic intransitive sentences:
N V1-en Aux (1)
Basic transitive sentences:
N-kin N V2-en Aux (2)/(3)
Sentences with –ua:
(N-kin) N V2-ua Aux (1)