Copy of hwk4
CS 447 Homework 4 $-$ Dependency Parsing¶
In this homework you will build a neural transition-based dependency parser, based off the paper A Fast and Accurate Dependency Parser using Neural Networks.
The setup for a dependency parser is somewhat more sophisticated than tasks like classification or translation. Therfore, this homework contains many small functions that can be tested incrementally. In addition to Gradescope tests, we also provide substantial tests in the notebook that you can use to debug your code.
We suggest that you work on this homework in CPU until you are ready to train. At that point, you should switch your runtime to GPU. You can do this by going to Runtime > Change Runtime Type and select “GPU” from the dropdown menu.
You will find it easier to debug on CPU, and the error messages will be more understandable.
Google monitors your GPU usage and will occasionally restrict GPU access if you use it too much. In these cases, you can either switch to a different Google account or wait for your access to be restored.
Step 0: Provided Testing Functions¶
The following cells set up the tests that you can use in the notebook. You should not edit any of these cells.
In [2]:
### DO NOT EDIT ###
if __name__ == ‘__main__’:
import cloudpickle as cp
from urllib.request import urlopen
testing_bundle = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1-9lWbXuXZYGjJWQRKD7um_83O-I3ER63″))
In [3]:
### DO NOT EDIT ###
def test(test_fxn, i=None, test_bundle=None, to_print = ‘all’, do_return=False, display_sent=True, do_raise=True):
to_test = test_fxn.__name__ + ((‘_’ + str(i)) if i is not None else ”)
if test_fxn.__name__ not in {‘get_gold_action’, ‘get_lc’, ‘get_rc’, ‘get_top3_stack_features’, ‘get_top3_buffer_features’}: assert i is not None
assert to_test in {‘get_gold_action’, ‘get_lc’, ‘get_rc’, ‘get_top3_stack_features’, ‘get_top3_buffer_features’, ‘get_lc1_lc2_features_1’,
‘get_lc1_lc2_features_2’, ‘get_rc1_rc2_features_1’, ‘get_rc1_rc2_features_2’, ‘get_llc_rrc_features_1’, ‘get_llc_rrc_features_2’}
assert to_print in {‘all’, ‘incorrect’, ‘none’}
test_bundle = testing_bundle if test_bundle is None else test_bundle
sentences, configs, gold_actions, gold_features, _ = test_bundle
vocab = Vocabulary(sentences)
testsents = vocab.buildSentences(sentences)
sents_correct, printag = 0, None # printag will collect the first error you encounter (used in Gradescope)
for i in range(len(sentences)):
printsent = “” # printsent is full printout for this sentence (only needed when to_print != ‘none’)
sent, gold_deps = testsents[i][0], testsents[i][1]
if to_test == ‘get_gold_action’:
printsent += ‘gold_dependencies: ‘ +str([str(arc) for arc in gold_deps.arcs]) + ‘\n\n’
for j in range(len(configs[i])):
# Manually create Stack, Buffer, Dependencies & ParserConfiguration objects from test case
stk, buf, deps = configs[i][j]
stack, buffer, dependencies = Stack([sent[idx] for idx in stk]), Buffer([sent[idx] for idx in buf]), Dependencies(vocab)
for arc in deps: dependencies.add(sent[arc[0]], sent[arc[1]], arc[2])
parser_config = ParserConfiguration(sent, vocab)
parser_config.stack, parser_config.buffer, parser_config.dependencies = stack, buffer, dependencies
if to_test == ‘get_gold_action’ or not (gold_actions[i][j] is None or gold_actions[i][j] == ‘DONE’): # Don’t need to test if Done or None when not testing get_gold_action
# Call the student code
arg = stack.get_si(1) if to_test in {‘get_lc’, ‘get_rc’} else (int(to_test[-1]) if to_test[-1] in {‘1’, ‘2’} else None)
if to_test == ‘get_gold_action’: fxn = lambda: test_fxn(stack, buffer, gold_deps)
elif to_test in {‘get_top3_stack_features’, ‘get_top3_buffer_features’}: fxn = lambda: test_fxn(parser_config)
else: fxn = lambda: test_fxn(parser_config, arg)
tt = to_test[:-2] if to_test[-1] in {‘1’, ‘2’} else to_test
exception=None
try:
yours = fxn()
except Exception as e:
yours = “Raised Exception: ” + str(e)
exception=e
correct = gold_actions[i][j] if to_test == ‘get_gold_action’ else gold_features[i][j][to_test]
fxn_name = ‘get_gold_action(stack, buffer, gold_dependencies)’ if to_test == ‘get_gold_action’else tt + (‘(parser_config, 1)’ if to_test not in to_test in {‘get_lc’, ‘get_rc’} else ‘(parser_config, ‘ + str(arg) + ‘)’)
if to_test in {‘get_lc’, ‘get_rc’} and exception is None:
if type(yours) != list or (len(yours) > 0 and type(yours[0]) != Arc):
yours = ‘Your ‘ + to_test + ‘(…) did not return a list of Arcs’ # note: exact quote used below, so if need to change, change in both places
else: yours = [str(arc) for arc in yours]
# if random.random() < 0.05: yours = [None, None, None, None, None, None] # simulate getting it wrong
is_correct = yours == correct
# Make the printouts!
printsent += 'Step ' + str(j+1) + ' | stack: ' + str([str(word) if to_test != 'get_gold_action' else word.word for word in stack]) + '\tbuffer: ' + str([str(word) if to_test != 'get_gold_action' else word.word for word in buffer]) + (('\tdependencies: ' + str([str(arc) for arc in dependencies.arcs])) if to_test != 'get_gold_action' else "") + '\n'
printsent += '\tTesting '+ fxn_name +':\n'
printsent += '\t\tYour Result: ' + str(yours) + '\t\tCorrect Result: ' + str(correct) + ('\t\tCorrect!' if is_correct else "") + '\n'
if not is_correct:
printsent += '\t\tIncorrect! Halting parse of this sentence.\n\n'
if printag is None:
statement=yours if yours == 'Your ' + to_test + '(...) did not return a list of Arcs' else str(yours)[:20]+("... " if len(str(yours)) > 20 else “”)
statement = “Your first error (on a hidden test sentence): You returned ” + statement + “; expected ” + str(correct) + “.”
printag = statement if exception is None else (statement, exception)
if to_print != ‘none’:
print(“Testing Sentence ” + str(i+1) + ‘…’)
if display_sent: display_sentence(sentences[i])
print(printsent)
print(‘Test Sentence ‘ + str(i+1) + ‘ failed.’)
print((‘\n’ + ‘-‘*100 + ‘\n’))
if do_raise and exception is not None:
raise exception
break
else: printsent += ‘\n’
else:
sents_correct += 1
if to_print == ‘all’:
print(“Testing Sentence ” + str(i+1) + ‘…’)
if display_sent: display_sentence(sentences[i])
print(printsent)
print(‘Test Sentence ‘ + str(i+1) + ‘ passed!’)
print((‘\n’ + ‘-‘*100 + ‘\n’))
score = sents_correct / len(sentences)
print(sents_correct, ‘/’, len(sentences), ‘=’, str(score*100)+’%’, ‘of test cases passed!’)
if do_return:
return score, printag
def test_generate_training_examples(test_fxn, feat_extract = None, test_bundle=None, to_print=’all’, do_return=False, display_sent=True, do_raise=True):
assert test_fxn.__name__ == ‘generate_training_examples’ and (feat_extract is None or feat_extract.__name__ == ‘extract_features’)
test_bundle = testing_bundle if test_bundle is None else test_bundle
sentences, _, _, _, gold_examples = test_bundle
vocab = Vocabulary(sentences)
testsents = vocab.buildSentences(sentences)
sents_correct, printag = 0, None
for i in range(len(sentences)):
sent, gold_deps = testsents[i][0], testsents[i][1]
exception=None
try:
your_examples = test_fxn(sent, gold_deps, vocab, feat_extract=feat_extract if feat_extract is not None else lambda p: [])
except Exception as e:
your_examples = “Raised Exception: ” + str(e)
exception=e
# if random.random() < 0.5: your_examples = [None, None, None, None, None, None, None, None] # simulate getting it wrong
is_correct = your_examples == gold_examples[i][0 if feat_extract == None else 1]
if to_print == 'all' or to_print == 'incorrect' and not is_correct:
print("Testing Sentence " + str(i+1) + '...')
if display_sent: display_sentence(sentences[i])
print("sentence: " + str([word.word for word in sent]) + '\tgold_dependencies: ' + str([str(arc) for arc in gold_deps.arcs]))
print("\tTesting generate_training_examples(sentence, gold_dependencies, vocab" + ('' if feat_extract is None else ', feat_extract=extract_features') + "):")
print('\t\tYour Training Examples: ', your_examples, '\n\t\tCorrect Training Examples:', gold_examples[i][0 if feat_extract == None else 1])
print('\t\tCorrect!\n\n' if is_correct else '\t\tIncorrect!\n\n')
if not is_correct and printag is None:
statement = "Your first error (on a hidden test sentence): You returned " + str(your_examples)[0:20]+("... " if len(str(your_examples)) > 20 else “”) + “; expected ” + str(gold_examples[i]) + “.”
printag = statement if exception is None else (statement, exception)
if to_print == ‘all’ or to_print == ‘incorrect’ and not is_correct:
print((‘Test Sentence ‘ + str(i+1) + ‘ passed!’) if is_correct else (‘Test Sentence ‘ + str(i+1) + ‘ failed.’))
print(‘\n’+ ‘-‘*100, ‘\n’)
if is_correct: sents_correct += 1
if do_raise and exception is not None:
raise exception
score = sents_correct / len(sentences)
print(sents_correct, ‘/’, len(sentences), ‘=’, str(score*100)+’%’, ‘of test cases passed!’)
if do_return:
return score, printag
Step 1: Prepare Data¶
In [4]:
### DO NOT EDIT ###
import numpy as np
from spacy import displacy
import random
Read & Visualize Data¶
We provide the data in a format preprocessed for this assignment. Here are some links if you are interested in learning more about the data:
We use one of the Universal Dependencies datasets from http://universaldependencies.org/docsv1/. Specifically, we use the UD_English dataset in version 1.4.
Refer to https://universaldependencies.org/ if you want to know more about the Universal Dependencies framework in general.
The data license can be found here: https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.4
Run the following cells to load the data and see the number of sentences. You do not need to edit these cells.
In [6]:
### DO NOT EDIT ###
def load_data():
train_set = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1N4B4bC4ua0bFMIYeNdtNQQ72LxxKLPas”))
test_set = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1TE2AflhABbz41dLMGmD1kTm7WevYpU31”))
return train_set, test_set
In [7]:
### DO NOT EDIT ###
if __name__ == ‘__main__’:
train_set, test_set = load_data()
print(“Num. Train Examples:”, len(train_set))
print(“Num. Test Examples: “, len(test_set))
Num. Train Examples: 12543
Num. Test Examples: 2002
Next, we visualize the training data, which contains labeled dependency trees. At test time, our goal will be to predict the dependency arcs & labels for a given sentence.
In [8]:
### DO NOT EDIT ###
def display_sentence(sent):
res = {‘words’: [{‘text’: “
for i in range (len(sent[‘word’])):
res[‘words’].append({‘text’: sent[‘word’][i], ‘tag’: sent[‘pos’][i]})
s = i + 1
e = sent[‘head’][i]
direc = “left”
if s > e:
s = sent[‘head’][i]
e = i + 1
direc = ‘right’
cur = {‘start’: s, ‘end’: e, ‘label’: sent[‘label’][i], ‘dir’: direc}
res[‘arcs’].append(cur)
displacy.render(res, style=”dep”, manual=True, jupyter=True, options={‘distance’: 70})
In [9]:
if __name__ == ‘__main__’:
MIN_SENT_LEN, MAX_SENT_LEN = 4, 17 # You can change this if you’re interested in seeing shorter/longer sentences
for x in random.sample(list(filter(lambda x: len(x[‘word’]) >= MIN_SENT_LEN and len(x[‘word’]) <= MAX_SENT_LEN, train_set)), 5):
display_sentence(x)
POS_ROOT
they
PRP
do
VBP
nt
RB
hurt
VB
unless
IN
you
PRP
r
VBP
stupid
JJ
enough
JJ
to
TO
press
VB
on
IN
them
PRP
.
.
nsubj
aux
neg
root
mark
nsubj
cop
advcl
advmod
mark
advcl
case
nmod
punct
POS_ROOT
absolute
RB
horrible
JJ
service
NN
from
IN
the
DT
parts
NNS
department
NN
.
.
advmod
amod
root
case
det
compound
nmod
punct
POS_ROOT
a
DT
changing
NN
bag
NN
folds
VBZ
neatly
RB
and
CC
does
VBZ
not
RB
take
VB
much
JJ
space
NN
in
IN
your
PRP$
bag
NN
.
.
det
compound
nsubj
root
advmod
cc
aux
neg
conj
amod
dobj
case
nmod:poss
nmod
punct
POS_ROOT
the
DT
one
CD
reason
NN
why
WRB
all
DT
americans
NNPS
should
MD
oppose
VB
the
DT
nomination
NN
of
IN
samuel
NNP
alito
NNP
det
nummod
root
advmod
det
nsubj
aux
acl:relcl
det
dobj
case
name
nmod
POS_ROOT
he
PRP
might
MD
even
RB
start
VB
peeing
VBG
on
IN
things
NNS
in
IN
fear
NN
.
.
nsubj
aux
advmod
root
xcomp
case
nmod
case
nmod
punct
Build Vocabulary¶
Next, we build the Vocabulary class. This maps each word, part-of-speech tag (POS), and label to an id (index), which we will later use in our embeddings. We also need to enumerate each possible transition, and map it to an id, since this is what our neural network will try to predict. The Vocabulary class does this as follows. Suppose there are $n$ labels. For each label, we create a Left-Arc (LA) and Right-Arc (RA) action for that label; and we also create a separate Shift (S) action. This creates a total of $2n+1$ actions that our dependency parser will be able to choose from.
You do not need to edit this cell.
In [10]:
### DO NOT EDIT ###
class Vocabulary(object):
def __init__(self, dataset):
UNK = ‘
NULL = ‘
ROOT = ‘
# Find the label of the root
root_labels = list([l for ex in dataset for (h, l) in zip(ex[‘head’], ex[‘label’]) if h == 0])
assert len(set(root_labels)) == 1
self.root_label = root_labels[0]
# Create mapping from transitions to ids
labels = sorted(list(set([w for ex in dataset for w in ex[‘label’] if w != self.root_label]))) # list of unique non-root labels
labels = [self.root_label] + labels # add root label too
self.n_labels = len(labels)
transitions = [‘LA-‘ + l for l in labels] + [‘RA-‘ + l for l in labels] + [‘S’]
self.n_trans = len(transitions) # 2*n_labels + 1
self.tran2id = {t: i for (i, t) in enumerate(transitions)}
self.id2tran = {i: t for (i, t) in enumerate(transitions)}
# Create mapping from word, pos, & label to id
# Do labels first
self.LABEL_PREFIX = ‘
self.tok2id = {self.LABEL_PREFIX + l: i for (i, l) in enumerate(labels)}
self.LABEL_NULL = self.tok2id[self.LABEL_PREFIX + NULL] = len(self.tok2id) # Add null label in
# Do pos’s
self.POS_PREFIX = ‘
:’
all_pos = sorted(set([self.POS_PREFIX + w for ex in dataset for w in ex[‘pos’]])) # Get pos’s
self.tok2id.update({w: index + len(self.tok2id) for (index, w) in enumerate(all_pos)}) # Add poses in
self.POS_NULL = self.tok2id[self.POS_PREFIX + NULL] = len(self.tok2id) # Add null pos
self.POS_ROOT = self.tok2id[self.POS_PREFIX + ROOT] = len(self.tok2id) # Add root pos
self.n_pos = 2 + len(all_pos) # +3 for null, root
# Do words
all_word = sorted(set([w for ex in dataset for w in ex[‘word’]]))
self.tok2id.update({w: index + len(self.tok2id) for (index, w) in enumerate(all_word)}) # Add words in
self.WORD_UNK = self.tok2id[UNK] = len(self.tok2id) # Add unk word
self.WORD_NULL = self.tok2id[NULL] = len(self.tok2id) # Add null word
self.WORD_ROOT = self.tok2id[ROOT] = len(self.tok2id) # Add root word
self.n_words = 3 + len(all_word) # +3 for unk, null, root
self.id2tok = {v: k for (k, v) in self.tok2id.items()} # Flip it
self.n_tokens = len(self.tok2id)
def printStats(self):
print(‘Num. labels:’, self.n_labels)
print(‘Num. transitions (2*n_labels + 1):’, self.n_trans)
print(‘Num. pos:’, self.n_pos)
print(‘Num. words:’, self.n_words)
print(‘Num. tokens:’, self.n_tokens)
def buildSentences(self, examples):
processed_sentences = []
for ex in examples:
# Initialize words & dependencies
words = [Word(‘
deps = []
# Loop over words in sentence
for i in range(len(ex[‘word’])):
w = ex[‘word’][i]
word_id = (self.tok2id[w] if w in self.tok2id else self.WORD_UNK)
pos = self.POS_PREFIX + ex[‘pos’][i]
pos_id = self.tok2id[pos]
word = Word(ex[‘word’][i], ex[‘pos’][i],i+1, word_id, pos_id)
words.append(word)
deps.append((ex[‘head’][i], word, ex[‘label’][i] ))
# Create dependencies
dependencies = Dependencies(self)
for dep in deps:
dependencies.add(words[dep[0]], dep[1], dep[2])
processed_sentences.append((words, dependencies))
return processed_sentences
Run the following cell to see some stats from the vocabulary.
In [11]:
### DO NOT EDIT ###
if __name__ == ‘__main__’:
Vocabulary(train_set).printStats()
Num. labels: 47
Num. transitions (2*n_labels + 1): 95
Num. pos: 52
Num. words: 16635
Num. tokens: 16735
Step 2: Parser Data Structures [12 points]¶
In this section, we define some useful data structures for dependency parsing. In particular, you will implement the Stack and Buffer structures, which make up a ParserConfiguration. You will also write a function to update these data structures based on a particular transition.
Helpful Data Structures¶
First, we define some data classes that you will find useful. You will be working with these a lot, so you should understand the data they contain as well as their methods. You do not need to edit this cell.
In [12]:
### DO NOT EDIT ###
class Word(object):
”’
Represents a word in the sentence.
”’
def __init__(self, word, pos, idx, word_id=None, pos_id=None):
self.word = word
self.pos = pos
self.idx = idx
self.word_id = word_id
self.pos_id = pos_id
def __str__(self):
return ‘Word(idx=’ + str(self.idx) + “, word='” + self.word+”‘, pos='”+self.pos+”‘, word_id=”+str(self.word_id)+’, pos_id=’+ str(self.pos_id) +’)’
def copy(self):
return Word(self.word, self.pos, self.idx, self.word_id, self.pos_id)
def __eq__(self, obj):
if not isinstance(obj, Word): return False
if obj.idx == self.idx:
assert obj.word == self.word and obj.pos == self.pos and obj.word_id == self.word_id and obj.pos_id == self.pos_id, ‘Your Word object has been corrupted.’
return obj.idx == self.idx
class Arc(object):
”’
Represents an arc between two words.
”’
def __init__(self, head, dependent, label, label_id):
self.head=head # Word object
self.dependent=dependent # Word object
self.label=label
self.label_id = label_id
def __str__(self):
return ‘Arc(head_idx=’+str(self.head.idx)+’, dep_idx=’+str(self.dependent.idx)+’, label_id=’+ str(self.label_id)+’)’
class Dependencies(object):
”’
Represents the dependency arcs in a sentence.
”’
def __init__(self, vocab):
self.arcs = []
self.vocab = vocab
self.dep_to_head_mapping = {} # For fast lookup
def add(self, head, dependent, label):
”’
Add a dependency from head to dependent with label.
Inputs:
head: Word object
dependent: Word object
label: str
”’
assert label[:3] != ‘LA-‘ and label[:3] != ‘RA-‘, ‘You need to pass in just the label to add(…), not the entire action.’
assert head is not None and dependent is not None, “You must pass in two Word objects to add(…).”
self.arcs.append(Arc(head, dependent, label, self.vocab.tok2id[self.vocab.LABEL_PREFIX+label]))
assert dependent.idx not in self.dep_to_head_mapping
self.dep_to_head_mapping[dependent.idx] = self.arcs[-1]
def getArcToHead(self, dependent):
”’
Returns the Arc that connects the head of dependent to dependent.
Inputs:
dependent: Word object
”’
if dependent.idx == 0: # Special case for ROOT
return Arc(None, dependent, None, None)
return self.dep_to_head_mapping[dependent.idx]
def __iter__(self):
return iter(self.arcs) # Allows you to iterate “for x in Dependencies(…)”
TODO: Stack & Buffer [6 points]¶
Here, we provide you with the outline of stack and buffer data structures. Your task is to implement the push(…) and pop(…) methods of the Stack, and the pop(…) method of the Buffer. Each method is worth 2 points.
In [13]:
class Stack(object):
def __init__(self, input=[]):
”’
Initialize an (empty) stack.
”’
self.stack = [word.copy() for word in input]
def push(self, item):
”’
Push item onto (the end of) self.stack. Returns nothing.
”’
### TODO ###
self.stack.append(item)
def pop(self):
”’
Pop item from (the end of) self.stack. Returns the item popped.
”’
assert len(self.stack) > 0
### TODO ###
item = self.stack.pop()
return item
def get_si(self, i):
”’
Returns si (the ith element of the stack) if it exists, otherwise None.
”’
assert i > 0, ‘Must provide i > 0′
return self.stack[-i] if len(self.stack) >= i else None
def __getitem__(self, idx):
return self.stack[idx]
def __len__(self):
return len(self.stack)
def __str__(self):
return str([str(x) for x in self.stack])
class Buffer(object):
def __init__(self, sentence):
”’
Initialize as a list of words in sentence.
”’
self.buffer = [word.copy() for word in sentence]
def pop(self):
”’
Pop item from (the beginning of) self.buffer. Returns the item popped.
”’
assert len(self.buffer) > 0
### TODO ###
item = self.buffer.pop(0)
return item
def get_bi(self, i):
”’
Returns bi (the ith element of the buffer) if it exists, otherwise None.
”’
assert i > 0, ‘Must provide i > 0’
return self.buffer[i-1] if len(self.buffer) >= i else None
def __getitem__(self, idx):
return self.buffer[idx]
def __len__(self):
return len(self.buffer)
def __str__(self):
return str([str(x) for x in self.buffer])
TODO: Parser Configuration [6 points]¶
Next, we create a ParserConfiguration class, which contains the Stack, Buffer, and Dependencies data structures. You only need to implement parse_step(self, transition), which modifies these structures based on transition. This method is worth 6 points.
More specifically, let $\sigma$ represent the stack, $\beta$ the buffer, and $A$ the set of arcs (dependencies). Based on the value of transition, parse_step(self, transition) should do the following:
transition = ‘S’: Shift $w_k$ from the buffer to the stack. $(\sigma, w_k|\beta, A) \Rightarrow (\sigma|w_k, \beta, A)$
transition = ‘LA-label’: Add a left arc with label $label$ from $w_j$ to $w_i$. $(\sigma |w_i w_j , \beta, A) \Rightarrow (\sigma |w_j, \beta, A \cup \{(w_j, label, w_i)\})$
transition = ‘RA-label’: Add a right arc with label $label$ from $w_i$ to $w_j$. $(\sigma |w_i w_j , \beta, A) \Rightarrow (\sigma |w_i, \beta, A \cup \{(w_i, label, w_j)\})$
Hint: Use your push(…) and pop(…) operations here, and look at the methods of the Dependencies class to see how to add an arc to it.
In [81]:
class ParserConfiguration(object):
def __init__(self, sentence, vocab):
”’
Inputs:
sentence: list of Word objects
vocab: Vocabulary object
”’
self.vocab = vocab
assert sentence[0].word_id == self.vocab.WORD_ROOT
self.stack = Stack([sentence[0]]) # Initialize stack with ROOT
self.buffer = Buffer(sentence[1:]) # Initialize buffer with sentence
self.dependencies = Dependencies(vocab)
def parse_step(self, transition):
”’
Update stack, buffer, and dependencies based on transition.
Inputs:
transition: str, “S”, “LA-label”, or “RA-label”, where label is a valid label
”’
assert transition in self.vocab.tran2id
### TODO ###
if transition == ‘S’:
if len(self.buffer) > 0:
item = self.buffer.pop()
self.stack.push(item)
elif transition[:3] == ‘LA-‘:
if len(self.stack) >= 2:
wj = self.stack.pop()
wi = self.stack.pop()
self.stack.push(wj)
self.dependencies.add(wj, wi, transition[3:])
elif transition[:3] == ‘RA-‘:
if len(self.stack) >= 2:
wj = self.stack.pop()
wi = self.stack.pop()
self.stack.push(wi)
self.dependencies.add(wi, wj, transition[3:])
Step 3: Generate Training Data [44 points]¶
As you saw above, the dataset contains many sentences along with their gold dependency parses. In order to use a transition-based parsing algorithm, we must predict the next parser action at each time step, based on the current parser configuration. Thus, we will need to transform each sentence into a series of training examples, where the input features are based on the current parser configuration and the correct label is the gold action.
In this section, you will first write an algorithm to select the next parser action at each time step based on the gold dependency parse. Then, you will extract features from each step’s parser configuration, which the neural network will use to predict the next action.
TODO: Compute Gold Action [8 points]¶
Next, you will write a function get_gold_action(stack, buffer, gold_dependencies). Given a stack and buffer, this function should return the next action of the parser based on the gold dependencies. We encourage you to review the example of a sentence parsing in the lecture slides before attempting to implement this function. This method is worth 8 points.
Let $s_i$ be $i$th element of the stack and $h(s_i)$ be the head word of $s_i$. The pseudocode is as follows:
If the stack only contains ROOT: If the buffer is not empty, return S (shift).
If the buffer is empty, return DONE, indicating the parse is complete.
If $h(s_2)=s_1$, return LA-label. Here, label is the label of the arc that attaches $s_2$ to $s_1$.
If $h(s_1)=s_2$ and $h(b_i) \neq s_1$ for all words $b_i$ in the buffer, return RA-label. Here, label is the label of the arc that attaches $s_1$ to $s_2$. This condition means that you cannot attach $s_1$ until everything in the buffer that depends on $s_1$ is attached. You should think about why this condition is necessary!
Otherwise: If the buffer is not empty, return S.
If the buffer is empty, return None, indicating a failed parse (i.e., the sentence is non-projective).
Hint: To get the $i$th word on the stack or buffer, call stack.get_si(i) or buffer.get_bi(i). To find the head of a word w, call gold_dependencies.getArcToHead(w).
In [15]:
def get_gold_action(stack, buffer, gold_dependencies):
”’
Given stack & buffer, compute the next gold action based on gold_dependencies.
Args:
– stack: Stack object
– buffer: Buffer object
– gold_dependencies: Dependencies object
Returns:
– action: str; ‘S’, ‘LA-label’, or ‘RA-label’, where ‘label’ is a valid label. Return None if no action possible and ‘DONE’ if the parse is complete.
”’
action = None
### TODO ###
if len(stack) == 1:
if len(buffer) != 0:
return ‘S’
else:
return ‘DONE’
arc1 = gold_dependencies.getArcToHead(stack.get_si(2))
if arc1.head == stack.get_si(1):
return ‘LA-‘+arc1.label
arc2 = gold_dependencies.getArcToHead(stack.get_si(1))
if arc2.head == stack.get_si(2):
count = 0
for item in buffer:
if gold_dependencies.getArcToHead(item).head != stack.get_si(1):
count += 1
if count == len(buffer):
return ‘RA-‘+arc2.label
if len(buffer) != 0:
return ‘S’
return None
We provide you with 10 sentences to test this function. The first sentence is the example from the lecture slides, the next 8 sentences are artificial sentences designed to test edge cases, and the last sentence is an example from the training set. For each sentence, we have hard-coded the stack & buffer configurations that you should encounter as well as the correct action.
The to_print argument of this method may be set to one of the following values:
all: Print every sentence.
incorrect: Print only the sentences that your function gets incorrect.
none: Don’t print any of the sentences (only show the score).
You may also toggle do_raise, which controls whether an exception in your code is raised (halting execution) or suppressed (thus allowing all test sentences to complete so you can see your score).
Note that Gradescope uses a set of different (hidden) tests, so you will want to fully test your code here.
In [16]:
if __name__ == ‘__main__’:
test(get_gold_action, to_print=’incorrect’, do_raise=True)
10 / 10 = 100.0% of test cases passed!
TODO: Generate Training Examples [8 points]¶
Now you will write a function to generate the training examples. Recall that each sentence needs to be converted into a series of separate training examples. Each training example will essentially be a partial parser configuration along with its gold action; the goal of the neural network will be to predict this action from the parser configuration.
In order to make this prediction, you need to extract features from the parser configuration. You will implement the feature extraction method in a future section; for now, we pass in a dummy function feat_extract(parser_config) that returns an empty feature list.
This function is worth 8 points.
In [17]:
def generate_training_examples(sentence, gold_dependencies, vocab, feat_extract = lambda parser_config: []):
”’
Create training instances for sentence.
Inputs:
sentence: list of Word objects
gold_dependencies: Dependencies object that contains the complete gold dependency tree
vocab: Vocabulary object
feat_extract: Feature extraction function
Outputs:
training_examples: List of tuples (features, label), where features is a list and label is a string
Pseudocode:
(1) Initialize your parser configuration (note that the __init__ method of ParserConfiguration creates the stack & buffer for you)
(2) Repeatedly call get_gold_action(…) on your current parser confuration until the gold action is ‘DONE’
(3) If the gold action is None at any step, return [] (indicating the sentence cannot be parsed; it is non-projective)
(4) Otherwise, append tuple (features, gold action) to training_examples, where features is the result of calling feat_extract on your parser configuration
(5) Update your parser configuration according to the gold action
(6) Return training_examples
”’
training_examples = []
### TODO ###
parser = ParserConfiguration(sentence, vocab)
result = get_gold_action(parser.stack, parser.buffer, gold_dependencies)
while result != ‘DONE’:
if result == None:
return []
features = feat_extract(parser)
training_examples.append((features, result))
parser.parse_step(result)
result = get_gold_action(parser.stack, parser.buffer, gold_dependencies)
return training_examples
We provide you with a test for this function on the same test sentences we used above.
In [18]:
if __name__ == ‘__main__’:
test_generate_training_examples(generate_training_examples, to_print=’incorrect’, do_raise=True)
10 / 10 = 100.0% of test cases passed!
The following function calls generate_training_examples(…) on every sentence in the dataset to create the full training data. You do not need to edit it.
In [19]:
### DO NOT EDIT ###
def generate_all_training_examples(vocab, sentences, feat_extract = lambda parser_config: []):
”’
Generate training examples for all sentences.
”’
all_training_examples = []
successful_sents = 0
for sentence in sentences:
training_examples = generate_training_examples(sentence[0], sentence[1], vocab, feat_extract)
if training_examples != []:
all_training_examples += training_examples
successful_sents += 1
print(“Successfully generated training examples for”, successful_sents, “/”, len(sentences), “sentences”)
print(“Number of training examples:”, len(all_training_examples))
return all_training_examples
In [20]:
### DO NOT EDIT ###
if __name__ == ‘__main__’:
_vocab = Vocabulary(train_set) # Variable just used in this cell
generate_all_training_examples(_vocab, _vocab.buildSentences(train_set))
Successfully generated training examples for 11888 / 12543 sentences
Number of training examples: 373100
TODO: Extract Features [28 points]¶
By this point, you have written code to create individual training instances. Each instance is made up of a parser configuration along with the gold action that the classifier should be trained to predict.
In order to make this prediction, your neural network will have to rely on features extracted from each parser configuration. We follow the procedure described at the end of Section 3.1 of A Fast and Accurate Dependency Parser using Neural Networks. In total, we will extract 48 features from the parser configuration $-$ 18 word features, 18 POS features, and 12 label features:
Word & POS features for $s_1$, $s_2$, $s_3$ (top 3 items of the stack)
Word & POS features for $b_1$, $b_2$, $b_3$ (top 3 items of the buffer)
Word, POS, & label features for $lc_1(s_1)$, $lc_2(s_1)$, $lc_1(s_2)$, $lc_2(s_2)$ (the first & second leftmost children of the top 2 items on the stack)
Word, POS, & label features for $rc_1(s_1)$, $rc_2(s_1)$, $rc_1(s_2)$, $rc_2(s_2)$ (the first & second rightmost children of the top 2 items on the stack)
Word, POS, & label features for $lc_1(lc_1(s_1))$, $lc_1(lc_1(s_2))$, $rc_1(rc_1(s_1))$, $rc_1(rc_1(s_2))$ (the leftmost of the leftmost & rightmost of the rightmost children of the top 2 items on the stack)
You will write a separate function for each of the 5 bullets above. Each function will return a list of word features, a list of POS features, and (in the relevant cases) a list of label features. We also provide you with a test for each function.
A “feature” refers to the id (index in the Vocabulary) of a word, POS, or label. Your neural network will then construct embeddings for each id, much as you have seen in previous homeworks.
First, write 2 functions to extract features corresponding to the words at the top of the stack & buffer:
get_top3_stack_features(parser_config) [3 points]: Return word & POS features (ids) for $s_1$, $s_2$, $s_3$ (top 3 words on the stack).
get_top3_buffer_features(parser_config) [3 points]: Return word & POS features (ids) for $b_1$, $b_2$, $b_3$ (top 3 words on the buffer).
Wherever a particular word does not exist (such as when the stack or buffer has length $< 3$) use the appropriate NULL token. This is necessary to ensure that our neural network will see an equally sized feature vector for each example.
In [21]:
def get_top3_stack_features(parser_config):
'''
Get the word and POS features for s1, s2, and s3 (the top 3 items on the stack)
Returns:
word_features: List of word ids for s1, s2, s3 (use vocab.WORD_NULL if a word does not exist)
pos_features: List of POS ids for s1, s2, s3 (use vocab.POS_NULL if a word does not exist)
'''
word_features, pos_features = [parser_config.vocab.WORD_NULL]*3, [parser_config.vocab.POS_NULL]*3
### TODO ###
s1 = parser_config.stack.get_si(1)
s2 = parser_config.stack.get_si(2)
s3 = parser_config.stack.get_si(3)
if s1 != None:
word_features[0] = s1.word_id
pos_features[0] = s1.pos_id
if s2 != None:
word_features[1] = s2.word_id
pos_features[1] = s2.pos_id
if s3 != None:
word_features[2] = s3.word_id
pos_features[2] = s3.pos_id
return word_features, pos_features
In [22]:
if __name__ == '__main__':
test(get_top3_stack_features, to_print='incorrect', do_raise=True)
10 / 10 = 100.0% of test cases passed!
In [23]:
def get_top3_buffer_features(parser_config):
'''
Get the word and POS features for b1, b2, and b3 (the top 3 items on the buffer)
Returns:
word_features: List of word ids for b1, b2, b3 (use vocab.WORD_NULL if a word does not exist)
pos_features: List of POS ids for b1, b2, b3 (use vocab.POS_NULL if a word does not exist)
'''
word_features, pos_features = [parser_config.vocab.WORD_NULL]*3, [parser_config.vocab.POS_NULL]*3
### TODO ###
s1 = parser_config.buffer.get_bi(1)
s2 = parser_config.buffer.get_bi(2)
s3 = parser_config.buffer.get_bi(3)
if s1 != None:
word_features[0] = s1.word_id
pos_features[0] = s1.pos_id
if s2 != None:
word_features[1] = s2.word_id
pos_features[1] = s2.pos_id
if s3 != None:
word_features[2] = s3.word_id
pos_features[2] = s3.pos_id
return word_features, pos_features
In [24]:
if __name__ == '__main__':
test(get_top3_buffer_features, to_print='incorrect', do_raise=True)
10 / 10 = 100.0% of test cases passed!
The remaining features have to do with the leftmost & rightmost children of the words at the top of the stack & buffer. Write the following 2 helper functions to make it easier to access these dependents:
get_lc(parser_config, word) [2 points]: Return a list of arcs to dependents of word, sorted from left to right. Only include dependents that are to the left of word in the sentence.
get_rc(parser_config, word) [2 points]: Return a list of arcs to dependents of word, sorted from right to left. Only include dependents that are to the right of word in the sentence.
Hint: You can sort a list of objects using sorted(...) with the key parameter.
In [27]:
def get_lc(parser_config, word):
'''
Finds the left dependents of word, sorted from left to right.
Returns:
A list of Arcs whose head is word, sorted by the indices of the dependent word from left to right.
'''
### TODO ###
arc = []
for x in parser_config.dependencies:
if x.head == word and word.idx > x.dependent.idx:
arc.append(x)
arc = sorted(arc, key=lambda x: x.dependent.idx)
return arc
In [28]:
if __name__ == ‘__main__’:
test(get_lc, to_print=’incorrect’, do_raise=True)
10 / 10 = 100.0% of test cases passed!
In [29]:
def get_rc(parser_config, word):
”’
Finds the right dependents of word, sorted from right to left.
Returns:
A list of Arcs whose head is word, sorted by the indices of the dependent word from right to left.
”’
### TODO ###
arc = []
for x in parser_config.dependencies:
if x.head == word and word.idx < x.dependent.idx:
arc.append(x)
arc = sorted(arc, key=lambda x: x.dependent.idx, reverse=True)
return arc
In [30]:
if __name__ == '__main__':
test(get_rc, to_print='incorrect', do_raise=True)
10 / 10 = 100.0% of test cases passed!
Let $lc_j(s_i)$ be the $j$th leftmost child of the $i$th item on the stack. Write the following function:
get_lc1_lc2_features(parser_config, i) [6 points]: Return word & POS features for $lc_1(s_i)$ and $lc_2(s_i)$. Additionally, return label features (the label ids) for the arcs that attach $lc_1(s_i)$ to $s_i$ and $lc_2(s_i)$ to $s_i$. As before, wherever a particular word does not exist, use the appropriate NULL token.
We will call this function with i=1 and i=2, accounting for the words $lc_1(s_1)$, $lc_2(s_1)$, $lc_1(s_2)$, $lc_2(s_2)$.
In [33]:
def get_lc1_lc2_features(parser_config, i):
'''
Get the word, POS, and label features for lc1(si) and lc2(si), where i in {1, 2}
Returns:
word_features: List of word ids for lc1(si), lc2(si) (use vocab.WORD_NULL if a word does not exist)
pos_features: List of POS ids for lc1(si), lc2(si) (use vocab.POS_NULL if a word does not exist)
label_features: List of label ids for lc1(si), lc2(si) (use vocab.LABEL_NULL if a word does not exist)
'''
assert i in {1,2}
word_features, pos_features, label_features = [parser_config.vocab.WORD_NULL]*2, [parser_config.vocab.POS_NULL]*2, [parser_config.vocab.LABEL_NULL]*2
### TODO ###
si = parser_config.stack.get_si(i)
lc = get_lc(parser_config, si)
if len(lc) >= 1:
lc1 = lc[0].dependent
word_features[0] = lc1.word_id
pos_features[0] = lc1.pos_id
label_features[0] = lc[0].label_id
if len(lc) >= 2:
lc2 = lc[1].dependent
word_features[1] = lc2.word_id
pos_features[1] = lc2.pos_id
label_features[1] = lc[1].label_id
return word_features, pos_features, label_features
In [34]:
if __name__ == ‘__main__’:
test(get_lc1_lc2_features, i=1, to_print=’incorrect’, do_raise=True) # call with i=1
10 / 10 = 100.0% of test cases passed!
In [35]:
if __name__ == ‘__main__’:
test(get_lc1_lc2_features, i=2, to_print=’incorrect’, do_raise=True) # call with i=2
10 / 10 = 100.0% of test cases passed!
You will now write the analagous function for the rightmost children. Let $rc_j(s_i)$ be the $j$th rightmost child of the $i$th item on the stack. Write the following function:
get_rc1_rc2_features(parser_config, i) [6 points]: Return word & POS features for $rc_1(s_i)$ and $rc_2(s_i)$. Additionally, return label features (the label ids) for the arcs that attach $rc_1(s_i)$ to $s_i$ and $rc_2(s_i)$ to $s_i$. As before, wherever a particular word does not exist, use the appropriate NULL token.
We will call this function with i=1 and i=2, accounting for the words $rc_1(s_1)$, $rc_2(s_1)$, $rc_1(s_2)$, $rc_2(s_2)$.
In [36]:
def get_rc1_rc2_features(parser_config, i):
”’
Get the word, POS, and label features for rc1(si) and rc2(si), where i in {1, 2}
Returns:
word_features: List of word ids for rc1(si), rc2(si) (use vocab.WORD_NULL if a word does not exist)
pos_features: List of POS ids for rc1(si), rc2(si) (use vocab.POS_NULL if a word does not exist)
label_features: List of label ids for rc1(si), rc2(si) (use vocab.LABEL_NULL if a word does not exist)
”’
assert i in {1,2}
word_features, pos_features, label_features = [parser_config.vocab.WORD_NULL]*2, [parser_config.vocab.POS_NULL]*2, [parser_config.vocab.LABEL_NULL]*2
### TODO ###
si = parser_config.stack.get_si(i)
lc = get_rc(parser_config, si)
if len(lc) >= 1:
lc1 = lc[0].dependent
word_features[0] = lc1.word_id
pos_features[0] = lc1.pos_id
label_features[0] = lc[0].label_id
if len(lc) >= 2:
lc2 = lc[1].dependent
word_features[1] = lc2.word_id
pos_features[1] = lc2.pos_id
label_features[1] = lc[1].label_id
return word_features, pos_features, label_features
return None, None, None
In [37]:
if __name__ == ‘__main__’:
test(get_rc1_rc2_features, i=1, to_print=’incorrect’, do_raise=True) # call with i=1
10 / 10 = 100.0% of test cases passed!
In [38]:
if __name__ == ‘__main__’:
test(get_rc1_rc2_features, i=2, to_print=’incorrect’, do_raise=True) # call with i=2
10 / 10 = 100.0% of test cases passed!
Finally, write the following function:
get_llc_rrc_features(parser_config, i) [6 points]: Return word & POS features for $lc_1(lc_1(s_i))$ and $rc_1(rc_1(s_i))$. Additionally, return label features (the label ids) for the arcs that attach $lc_1(lc_1(s_i))$ to $lc_1(s_i)$ and $rc_1(rc_1(s_i))$ to $rc_1(s_i)$. As before, wherever a particular word does not exist, use the appropriate NULL token.
We will call this function with i=1 and i=2, accounting for the words $lc_1(lc_1(s_1))$, $lc_1(lc_1(s_2))$, $rc_1(rc_1(s_1))$, $rc_1(rc_1(s_2))$.
In [39]:
def get_llc_rrc_features(parser_config, i):
”’
Get the word, POS, and label features for lc1(lc1(si)), and rc1(rc1(si)), where i in {1, 2}
Returns:
word_features: List of word ids for lc1(lc1(si)), and rc1(rc1(