CS代考 Copy of hwk4

Copy of hwk4

CS 447 Homework 4 $-$ Dependency Parsing¶
In this homework you will build a neural transition-based dependency parser, based off the paper A Fast and Accurate Dependency Parser using Neural Networks.

The setup for a dependency parser is somewhat more sophisticated than tasks like classification or translation. Therfore, this homework contains many small functions that can be tested incrementally. In addition to Gradescope tests, we also provide substantial tests in the notebook that you can use to debug your code.

We suggest that you work on this homework in CPU until you are ready to train. At that point, you should switch your runtime to GPU. You can do this by going to Runtime > Change Runtime Type and select “GPU” from the dropdown menu.

You will find it easier to debug on CPU, and the error messages will be more understandable.
Google monitors your GPU usage and will occasionally restrict GPU access if you use it too much. In these cases, you can either switch to a different Google account or wait for your access to be restored.

Step 0: Provided Testing Functions¶
The following cells set up the tests that you can use in the notebook. You should not edit any of these cells.

In [2]:

### DO NOT EDIT ###

if __name__ == ‘__main__’:
import cloudpickle as cp
from urllib.request import urlopen

testing_bundle = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1-9lWbXuXZYGjJWQRKD7um_83O-I3ER63″))

In [3]:

### DO NOT EDIT ###

def test(test_fxn, i=None, test_bundle=None, to_print = ‘all’, do_return=False, display_sent=True, do_raise=True):
to_test = test_fxn.__name__ + ((‘_’ + str(i)) if i is not None else ”)
if test_fxn.__name__ not in {‘get_gold_action’, ‘get_lc’, ‘get_rc’, ‘get_top3_stack_features’, ‘get_top3_buffer_features’}: assert i is not None
assert to_test in {‘get_gold_action’, ‘get_lc’, ‘get_rc’, ‘get_top3_stack_features’, ‘get_top3_buffer_features’, ‘get_lc1_lc2_features_1’,
‘get_lc1_lc2_features_2’, ‘get_rc1_rc2_features_1’, ‘get_rc1_rc2_features_2’, ‘get_llc_rrc_features_1’, ‘get_llc_rrc_features_2’}
assert to_print in {‘all’, ‘incorrect’, ‘none’}
test_bundle = testing_bundle if test_bundle is None else test_bundle
sentences, configs, gold_actions, gold_features, _ = test_bundle
vocab = Vocabulary(sentences)
testsents = vocab.buildSentences(sentences)
sents_correct, printag = 0, None # printag will collect the first error you encounter (used in Gradescope)
for i in range(len(sentences)):

printsent = “” # printsent is full printout for this sentence (only needed when to_print != ‘none’)
sent, gold_deps = testsents[i][0], testsents[i][1]
if to_test == ‘get_gold_action’:
printsent += ‘gold_dependencies: ‘ +str([str(arc) for arc in gold_deps.arcs]) + ‘\n\n’

for j in range(len(configs[i])):
# Manually create Stack, Buffer, Dependencies & ParserConfiguration objects from test case
stk, buf, deps = configs[i][j]
stack, buffer, dependencies = Stack([sent[idx] for idx in stk]), Buffer([sent[idx] for idx in buf]), Dependencies(vocab)
for arc in deps: dependencies.add(sent[arc[0]], sent[arc[1]], arc[2])
parser_config = ParserConfiguration(sent, vocab)
parser_config.stack, parser_config.buffer, parser_config.dependencies = stack, buffer, dependencies

if to_test == ‘get_gold_action’ or not (gold_actions[i][j] is None or gold_actions[i][j] == ‘DONE’): # Don’t need to test if Done or None when not testing get_gold_action

# Call the student code
arg = stack.get_si(1) if to_test in {‘get_lc’, ‘get_rc’} else (int(to_test[-1]) if to_test[-1] in {‘1’, ‘2’} else None)
if to_test == ‘get_gold_action’: fxn = lambda: test_fxn(stack, buffer, gold_deps)
elif to_test in {‘get_top3_stack_features’, ‘get_top3_buffer_features’}: fxn = lambda: test_fxn(parser_config)
else: fxn = lambda: test_fxn(parser_config, arg)
tt = to_test[:-2] if to_test[-1] in {‘1’, ‘2’} else to_test

exception=None
try:
yours = fxn()
except Exception as e:
yours = “Raised Exception: ” + str(e)
exception=e
correct = gold_actions[i][j] if to_test == ‘get_gold_action’ else gold_features[i][j][to_test]
fxn_name = ‘get_gold_action(stack, buffer, gold_dependencies)’ if to_test == ‘get_gold_action’else tt + (‘(parser_config, 1)’ if to_test not in to_test in {‘get_lc’, ‘get_rc’} else ‘(parser_config, ‘ + str(arg) + ‘)’)

if to_test in {‘get_lc’, ‘get_rc’} and exception is None:
if type(yours) != list or (len(yours) > 0 and type(yours[0]) != Arc):
yours = ‘Your ‘ + to_test + ‘(…) did not return a list of Arcs’ # note: exact quote used below, so if need to change, change in both places
else: yours = [str(arc) for arc in yours]
# if random.random() < 0.05: yours = [None, None, None, None, None, None] # simulate getting it wrong is_correct = yours == correct # Make the printouts! printsent += 'Step ' + str(j+1) + ' | stack: ' + str([str(word) if to_test != 'get_gold_action' else word.word for word in stack]) + '\tbuffer: ' + str([str(word) if to_test != 'get_gold_action' else word.word for word in buffer]) + (('\tdependencies: ' + str([str(arc) for arc in dependencies.arcs])) if to_test != 'get_gold_action' else "") + '\n' printsent += '\tTesting '+ fxn_name +':\n' printsent += '\t\tYour Result: ' + str(yours) + '\t\tCorrect Result: ' + str(correct) + ('\t\tCorrect!' if is_correct else "") + '\n' if not is_correct: printsent += '\t\tIncorrect! Halting parse of this sentence.\n\n' if printag is None: statement=yours if yours == 'Your ' + to_test + '(...) did not return a list of Arcs' else str(yours)[:20]+("... " if len(str(yours)) > 20 else “”)
statement = “Your first error (on a hidden test sentence): You returned ” + statement + “; expected ” + str(correct) + “.”
printag = statement if exception is None else (statement, exception)
if to_print != ‘none’:
print(“Testing Sentence ” + str(i+1) + ‘…’)
if display_sent: display_sentence(sentences[i])
print(printsent)
print(‘Test Sentence ‘ + str(i+1) + ‘ failed.’)
print((‘\n’ + ‘-‘*100 + ‘\n’))
if do_raise and exception is not None:
raise exception
break
else: printsent += ‘\n’

else:
sents_correct += 1
if to_print == ‘all’:
print(“Testing Sentence ” + str(i+1) + ‘…’)
if display_sent: display_sentence(sentences[i])
print(printsent)
print(‘Test Sentence ‘ + str(i+1) + ‘ passed!’)
print((‘\n’ + ‘-‘*100 + ‘\n’))
score = sents_correct / len(sentences)
print(sents_correct, ‘/’, len(sentences), ‘=’, str(score*100)+’%’, ‘of test cases passed!’)
if do_return:
return score, printag

def test_generate_training_examples(test_fxn, feat_extract = None, test_bundle=None, to_print=’all’, do_return=False, display_sent=True, do_raise=True):
assert test_fxn.__name__ == ‘generate_training_examples’ and (feat_extract is None or feat_extract.__name__ == ‘extract_features’)
test_bundle = testing_bundle if test_bundle is None else test_bundle
sentences, _, _, _, gold_examples = test_bundle
vocab = Vocabulary(sentences)
testsents = vocab.buildSentences(sentences)
sents_correct, printag = 0, None
for i in range(len(sentences)):
sent, gold_deps = testsents[i][0], testsents[i][1]
exception=None
try:
your_examples = test_fxn(sent, gold_deps, vocab, feat_extract=feat_extract if feat_extract is not None else lambda p: [])
except Exception as e:
your_examples = “Raised Exception: ” + str(e)
exception=e
# if random.random() < 0.5: your_examples = [None, None, None, None, None, None, None, None] # simulate getting it wrong is_correct = your_examples == gold_examples[i][0 if feat_extract == None else 1] if to_print == 'all' or to_print == 'incorrect' and not is_correct: print("Testing Sentence " + str(i+1) + '...') if display_sent: display_sentence(sentences[i]) print("sentence: " + str([word.word for word in sent]) + '\tgold_dependencies: ' + str([str(arc) for arc in gold_deps.arcs])) print("\tTesting generate_training_examples(sentence, gold_dependencies, vocab" + ('' if feat_extract is None else ', feat_extract=extract_features') + "):") print('\t\tYour Training Examples: ', your_examples, '\n\t\tCorrect Training Examples:', gold_examples[i][0 if feat_extract == None else 1]) print('\t\tCorrect!\n\n' if is_correct else '\t\tIncorrect!\n\n') if not is_correct and printag is None: statement = "Your first error (on a hidden test sentence): You returned " + str(your_examples)[0:20]+("... " if len(str(your_examples)) > 20 else “”) + “; expected ” + str(gold_examples[i]) + “.”
printag = statement if exception is None else (statement, exception)
if to_print == ‘all’ or to_print == ‘incorrect’ and not is_correct:
print((‘Test Sentence ‘ + str(i+1) + ‘ passed!’) if is_correct else (‘Test Sentence ‘ + str(i+1) + ‘ failed.’))
print(‘\n’+ ‘-‘*100, ‘\n’)
if is_correct: sents_correct += 1
if do_raise and exception is not None:
raise exception

score = sents_correct / len(sentences)
print(sents_correct, ‘/’, len(sentences), ‘=’, str(score*100)+’%’, ‘of test cases passed!’)
if do_return:
return score, printag

Step 1: Prepare Data¶

In [4]:

### DO NOT EDIT ###

import numpy as np
from spacy import displacy
import random

Read & Visualize Data¶
We provide the data in a format preprocessed for this assignment. Here are some links if you are interested in learning more about the data:

We use one of the Universal Dependencies datasets from http://universaldependencies.org/docsv1/. Specifically, we use the UD_English dataset in version 1.4.
Refer to https://universaldependencies.org/ if you want to know more about the Universal Dependencies framework in general.
The data license can be found here: https://lindat.mff.cuni.cz/repository/xmlui/page/licence-UD-1.4

Run the following cells to load the data and see the number of sentences. You do not need to edit these cells.

In [6]:

### DO NOT EDIT ###

def load_data():
train_set = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1N4B4bC4ua0bFMIYeNdtNQQ72LxxKLPas”))
test_set = cp.load(urlopen(“https://drive.google.com/uc?export=download&id=1TE2AflhABbz41dLMGmD1kTm7WevYpU31”))
return train_set, test_set

In [7]:

### DO NOT EDIT ###

if __name__ == ‘__main__’:
train_set, test_set = load_data()
print(“Num. Train Examples:”, len(train_set))
print(“Num. Test Examples: “, len(test_set))

Num. Train Examples: 12543
Num. Test Examples: 2002

Next, we visualize the training data, which contains labeled dependency trees. At test time, our goal will be to predict the dependency arcs & labels for a given sentence.

In [8]:

### DO NOT EDIT ###

def display_sentence(sent):
res = {‘words’: [{‘text’: ““, ‘tag’: ‘POS_ROOT’}], ‘arcs’: []}
for i in range (len(sent[‘word’])):
res[‘words’].append({‘text’: sent[‘word’][i], ‘tag’: sent[‘pos’][i]})
s = i + 1
e = sent[‘head’][i]
direc = “left”
if s > e:
s = sent[‘head’][i]
e = i + 1
direc = ‘right’
cur = {‘start’: s, ‘end’: e, ‘label’: sent[‘label’][i], ‘dir’: direc}
res[‘arcs’].append(cur)
displacy.render(res, style=”dep”, manual=True, jupyter=True, options={‘distance’: 70})

In [9]:

if __name__ == ‘__main__’:
MIN_SENT_LEN, MAX_SENT_LEN = 4, 17 # You can change this if you’re interested in seeing shorter/longer sentences
for x in random.sample(list(filter(lambda x: len(x[‘word’]) >= MIN_SENT_LEN and len(x[‘word’]) <= MAX_SENT_LEN, train_set)), 5): display_sentence(x)
POS_ROOT

they
PRP

do
VBP

nt
RB

hurt
VB

unless
IN

you
PRP

r
VBP

stupid
JJ

enough
JJ

to
TO

press
VB

on
IN

them
PRP

.
.

nsubj

aux

neg

root

mark

nsubj

cop

advcl

advmod

mark

advcl

case

nmod

punct


POS_ROOT

absolute
RB

horrible
JJ

service
NN

from
IN

the
DT

parts
NNS

department
NN

.
.

advmod

amod

root

case

det

compound

nmod

punct


POS_ROOT

a
DT

changing
NN

bag
NN

folds
VBZ

neatly
RB

and
CC

does
VBZ

not
RB

take
VB

much
JJ

space
NN

in
IN

your
PRP$

bag
NN

.
.

det

compound

nsubj

root

advmod

cc

aux

neg

conj

amod

dobj

case

nmod:poss

nmod

punct


POS_ROOT

the
DT

one
CD

reason
NN

why
WRB

all
DT

americans
NNPS

should
MD

oppose
VB

the
DT

nomination
NN

of
IN

samuel
NNP

alito
NNP

det

nummod

root

advmod

det

nsubj

aux

acl:relcl

det

dobj

case

name

nmod


POS_ROOT

he
PRP

might
MD

even
RB

start
VB

peeing
VBG

on
IN

things
NNS

in
IN

fear
NN

.
.

nsubj

aux

advmod

root

xcomp

case

nmod

case

nmod

punct

Build Vocabulary¶
Next, we build the Vocabulary class. This maps each word, part-of-speech tag (POS), and label to an id (index), which we will later use in our embeddings. We also need to enumerate each possible transition, and map it to an id, since this is what our neural network will try to predict. The Vocabulary class does this as follows. Suppose there are $n$ labels. For each label, we create a Left-Arc (LA) and Right-Arc (RA) action for that label; and we also create a separate Shift (S) action. This creates a total of $2n+1$ actions that our dependency parser will be able to choose from.

You do not need to edit this cell.

In [10]:

### DO NOT EDIT ###

class Vocabulary(object):
def __init__(self, dataset):

UNK = ‘
NULL = ‘
ROOT = ‘

# Find the label of the root
root_labels = list([l for ex in dataset for (h, l) in zip(ex[‘head’], ex[‘label’]) if h == 0])
assert len(set(root_labels)) == 1
self.root_label = root_labels[0]

# Create mapping from transitions to ids
labels = sorted(list(set([w for ex in dataset for w in ex[‘label’] if w != self.root_label]))) # list of unique non-root labels
labels = [self.root_label] + labels # add root label too
self.n_labels = len(labels)

transitions = [‘LA-‘ + l for l in labels] + [‘RA-‘ + l for l in labels] + [‘S’]
self.n_trans = len(transitions) # 2*n_labels + 1
self.tran2id = {t: i for (i, t) in enumerate(transitions)}
self.id2tran = {i: t for (i, t) in enumerate(transitions)}

# Create mapping from word, pos, & label to id
# Do labels first
self.LABEL_PREFIX = ‘:’
self.tok2id = {self.LABEL_PREFIX + l: i for (i, l) in enumerate(labels)}
self.LABEL_NULL = self.tok2id[self.LABEL_PREFIX + NULL] = len(self.tok2id) # Add null label in

# Do pos’s
self.POS_PREFIX = ‘

:’
all_pos = sorted(set([self.POS_PREFIX + w for ex in dataset for w in ex[‘pos’]])) # Get pos’s
self.tok2id.update({w: index + len(self.tok2id) for (index, w) in enumerate(all_pos)}) # Add poses in
self.POS_NULL = self.tok2id[self.POS_PREFIX + NULL] = len(self.tok2id) # Add null pos
self.POS_ROOT = self.tok2id[self.POS_PREFIX + ROOT] = len(self.tok2id) # Add root pos
self.n_pos = 2 + len(all_pos) # +3 for null, root

# Do words
all_word = sorted(set([w for ex in dataset for w in ex[‘word’]]))
self.tok2id.update({w: index + len(self.tok2id) for (index, w) in enumerate(all_word)}) # Add words in
self.WORD_UNK = self.tok2id[UNK] = len(self.tok2id) # Add unk word
self.WORD_NULL = self.tok2id[NULL] = len(self.tok2id) # Add null word
self.WORD_ROOT = self.tok2id[ROOT] = len(self.tok2id) # Add root word
self.n_words = 3 + len(all_word) # +3 for unk, null, root

self.id2tok = {v: k for (k, v) in self.tok2id.items()} # Flip it
self.n_tokens = len(self.tok2id)

def printStats(self):
print(‘Num. labels:’, self.n_labels)
print(‘Num. transitions (2*n_labels + 1):’, self.n_trans)
print(‘Num. pos:’, self.n_pos)
print(‘Num. words:’, self.n_words)
print(‘Num. tokens:’, self.n_tokens)

def buildSentences(self, examples):
processed_sentences = []
for ex in examples:
# Initialize words & dependencies
words = [Word(‘‘, ‘‘, 0, self.WORD_ROOT, self.POS_ROOT)]
deps = []

# Loop over words in sentence
for i in range(len(ex[‘word’])):
w = ex[‘word’][i]
word_id = (self.tok2id[w] if w in self.tok2id else self.WORD_UNK)
pos = self.POS_PREFIX + ex[‘pos’][i]
pos_id = self.tok2id[pos]
word = Word(ex[‘word’][i], ex[‘pos’][i],i+1, word_id, pos_id)

words.append(word)
deps.append((ex[‘head’][i], word, ex[‘label’][i] ))

# Create dependencies
dependencies = Dependencies(self)
for dep in deps:
dependencies.add(words[dep[0]], dep[1], dep[2])

processed_sentences.append((words, dependencies))

return processed_sentences

Run the following cell to see some stats from the vocabulary.

In [11]:

### DO NOT EDIT ###

if __name__ == ‘__main__’:
Vocabulary(train_set).printStats()

Num. labels: 47
Num. transitions (2*n_labels + 1): 95
Num. pos: 52
Num. words: 16635
Num. tokens: 16735

Step 2: Parser Data Structures [12 points]¶
In this section, we define some useful data structures for dependency parsing. In particular, you will implement the Stack and Buffer structures, which make up a ParserConfiguration. You will also write a function to update these data structures based on a particular transition.

Helpful Data Structures¶
First, we define some data classes that you will find useful. You will be working with these a lot, so you should understand the data they contain as well as their methods. You do not need to edit this cell.

In [12]:

### DO NOT EDIT ###

class Word(object):
”’
Represents a word in the sentence.
”’

def __init__(self, word, pos, idx, word_id=None, pos_id=None):
self.word = word
self.pos = pos
self.idx = idx
self.word_id = word_id
self.pos_id = pos_id

def __str__(self):
return ‘Word(idx=’ + str(self.idx) + “, word='” + self.word+”‘, pos='”+self.pos+”‘, word_id=”+str(self.word_id)+’, pos_id=’+ str(self.pos_id) +’)’

def copy(self):
return Word(self.word, self.pos, self.idx, self.word_id, self.pos_id)

def __eq__(self, obj):
if not isinstance(obj, Word): return False
if obj.idx == self.idx:
assert obj.word == self.word and obj.pos == self.pos and obj.word_id == self.word_id and obj.pos_id == self.pos_id, ‘Your Word object has been corrupted.’
return obj.idx == self.idx

class Arc(object):
”’
Represents an arc between two words.
”’

def __init__(self, head, dependent, label, label_id):
self.head=head # Word object
self.dependent=dependent # Word object
self.label=label
self.label_id = label_id

def __str__(self):
return ‘Arc(head_idx=’+str(self.head.idx)+’, dep_idx=’+str(self.dependent.idx)+’, label_id=’+ str(self.label_id)+’)’

class Dependencies(object):
”’
Represents the dependency arcs in a sentence.
”’

def __init__(self, vocab):
self.arcs = []
self.vocab = vocab
self.dep_to_head_mapping = {} # For fast lookup

def add(self, head, dependent, label):
”’
Add a dependency from head to dependent with label.
Inputs:
head: Word object
dependent: Word object
label: str
”’
assert label[:3] != ‘LA-‘ and label[:3] != ‘RA-‘, ‘You need to pass in just the label to add(…), not the entire action.’
assert head is not None and dependent is not None, “You must pass in two Word objects to add(…).”
self.arcs.append(Arc(head, dependent, label, self.vocab.tok2id[self.vocab.LABEL_PREFIX+label]))
assert dependent.idx not in self.dep_to_head_mapping
self.dep_to_head_mapping[dependent.idx] = self.arcs[-1]

def getArcToHead(self, dependent):
”’
Returns the Arc that connects the head of dependent to dependent.
Inputs:
dependent: Word object
”’
if dependent.idx == 0: # Special case for ROOT
return Arc(None, dependent, None, None)
return self.dep_to_head_mapping[dependent.idx]

def __iter__(self):
return iter(self.arcs) # Allows you to iterate “for x in Dependencies(…)”

TODO: Stack & Buffer [6 points]¶
Here, we provide you with the outline of stack and buffer data structures. Your task is to implement the push(…) and pop(…) methods of the Stack, and the pop(…) method of the Buffer. Each method is worth 2 points.

In [13]:

class Stack(object):
def __init__(self, input=[]):
”’
Initialize an (empty) stack.
”’
self.stack = [word.copy() for word in input]

def push(self, item):
”’
Push item onto (the end of) self.stack. Returns nothing.
”’

### TODO ###
self.stack.append(item)

def pop(self):
”’
Pop item from (the end of) self.stack. Returns the item popped.
”’
assert len(self.stack) > 0

### TODO ###
item = self.stack.pop()
return item

def get_si(self, i):
”’
Returns si (the ith element of the stack) if it exists, otherwise None.
”’
assert i > 0, ‘Must provide i > 0′
return self.stack[-i] if len(self.stack) >= i else None

def __getitem__(self, idx):
return self.stack[idx]

def __len__(self):
return len(self.stack)

def __str__(self):
return str([str(x) for x in self.stack])

class Buffer(object):
def __init__(self, sentence):
”’
Initialize as a list of words in sentence.
”’
self.buffer = [word.copy() for word in sentence]

def pop(self):
”’
Pop item from (the beginning of) self.buffer. Returns the item popped.
”’
assert len(self.buffer) > 0

### TODO ###
item = self.buffer.pop(0)
return item

def get_bi(self, i):
”’
Returns bi (the ith element of the buffer) if it exists, otherwise None.
”’
assert i > 0, ‘Must provide i > 0’
return self.buffer[i-1] if len(self.buffer) >= i else None

def __getitem__(self, idx):
return self.buffer[idx]

def __len__(self):
return len(self.buffer)

def __str__(self):
return str([str(x) for x in self.buffer])

TODO: Parser Configuration [6 points]¶
Next, we create a ParserConfiguration class, which contains the Stack, Buffer, and Dependencies data structures. You only need to implement parse_step(self, transition), which modifies these structures based on transition. This method is worth 6 points.

More specifically, let $\sigma$ represent the stack, $\beta$ the buffer, and $A$ the set of arcs (dependencies). Based on the value of transition, parse_step(self, transition) should do the following:

transition = ‘S’:  Shift $w_k$ from the buffer to the stack. $(\sigma, w_k|\beta, A) \Rightarrow (\sigma|w_k, \beta, A)$
transition = ‘LA-label’:  Add a left arc with label $label$ from $w_j$ to $w_i$. $(\sigma |w_i w_j , \beta, A) \Rightarrow (\sigma |w_j, \beta, A \cup \{(w_j, label, w_i)\})$
transition = ‘RA-label’:  Add a right arc with label $label$ from $w_i$ to $w_j$. $(\sigma |w_i w_j , \beta, A) \Rightarrow (\sigma |w_i, \beta, A \cup \{(w_i, label, w_j)\})$

Hint: Use your push(…) and pop(…) operations here, and look at the methods of the Dependencies class to see how to add an arc to it.

In [81]:

class ParserConfiguration(object):
def __init__(self, sentence, vocab):
”’
Inputs:
sentence: list of Word objects
vocab: Vocabulary object
”’

self.vocab = vocab

assert sentence[0].word_id == self.vocab.WORD_ROOT
self.stack = Stack([sentence[0]]) # Initialize stack with ROOT
self.buffer = Buffer(sentence[1:]) # Initialize buffer with sentence
self.dependencies = Dependencies(vocab)

def parse_step(self, transition):
”’
Update stack, buffer, and dependencies based on transition.
Inputs:
transition: str, “S”, “LA-label”, or “RA-label”, where label is a valid label
”’
assert transition in self.vocab.tran2id

### TODO ###
if transition == ‘S’:
if len(self.buffer) > 0:
item = self.buffer.pop()
self.stack.push(item)
elif transition[:3] == ‘LA-‘:
if len(self.stack) >= 2:
wj = self.stack.pop()
wi = self.stack.pop()
self.stack.push(wj)
self.dependencies.add(wj, wi, transition[3:])
elif transition[:3] == ‘RA-‘:
if len(self.stack) >= 2:
wj = self.stack.pop()
wi = self.stack.pop()
self.stack.push(wi)
self.dependencies.add(wi, wj, transition[3:])

Step 3: Generate Training Data [44 points]¶
As you saw above, the dataset contains many sentences along with their gold dependency parses. In order to use a transition-based parsing algorithm, we must predict the next parser action at each time step, based on the current parser configuration. Thus, we will need to transform each sentence into a series of training examples, where the input features are based on the current parser configuration and the correct label is the gold action.

In this section, you will first write an algorithm to select the next parser action at each time step based on the gold dependency parse. Then, you will extract features from each step’s parser configuration, which the neural network will use to predict the next action.

TODO: Compute Gold Action [8 points]¶
Next, you will write a function get_gold_action(stack, buffer, gold_dependencies). Given a stack and buffer, this function should return the next action of the parser based on the gold dependencies. We encourage you to review the example of a sentence parsing in the lecture slides before attempting to implement this function. This method is worth 8 points.

Let $s_i$ be $i$th element of the stack and $h(s_i)$ be the head word of $s_i$. The pseudocode is as follows:

If the stack only contains ROOT: If the buffer is not empty, return S (shift).
If the buffer is empty, return DONE, indicating the parse is complete.

If $h(s_2)=s_1$, return LA-label. Here, label is the label of the arc that attaches $s_2$ to $s_1$.
If $h(s_1)=s_2$ and $h(b_i) \neq s_1$ for all words $b_i$ in the buffer, return RA-label. Here, label is the label of the arc that attaches $s_1$ to $s_2$. This condition means that you cannot attach $s_1$ until everything in the buffer that depends on $s_1$ is attached. You should think about why this condition is necessary!

Otherwise: If the buffer is not empty, return S.
If the buffer is empty, return None, indicating a failed parse (i.e., the sentence is non-projective).

Hint: To get the $i$th word on the stack or buffer, call stack.get_si(i) or buffer.get_bi(i). To find the head of a word w, call gold_dependencies.getArcToHead(w).

In [15]:

def get_gold_action(stack, buffer, gold_dependencies):
”’
Given stack & buffer, compute the next gold action based on gold_dependencies.
Args:
– stack: Stack object
– buffer: Buffer object
– gold_dependencies: Dependencies object
Returns:
– action: str; ‘S’, ‘LA-label’, or ‘RA-label’, where ‘label’ is a valid label. Return None if no action possible and ‘DONE’ if the parse is complete.
”’
action = None

### TODO ###
if len(stack) == 1:
if len(buffer) != 0:
return ‘S’
else:
return ‘DONE’

arc1 = gold_dependencies.getArcToHead(stack.get_si(2))
if arc1.head == stack.get_si(1):
return ‘LA-‘+arc1.label

arc2 = gold_dependencies.getArcToHead(stack.get_si(1))

if arc2.head == stack.get_si(2):
count = 0
for item in buffer:
if gold_dependencies.getArcToHead(item).head != stack.get_si(1):
count += 1
if count == len(buffer):
return ‘RA-‘+arc2.label

if len(buffer) != 0:
return ‘S’

return None

We provide you with 10 sentences to test this function. The first sentence is the example from the lecture slides, the next 8 sentences are artificial sentences designed to test edge cases, and the last sentence is an example from the training set. For each sentence, we have hard-coded the stack & buffer configurations that you should encounter as well as the correct action.

The to_print argument of this method may be set to one of the following values:

all: Print every sentence.
incorrect: Print only the sentences that your function gets incorrect.
none: Don’t print any of the sentences (only show the score).

You may also toggle do_raise, which controls whether an exception in your code is raised (halting execution) or suppressed (thus allowing all test sentences to complete so you can see your score).

Note that Gradescope uses a set of different (hidden) tests, so you will want to fully test your code here.

In [16]:

if __name__ == ‘__main__’:
test(get_gold_action, to_print=’incorrect’, do_raise=True)

10 / 10 = 100.0% of test cases passed!

TODO: Generate Training Examples [8 points]¶
Now you will write a function to generate the training examples. Recall that each sentence needs to be converted into a series of separate training examples. Each training example will essentially be a partial parser configuration along with its gold action; the goal of the neural network will be to predict this action from the parser configuration.

In order to make this prediction, you need to extract features from the parser configuration. You will implement the feature extraction method in a future section; for now, we pass in a dummy function feat_extract(parser_config) that returns an empty feature list.

This function is worth 8 points.

In [17]:

def generate_training_examples(sentence, gold_dependencies, vocab, feat_extract = lambda parser_config: []):
”’
Create training instances for sentence.
Inputs:
sentence: list of Word objects
gold_dependencies: Dependencies object that contains the complete gold dependency tree
vocab: Vocabulary object
feat_extract: Feature extraction function
Outputs:
training_examples: List of tuples (features, label), where features is a list and label is a string
Pseudocode:
(1) Initialize your parser configuration (note that the __init__ method of ParserConfiguration creates the stack & buffer for you)
(2) Repeatedly call get_gold_action(…) on your current parser confuration until the gold action is ‘DONE’
(3) If the gold action is None at any step, return [] (indicating the sentence cannot be parsed; it is non-projective)
(4) Otherwise, append tuple (features, gold action) to training_examples, where features is the result of calling feat_extract on your parser configuration
(5) Update your parser configuration according to the gold action
(6) Return training_examples
”’

training_examples = []

### TODO ###
parser = ParserConfiguration(sentence, vocab)
result = get_gold_action(parser.stack, parser.buffer, gold_dependencies)
while result != ‘DONE’:
if result == None:
return []
features = feat_extract(parser)
training_examples.append((features, result))
parser.parse_step(result)
result = get_gold_action(parser.stack, parser.buffer, gold_dependencies)

return training_examples

We provide you with a test for this function on the same test sentences we used above.

In [18]:

if __name__ == ‘__main__’:
test_generate_training_examples(generate_training_examples, to_print=’incorrect’, do_raise=True)

10 / 10 = 100.0% of test cases passed!

The following function calls generate_training_examples(…) on every sentence in the dataset to create the full training data. You do not need to edit it.

In [19]:

### DO NOT EDIT ###

def generate_all_training_examples(vocab, sentences, feat_extract = lambda parser_config: []):
”’
Generate training examples for all sentences.
”’
all_training_examples = []
successful_sents = 0
for sentence in sentences:
training_examples = generate_training_examples(sentence[0], sentence[1], vocab, feat_extract)
if training_examples != []:
all_training_examples += training_examples
successful_sents += 1

print(“Successfully generated training examples for”, successful_sents, “/”, len(sentences), “sentences”)
print(“Number of training examples:”, len(all_training_examples))

return all_training_examples

In [20]:

### DO NOT EDIT ###

if __name__ == ‘__main__’:
_vocab = Vocabulary(train_set) # Variable just used in this cell
generate_all_training_examples(_vocab, _vocab.buildSentences(train_set))

Successfully generated training examples for 11888 / 12543 sentences
Number of training examples: 373100

TODO: Extract Features [28 points]¶
By this point, you have written code to create individual training instances. Each instance is made up of a parser configuration along with the gold action that the classifier should be trained to predict.

In order to make this prediction, your neural network will have to rely on features extracted from each parser configuration. We follow the procedure described at the end of Section 3.1 of A Fast and Accurate Dependency Parser using Neural Networks. In total, we will extract 48 features from the parser configuration $-$ 18 word features, 18 POS features, and 12 label features:

Word & POS features for $s_1$, $s_2$, $s_3$ (top 3 items of the stack)
Word & POS features for $b_1$, $b_2$, $b_3$ (top 3 items of the buffer)
Word, POS, & label features for $lc_1(s_1)$, $lc_2(s_1)$, $lc_1(s_2)$, $lc_2(s_2)$ (the first & second leftmost children of the top 2 items on the stack)
Word, POS, & label features for $rc_1(s_1)$, $rc_2(s_1)$, $rc_1(s_2)$, $rc_2(s_2)$ (the first & second rightmost children of the top 2 items on the stack)
Word, POS, & label features for $lc_1(lc_1(s_1))$, $lc_1(lc_1(s_2))$, $rc_1(rc_1(s_1))$, $rc_1(rc_1(s_2))$ (the leftmost of the leftmost & rightmost of the rightmost children of the top 2 items on the stack)

You will write a separate function for each of the 5 bullets above. Each function will return a list of word features, a list of POS features, and (in the relevant cases) a list of label features. We also provide you with a test for each function.

A “feature” refers to the id (index in the Vocabulary) of a word, POS, or label. Your neural network will then construct embeddings for each id, much as you have seen in previous homeworks.

First, write 2 functions to extract features corresponding to the words at the top of the stack & buffer:

get_top3_stack_features(parser_config) [3 points]: Return word & POS features (ids) for $s_1$, $s_2$, $s_3$ (top 3 words on the stack).
get_top3_buffer_features(parser_config) [3 points]: Return word & POS features (ids) for $b_1$, $b_2$, $b_3$ (top 3 words on the buffer).

Wherever a particular word does not exist (such as when the stack or buffer has length $< 3$) use the appropriate NULL token. This is necessary to ensure that our neural network will see an equally sized feature vector for each example. In [21]: def get_top3_stack_features(parser_config): ''' Get the word and POS features for s1, s2, and s3 (the top 3 items on the stack) Returns: word_features: List of word ids for s1, s2, s3 (use vocab.WORD_NULL if a word does not exist) pos_features: List of POS ids for s1, s2, s3 (use vocab.POS_NULL if a word does not exist) ''' word_features, pos_features = [parser_config.vocab.WORD_NULL]*3, [parser_config.vocab.POS_NULL]*3 ### TODO ### s1 = parser_config.stack.get_si(1) s2 = parser_config.stack.get_si(2) s3 = parser_config.stack.get_si(3) if s1 != None: word_features[0] = s1.word_id pos_features[0] = s1.pos_id if s2 != None: word_features[1] = s2.word_id pos_features[1] = s2.pos_id if s3 != None: word_features[2] = s3.word_id pos_features[2] = s3.pos_id return word_features, pos_features In [22]: if __name__ == '__main__': test(get_top3_stack_features, to_print='incorrect', do_raise=True) 10 / 10 = 100.0% of test cases passed! In [23]: def get_top3_buffer_features(parser_config): ''' Get the word and POS features for b1, b2, and b3 (the top 3 items on the buffer) Returns: word_features: List of word ids for b1, b2, b3 (use vocab.WORD_NULL if a word does not exist) pos_features: List of POS ids for b1, b2, b3 (use vocab.POS_NULL if a word does not exist) ''' word_features, pos_features = [parser_config.vocab.WORD_NULL]*3, [parser_config.vocab.POS_NULL]*3 ### TODO ### s1 = parser_config.buffer.get_bi(1) s2 = parser_config.buffer.get_bi(2) s3 = parser_config.buffer.get_bi(3) if s1 != None: word_features[0] = s1.word_id pos_features[0] = s1.pos_id if s2 != None: word_features[1] = s2.word_id pos_features[1] = s2.pos_id if s3 != None: word_features[2] = s3.word_id pos_features[2] = s3.pos_id return word_features, pos_features In [24]: if __name__ == '__main__': test(get_top3_buffer_features, to_print='incorrect', do_raise=True) 10 / 10 = 100.0% of test cases passed! The remaining features have to do with the leftmost & rightmost children of the words at the top of the stack & buffer. Write the following 2 helper functions to make it easier to access these dependents: get_lc(parser_config, word) [2 points]: Return a list of arcs to dependents of word, sorted from left to right. Only include dependents that are to the left of word in the sentence. get_rc(parser_config, word) [2 points]: Return a list of arcs to dependents of word, sorted from right to left. Only include dependents that are to the right of word in the sentence. Hint: You can sort a list of objects using sorted(...) with the key parameter. In [27]: def get_lc(parser_config, word): ''' Finds the left dependents of word, sorted from left to right. Returns: A list of Arcs whose head is word, sorted by the indices of the dependent word from left to right. ''' ### TODO ### arc = [] for x in parser_config.dependencies: if x.head == word and word.idx > x.dependent.idx:
arc.append(x)
arc = sorted(arc, key=lambda x: x.dependent.idx)
return arc

In [28]:

if __name__ == ‘__main__’:
test(get_lc, to_print=’incorrect’, do_raise=True)

10 / 10 = 100.0% of test cases passed!

In [29]:

def get_rc(parser_config, word):
”’
Finds the right dependents of word, sorted from right to left.
Returns:
A list of Arcs whose head is word, sorted by the indices of the dependent word from right to left.
”’
### TODO ###
arc = []
for x in parser_config.dependencies:
if x.head == word and word.idx < x.dependent.idx: arc.append(x) arc = sorted(arc, key=lambda x: x.dependent.idx, reverse=True) return arc In [30]: if __name__ == '__main__': test(get_rc, to_print='incorrect', do_raise=True) 10 / 10 = 100.0% of test cases passed! Let $lc_j(s_i)$ be the $j$th leftmost child of the $i$th item on the stack. Write the following function: get_lc1_lc2_features(parser_config, i) [6 points]: Return word & POS features for $lc_1(s_i)$ and $lc_2(s_i)$. Additionally, return label features (the label ids) for the arcs that attach $lc_1(s_i)$ to $s_i$ and $lc_2(s_i)$ to $s_i$. As before, wherever a particular word does not exist, use the appropriate NULL token. We will call this function with i=1 and i=2, accounting for the words $lc_1(s_1)$, $lc_2(s_1)$, $lc_1(s_2)$, $lc_2(s_2)$. In [33]: def get_lc1_lc2_features(parser_config, i): ''' Get the word, POS, and label features for lc1(si) and lc2(si), where i in {1, 2} Returns: word_features: List of word ids for lc1(si), lc2(si) (use vocab.WORD_NULL if a word does not exist) pos_features: List of POS ids for lc1(si), lc2(si) (use vocab.POS_NULL if a word does not exist) label_features: List of label ids for lc1(si), lc2(si) (use vocab.LABEL_NULL if a word does not exist) ''' assert i in {1,2} word_features, pos_features, label_features = [parser_config.vocab.WORD_NULL]*2, [parser_config.vocab.POS_NULL]*2, [parser_config.vocab.LABEL_NULL]*2 ### TODO ### si = parser_config.stack.get_si(i) lc = get_lc(parser_config, si) if len(lc) >= 1:
lc1 = lc[0].dependent
word_features[0] = lc1.word_id
pos_features[0] = lc1.pos_id
label_features[0] = lc[0].label_id
if len(lc) >= 2:
lc2 = lc[1].dependent
word_features[1] = lc2.word_id
pos_features[1] = lc2.pos_id
label_features[1] = lc[1].label_id

return word_features, pos_features, label_features

In [34]:

if __name__ == ‘__main__’:
test(get_lc1_lc2_features, i=1, to_print=’incorrect’, do_raise=True) # call with i=1

10 / 10 = 100.0% of test cases passed!

In [35]:

if __name__ == ‘__main__’:
test(get_lc1_lc2_features, i=2, to_print=’incorrect’, do_raise=True) # call with i=2

10 / 10 = 100.0% of test cases passed!

You will now write the analagous function for the rightmost children. Let $rc_j(s_i)$ be the $j$th rightmost child of the $i$th item on the stack. Write the following function:

get_rc1_rc2_features(parser_config, i) [6 points]: Return word & POS features for $rc_1(s_i)$ and $rc_2(s_i)$. Additionally, return label features (the label ids) for the arcs that attach $rc_1(s_i)$ to $s_i$ and $rc_2(s_i)$ to $s_i$. As before, wherever a particular word does not exist, use the appropriate NULL token.

We will call this function with i=1 and i=2, accounting for the words $rc_1(s_1)$, $rc_2(s_1)$, $rc_1(s_2)$, $rc_2(s_2)$.

In [36]:

def get_rc1_rc2_features(parser_config, i):
”’
Get the word, POS, and label features for rc1(si) and rc2(si), where i in {1, 2}
Returns:
word_features: List of word ids for rc1(si), rc2(si) (use vocab.WORD_NULL if a word does not exist)
pos_features: List of POS ids for rc1(si), rc2(si) (use vocab.POS_NULL if a word does not exist)
label_features: List of label ids for rc1(si), rc2(si) (use vocab.LABEL_NULL if a word does not exist)
”’
assert i in {1,2}
word_features, pos_features, label_features = [parser_config.vocab.WORD_NULL]*2, [parser_config.vocab.POS_NULL]*2, [parser_config.vocab.LABEL_NULL]*2

### TODO ###
si = parser_config.stack.get_si(i)
lc = get_rc(parser_config, si)
if len(lc) >= 1:
lc1 = lc[0].dependent
word_features[0] = lc1.word_id
pos_features[0] = lc1.pos_id
label_features[0] = lc[0].label_id
if len(lc) >= 2:
lc2 = lc[1].dependent
word_features[1] = lc2.word_id
pos_features[1] = lc2.pos_id
label_features[1] = lc[1].label_id

return word_features, pos_features, label_features

return None, None, None

In [37]:

if __name__ == ‘__main__’:
test(get_rc1_rc2_features, i=1, to_print=’incorrect’, do_raise=True) # call with i=1

10 / 10 = 100.0% of test cases passed!

In [38]:

if __name__ == ‘__main__’:
test(get_rc1_rc2_features, i=2, to_print=’incorrect’, do_raise=True) # call with i=2

10 / 10 = 100.0% of test cases passed!

Finally, write the following function:

get_llc_rrc_features(parser_config, i) [6 points]: Return word & POS features for $lc_1(lc_1(s_i))$ and $rc_1(rc_1(s_i))$. Additionally, return label features (the label ids) for the arcs that attach $lc_1(lc_1(s_i))$ to $lc_1(s_i)$ and $rc_1(rc_1(s_i))$ to $rc_1(s_i)$. As before, wherever a particular word does not exist, use the appropriate NULL token.

We will call this function with i=1 and i=2, accounting for the words $lc_1(lc_1(s_1))$, $lc_1(lc_1(s_2))$, $rc_1(rc_1(s_1))$, $rc_1(rc_1(s_2))$.

In [39]:

def get_llc_rrc_features(parser_config, i):
”’
Get the word, POS, and label features for lc1(lc1(si)), and rc1(rc1(si)), where i in {1, 2}
Returns:
word_features: List of word ids for lc1(lc1(si)), and rc1(rc1(