程序代写代做代考 CS 224d: Assignment #3

CS 224d: Assignment #3
Updated Thursday 12th May, 2016 at 11:30pm
Due date: 5/21 11:59 PM PST (You are allowed to use three (3) late days maximum for this assignment)
This handout consists of several homework problems, as well as instructions on the “deliverables” associated with the coding portions of this assignment.
These questions require thought, but do not require long answers. Please be as concise as possible.
We encourage students to discuss in groups for assignments. However, each student must finish the problem set and programming assignment individually, and must turn in her/his assignment. We ask that you abide by the university Honor Code and that of the Computer Science department, and make sure that all of your submitted work is done by yourself.
Please review any additional instructions posted on the assignment page at http://cs224d.stanford.edu/assignments.html. When you are ready to submit, please follow the instructions on the course website.
1 RNN’s (Recursive Neural Network)
Welcome to SAIL (Stanford Artificial Intelligence Lab): Congrats! You have just been given a Research Assistantship in SAIL. Your task is to discover the power of Recursive Neural Networks (RNNs). So you plan an experiment to show their effectiveness on Positive/Negative Sentiment Analysis. In this part, you will derive the forward and backpropogation equations, implement them, and test the results.
Our RNN has one ReLU layer and one softmax layer, and uses Cross Entropy loss as its cost function. We follow the parse tree given from the leaf nodes up to the top of the tree and evaluate the cost at each node. During backprop, we follow the exact opposite path. Figure 1 shows an example of such a RNN applied to a simple sentence ”I love this assignment”. These equations are sufficient to explain our model:
CE(y,yˆ) = −􏰃y log(yˆ) ii
i
where y is the label represented as a one-hot row vector, and yˆ is a row vector containing the predicted probabilities for all classes. In our case, y ∈ R1×5 and yˆ ∈ R1×5 to represent our 5 sentiment classes: Really Negative, Negative, Neutral, Positive, and Really Positive. Furthermore,
h(1) =max(􏰛 h(1) ,h(1) 􏰜W(1) +b(1),0) Lef t Right
yˆ = softmax(h(1)U + b(s))
where h(1) is the vector representation of the left subtree (possibly be a word vector), the h(1) of the
Lef t
right subtree. For clarity, L ∈ R|V |×d, W(1) ∈ R2d×d, b(1) ∈ R1×d, U ∈ Rd×5, b(s) ∈ R1×5.
Right
(a) (20 points) Follow the example parse tree in Figure 1 in which we are given a parse tree and truth labels y for each node. Starting with Node 1, then to Node 2, finishing with Node 3, write the update rules for W(1),b(1), U, b(s), and L after the evaluation of yˆ against our truth, y. This means for at each node, we evaluate:
1

CS 224d: Assignment #3
Figure 1: RNN (Recursive Neural Network) example
δ 3 = yˆ − y
as our first error vector and we backpropogate that error through the network, aggregating gradient at
each node for:
∂J ∂J ∂J ∂J ∂J ∂U ∂b(s) ∂W (1) ∂b(1) ∂Li
Points will be deducted if you do not express the derivative of activation functions (ReLU) in terms of their function values (as with Assignment 1 and 2) or do not express the gradients by using an “error vector” (δi) propagated back to each layer. Tip on notation: δbelow and δabove should be used for error that is being sent down to the next node, or came from an above node. This will help you think about the problem in the right way. Note you should not be updating gradients for Li in Node 1. But error should be leaving Node 1 for sure!
(b) (80 points) Implementation time! We have simplified the problem to reduce training time by binarizing the sentiment labels. This means that all the sentences are either positive or negative. The internal nodes however, can be positive negative or neutral. While training, the cost function includes predictions over all nodes that have a sentiment associated with them and ignores the neutral nodes. While testing, we are only interested in the performance at the full sentence level. This is all provided for you in the starter code.
Page 2 of 3

CS 224d: Assignment #3
• (a) Download, unzip, and have a look at the code base.
• (b) From the command line, run ./setup.sh to download the labeled parse tree dataset.
• (c) Begin implementing the model in rnn.py by filling out the outline.
• (d) Run the model and show us the resulting loss plot, final training accuracy, final validation accuracy. You should be able to achieve an accuracy of 75%.
• (e) (10 points extra credit) Tune the parameters of the model to improve performance and report your new loss plot, final training accuracy, final validation accuracy. Explain why the changes you made helped achieve the improvement.
• (f) Ensure that your code is using the best model config and the trained weights are present in the weights folder. Run ./prepare submission.sh and submit the resulting zip file.
Page 3 of 3