代写 python network Go Assignment II

Assignment II
COMP 5900 Advanced Machine Learning Fall 2019
Note: Submit your assignment as a single typed pdf. No handwritten note will be accepted. Part I LSTM and RNN
In this part you will use RNN and LSTM for sentiment analysis of IMDB movies reviews. You will use Colab environment that you are familiar with from Assignment 1. Please download the Jupiter Notebook provided on the course website and save it in you Google Drive and open it in Colab. Go to Edit -> Notebook Setting and set the Runtime Type to Python 3 and Hardware Accelerator to GPU. The provided code classifies movie reviews to either positive or negative using 2 bidirectional LSTM layers stacked on top of each other.
Q1.1 (1 Mark) Run the cells one-by-one and follow the instructions provided in the notebook. Train the network for 5 epochs. How many parameters are in the two bi-LSTM layers? What is the accuracy on the test set? Write your answers in the table below. Hint: Your two-level LSTM looks like this:
LSTM(100, 200, num_layers=2, dropout=0.5, bidirectional=True)
Q1.2 (1 Mark) Modify the code such that it implements a 2-level bi-RNN. Train the network for 5 epochs. How many parameters are in the two bi-RNN layers? What is the accuracy on the test set? Write your answers in the table below. Hint: your two-level RNN should look like this:
RNN(100, 200, num_layers=2, dropout=0.5, bidirectional=True)
Q1.3 (1 Mark) Modify the code such that it implements a bi-RNN. Train the network for 5 epochs. How many parameters are in the bi-RNN layer? What is the accuracy on the test set? Write your answers in the table below. Hint: your one-level RNN should look like this:
RNN(100, 200, dropout=0.5, bidirectional=True)
Q1.4 (1 Mark) Modify the code such that it implements a RNN. Train the network for 5 epochs. How many parameters are in the RNN layer? What is the accuracy on the test set? Write your answers in the table below. Hint: your one-level RNN should look like this:
RNN(100, 200, dropout=0.5)
Total # of parameters in RNN/LSTM layer(s)
Test accuracy %
2-level bi-LSTM
2-level bi-RNN
bi-RNN
RNN
Note: No need to submit your code.

Part II Triplet Loss
In this Part you are going to explore how “triplet loss” works. There is no implementation in this part. Triplet loss is a loss function for artificial neural networks where a baseline (anchor) input is compared to a positive (truthy) input and a negative (falsely) input. The distance from the baseline (anchor) input to the positive (truthy) input is minimized, and the distance from the baseline (anchor) input to the negative (falsely) input is maximized. The loss function can be described using a Euclidean distance function. The triplet is formed by drawing an anchor input, a positive input that describes the same entity as the anchor entity, and a negative input that does not describe the same entity as the anchor entity. These inputs are then run through the network, and the outputs are used in the loss function.
𝑥𝑖
𝑔𝑖
𝑓 𝑖
Let 𝑔 be an arbitrary model (e.g. neural network) and 𝑓 be the embedding for the input sample 𝑥 . We 𝑖𝑖
are going to learn the embeddings 𝑓 . For simplicity, assume that 𝑔 consists of only a single dense layer 𝑖
with parameters 𝑊. Therefor 𝑓 = 𝑊𝑥 . 𝑖𝑖
Let 𝑡 = (𝑓 , 𝑓 , 𝑓 ) be a triplet where 𝑓 , 𝑓 𝑎𝑛𝑑 𝑓 are embeddings corresponding to “anchor”, 𝑎𝑝𝑛 𝑎𝑝𝑛
“positive” and “negative” samples respectively. Let’s define the loss for the t-th triplet as
𝑡𝑡1212
L = 𝐿 (𝑓 , 𝑓 , 𝑓 ) = max (0, ‖𝑓 − 𝑓 ‖ + 𝛼 − ‖𝑓 − 𝑓 ‖ ), (1)
𝑎𝑝𝑛2𝑎𝑝2𝑎𝑛 𝑇𝑡
and the overall loss L = ∑𝑡=1 𝐿 where 𝑇 is the total number of triplets. Q2(1.5Marks)Determine𝜕𝐿𝑡 ,𝜕𝐿𝑡 and𝜕𝐿𝑡 ,if1‖𝑓 −𝑓‖2+𝛼−1‖𝑓 −𝑓‖2 ≤0.
𝜕𝑓𝜕𝑓𝜕𝑓2𝑎𝑝 2𝑎𝑛 𝑎𝑝n
Q3 (1.5 Marks) Determine 𝜕𝐿𝑡 , 𝜕𝐿𝑡 and 𝜕𝐿𝑡 , if 1 ‖𝑓 − 𝑓 ‖2 + 𝛼 − 1 ‖𝑓 − 𝑓 ‖2 > 0. 𝜕𝑓𝜕𝑓𝜕𝑓2𝑎𝑝 2𝑎𝑛
𝑎𝑝n
Assume that our dataset has six samples from two classes. Here is our dataset D.
Q4 (1 Mark) How many unique triplets can be generated from dataset D?
Sample
𝑥1
𝑥2
𝑥3
𝑥4
𝑥5
𝑥6
Class Label
1
1
1
2
2
2

Assumethatonly10outofalltripletssatisfy1‖𝑓 −𝑓‖2+𝛼−1‖𝒇 −𝑓‖2 >0.Herearethe10 2𝑎𝑝 2𝑎𝑛
triplets.
𝑡1 = (𝑥1,𝑥2,𝑥5), 𝑡2 = (𝑥1,𝑥3,𝑥4), 𝑡3 = (𝑥2,𝑥1,𝑥6), 𝑡4 = (𝑥2,𝑥3,𝑥4), 𝑡5 = (𝑥3,𝑥1,𝑥5)
𝑡6 = (𝑥5,𝑥6,𝑥2), 𝑡7 = (𝑥6,𝑥4,𝑥3), 𝑡8 = (𝑥6,𝑥4,𝑥2).
We are going to update the weights 𝑊 using gradient decent. The general procedure is as follows:
1. Perform the forward pass for the input data and compute the embeddings 𝑓 , … , 𝑓 . 16
𝑡
2. Compute the 𝐿 for every triplet using Equation (1) and compute the overall loss L.
3. Compute the gradient of loss 𝐿 for every training sample i.e. compute 𝛥1, ⋯ , 𝛥6 where 𝛥𝑖 = 𝜕𝐿 𝜕𝑓𝑖 .
𝜕𝑓𝑖 𝜕𝑊
4. Update 𝑊 using gradient decent: 𝑊𝑛𝑒𝑤 = 𝑊𝑜𝑙𝑑 − 𝜂 1 ∑6 𝛥 . 6 𝑖=1 𝑖
Let’s take a closer look at step 3. In our simple network 𝑓 = 𝑊𝑥 . Therefore 𝜕𝑓𝑖 = 𝑥 . For computing 𝑖𝑖 𝜕𝑊𝑖
𝛥𝑖 = 𝜕𝐿 𝑥𝑖 we need to compute gradient of loss 𝐿 with respect to the embedding of every sample, i.e. 𝜕𝑓𝑖
𝜕𝐿 𝜕𝐿
we need to compute 𝜕𝑓 ,⋯,𝜕𝑓 . We want to minimize the loss for all triplets, so the loss is L = ∑𝑡=1 𝐿
16
𝑡
𝜕𝐿 , ⋯ , 𝜕𝐿 for our toy dataset and in Q5 you will compute some intermediate values needed to do so. 𝜕𝑓 𝜕𝑓
8𝑡 where 𝐿 is the triplet loss that works on three samples of the triplet 𝑡. In Q6, you are going to compute
16
Q5 (4 Marks) Q5.1 Determine
Q5.2 Determine Q5.3 Determine Q5.4 Determine Q5.5 Determine Q5.6 Determine Q5.7 Determine Q5.8 Determine
𝜕𝐿1 , 𝜕𝐿1 , 𝜕𝐿1 , 𝜕𝐿1 , 𝜕𝐿1 and 𝜕𝐿1. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿2 , 𝜕𝐿2 , 𝜕𝐿2 , 𝜕𝐿2 , 𝜕𝐿2 and 𝜕𝐿2. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿3 , 𝜕𝐿3 , 𝜕𝐿3 , 𝜕𝐿3 , 𝜕𝐿3 and 𝜕𝐿3. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿4 , 𝜕𝐿4 , 𝜕𝐿4 , 𝜕𝐿4 , 𝜕𝐿4 and 𝜕𝐿4. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿5 , 𝜕𝐿5 , 𝜕𝐿5 , 𝜕𝐿5 , 𝜕𝐿5 and 𝜕𝐿5. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿8 , 𝜕𝐿8 , 𝜕𝐿8 , 𝜕𝐿8 , 𝜕𝐿8 and 𝜕𝐿8. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿9 , 𝜕𝐿9 , 𝜕𝐿9 , 𝜕𝐿9 , 𝜕𝐿9 and 𝜕𝐿9. 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
12345 6
𝜕𝐿10 , 𝜕𝐿10 , 𝜕𝐿10 , 𝜕𝐿10 , 𝜕𝐿10 and 𝜕𝐿10.
𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 12345 6

Q6 (3 Marks) Determine 𝜕𝐿 , 𝜕L , 𝜕𝐿 , 𝜕L , 𝜕L 𝑎𝑛𝑑 𝜕L . The final expression should be in its simplest 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓
form.
𝐻𝑖𝑛𝑡:𝑓𝑜𝑟𝑒𝑥𝑎𝑚𝑝𝑙𝑒 𝜕𝐿 =𝜕𝐿1 +𝜕𝐿2 +⋯+𝜕𝐿8.
12345 6
𝜕𝑓 𝜕𝑓 𝜕𝑓 𝜕𝑓 111 1