CSCI 567
Homework #2 Programming Assignments Due: 11:59 pm, Oct. 7th, 2018
General Instructions
Starting point Your repository will now have a directory ‘homework2/’. Please do not change the name
ofthisrepositoryorthenamesofanyfileswehaveaddedtoit.git pullandyouwillfindthefollowing:
- Python scripts which you need to amend:
– logistic.py – dnn misc.py – dnn cnn 2.py
- Python scripts which you are not allowed to modify:
– , , dnn cnn.py, hw2 dnn check.py, dnn im2col.pyand
- Helperscriptsthatyouwillusetogenerageoutputfiles: ,q34.sh, ,q36.sh,q37.sh,
q38.sh, q310.sh, logistic binary.sh and
Environment This assignment has to be done in Python 3.5.2. Make sure you have the correct versioninstalled. As in previous assignments, there are multiple ways you can install Python 3, for example:
- Install Virtualbox and import this virtual machine. Everything is setup already in this VM. You sub- mission will eventually be graded within this VM too.
- Alternatively you can also use virtualenv or miniconda to create a Python 3.5.2 environment for this programming assignment.
- [Fortestingpurposesonly]OtheronlinecodingplatformssuchasGoogleColabwhichpre-installsall the packages used in this assignment (but with different versions).
Python Packages You are allowed to use the following Python packages: • all built-in packages in Python 3.5.2, such as sys.
• numpy (1.13.1)
• scipy (0.19.1)• matplotlib (2.0.2)
You will use Numpy mostly; in contrast, Scipy is usually not needed unless you have special needs. As for Matplotlib, you can use it to visualize the results, but it is not required for implementing this assignment.
You will also need the following package for testing your code, but do not import it yourself or use any of its functions in you implementation:
• sklearn (0.19.0)
Download the data Please download mnist subset.json from Piazza/Resource/Homework. DO NOT push it into your repository when you submit your results; otherwise, you will get 20% deduction of your score of this assignment.
dnn mlp.py
dnn mlp nononlinear.py
data loader.py
q33.sh
q35.sh
logistic multiclass.sh
1
Submission Instructions The following will constitute your submission:
• The three Python scripts that you need to amended according to Sect. 2 and Sect. 3. Make sure that
you committed your changes.
• Seven .json files and two .out files. These are the outputs from the eight helper scripts.
logistic binary.out
logistic multiclass.out
MLP lr0.01 m0.0 w0.0 d0.0.json
MLP lr0.01 m0.0 w0.0 d0.5.json
MLP lr0.01 m0.0 w0.0 d0.95.json
LR lr0.01 m0.0 w0.0 d0.0.json
CNN lr0.01 m0.0 w0.0 d0.5.json
CNN lr0.01 m0.9 w0.0 d0.5.json
CNN2 lr0.001 m0.9 w0.0 d0.5.json
2
Problem 1 High-level descriptions
1.1 Dataset (Same as in Homework 1.) We will use mnist subset (images of handwritten digits from 0 to 9). The dataset is stored in a JSON-formated file mnist subset.json. You can access its training, valida- tion, and test splits using the keys ‘train’, ‘valid’, and ‘test’, respectively. For example, suppose we load mnist subset.json to the variable x. Then, x[′train′] refers to the training set of mnist subset. This set is a list with two elements: x[′train′][0] containing the features of size N (samples) ×D (dimension of features), and x[′train′][1] containing the corresponding labels of size N.
Besides, for logistic regression in Sect. 2, you will be using synthetic datasets with two, three and five classes.
1.2 Tasks You will be asked to implement binary and multiclass classification (Sect. 2) and neural net- works (Sect. 3). Specifically, you will
- finish the implementation of all python functions in our template codes.
- run your code by calling the specified scripts to generate output files.
- add, commit, and push (1) all *.py files, and (2) all *.json and *.out files that you have amended or created.
In the next two subsections, we will provide a high-level checklist of what you need to do. You are not responsible for loading/pre-processing data; we have done that for you. For specific instructions, please refer to text in Sect. 2 and Sect. 3, as well as corresponding python scripts.
1.2.1 Logistic regression
Coding In , finish implementing the following functions: binary train, , , , ovr train and ovr predict. Refer to
and Sect. 2 for more information.
Running your code Run the scripts logistic binary.sh and logistic multiclass.sh after you finish your implementation. This will output:
• logistic binary.out
• logistic multiclass.out
What to submit Submit logistic.py, logistic binary.out, logistic multiclass.out.
1.2.2 Neural networks
Preparation Read Sect. 3 as well as dnn mlp.py and dnn cnn.py.
Coding First, in dnn misc.py, finish implementing
• forward and backward functions in class linear layer
• forward and backward functions in class relu
• backward function in class dropout (before that, please read forward function).
Refer to and Sect. 3 for more information.
Second, in , finish implementing the main function. There are five TODO items. Refer to
and Sect. 3 for more information.
Running your code Run the scripts q33.sh, q34.sh, q35.sh, q36.sh, q37.sh, q38.sh, q310.sh after
you finish your implementation. This will generate, respectively, 3
logistic.py
binary predict
multinomial train
multinomial predict
logistic.py
dnn misc.py
dnn cnn 2.py
dnn cnn 2.py
MLP lr0.01 m0.0 w0.0 d0.0.json
MLP lr0.01 m0.0 w0.0 d0.5.json
MLP lr0.01 m0.0 w0.0 d0.95.json
LR lr0.01 m0.0 w0.0 d0.0.json
CNN lr0.01 m0.0 w0.0 d0.5.json
CNN lr0.01 m0.9 w0.0 d0.5.json
CNN2 lr0.001 m0.9 w0.0 d0.5.json
What to submit Submit dnn misc.py, , and the above seven .json files.
1.3 Cautions
- Do not import packages that are not listed above (See Python Packages section).
- Follow the instructions in each section strictly to code up your solutions.
- DO NOT CHANGE THE OUTPUT FORMAT.
- DO NOT MODIFY THE CODE UNLESS WE INSTRUCT YOU TO DO SO.
- A homework solution that mismatches the provided setup, such as format, name, initializations, etc., will not be graded.
- It is your responsibility to make sure that your code runs with Python 3.5.2 in the VM.
dnn cnn 2.py
Advice We are extensively using softmax and sigmoid function in this homework. To avoid numerical issues such as overflow and underflow caused by numpy.exp() and numpy.log(), please use the following implementations:
1.4
• Let x be a input vector to the softmax function. Use x ̃ = x − max(x) instead of using x directly for the softmax function f . That is, if you want to compute f (x) , compute f (x ̃) = exp(x ̃i ) instead, which
i i ∑Dj=1 exp(x ̃j) is clearly mathematically equivalent but numerically more stable.
• If you are using numpy.log(), make sure the input to the log function is positive. Also, there may be chances that one of the outputs of softmax, e.g. f(x ̃)i, is extremely small but you need the value ln(f(x ̃)i). In this case you should convert the computation equivalently into x ̃i −ln(∑Dj=1 exp(x ̃j)).
We have implemented and run the code ourselves without problems, so if you follow the instructions and settings provided in the python files, you should not encounter overflow or underflow.
4
Problem 2 Logistic Regression (20 Points)
For this assignment you are asked to implement Logistic Regression for binary and multiclass classification.
Q2.1 (6 Points)
In lecture 3 we discussed logistic regression for binary classification. In this problem, you are given a training set D = (xn, yn)N , where yi ∈ {0, 1} ∀i = 1…N. Important: note that here the binary labels are
n=1
not −1 or +1 as used in the lecutre, so be very careful about applying formulas from the lecture notes.
Your task is to learn the linear model specified by wT x + b that minimizes the logistic loss. Note that we do not explicitly append the feature 1 to the data, so you need to explicitly learn the bias/intercept term b too. Specifically you need to implement function binary train in logistic.py which uses gradient descent (not stochastic gradient descent) to find the optimal parameters (recall logistic regression does not admit a closed-form solution).
In addition you need to implement function binary predict in logistic.py. We discuss two ways of making predictions in logistic regression in lecture 4: deterministic prediction or randomized prediction. Here you need to use the deterministic prediction.
After finishing implementation, please run logistic binary.sh which generates logistic binary.out.
What to submit:
• logistic.py
• logistic binary.out
Q2.2 (7 Points) In the lectures you learned several methods to perform multiclass classification. One of them was one-versus-rest or one-versus-all approach.
For one-versus-rest classification in a problem with K classes, we need to train K classifiers using a black-box. Classifier k is trained on a binary problem, where the two labels corresponds to belonging or not belonging to class k. After that, the multiclass prediction is made based on the combination of all predictions from K binary classifiers.
In this problem you will implement one-versus-rest using binary logistic regression (that you have im- plemented in Q2.1) as the black-box. Important: the way to predict discussed in the lecture is to randomized over the classifiers that say “yes”; however, here since binary logistic regression naturally predicts a proba- bility for each class (recall the sigmoid model), we will simply predict the class with the highest probability (using numpy argmax).
To sum up, you need to complete functions OVR train and to perform one-versus-rest classification. After you finished implementation, please run script, which will produce logistic multiclass.out.
What to submit: logistic.py and logistic multiclass.out.
Q2.3 (7 Points) Yet another multiclass classification method you learned was multinomial logistic regres- sion. Complete the functions multinomial train and multinomial predict to perform multinomial logistic regression, following the same notes as in Q2.1, that is, 1) explicitly learn the biased term; 2) perform gradient descent instead of stochastic gradient descent; 3) make deterministic predictions.
After you finished implementation, please run logistic multiclass.sh script, which will produce logistic multiclass.out.
What to submit: logistic.py and logistic multiclass.out.
OVR predict
logistic multiclass.sh
5
linear(1) relu linear(2) softmax
x u h a z yˆ
input features predicted label
Figure 1: A diagram of a multi-layer perceptron (MLP). The edges mean mathematical operations (modules), and the circles mean variables. The term relu stands for rectified linear units.
Problem 3 Neural networks: multi-layer perceptrons (MLPs) and convolutional neu- ral networks (CNNs)
(30 Points)
Background
In recent years, neural networks have been one of the most powerful machine learning models. Many tool- boxes/platforms (e.g., TensorFlow, PyTorch, Torch, Theano, MXNet, Caffe, CNTK) are publicly available for efficiently constructing and training neural networks. The core idea of these toolboxes is to treat a neural network as a combination of data transformation (or mathematical operation) modules.
For example, in Fig. 1 we provide a diagram of a multi-layer perceptron (MLP, just another term for fully connected feedforward networks we discussed in the lecture) for a K-class classification problem. The edges correspond to modules and the circles correspond to variables. Let (x ∈ RD, y ∈ {1, 2, · · · , K}) be a labeled instance, such an MLP performs the following computations
input features : linear(1) :
relu : linear(2) :
softmax :
predicted label :
x∈RD (1) u=W(1)x+b(1) ,W(1) ∈RM×D andb(1) ∈RM (2)
,W(2) ∈RK×M andb(2) ∈RK (4)
(5)
(6)
∑ eak k
. z=.
eaK ∑k eak
yˆ=argmaxkzk.
max{0,u1} .
h=max{0,u}= . (3) max{0,uM}
a=W(2)h+b(2) ea1
For a K-class classification problem, one popular loss function for training (i.e., to learn W(1), W(2), b(1), b(2)) is the cross-entropy loss. Specifically we denote the cross-entropy loss with respect to the training example (x, y) by l:
l=−log(zy)=log 1+∑eak−ay k̸=y
Note that one should look at l as a function of the parameters of the network, that is, W(1),b(1),W(2) and b(2). For ease of notation, let us define the one-hot (i.e., 1-of-K) encoding of a class y as
6
so that
1, ify=k,
y ∈ RK and yk = 0, otherwise. (7)
log z1 ∑kkT.T
l=− y logz =−y . =−y logz. (8)
k
the parameters of a neural network, and use gradient-based optimization to learn the parameters.
Modules
Now we will provide more information on modules for this assignment. Each module has its own param- eters (but note that a module may have no parameters). Moreover, each module can perform a forward pass and a backward pass. The forward pass performs the computation of the module, given the input to the module. The backward pass computes the partial derivatives of the loss function w.r.t. the input and parameters, given the partial derivatives of the loss function w.r.t. the output of the module. Consider a module ⟨module name⟩. Let ⟨module name⟩.forward and ⟨module name⟩.backward be its forward and backward passes, respectively.
For example, the linear module may be defined as follows.
forward pass: u = linear(1).forward(x) = W(1)x + b(1), (9)
where W(1) and b(1) are its parameters.
backward pass: [ ∂l , ∂l , ∂l ] = linear(1).backward(x, ∂l ). (10)
Let us assume that we have implemented all the desired modules. Then, getting yˆ for x is equivalent to running the forward pass of each module in order, given x. All the intermediated variables (i.e., u, h, etc.) will all be computed along the forward pass. Similarly, getting the partial derivatives of the loss function w.r.t. the parameters is equivalent to running the backward pass of each module in a reverse order, given
∂l. ∂z
In this question, we provide a Python environment based on the idea of modules. Every module is defined as a class, so you can create multiple modules of the same functionality by creating multiple object instances of the same class. Your work is to finish the implementation of several modules, where these modules are elements of a multi-layer perceptron (MLP) or a convolutional neural network (CNN). We will apply these models to the same 10-class classification problem introduced in Sect. 2. We will train the models using stochastic gradient descent with mini-batch, and explore how different hyperparameters of optimizers and regularization techniques affect training and validation accuracies over training epochs. For deeper understanding, check out, e.g., the seminal work of Yann LeCun et al. “Gradient-based learning applied to document recognition,” written in 1998.
We give a specific example below. Suppose that, at iteration t, you sample a mini-batch of N examples
{(xi ∈ RD, yi ∈ RK)}N from the training set (K = 10). Then, the loss of such a mini-batch given by Fig. 1
is
log zK
We can then perform error-backpropagation, a way to compute partial derivatives (or gradients) w.r.t
∂x ∂W(1) ∂b(1) ∂u
i=1
7
x linear(1) relu dropout linear(2) softmax yˆ
Figure 2: The diagram of the MLP implemented in dnn mlp.py. The circles mean variables and edges mean modules.
1N (2) (1)
lmb = N ∑l(softmax.forward(linear .forward(relu.forward(linear .forward(xi)))),yi) (11)
i=1
1∑N (2) ii
= N =···
l(softmax.forward(linear .forward(relu.forward(u ))), y ) (12) (13)
i=1 1N
= N ∑l(softmax.forward(ai),yi) i=1
1NK
= N ∑∑yiklogzik.
i=1 k=1
(14) (15)
That is, in the forward pass, we can perform the computation of a certain module to all the N input exam-
ples, and then pass the N output examples to the next module. This is the same case for the backward pass.
For example, according to Fig. 1, given the partial derivatives of the loss w.r.t. {ai}N i=1
∂lmbT ()
∂a1 ∂lmb
( )T
∂l ∂a2 mb .
N = . , ∂{ai }i=1 ∂lmb
(16)
)T ∂aN−1 (∂lmb)T
∂aN
∂lmb and pass it back to relu.backward.
∂{hi}N i=1
(
linear(2).backward will compute Preparation
Q3.1 Please read through dnn mlp.py and dnn cnn.py. Both files will use modules defined in dnn misc.py (which you will modify). Your work is to understand how modules are created, how they are linked to perform the forward and backward passes, and how parameters are updated based on gradients (and mo- mentum). The architectures of the MLP and CNN defined in dnn mlp.py and dnn cnn.py are shown in
Fig. 2 and Fig. 3, respectively. What to submit: Nothing. Coding: Modules
8
convolution relu max pooling flatten dropout linear softmax
x yˆ
Figure 3: The diagram of the CNN implemented in dnn cnn.py. The circles correspond to variables and edges correspond to modules. Note that the input to CNN may not be a vector (e.g., in dnn cnn.py it is an image, which can be represented as a 3-dimensional tensor). The flatten layer is to reshape its input into vector.
Q3.2 (14 Points) You will modify dnn misc.py. This script defines all modules that you will need to construct the MLP and CNN in , respectively. You have three tasks. First, finish the implementation of and functions in class linear layer. Please follow Eqn. (2) for the forward pass and derive the partial derivatives accordingly. Second, finish the implemen- tation of forward and backward functions in class relu. Please follow Eqn. (3) for the forward pass and derive the partial derivatives accordingly. Third, finish the the implementation of backward function inclass dropout.Wedefinetheforwardandthebackwardpassesasfollows.
dnn mlp.py
and
dnn cnn.py
forward
backward
forward pass:
backward pass:
s = dropout.forward(q ∈ R ) = 1 − r × . , (17) 1[pJ >=r]×qJ
1[p1 >= r] × q1 J1.
where pj is sampled uniformly from [0, 1), ∀j ∈ {1, · · · , J}, and r ∈ [0, 1) is a pre-defined scalar named dropout rate.
∂l ∂q
1[p1 >=r]× ∂l 1 .
(18)
∂l ∂s1
= dropout.backward(q,
) = × . . (19) ∂s 1−r 1[pJ>=r]×∂l
Note that pj, j ∈ {1, · · · , J} and r are not be learned so we do not need to compute the derivatives w.r.t. to them. Moreover, pj, j ∈ {1, · · · , J} are re-sampled every forward pass, and are kept for the following backward pass. The dropout rate r is set to 0 during testing.
Detailed descriptions/instructions about each pass (i.e., what to compute and what to return) are in- cluded in dnn misc.py. Please do read carefully.
Note that in this script we do import numpy as np. Thus, to call a function XX from numpy, please use np.XX.
What to do and submit: Finish the implementation of 5 functions specified above in dnn misc.py. Sub- mit your completed dnn misc.py. We do provide a checking code hw2 dnn check.py to check your implementation.
Testing dnn misc.py with multi-layer perceptron (MLP)
Q3.3 (2 Points) What to do and submit: run script q33.sh. It will output MLP lr0.01 m0.0 w0.0 d0.0.json. Add, commit, and push this file before the due date.
What it does: q33.sh will run python3 dnn mlp.py with learning rate 0.01, no momentum, no weight decay, and dropout rate 0.0. The output file stores the training and validation accuracies over 30 training epochs.
∂sJ
9
Q3.4 (2 Points) What to do and submit: run script q34.sh. It will output MLP lr0.01 m0.0 w0.0 d0.5.json. Add, commit, and push this file before the due date.
What it does: q34.sh will run python3 dnn mlp.py –dropout rate 0.5 with learning rate 0.01,
no momentum, no weight decay, and dropout rate 0.5. The output file stores the training and validation accuracies over 30 training epochs.
Q3.5 (2 Points) What to do and submit: run script q35.sh. It will output MLP lr0.01 m0.0 w0.0 d0.95.json. Add, commit, and push this file before the due date.
What it does: q35.sh will run python3 dnn mlp.py –dropout rate 0.95 with learning rate 0.01,
no momentum, no weight decay, and dropout rate 0.95. The output file stores the training and validation accuracies over 30 training epochs.
You will observe that the model in Q3.4 will give better validation accuracy (at epoch 30) compared to Q3.3. Specifically, dropout is widely-used to prevent over-fitting. However, if we use a too large dropout rate (like the one in Q3.5), the validation accuracy (together with the training accuracy) will be relatively lower, essentially under-fitting the training data.
Q3.6 (2 Points) What to do and submit: run script q36.sh. It will output LR lr0.01 m0.0 w0.0 d0.0.json. Add, commit, and push this file before the due date.
What it does: q36.sh will run python3 dnn mlp nononlinear.py with learning rate 0.01, no momen- tum, no weight decay, and dropout rate 0.0. The output file stores the training and validation accuracies over 30 training epochs.
The network has the same structure as the one in Q3.3, except that we remove the relu (nonlinear) layer. You will see that the validation accuracies drop significantly (the gap is around 0.03). Essentially, without the nonlinear layer, the model is learning multinomial logistic regression similar to Q2.3.
Testing dnn misc.py with convolutional neural networks (CNN)
Q3.7 (2 Points) What to do and submit: run script q37.sh. It will output CNN lr0.01 m0.0 w0.0 d0.5.json. Add, commit, and push this file before the due date.
What it does: q37.sh will run python3 dnn cnn.py with learning rate 0.01, no momentum, no weight decay, and dropout rate 0.5. The output file stores the training and validation accuracies over 30 training epochs.
Q3.8 (2 Points) What to do and submit: run script q38.sh. It will output CNN lr0.01 m0.9 w0.0 d0.5.json. Add, commit, and push this file before the due date.
What it does: q38.sh will run python3 dnn cnn.py –alpha 0.9 with learning rate 0.01, momentum
0.9, no weight decay, and dropout rate 0.5. The output file stores the training and validation accuracies over
30 training epochs.
You will see that Q3.8 will lead to faster convergence than Q3.7 (i.e., the training/validation accuracies
will be higher than 0.94 after 1 epoch). That is, using momentum will lead to more stable updates of the parameters.
Coding: Building a deeper architecture
Q3.9 (2 Points) The CNN architecture in dnn cnn.py has only one convolutional layer. In this question, you are going to construct a two-convolutional-layer CNN (see Fig. 4 using the modules you implemented in Q3.2. Please modify the main function in dnn cnn 2.py. The code in dnn cnn 2.py is similar to that in dnn cnn.py, except that there are a few parts marked as TODO. You need to fill in your code so as to construct the CNN in Fig. 4.
10
x conv relu max-p conv relu max-p flatten dropout linear softmax yˆ
Figure 4: The diagram of the CNN you are going to implement in dnn cnn 2.py. The term conv stands for convolu- tion; max-p stands for max pooling. The circles correspond to variables and edges correspond to modules. Note that the input to CNN may not be a vector (e.g., in dnn cnn 2.py it is an image, which can be represented as a 3-dimensional tensor). The flatten layer is to reshape its input into vector.
What to do and submit: Finish the implementation of the main function in dnn cnn 2.py (search for TODO in main). Submit your completed dnn cnn 2.py.
Testing dnn cnn 2.py
Q3.10 (2 Points) What to do and submit: run script q310.sh. It will output CNN2 lr0.001 m0.9 w0.0 d0.5.json. Add, commit, and push this file before the due date.
What it does: q310.sh will run python3 dnn cnn 2.py –alpha 0.9 with learning rate 0.01, momen-
tum 0.9, no weight decay, and dropout rate 0.5. The output file stores the training and validation accuracies
over 30 training epochs.
You will see that you can achieve slightly higher validation accuracies than those in Q3.8.
11