python深度学习代写 CMPT 419/726: Assignment 2

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

Assignment 2: Classification / Deep learning Due October 29 at 11:59pm

This assignment is to be done individually.

Important Note: The university policy on academic dishonesty (cheating) will be taken very seriously in this course. You may not provide or use any solution, in whole or in part, to or by another student.

You are encouraged to discuss the concepts involved in the questions with other students. If you are in doubt as to what constitutes acceptable discussion, please ask! Further, please take advantage of office hours offered by the instructor and the TA if you are having difficulties with this assignment.

DO NOT:
• Give/receive code or proofs to/from other students

• Use Google to find solutions for assignment DO:

• Meet with other students to discuss assignment (it is best not to take any notes during such meetings, and to re-work assignment on your own)

• Useonlineresources(e.g.Wikipedia)tounderstandtheconceptsneededtosolvetheassignment

1

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

1 Softmax for Multi-Class Classification (10 marks)

The softmax function is a multi-class generalization of the logistic sigmoid: exp(ak )

p(Ck|x) = 􏰂j exp(aj) (1) Consider a case where the activation functions aj are linear functions of the input. Assume there

are 3 classes (C1, C2, C3), and the input is x = (x1, x2) ∈ R2

• a1 =3×1 +1×2 +1
• a2 =1×1 +3×2 +2
• a3 = −3×1 +1.5×2 +2

The image below shows the 3 decision regions induced by these activation functions, their common point intersection point (in green) and decision boundaries (in red).

Answer the following questions. For 2 and 3, you may provide qualitative answers (i.e. no need to analyze limits).

  1. (3 marks) What are the probabilities p(Ck|x) at the green point?
  2. (3 marks) What happens to the probabilities along each of the red lines? What happens as

    we move along a red line (away from the green point)?

  3. (4marks)Whathappenstotheprobabilitiesaswemovefarawayfromtheintersectionpoint, staying in the middle of one region?

2

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

2 Error Backpropagation (30 marks)

We will derive error derivatives using back-propagation on the network below.
Notation: Please use notation following the examples of names for weights given in the figure.

For activations/outputs, the red node would have activation a(2) = w(1)x + w(1)x + w(1)x and

output z(2) = h(a(2)). 22

Activation functions: Assume the activation functions h(·) for the hidden layers are logistics. For the final output node assume the activation function is an identity function h(a) = a.

Error function: Assume this network is doing regression, trained using the standard squared error so that En(w) = 1 (y(xn, w) − tn)2.

2

2 211 222 233

w w2
  w3



x x2

x3
input

output

Consider the output layer.
• Calculate ∂En(w). Note that a(4) is the activation of the output node, and

∂a(4) 1 1

∂En(w)

that ∂En(w) ≡ δ(4).

∂a(4) 1 1

• Use this result to calculate ∂w(3) . 12

Next, consider the penultimate layer of nodes.
• Write an expression for ∂En(w). Use δ(4) in this expression.

∂a(3) 1 1

• Use this result to calculate ∂En(w) . ∂w(2)

11

Finally, consider the weights connecting from the inputs.
• Write an expression for ∂En(w). Use the set of δ(3) in this expression.

∂a(2) 1

• Use this result to calculate ∂En(w) . ∂w(1)

k

11

3

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

3 Vanishing Gradients (40 marks)

Consider the network below. Use the same notation as the previous question.

2
w w …

x
x2 …

input

w 52 

output

  • Write an expression for ∂En(w) for all layers l in the network. ∂w(l)

    11

  • Suppose we use logistic sigmoid activation functions in this network. Describe what would happen to the gradient for weights early in the network,∂En(w) for smaller l. For example,

    ∂w(l) 11

    when would these gradients be very small? When would they be reasonable in magnitude?

  • Suppose we use rectified linear units (ReLU) in activation functions. When would the gra- dients be zero?
  • Supposewemodifythegraphtohavecompletebipartiteconnectionsateachlayer,stillusing ReLU activation functions. When would the gradients be zero?

4

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

4 Logistic Regression (40 marks)

In this question you will examine optimization for logistic regression.

  1. Download the assignment 2 code and data from the website. Run the script logistic regression.py in the lr directory. This code performs gradient descent to find w which minimizes negative log-likelihood (i.e. maximizes likelihood).

    Include the final output of Figures 2 and 3 (plot of separator path in slope-intercept space; plot of neg. log likelihood over epochs) in your report.

    Why are these plots oscillating? Briefly explain why in your report.

  2. Create a Python script logistic regression mod.py for the following.

    Modify logistic regression.py to run gradient descent with the learning rates η = 0.5, 0.3, 0.1, 0.05, 0.01.

    Include in your report a single plot comparing negative log-likelihood versus epoch for these different learning rates.

    Compare these results. What are the relative advantages of the different rates?

  3. Create a Python script logistic regression sgd.py for the following. Modify this code to do stochastic gradient descent. Use the parameters

    η = 0.5, 0.3, 0.1, 0.05, 0.01.
    Include in your report a new plot comparing negative log-likelihood versus iteration using

    stochastic gradient descent.
    Is stochastic gradient descent faster than gradient descent? Explain using your plots.

5

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

5 Fine-Tuning a Pre-Trained Network (30 marks)

In this question you will experiment with fine-tuning a pre-trained network. This is a standard workflow in adapting existing deep networks to a new task.

We will utilize PyTorch (https://pytorch.org) a machine learning library for python.
The provided code builds upon ResNet 50, a state of the art deep network for image classification.

ResNet 50 has been designed for ImageNet image classification with 1000 output classes.

The ResNet 50 model has been adapted to solve a (simpler) different task, classifying an image as one of 10 classes on CIFAR10 dataset.

The code imagenet finetune.py does the following:

  • Constructs a deep network. This network starts with ResNet 50 up to its average pooling layer. Then, a small network with 32 hidden nodes then 10 output nodes (dense connections) is added on top.
  • Initializes the weights of the ResNet 50 portion with the parameters from training on Ima- geNet.
  • PerformstrainingononlythenewlayersusingCIFAR10dataset–allotherweightsarefixed to their values learned on ImageNet.

    The code and data can be found on the course website. For convenience, Anaconda (https: //www.anaconda.com) environment config files with the latest stable release of PyTorch and torchvision are provided for Python 2.7 and Python 3.6 for Linux and macOs users. You can use one of the config files to create virtual environments and test your code. To set up the virtual environment, install Anaconda and run the following command

    conda env create -f CONFIG_FILE.
    Replace CONFIG FILE with the path to the config files you downloaded. To activate the virtual

    environment, run the following command

    source activate ENV_NAME
    Replacing ENV NAME with cmpt419-pytorch-python27 or cmpt419-pytorch-python36

    depending on your Python version.

    Windows users please follow the instructions on PyTorch website (https://pytorch.org) to install manually. PyTorch only supports Python3 on Windows!

    If you wish to download and install PyTorch by yourself, you will need PyTorch (v 0.4.1), torchvi- sion (v 0.2.1), and their dependencies.

    What to do:

    Start by running the code provided. It will be *very* slow to train since the code runs on a CPU. You can try figuring out how to change the code to train on a GPU if you have a good GPU and want to accelerate training. Try to do one of the following tasks:

6

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

  • Write a Python function to be used at the end of training that generates HTML output show- ing each test image and its classification scores. You could produce an HTML table output for example.
  • Run validation of the model every few training epochs on validation or test set of the dataset and save the model with the best validation error.
  • Try applying L2 regularization to the coefficients in the small networks we added.
  • Try running this code on one of the datasets in torchvision.datasets (https://pytorch. org/docs/stable/torchvision/datasets.html) except CIFAR100. You may need to change some layers in the network. Try creating a custom dataloader that loads data from your own dataset and run the code using your dataloader. (Hints: Your own dataset should not come from torchvision.datasets. A standard approach is to implement your own torch.utils.data.Dataset and wrap it with torch.utils.data.DataLoader)
  • Try modifying the structure of the new layers that were added on top of ResNet 50.
  • Tryaddingdataaugmentationforthetrainingdatausingtorchvision.transformsandthenim- plementing your custom image transformation methods not available in torchvision.transforms, like gaussian blur.
  • The current code is inefficient because it recomputes the output of ResNet 50 every time a training/validation example is seen, even though those layers aren’t being trained. Change this by saving the output of ResNet 50 and using these as input rather than the dataloader currently used.
  • The current code does not train the layers in ResNet 50. After training the new layers for a while (until good values have been obtained), turn on training for the ResNet 50 layers to see if better performance can be achieved.

    Put your code and a readme file for Problem 5 under a separate directory named P5 in the code.zip file you submit for this assignment. The readme file should describe what you implemented for this problem and what each one of your code files does. It should also include the command to run your code. If you have any figures or tables to show, put them in your report for this assignment and mention them in your readme file.

7

CMPT 419/726: Assignment 2 (Fall 2018) Instructor: Greg Mori

Submitting Your Assignment

The assignment must be submitted online at https://courses.cs.sfu.ca. You must sub- mit three files:

1. An assignment report in PDF format, called report.pdf. This report must contain the solutions to questions 1-3 as well as the figures / explanations requested for 4-5.

2. A .zip file of all your code, called code.zip.

8