Assignment2
Copyright By PowCoder代写 加微信 powcoder
Assignment 2: Build a CNN for image recognition.¶
Due Date: March 29, 11:59PM¶
Name: [Your-Name?]¶
Introduction:¶
In this assignment, you will build Convolutional Neural Network to classify CIFAR-10 Images.
You can directly load dataset from many deep learning packages.
You can use any deep learning packages such as pytorch, keras or tensorflow for this assignment.
Requirements:¶
You need to load cifar 10 data and split the entire training dataset into training and validation.
You will implement a CNN model to classify cifar 10 images with provided structure.
You need to plot the training and validation accuracy or loss obtained from above step.
Then you can use tuned parameters to train using the entire training dataset.
You should report the testing accuracy using the model with complete data.
You may try to change the structure (e.g, add BN layer or dropout layer,…) and analyze your findings.
Google Colab¶
If you do not have GPU, the training of a CNN can be slow. Google Colab is a good option.
Batch Normalization (BN)¶
Background:¶
Batch Normalization is a technique to speed up training and help make the model more stable.
In simple words, batch normalization is just another network layer that gets inserted between a hidden layer and the next hidden layer. Its job is to take the outputs from the first hidden layer and normalize them before passing them on as the input of the next hidden layer.
For more detailed information, you may refer to the original paper: https://arxiv.org/pdf/1502.03167.pdf.
BN Algorithm:¶
Input: Values of $x$ over a mini-batch: $\mathbf{B}$ = $\{x_1,…, x_m\};$
Output: $\{y_i = BN_{\gamma,\beta}(x_i)\}$, $\gamma, \beta$ are learnable parameters
Normalization of the Input:
$$\mu_{\mathbf{B}} = \frac{1}{m}\sum_{i=1}^m x_i$$
$$\sigma_{\mathbf{B}}^2 = \frac{1}{m}\sum_{i=1}^m (x_i – \mu_{\mathbf{B}})^2$$
$$\hat{x_i} = \frac{x_i – \mu_{\mathbf{B}}}{\sqrt{\sigma_{\mathbf{B}}}^2 + \epsilon}$$
Re-scaling and Offsetting:
$$y_i = \gamma \hat{x_i} + \beta = BN_{\gamma,\beta}(x_i)$$
Advantages of BN:¶
Improves gradient flow through the network.
Allows use of saturating nonlinearities and higher learning rates.
Makes weights easier to initialize.
Act as a form of regularization and may reduce the need for dropout.
Implementation:¶
The batch normalization layer has already been implemented in many packages. You may simply call the function to build the layer. For example: torch.nn.BatchNorm2d() using pytroch package, keras.layers.BatchNormalization() using keras package.
The location of BN layer: Please make sure BatchNormalization is between a Conv/Dense layer and an activation layer.
1. Data preparation¶
1.1. Load data¶
# Load Cifar-10 Data
# This is just an example, you may load dataset from other packages.
import keras
import numpy as np
### If you can not load keras dataset, un-comment these two lines.
#import ssl
#ssl._create_default_https_context = ssl._create_unverified_context
(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
print(‘shape of x_train: ‘ + str(x_train.shape))
print(‘shape of y_train: ‘ + str(y_train.shape))
print(‘shape of x_test: ‘ + str(x_test.shape))
print(‘shape of y_test: ‘ + str(y_test.shape))
print(‘number of classes: ‘ + str(np.max(y_train) – np.min(y_train) + 1))
shape of x_train: (50000, 32, 32, 3)
shape of y_train: (50000, 1)
shape of x_test: (10000, 32, 32, 3)
shape of y_test: (10000, 1)
number of classes: 10
1.2. One-hot encode the labels (5 points)¶
In the input, a label is a scalar in $\{0, 1, \cdots , 9\}$. One-hot encode transform such a scalar to a $10$-dim vector. E.g., a scalar y_train[j]=3 is transformed to the vector y_train_vec[j]=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0].
Implement a function to_one_hot that transforms an $n\times 1$ array to a $n\times 10$ matrix.
Apply the function to y_train and y_test.
def to_one_hot(y, num_class=10):
y_train_vec = to_one_hot(y_train)
y_test_vec = to_one_hot(y_test)
print(‘Shape of y_train_vec: ‘ + str(y_train_vec.shape))
print(‘Shape of y_test_vec: ‘ + str(y_test_vec.shape))
print(y_train[0])
print(y_train_vec[0])
Remark: the outputs should be¶
Shape of y_train_vec: (50000, 10)
Shape of y_test_vec: (10000, 10)
[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
1.3. Randomly partition the training set to training and validation sets (5 points)¶
Randomly partition the 50K training samples to 2 sets:
a training set containing 40K samples: x_tr, y_tr
a validation set containing 10K samples: x_val, y_val
print(‘Shape of x_tr: ‘ + str(x_tr.shape))
print(‘Shape of y_tr: ‘ + str(y_tr.shape))
print(‘Shape of x_val: ‘ + str(x_val.shape))
print(‘Shape of y_val: ‘ + str(y_val.shape))
2. Build a CNN and tune its hyper-parameters (50 points)¶
Build a convolutional neural network model using the below structure:
It should have a structure of: Conv – ReLU – – ConV – ReLU – – Dense – ReLU – Dense – Softmax
In the graph means the dimension of input image, means it has 32 filters and the dimension now becomes 30×30 after the convolution.
All convolutional layers (Conv) should have stride = 1 and no padding.
ing has a pool size of 2 by 2.
You may use the validation data to tune the hyper-parameters (e.g., learning rate, and optimization algorithm)
Do NOT use test data for hyper-parameter tuning!!!
Try to achieve a validation accuracy as high as possible.
# Build the model
# Define model optimizer and loss function
# Train the model and store model parameters/loss values
3. Plot the training and validation loss curve versus epochs. (5 points)¶
# Plot the loss curve
4. Train (again) and evaluate the model (5 points)¶
To this end, you have found the “best” hyper-parameters.
Now, fix the hyper-parameters and train the network on the entire training set (all the 50K training samples)
Evaluate your model on the test set.
Train the model on the entire training set¶
Why? Previously, you used 40K samples for training; you wasted 10K samples for the sake of hyper-parameter tuning. Now you already know the hyper-parameters, so why not using all the 50K samples for training?
#
#
5. Evaluate the model on the test set (5 points)¶
Do NOT used the test set until now. Make sure that your model parameters and hyper-parameters are independent of the test set.
# Evaluate your model performance (testing accuracy) on testing data.
6. Building model with new structure (25 points)¶
In this section, you can build your model with adding new layers (e.g, BN layer or dropout layer, …).
If you want to regularize a Conv/Dense layer, you should place a Dropout layer before the Conv/Dense layer.
You can try to compare their loss curve and testing accuracy and analyze your findings.
You need to try at lease two different model structures.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com