General Note: The developed neural network only needs to approximate the output. It is understood that using the sigmoid activation function, the value at the output will never be exactly 0 or 1. The networks only need to produce a probability value that is either above or below 0.5, depending on whether the desired output is 1 or 0.
The Task: In this Assignment you will implement a neural network with 1 hidden-layer, and train the neural network on the Fashion-MNIST dataset. You will need to separate the dataset into a training, validation and test set.
The Fashion-MNIST dataset contains 60000 training images and 10000 test- ing images. Images are 28 × 28 Greyscale. Load Fashion-MNIST dataset from https://github.com/zalandoresearch/fashion-mnist. Separate the training set to 59000 images used for training and 1000 images used for validation.
• Each neuron in hidden layer carries out the computation n
y=σ b+wixi , (1) i=1
where the xi are the inputs and σ is the sigmoid activation function σ(x) = 1 (2)
1+e−x
• The loss function for the network should be a softmax classifier with cross-
entropy loss.
• Assume we are using stochastic gradient descent with no mini-batch (up- date with each example).
1
You need to:
1. Draw a schematic of the overall architecture for the network.
2. Derivethederivativeofthelossfunctionwithrespecttothenetwork weights for the hidden unit and the output units. You can use σ(.) to represent sigmoid in your derivation.
3. Write a program that loads the Fashion-MNIST dataset and builds the neural network. You can decide how many neurons to add in the hidden layer. Your program should include forward / backward propagation for the network.
(a) Implement stochastic gradient descent and train your network.
(b) Incorporate L2 regularization for weights in hidden layer and output
layer.
You should implement this part in raw code (e.g., Python), ie., not
using software such as PyTorch or TensorFlow.
4. Write a report on the performance of the network.
(a) Plot the loss function on the training set over the number of epochs.
(b) Compare the loss function over time to the loss function on the vali- dation set.
(c) Using the validation set, experiment with different settings of the learning rate, and report the effect.
(d) Implement a momentum term and report on its effect, as well as explaining how momentum works.
(e) What difference caused with varying the number of hidden units?
(f) Compare the performance of your best performing model to your ini- tial settings on the test set. Report the overall average accuracy and the accuracy for each of the output classes on the training, validation, and test set.
Note that you can use PyTorch or raw code for Question 4. (ie., re- implement the same network in PyTorch.)