Report
Topic 1
How effective are early stopping methods at reducing overfitting?
In this topic, I will investigate the effective of early stopping at reduceing overfitting.
I implement the early stopping as follows. In the train method of class Optimiser, for a fixed interval, compute the validation error, if the validation error doesn¡¯t decrease, then we end the training process.
When the training time increase, the training error will always decrease, but the validation error may go up. If the validation error go up, it may indicate that the overfitting happens. The hypothesis is that if we end the training when validation error doesn¡¯t decrease, it can avoid the overfitting.
To test the hypothesis, I design the experiment as follows. I use the MultipleLayerModel with 5 AffineLayers. Two runs with the same parameters and with only 1 difference. One is using early stopping method and another without.
The parameters for both are good batch_size = 100, num_epochs = 100, learning_rate = 0.2, init_scale = 0.5, and hidden_dim = 100. The experiment in lab3 shows they are suitable.
Figure 1. Error plot without early stopping.
Figure 2. Error plot with early stopping.
Table 1. Final Error.
From the graph, we can see that in the plot without early stopping, the validation error start increasing after about 40 epochs indicating it is over fitting. In the plot of early stopping, we can see that it stops at epoch 30 instead of 100, because the error in epoch 30 increase a little, so it stops training.
From the final error, we can see that although the training error of the early stopping is above 10 times bigger than the one without early stopping, it is final validation error is much smaller. From the result, we can see that early stopping method indeed reduces the overfitting. And it is a simple method to implement, and by early stopping, it also can save training time.
Topic 2
Data Augmentation
In this topic, I will investigate whether data augmentation can reduce overfitting and improve the performance.
with_early_stop
final error(train)
final error(valid)
No
1.81e-03
1.47e-01
Yes
4.85e-02
1.23e-01
The hypothesis is that data augmentation can increase the amount of data, reduce overfitting and makes the model better generalize.
My data augmentation works as follows. For a batch of images, 50% images of them will be rotated randomly between -30 to 30 degrees. And then 50% of the resulting samples will be shifted randomly between -4 and 4 pixels both in X and Y axis.
To test the effectiveness, I train two models, everything is the same except that one with data augmentation and one without.
The parameters for both are good batch_size = 100, num_epochs = 100, learning_rate = 0.01, init_scale = 0.5, and hidden_dim = 100. The experiment in lab5 shows they are suitable. And the model structure is MultipleLayerModel([ AffineLayer, ReluLayer, AffineLayer, ReluLayer, AffineLayer).
Figure 3. Error plot with(left) and without(right) data augmentation.
Figure 3. Accuracy plot with(left) and without(right) data augmentation.
Data Augmentation
Final Error(train)
final error (valid)
final acc(t rain)
final acc(v alid)
Yes
6.88e-02
6.17e-02
0.9788
0.9808
No
3.92e-04
1.06e-01
1.0000
0.9803
Table 2. Final Error and Accuracy with and without data augmentation.
In the error plot, the validation error with data augmentation decrease steadily and the validation error without data augmentation increases after about 30 epochs.
In the accuracy plot, the validation accuracy with data augmentation increase steadily, but the validation accuracy without data augmentation keeps about the same after 50 epochs.
The final validation error with data augmentation is also much lower than that without data augmentation.
All these shows that the one without data augmentation is overfitting and data augmentation indeed can reduce the overfitting and improve the accuracy.
Topic 3
models with convolutional layers
In this topic, I will investigate whether adding convolutional layer can improve the model¡¯s performance.
I use the provided skeleton code to implement class ConvolutionalLayer. In the method of fprop, for each window of the image subx and for each filter weight w, the output is np.sum(subx * w) + b. In the method of bprop, for each window of output gradients dout, for each filter weight w, add dout * w to input gradients.In the method of grads_wrt_params, for each image window subx and gradients dout, add dout * subx to dw and dout to db. The implementation passes all the tests.
In the experiment, I create a model with 1 ConvolutionalLayer and AffineLayer to test its performance and another model that without convolutionlayer but with same other parameters to compare their performance.
I use 1 kernel with dimension (2, 2) in the convolutionalLayer, because the model with convolutional Layer consume much more time, so I only train both models for 20 epochs.
Figure 4. Accuracy plot with(left) and without(right) convolution layer.
Data Augmentation
Final Error(train)
final error (valid)
final acc(t rain)
final acc(v alid)
Yes
2.60e-01
2.61e-01
0.9280
0.9277
No
2.72e-01
2.67e-01
0.9243
0.9258
Table 2. Final Error and Accuracy with and without convolution layer.
From the accuracy plot and the table, we can see that the one with convolution layer¡¯s both highest accuracy and final accuracy higher than the one without convolution layer. It indicates that convolution layer indeed can improve out model¡¯s accuracy.