Convolutional Networks¶
So far we have worked with deep fully-connected networks, using them to explore different optimization strategies and network architectures. Fully-connected networks are a good testbed for experimentation because they are very computationally efficient, but in practice all state-of-the-art results use convolutional networks instead.
First you will implement several layer types that are used in convolutional networks. You will then use these layers to train a convolutional network on the CIFAR-10 dataset.
In [1]:
# As usual, a bit of setup
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.cnn import *
from cs231n.data_utils import get_CIFAR10_data
from cs231n.gradient_check import eval_numerical_gradient_array, eval_numerical_gradient
from cs231n.layers import *
from cs231n.fast_layers import *
from cs231n.solver import Solver
%matplotlib inline
plt.rcParams[‘figure.figsize’] = (10.0, 8.0) # set default size of plots
plt.rcParams[‘image.interpolation’] = ‘nearest’
plt.rcParams[‘image.cmap’] = ‘gray’
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
“”” returns relative error “””
return np.max(np.abs(x – y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
In [2]:
# Load the (preprocessed) CIFAR10 data.
data = get_CIFAR10_data()
for k, v in data.iteritems():
print ‘%s: ‘ % k, v.shape
X_val: (1000, 3, 32, 32)
X_train: (49000, 3, 32, 32)
X_test: (1000, 3, 32, 32)
y_val: (1000,)
y_train: (49000,)
y_test: (1000,)
Convolution: Naive forward pass¶
The core of a convolutional network is the convolution operation. In the file cs231n/layers.py, implement the forward pass for the convolution layer in the function conv_forward_naive.
You don’t have to worry too much about efficiency at this point; just write the code in whatever way you find most clear.
You can test your implementation by running the following:
In [12]:
x_shape = (2, 3, 4, 4)
w_shape = (3, 3, 4, 4)
x = np.linspace(-0.1, 0.5, num=np.prod(x_shape)).reshape(x_shape)
w = np.linspace(-0.2, 0.3, num=np.prod(w_shape)).reshape(w_shape)
b = np.linspace(-0.1, 0.2, num=3)
conv_param = {‘stride’: 2, ‘pad’: 1}
out, _ = conv_forward_naive(x, w, b, conv_param)
correct_out = np.array([[[[[-0.08759809, -0.10987781],
[-0.18387192, -0.2109216 ]],
[[ 0.21027089, 0.21661097],
[ 0.22847626, 0.23004637]],
[[ 0.50813986, 0.54309974],
[ 0.64082444, 0.67101435]]],
[[[-0.98053589, -1.03143541],
[-1.19128892, -1.24695841]],
[[ 0.69108355, 0.66880383],
[ 0.59480972, 0.56776003]],
[[ 2.36270298, 2.36904306],
[ 2.38090835, 2.38247847]]]]])
# Compare your output to ours; difference should be around 1e-8
print ‘Testing conv_forward_naive’
print ‘difference: ‘, rel_error(out, correct_out)
print out
(2, 3, 2, 2)
(2, 2)
Testing conv_forward_naive
difference: 2.21214764175e-08
[[[[-0.08759809 -0.10987781]
[-0.18387192 -0.2109216 ]]
[[ 0.21027089 0.21661097]
[ 0.22847626 0.23004637]]
[[ 0.50813986 0.54309974]
[ 0.64082444 0.67101435]]]
[[[-0.98053589 -1.03143541]
[-1.19128892 -1.24695841]]
[[ 0.69108355 0.66880383]
[ 0.59480972 0.56776003]]
[[ 2.36270298 2.36904306]
[ 2.38090835 2.38247847]]]]
Aside: Image processing via convolutions¶
As fun way to both check your implementation and gain a better understanding of the type of operation that convolutional layers can perform, we will set up an input containing two images and manually set up filters that perform common image processing operations (grayscale conversion and edge detection). The convolution forward pass will apply these operations to each of the input images. We can then visualize the results as a sanity check.
In [14]:
from scipy.misc import imread, imresize
kitten, puppy = imread(‘kitten.jpg’), imread(‘puppy.jpg’)
# kitten is wide, and puppy is already square
d = kitten.shape[1] – kitten.shape[0]
kitten_cropped = kitten[:, d/2:-d/2, :]
img_size = 200 # Make this smaller if it runs too slow
x = np.zeros((2, 3, img_size, img_size))
x[0, :, :, :] = imresize(puppy, (img_size, img_size)).transpose((2, 0, 1))
x[1, :, :, :] = imresize(kitten_cropped, (img_size, img_size)).transpose((2, 0, 1))
# Set up a convolutional weights holding 2 filters, each 3×3
w = np.zeros((2, 3, 3, 3))
# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]
# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]
# Vector of biases. We don’t need any bias for the grayscale
# filter, but for the edge detection filter we want to add 128
# to each output so that nothing is negative.
b = np.array([0, 128])
# Compute the result of convolving each input in x with each filter in w,
# offsetting by b, and storing the results in out.
out, _ = conv_forward_naive(x, w, b, {‘stride’: 1, ‘pad’: 1})
def imshow_noax(img, normalize=True):
“”” Tiny helper to show images as uint8 and remove axis labels “””
if normalize:
img_max, img_min = np.max(img), np.min(img)
img = 255.0 * (img – img_min) / (img_max – img_min)
plt.imshow(img.astype(‘uint8’))
plt.gca().axis(‘off’)
# Show the original images and the results of the conv operation
plt.subplot(2, 3, 1)
imshow_noax(puppy, normalize=False)
plt.title(‘Original image’)
plt.subplot(2, 3, 2)
imshow_noax(out[0, 0])
plt.title(‘Grayscale’)
plt.subplot(2, 3, 3)
imshow_noax(out[0, 1])
plt.title(‘Edges’)
plt.subplot(2, 3, 4)
imshow_noax(kitten_cropped, normalize=False)
plt.subplot(2, 3, 5)
imshow_noax(out[1, 0])
plt.subplot(2, 3, 6)
imshow_noax(out[1, 1])
plt.show()
(2, 2, 200, 200)
(200, 200)

Convolution: Naive backward pass¶
Implement the backward pass for the convolution operation in the function conv_backward_naive in the file cs231n/layers.py. Again, you don’t need to worry too much about computational efficiency.
When you are done, run the following to check your backward pass with a numeric gradient check.
In [16]:
x = np.random.randn(4, 3, 5, 5)
w = np.random.randn(2, 3, 3, 3)
b = np.random.randn(2,)
dout = np.random.randn(4, 2, 5, 5)
conv_param = {‘stride’: 1, ‘pad’: 1}
dx_num = eval_numerical_gradient_array(lambda x: conv_forward_naive(x, w, b, conv_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_forward_naive(x, w, b, conv_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_forward_naive(x, w, b, conv_param)[0], b, dout)
out, cache = conv_forward_naive(x, w, b, conv_param)
dx, dw, db = conv_backward_naive(dout, cache)
# Your errors should be around 1e-9′
print ‘Testing conv_backward_naive function’
print ‘dx error: ‘, rel_error(dx, dx_num)
print ‘dw error: ‘, rel_error(dw, dw_num)
print ‘db error: ‘, rel_error(db, db_num)
Testing conv_backward_naive function
dx error: 3.89923709543e-09
dw error: 5.89435225423e-10
db error: 6.56536490817e-12
Max pooling: Naive forward¶
Implement the forward pass for the max-pooling operation in the function max_pool_forward_naive in the file cs231n/layers.py. Again, don’t worry too much about computational efficiency.
Check your implementation by running the following:
In [25]:
x_shape = (2, 3, 4, 4)
x = np.linspace(-0.3, 0.4, num=np.prod(x_shape)).reshape(x_shape)
pool_param = {‘pool_width’: 2, ‘pool_height’: 2, ‘stride’: 2}
out, _ = max_pool_forward_naive(x, pool_param)
correct_out = np.array([[[[-0.26315789, -0.24842105],
[-0.20421053, -0.18947368]],
[[-0.14526316, -0.13052632],
[-0.08631579, -0.07157895]],
[[-0.02736842, -0.01263158],
[ 0.03157895, 0.04631579]]],
[[[ 0.09052632, 0.10526316],
[ 0.14947368, 0.16421053]],
[[ 0.20842105, 0.22315789],
[ 0.26736842, 0.28210526]],
[[ 0.32631579, 0.34105263],
[ 0.38526316, 0.4 ]]]])
# Compare your output with ours. Difference should be around 1e-8.
print ‘Testing max_pool_forward_naive function:’
print ‘difference: ‘, rel_error(out, correct_out)
Testing max_pool_forward_naive function:
difference: 4.16666651573e-08
Max pooling: Naive backward¶
Implement the backward pass for the max-pooling operation in the function max_pool_backward_naive in the file cs231n/layers.py. You don’t need to worry about computational efficiency.
Check your implementation with numeric gradient checking by running the following:
In [34]:
x = np.random.randn(3, 2, 8, 8)
dout = np.random.randn(3, 2, 4, 4)
pool_param = {‘pool_height’: 2, ‘pool_width’: 2, ‘stride’: 2}
dx_num = eval_numerical_gradient_array(lambda x: max_pool_forward_naive(x, pool_param)[0], x, dout)
out, cache = max_pool_forward_naive(x, pool_param)
dx = max_pool_backward_naive(dout, cache)
# Your error should be around 1e-12
print ‘Testing max_pool_backward_naive function:’
print ‘dx error: ‘, rel_error(dx, dx_num)
# print dx
# print ‘\n’
# print dx_num
Testing max_pool_backward_naive function:
dx error: 3.27562232596e-12
Fast layers¶
Making convolution and pooling layers fast can be challenging. To spare you the pain, we’ve provided fast implementations of the forward and backward passes for convolution and pooling layers in the file cs231n/fast_layers.py.
The fast convolution implementation depends on a Cython extension; to compile it you need to run the following from the cs231n directory:
python setup.py build_ext –inplace
The API for the fast versions of the convolution and pooling layers is exactly the same as the naive versions that you implemented above: the forward pass receives data, weights, and parameters and produces outputs and a cache object; the backward pass recieves upstream derivatives and the cache object and produces gradients with respect to the data and weights.
NOTE: The fast implementation for pooling will only perform optimally if the pooling regions are non-overlapping and tile the input. If these conditions are not met then the fast pooling implementation will not be much faster than the naive implementation.
You can compare the performance of the naive and fast versions of these layers by running the following:
In [35]:
from cs231n.fast_layers import conv_forward_fast, conv_backward_fast
from time import time
x = np.random.randn(100, 3, 31, 31)
w = np.random.randn(25, 3, 3, 3)
b = np.random.randn(25,)
dout = np.random.randn(100, 25, 16, 16)
conv_param = {‘stride’: 2, ‘pad’: 1}
t0 = time()
out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)
t1 = time()
out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)
t2 = time()
print ‘Testing conv_forward_fast:’
print ‘Naive: %fs’ % (t1 – t0)
print ‘Fast: %fs’ % (t2 – t1)
print ‘Speedup: %fx’ % ((t1 – t0) / (t2 – t1))
print ‘Difference: ‘, rel_error(out_naive, out_fast)
t0 = time()
dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)
t1 = time()
dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)
t2 = time()
print ‘\nTesting conv_backward_fast:’
print ‘Naive: %fs’ % (t1 – t0)
print ‘Fast: %fs’ % (t2 – t1)
print ‘Speedup: %fx’ % ((t1 – t0) / (t2 – t1))
print ‘dx difference: ‘, rel_error(dx_naive, dx_fast)
print ‘dw difference: ‘, rel_error(dw_naive, dw_fast)
print ‘db difference: ‘, rel_error(db_naive, db_fast)
Testing conv_forward_fast:
Naive: 4.943772s
Fast: 0.031066s
Speedup: 159.138012x
Difference: 3.65915476673e-10
Testing conv_backward_fast:
Naive: 5.644231s
Fast: 0.014896s
Speedup: 378.905232x
dx difference: 2.50352845147e-11
dw difference: 1.69592598111e-12
db difference: 2.35443647763e-13
In [36]:
from cs231n.fast_layers import max_pool_forward_fast, max_pool_backward_fast
x = np.random.randn(100, 3, 32, 32)
dout = np.random.randn(100, 3, 16, 16)
pool_param = {‘pool_height’: 2, ‘pool_width’: 2, ‘stride’: 2}
t0 = time()
out_naive, cache_naive = max_pool_forward_naive(x, pool_param)
t1 = time()
out_fast, cache_fast = max_pool_forward_fast(x, pool_param)
t2 = time()
print ‘Testing pool_forward_fast:’
print ‘Naive: %fs’ % (t1 – t0)
print ‘fast: %fs’ % (t2 – t1)
print ‘speedup: %fx’ % ((t1 – t0) / (t2 – t1))
print ‘difference: ‘, rel_error(out_naive, out_fast)
t0 = time()
dx_naive = max_pool_backward_naive(dout, cache_naive)
t1 = time()
dx_fast = max_pool_backward_fast(dout, cache_fast)
t2 = time()
print ‘\nTesting pool_backward_fast:’
print ‘Naive: %fs’ % (t1 – t0)
print ‘speedup: %fx’ % ((t1 – t0) / (t2 – t1))
print ‘dx difference: ‘, rel_error(dx_naive, dx_fast)
Testing pool_forward_fast:
Naive: 0.015819s
fast: 0.007003s
speedup: 2.258954x
difference: 0.0
Testing pool_backward_fast:
Naive: 0.327402s
speedup: 22.242047x
dx difference: 0.0
Convolutional “sandwich” layers¶
Previously we introduced the concept of “sandwich” layers that combine multiple operations into commonly used patterns. In the file cs231n/layer_utils.py you will find sandwich layers that implement a few commonly used patterns for convolutional networks.
In [37]:
from cs231n.layer_utils import conv_relu_pool_forward, conv_relu_pool_backward
x = np.random.randn(2, 3, 16, 16)
w = np.random.randn(3, 3, 3, 3)
b = np.random.randn(3,)
dout = np.random.randn(2, 3, 8, 8)
conv_param = {‘stride’: 1, ‘pad’: 1}
pool_param = {‘pool_height’: 2, ‘pool_width’: 2, ‘stride’: 2}
out, cache = conv_relu_pool_forward(x, w, b, conv_param, pool_param)
dx, dw, db = conv_relu_pool_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda x: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_relu_pool_forward(x, w, b, conv_param, pool_param)[0], b, dout)
print ‘Testing conv_relu_pool’
print ‘dx error: ‘, rel_error(dx_num, dx)
print ‘dw error: ‘, rel_error(dw_num, dw)
print ‘db error: ‘, rel_error(db_num, db)
Testing conv_relu_pool
dx error: 6.01771601528e-08
dw error: 4.3826243535e-09
db error: 2.86986280184e-11
In [38]:
from cs231n.layer_utils import conv_relu_forward, conv_relu_backward
x = np.random.randn(2, 3, 8, 8)
w = np.random.randn(3, 3, 3, 3)
b = np.random.randn(3,)
dout = np.random.randn(2, 3, 8, 8)
conv_param = {‘stride’: 1, ‘pad’: 1}
out, cache = conv_relu_forward(x, w, b, conv_param)
dx, dw, db = conv_relu_backward(dout, cache)
dx_num = eval_numerical_gradient_array(lambda x: conv_relu_forward(x, w, b, conv_param)[0], x, dout)
dw_num = eval_numerical_gradient_array(lambda w: conv_relu_forward(x, w, b, conv_param)[0], w, dout)
db_num = eval_numerical_gradient_array(lambda b: conv_relu_forward(x, w, b, conv_param)[0], b, dout)
print ‘Testing conv_relu:’
print ‘dx error: ‘, rel_error(dx_num, dx)
print ‘dw error: ‘, rel_error(dw_num, dw)
print ‘db error: ‘, rel_error(db_num, db)
Testing conv_relu:
dx error: 4.18743855383e-08
dw error: 6.60765351596e-09
db error: 6.31149384747e-12
Three-layer ConvNet¶
Now that you have implemented all the necessary layers, we can put them together into a simple convolutional network.
Open the file cs231n/cnn.py and complete the implementation of the ThreeLayerConvNet class. Run the following cells to help you debug:
Sanity check loss¶
After you build a new network, one of the first things you should do is sanity check the loss. When we use the softmax loss, we expect the loss for random weights (and no regularization) to be about log(C) for C classes. When we add regularization this should go up.
In [42]:
model = ThreeLayerConvNet()
N = 50
X = np.random.randn(N, 3, 32, 32)
y = np.random.randint(10, size=N)
loss, grads = model.loss(X, y)
print ‘Initial loss (no regularization): ‘, loss
model.reg = 0.5
loss, grads = model.loss(X, y)
print ‘Initial loss (with regularization): ‘, loss
Initial loss (no regularization): 2.3025828006
Initial loss (with regularization): 2.50895817989
Gradient check¶
After the loss looks reasonable, use numeric gradient checking to make sure that your backward pass is correct. When you use numeric gradient checking you should use a small amount of artifical data and a small number of neurons at each layer.
In [47]:
num_inputs = 2
input_dim = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_dim)
y = np.random.randint(num_classes, size=num_inputs)
model = ThreeLayerConvNet(num_filters=3, filter_size=3,
input_dim=input_dim, hidden_dim=7,
dtype=np.float64)
loss, grads = model.loss(X, y)
for param_name in sorted(grads):
f = lambda _: model.loss(X, y)[0]
param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)
# print param_grad_num
# print grads[param_name]
e = rel_error(param_grad_num, grads[param_name])
print ‘%s max relative error: %e’ % (param_name, rel_error(param_grad_num, grads[param_name]))
W1 max relative error: 1.140221e-04
W2 max relative error: 3.654967e-03
W3 max relative error: 1.149018e-04
b1 max relative error: 1.217879e-05
b2 max relative error: 3.882741e-07
b3 max relative error: 1.270369e-09
In [132]:
# test conv
from cs231n.classifiers.convnet import *
num_inputs = 2
input_dim = (3, 16, 16)
reg = 0.0
num_classes = 10
X = np.random.randn(num_inputs, *input_dim)
y = np.random.randint(num_classes, size=num_inputs)
model = ConvoConnectedNet( [(32, 7)], [10], use_batchnorm = True, dropout = 0.2,
input_dim=input_dim,
dtype=np.float64)
loss, grads = model.loss(X, y)
for param_name in sorted(grads):
f = lambda _: model.loss(X, y)[0]
param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)
# print param_grad_num
# print grads[param_name]
e = rel_error(param_grad_num, grads[param_name])
# print grads[param_name]
print ‘%s max relative error: %e’ % (param_name, rel_error(param_grad_num, grads[param_name]))
AW0 max relative error: 1.000000e+00
AW1 max relative error: 1.000000e+00
Ab0 max relative error: 1.000000e+00
Ab1 max relative error: 1.000000e+00
Abeta0 max relative error: 1.000000e+00
Agamma0 max relative error: 1.000000e+00
—————————————————————————
KeyboardInterrupt Traceback (most recent call last)
16 for param_name in sorted(grads):
17 f = lambda _: model.loss(X, y)[0]
—> 18 param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)
19
20 # print param_grad_num
/Users/liumeng/Documents/task-2016/deep-learning/assignment2/cs231n/gradient_check.pyc in eval_numerical_gradient(f, x, verbose, h)
19 oldval = x[ix]
20 x[ix] = oldval + h # increment by h
—> 21 fxph = f(x) # evalute f(x + h)
22 x[ix] = oldval – h
23 fxmh = f(x) # evaluate f(x – h)
15 loss, grads = model.loss(X, y)
16 for param_name in sorted(grads):
—> 17 f = lambda _: model.loss(X, y)[0]
18 param_grad_num = eval_numerical_gradient(f, model.params[param_name], verbose=False, h=1e-6)
19
/Users/liumeng/Documents/task-2016/deep-learning/assignment2/cs231n/classifiers/convnet.py in loss(self, X, y)
310 if self.use_batchnorm:
311
–> 312 dout, dW, db, dgamma, dbeta = affine_batchnorm_relu_backward(dout, acache[i])
313
314 grads[‘Agamma’ + str(i)] = dgamma
/Users/liumeng/Documents/task-2016/deep-learning/assignment2/cs231n/layer_utils.py in affine_batchnorm_relu_backward(dout, cache)
39 db, dgamma, dbeta = batchnorm_backward_alt(da, batch_cache)
40
—> 41 dx, dw, db = affine_backward(db, fc_cache)
42
43 return dx, dw, db, dgamma, dbeta
/Users/liumeng/Documents/task-2016/deep-learning/assignment2/cs231n/layers.pyc in affine_backward(dout, cache)
62 dx = dout.dot(w.T).reshape(x.shape)
63
—> 64 db = np.sum(dout, axis=0)
65
66 #############################################################################
/Applications/conda/conda-27/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.pyc in sum(a, axis, dtype, out, keepdims)
1833 else:
1834 return _methods._sum(a, axis=axis, dtype=dtype,
-> 1835 out=out, keepdims=keepdims)
1836
1837
/Applications/conda/conda-27/anaconda/lib/python2.7/site-packages/numpy/core/_methods.pyc in _sum(a, axis, dtype, out, keepdims)
30
31 def _sum(a, axis=None, dtype=None, out=None, keepdims=False):
—> 32 return umr_sum(a, axis, dtype, out, keepdims)
33
34 def _prod(a, axis=None, dtype=None, out=None, keepdims=False):
KeyboardInterrupt:
Overfit small data¶
A nice trick is to train your model with just a few training samples. You should be able to overfit small datasets, which will result in very high training accuracy and comparatively low validation accuracy.
In [131]:
num_train = 100
small_data = {
‘X_train’: data[‘X_train’][:num_train],
‘y_train’: data[‘y_train’][:num_train],
‘X_val’: data[‘X_val’],
‘y_val’: data[‘y_val’],
}
model = ThreeLayerConvNet(weight_scale=1e-2)
model = ConvoConnectedNet( [(32, 7), (32,7)], [100], weight_scale=1e-2, use_batchnorm = True, dropout = 0.2)
solver = Solver(model, small_data,
num_epochs=10, batch_size=50,
update_rule=’adam’,
optim_config={
‘learning_rate’: 1e-3,
},
verbose=True, print_every=1)
solver.train()
(Iteration 1 / 20) loss: 2.294288
(Epoch 0 / 10) train acc: 0.430000; val_acc: 0.193000
(Iteration 2 / 20) loss: 2.234753
(Epoch 1 / 10) train acc: 0.450000; val_acc: 0.197000
(Iteration 3 / 20) loss: 2.218887
(Iteration 4 / 20) loss: 2.130002
(Epoch 2 / 10) train acc: 0.470000; val_acc: 0.213000
(Iteration 5 / 20) loss: 2.120628
(Iteration 6 / 20) loss: 2.041231
(Epoch 3 / 10) train acc: 0.520000; val_acc: 0.218000
(Iteration 7 / 20) loss: 2.022885
(Iteration 8 / 20) loss: 2.029267
(Epoch 4 / 10) train acc: 0.550000; val_acc: 0.223000
(Iteration 9 / 20) loss: 2.007008
(Iteration 10 / 20) loss: 1.891127
(Epoch 5 / 10) train acc: 0.550000; val_acc: 0.220000
(Iteration 11 / 20) loss: 1.936409
(Iteration 12 / 20) loss: 1.950077
(Epoch 6 / 10) train acc: 0.570000; val_acc: 0.216000
(Iteration 13 / 20) loss: 1.928492
(Iteration 14 / 20) loss: 1.817291
(Epoch 7 / 10) train acc: 0.590000; val_acc: 0.216000
(Iteration 15 / 20) loss: 1.798725
(Iteration 16 / 20) loss: 1.835590
(Epoch 8 / 10) train acc: 0.680000; val_acc: 0.216000
(Iteration 17 / 20) loss: 1.736534
(Iteration 18 / 20) loss: 1.721814
(Epoch 9 / 10) train acc: 0.660000; val_acc: 0.217000
(Iteration 19 / 20) loss: 1.666741
(Iteration 20 / 20) loss: 1.698333
(Epoch 10 / 10) train acc: 0.710000; val_acc: 0.210000
Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:
In [87]:
plt.subplot(2, 1, 1)
plt.plot(solver.loss_history, ‘o’)
plt.xlabel(‘iteration’)
plt.ylabel(‘loss’)
plt.subplot(2, 1, 2)
plt.plot(solver.train_acc_history, ‘-o’)
plt.plot(solver.val_acc_history, ‘-o’)
plt.legend([‘train’, ‘val’], loc=’upper left’)
plt.xlabel(‘epoch’)
plt.ylabel(‘accuracy’)
plt.show()

Train the net¶
By training the three-layer convolutional network for one epoch, you should achieve greater than 40% accuracy on the training set:
In [50]:
model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)
solver = Solver(model, data,
num_epochs=1, batch_size=50,
update_rule=’adam’,
optim_config={
‘learning_rate’: 1e-3,
},
verbose=True, print_every=20)
solver.train()
(Iteration 1 / 980) loss: 2.304631
(Epoch 0 / 1) train acc: 0.105000; val_acc: 0.079000
(Iteration 21 / 980) loss: 2.307392
(Iteration 41 / 980) loss: 1.916440
(Iteration 61 / 980) loss: 2.053262
(Iteration 81 / 980) loss: 2.059085
(Iteration 101 / 980) loss: 1.738660
(Iteration 121 / 980) loss: 1.864290
(Iteration 141 / 980) loss: 1.746119
(Iteration 161 / 980) loss: 1.708535
(Iteration 181 / 980) loss: 1.728784
(Iteration 201 / 980) loss: 1.727301
(Iteration 221 / 980) loss: 1.924585
(Iteration 241 / 980) loss: 1.201118
(Iteration 261 / 980) loss: 1.707081
(Iteration 281 / 980) loss: 1.623061
(Iteration 301 / 980) loss: 1.637996
(Iteration 321 / 980) loss: 1.836819
(Iteration 341 / 980) loss: 1.719912
(Iteration 361 / 980) loss: 1.788795
(Iteration 381 / 980) loss: 1.349020
(Iteration 401 / 980) loss: 1.690917
(Iteration 421 / 980) loss: 1.520324
(Iteration 441 / 980) loss: 1.762731
(Iteration 461 / 980) loss: 1.487806
(Iteration 481 / 980) loss: 2.040019
(Iteration 501 / 980) loss: 1.493980
(Iteration 521 / 980) loss: 1.443606
(Iteration 541 / 980) loss: 1.386908
(Iteration 561 / 980) loss: 1.537202
(Iteration 581 / 980) loss: 1.549965
(Iteration 601 / 980) loss: 1.945880
(Iteration 621 / 980) loss: 1.607954
(Iteration 641 / 980) loss: 1.596187
(Iteration 661 / 980) loss: 1.556345
(Iteration 681 / 980) loss: 1.496723
(Iteration 701 / 980) loss: 1.791031
(Iteration 721 / 980) loss: 1.555505
(Iteration 741 / 980) loss: 1.412481
(Iteration 761 / 980) loss: 1.649932
(Iteration 781 / 980) loss: 1.760851
(Iteration 801 / 980) loss: 1.592211
(Iteration 821 / 980) loss: 1.599358
(Iteration 841 / 980) loss: 1.666108
(Iteration 861 / 980) loss: 1.561748
(Iteration 881 / 980) loss: 1.435852
(Iteration 901 / 980) loss: 1.389881
(Iteration 921 / 980) loss: 1.658525
(Iteration 941 / 980) loss: 1.352616
(Iteration 961 / 980) loss: 1.514745
(Epoch 1 / 1) train acc: 0.470000; val_acc: 0.474000
Visualize Filters¶
You can visualize the first-layer convolutional filters from the trained network by running the following:
In [51]:
from cs231n.vis_utils import visualize_grid
grid = visualize_grid(model.params[‘W1’].transpose(0, 2, 3, 1))
plt.imshow(grid.astype(‘uint8’))
plt.axis(‘off’)
plt.gcf().set_size_inches(5, 5)
plt.show()

Spatial Batch Normalization¶
We already saw that batch normalization is a very useful technique for training deep fully-connected networks. Batch normalization can also be used for convolutional networks, but we need to tweak it a bit; the modification will be called “spatial batch normalization.”
Normally batch-normalization accepts inputs of shape (N, D) and produces outputs of shape (N, D), where we normalize across the minibatch dimension N. For data coming from convolutional layers, batch normalization needs to accept inputs of shape (N, C, H, W) and produce outputs of shape (N, C, H, W) where the N dimension gives the minibatch size and the (H, W) dimensions give the spatial size of the feature map.
If the feature map was produced using convolutions, then we expect the statistics of each feature channel to be relatively consistent both between different imagesand different locations within the same image. Therefore spatial batch normalization computes a mean and variance for each of the C feature channels by computing statistics over both the minibatch dimension N and the spatial dimensions H and W.
Spatial batch normalization: forward¶
In the file cs231n/layers.py, implement the forward pass for spatial batch normalization in the function spatial_batchnorm_forward. Check your implementation by running the following:
In [61]:
# Check the training-time forward pass by checking means and variances
# of features both before and after spatial batch normalization
N, C, H, W = 2, 3, 4, 5
x = 4 * np.random.randn(N, C, H, W) + 10
print ‘Before spatial batch normalization:’
print ‘ Shape: ‘, x.shape
print ‘ Means: ‘, x.mean(axis=(0, 2, 3))
print ‘ Stds: ‘, x.std(axis=(0, 2, 3))
# Means should be close to zero and stds close to one
gamma, beta = np.ones(C), np.zeros(C)
bn_param = {‘mode’: ‘train’}
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print ‘After spatial batch normalization:’
print ‘ Shape: ‘, out.shape
print ‘ Means: ‘, out.mean(axis=(0, 2, 3))
print ‘ Stds: ‘, out.std(axis=(0, 2, 3))
# Means should be close to beta and stds close to gamma
gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print ‘After spatial batch normalization (nontrivial gamma, beta):’
print ‘ Shape: ‘, out.shape
print ‘ Means: ‘, out.mean(axis=(0, 2, 3))
print ‘ Stds: ‘, out.std(axis=(0, 2, 3))
# print out
Before spatial batch normalization:
Shape: (2, 3, 4, 5)
Means: [ 8.34145733 10.18774677 9.50767467]
Stds: [ 3.75543964 3.75386883 4.11212784]
After spatial batch normalization:
Shape: (2, 3, 4, 5)
Means: [ 1.99840144e-16 1.77635684e-16 -2.22044605e-17]
Stds: [ 0.99999965 0.99999965 0.9999997 ]
After spatial batch normalization (nontrivial gamma, beta):
Shape: (2, 3, 4, 5)
Means: [ 6. 7. 8.]
Stds: [ 2.99999894 3.99999858 4.99999852]
In [63]:
# Check the test-time forward pass by running the training-time
# forward pass many times to warm up the running averages, and then
# checking the means and variances of activations after a test-time
# forward pass.
N, C, H, W = 10, 4, 11, 12
bn_param = {‘mode’: ‘train’}
gamma = np.ones(C)
beta = np.zeros(C)
for t in xrange(50):
x = 2.3 * np.random.randn(N, C, H, W) + 13
spatial_batchnorm_forward(x, gamma, beta, bn_param)
bn_param[‘mode’] = ‘test’
x = 2.3 * np.random.randn(N, C, H, W) + 13
a_norm, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
# Means should be close to zero and stds close to one, but will be
# noisier than training-time forward passes.
print ‘After spatial batch normalization (test-time):’
print ‘ means: ‘, a_norm.mean(axis=(0, 2, 3))
print ‘ stds: ‘, a_norm.std(axis=(0, 2, 3))
After spatial batch normalization (test-time):
means: [ 0.05936135 0.02505848 0.03810269 0.03127471]
stds: [ 1.00869769 0.9709786 0.97645297 0.98916183]
Spatial batch normalization: backward¶
In the file cs231n/layers.py, implement the backward pass for spatial batch normalization in the function spatial_batchnorm_backward. Run the following to check your implementation using a numeric gradient check:
In [65]:
N, C, H, W = 2, 3, 4, 5
x = 5 * np.random.randn(N, C, H, W) + 12
gamma = np.random.randn(C)
beta = np.random.randn(C)
dout = np.random.randn(N, C, H, W)
bn_param = {‘mode’: ‘train’}
fx = lambda x: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
fg = lambda a: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
fb = lambda b: spatial_batchnorm_forward(x, gamma, beta, bn_param)[0]
dx_num = eval_numerical_gradient_array(fx, x, dout)
da_num = eval_numerical_gradient_array(fg, gamma, dout)
db_num = eval_numerical_gradient_array(fb, beta, dout)
_, cache = spatial_batchnorm_forward(x, gamma, beta, bn_param)
dx, dgamma, dbeta = spatial_batchnorm_backward(dout, cache)
print ‘dx error: ‘, rel_error(dx_num, dx)
print ‘dgamma error: ‘, rel_error(da_num, dgamma)
print ‘dbeta error: ‘, rel_error(db_num, dbeta)
dx error: 1.59355515677e-07
dgamma error: 1.72797375419e-11
dbeta error: 3.27543314215e-12
Experiment!¶
Experiment and try to get the best performance that you can on CIFAR-10 using a ConvNet. Here are some ideas to get you started:
Things you should try:¶
• Filter size: Above we used 7×7; this makes pretty pictures but smaller filters may be more efficient
• Number of filters: Above we used 32 filters. Do more or fewer do better?
• Batch normalization: Try adding spatial batch normalization after convolution layers and vanilla batch normalization aafter affine layers. Do your networks train faster?
• Network architecture: The network above has two layers of trainable parameters. Can you do better with a deeper network? You can implement alternative architectures in the file cs231n/classifiers/convnet.py. Some good architectures to try include:
▪ [conv-relu-pool]xN – conv – relu – [affine]xM – [softmax or SVM]
▪ [conv-relu-pool]XN – [affine]XM – [softmax or SVM]
▪ [conv-relu-conv-relu-pool]xN – [affine]xM – [softmax or SVM]
Tips for training¶
For each network architecture that you try, you should tune the learning rate and regularization strength. When doing this there are a couple important things to keep in mind:
• If the parameters are working well, you should see improvement within a few hundred iterations
• Remember the course-to-fine approach for hyperparameter tuning: start by testing a large range of hyperparameters for just a few training iterations to find the combinations of parameters that are working at all.
• Once you have found some sets of parameters that seem to work, search more finely around these parameters. You may need to train for more epochs.
Going above and beyond¶
If you are feeling adventurous there are many other features you can implement to try and improve your performance. You are not required to implement any of these; however they would be good things to try for extra credit.
• Alternative update steps: For the assignment we implemented SGD+momentum, RMSprop, and Adam; you could try alternatives like AdaGrad or AdaDelta.
• Alternative activation functions such as leaky ReLU, parametric ReLU, or MaxOut.
• Model ensembles
• Data augmentation
If you do decide to implement something extra, clearly describe it in the “Extra Credit Description” cell below.
What we expect¶
At the very least, you should be able to train a ConvNet that gets at least 65% accuracy on the validation set. This is just a lower bound – if you are careful it should be possible to get accuracies much higher than that! Extra credit points will be awarded for particularly high-scoring models or unique approaches.
You should use the space below to experiment and train your network. The final cell in this notebook should contain the training, validation, and test set accuracies for your final trained network. In this notebook you should also write an explanation of what you did, any additional features that you implemented, and any visualizations or graphs that you make in the process of training and evaluating your network.
Have fun and happy training!
In [145]:
# Train a really good model on CIFAR-10
# num_train = 100
# small_data = {
# ‘X_train’: data[‘X_train’][:num_train],
# ‘y_train’: data[‘y_train’][:num_train],
# ‘X_val’: data[‘X_val’],
# ‘y_val’: data[‘y_val’],
# }
# model = ThreeLayerConvNet(weight_scale=1e-2)
# model = ConvoConnectedNet( [(32, 7)], [100], weight_scale=1e-2, use_batchnorm = True)
# solver = Solver(model, data,
# num_epochs=1, batch_size=200,
# update_rule=’adam’,
# optim_config={
# ‘learning_rate’: 1e-3,
# },
# verbose=True, print_every=10)
# solver.train()
# (Epoch 1 / 1) train acc: 0.586000; val_acc: 0.606000
model = ConvoConnectedNet( [(32, 7)], [100], weight_scale=1e-2, use_batchnorm = True, reg = 0.001)
solver = Solver(model, data,
num_epochs=10, batch_size=100,
update_rule=’adam’,
optim_config={
‘learning_rate’: 1e-3,
},
verbose=True, print_every=10)
solver.train()
(Iteration 1 / 4900) loss: 2.349475
(Epoch 0 / 10) train acc: 0.195000; val_acc: 0.179000
(Iteration 11 / 4900) loss: 2.108084
(Iteration 21 / 4900) loss: 2.038587
(Iteration 31 / 4900) loss: 1.946151
(Iteration 41 / 4900) loss: 1.860595
(Iteration 51 / 4900) loss: 1.692558
(Iteration 61 / 4900) loss: 1.666435
(Iteration 71 / 4900) loss: 1.671963
(Iteration 81 / 4900) loss: 1.589735
(Iteration 91 / 4900) loss: 1.572495
(Iteration 101 / 4900) loss: 1.619588
(Iteration 111 / 4900) loss: 1.612160
(Iteration 121 / 4900) loss: 1.436860
(Iteration 131 / 4900) loss: 1.483802
(Iteration 141 / 4900) loss: 1.392461
(Iteration 151 / 4900) loss: 1.510770
(Iteration 161 / 4900) loss: 1.475851
(Iteration 171 / 4900) loss: 1.502406
(Iteration 181 / 4900) loss: 1.442877
(Iteration 191 / 4900) loss: 1.469351
(Iteration 201 / 4900) loss: 1.341603
(Iteration 211 / 4900) loss: 1.424621
(Iteration 221 / 4900) loss: 1.557821
(Iteration 231 / 4900) loss: 1.491304
(Iteration 241 / 4900) loss: 1.526476
(Iteration 251 / 4900) loss: 1.368115
(Iteration 261 / 4900) loss: 1.451643
(Iteration 271 / 4900) loss: 1.344265
(Iteration 281 / 4900) loss: 1.332669
(Iteration 291 / 4900) loss: 1.365798
(Iteration 301 / 4900) loss: 1.362626
(Iteration 311 / 4900) loss: 1.468915
(Iteration 321 / 4900) loss: 1.236515
(Iteration 331 / 4900) loss: 1.348746
(Iteration 341 / 4900) loss: 1.352060
(Iteration 351 / 4900) loss: 1.296297
(Iteration 361 / 4900) loss: 1.369860
(Iteration 371 / 4900) loss: 1.244419
(Iteration 381 / 4900) loss: 1.314676
(Iteration 391 / 4900) loss: 1.329043
(Iteration 401 / 4900) loss: 1.433609
(Iteration 411 / 4900) loss: 1.312244
(Iteration 421 / 4900) loss: 1.192586
(Iteration 431 / 4900) loss: 1.292096
(Iteration 441 / 4900) loss: 1.244677
(Iteration 451 / 4900) loss: 1.225453
(Iteration 461 / 4900) loss: 1.178217
(Iteration 471 / 4900) loss: 1.338978
(Iteration 481 / 4900) loss: 1.126398
(Epoch 1 / 10) train acc: 0.620000; val_acc: 0.606000
(Iteration 491 / 4900) loss: 1.135781
(Iteration 501 / 4900) loss: 1.154302
(Iteration 511 / 4900) loss: 1.241877
(Iteration 521 / 4900) loss: 1.243900
(Iteration 531 / 4900) loss: 1.148066
(Iteration 541 / 4900) loss: 1.105620
(Iteration 551 / 4900) loss: 1.231018
(Iteration 561 / 4900) loss: 1.074925
(Iteration 571 / 4900) loss: 1.236390
(Iteration 581 / 4900) loss: 1.222042
(Iteration 591 / 4900) loss: 0.982776
(Iteration 601 / 4900) loss: 1.315873
(Iteration 611 / 4900) loss: 1.158725
(Iteration 621 / 4900) loss: 1.035297
(Iteration 631 / 4900) loss: 1.233601
(Iteration 641 / 4900) loss: 1.138010
(Iteration 651 / 4900) loss: 1.226809
(Iteration 661 / 4900) loss: 1.093917
(Iteration 671 / 4900) loss: 1.165608
(Iteration 681 / 4900) loss: 1.196853
(Iteration 691 / 4900) loss: 1.219479
(Iteration 701 / 4900) loss: 1.186441
(Iteration 711 / 4900) loss: 1.307014
(Iteration 721 / 4900) loss: 1.167079
(Iteration 731 / 4900) loss: 1.194765
(Iteration 741 / 4900) loss: 0.950362
(Iteration 751 / 4900) loss: 1.214124
(Iteration 761 / 4900) loss: 1.105128
(Iteration 771 / 4900) loss: 1.174585
(Iteration 781 / 4900) loss: 1.352781
(Iteration 791 / 4900) loss: 1.101397
(Iteration 801 / 4900) loss: 1.057030
(Iteration 811 / 4900) loss: 1.154894
(Iteration 821 / 4900) loss: 1.138571
(Iteration 831 / 4900) loss: 1.267678
(Iteration 841 / 4900) loss: 0.905551
(Iteration 851 / 4900) loss: 1.025931
(Iteration 861 / 4900) loss: 1.122380
(Iteration 871 / 4900) loss: 1.136210
(Iteration 881 / 4900) loss: 1.122291
(Iteration 891 / 4900) loss: 1.067022
(Iteration 901 / 4900) loss: 1.095408
(Iteration 911 / 4900) loss: 1.023535
(Iteration 921 / 4900) loss: 1.090005
(Iteration 931 / 4900) loss: 1.074765
(Iteration 941 / 4900) loss: 1.004796
(Iteration 951 / 4900) loss: 1.215702
(Iteration 961 / 4900) loss: 1.046244
(Iteration 971 / 4900) loss: 1.234910
(Epoch 2 / 10) train acc: 0.676000; val_acc: 0.621000
(Iteration 981 / 4900) loss: 1.005130
(Iteration 991 / 4900) loss: 1.155998
(Iteration 1001 / 4900) loss: 1.269907
(Iteration 1011 / 4900) loss: 1.051985
(Iteration 1021 / 4900) loss: 1.334582
(Iteration 1031 / 4900) loss: 1.085363
(Iteration 1041 / 4900) loss: 1.252961
(Iteration 1051 / 4900) loss: 1.104328
(Iteration 1061 / 4900) loss: 0.970909
(Iteration 1071 / 4900) loss: 0.922423
(Iteration 1081 / 4900) loss: 0.980690
(Iteration 1091 / 4900) loss: 1.072732
(Iteration 1101 / 4900) loss: 0.951115
(Iteration 1111 / 4900) loss: 1.035462
(Iteration 1121 / 4900) loss: 0.972250
(Iteration 1131 / 4900) loss: 0.947233
(Iteration 1141 / 4900) loss: 0.822635
(Iteration 1151 / 4900) loss: 1.072279
(Iteration 1161 / 4900) loss: 1.256793
(Iteration 1171 / 4900) loss: 1.079357
(Iteration 1181 / 4900) loss: 1.216935
(Iteration 1191 / 4900) loss: 1.063472
(Iteration 1201 / 4900) loss: 1.081911
(Iteration 1211 / 4900) loss: 1.161888
(Iteration 1221 / 4900) loss: 1.044020
(Iteration 1231 / 4900) loss: 1.188951
(Iteration 1241 / 4900) loss: 0.923921
(Iteration 1251 / 4900) loss: 1.119771
(Iteration 1261 / 4900) loss: 1.087451
(Iteration 1271 / 4900) loss: 1.126568
(Iteration 1281 / 4900) loss: 0.872293
(Iteration 1291 / 4900) loss: 1.122028
(Iteration 1301 / 4900) loss: 1.023909
(Iteration 1311 / 4900) loss: 0.933171
(Iteration 1321 / 4900) loss: 0.967757
(Iteration 1331 / 4900) loss: 0.908554
(Iteration 1341 / 4900) loss: 0.900652
(Iteration 1351 / 4900) loss: 1.003968
(Iteration 1361 / 4900) loss: 1.017485
(Iteration 1371 / 4900) loss: 0.885782
(Iteration 1381 / 4900) loss: 0.981572
(Iteration 1391 / 4900) loss: 0.957649
(Iteration 1401 / 4900) loss: 1.014009
(Iteration 1411 / 4900) loss: 1.025379
(Iteration 1421 / 4900) loss: 0.910411
(Iteration 1431 / 4900) loss: 1.116400
(Iteration 1441 / 4900) loss: 1.125180
(Iteration 1451 / 4900) loss: 1.031659
(Iteration 1461 / 4900) loss: 1.012139
(Epoch 3 / 10) train acc: 0.707000; val_acc: 0.651000
(Iteration 1471 / 4900) loss: 0.949418
(Iteration 1481 / 4900) loss: 1.008946
(Iteration 1491 / 4900) loss: 0.914326
(Iteration 1501 / 4900) loss: 1.248458
(Iteration 1511 / 4900) loss: 0.933757
(Iteration 1521 / 4900) loss: 1.117807
(Iteration 1531 / 4900) loss: 1.157050
(Iteration 1541 / 4900) loss: 0.864228
(Iteration 1551 / 4900) loss: 1.047004
(Iteration 1561 / 4900) loss: 0.970869
(Iteration 1571 / 4900) loss: 0.883831
(Iteration 1581 / 4900) loss: 0.854006
(Iteration 1591 / 4900) loss: 1.011948
(Iteration 1601 / 4900) loss: 1.017881
(Iteration 1611 / 4900) loss: 0.991586
(Iteration 1621 / 4900) loss: 1.128257
(Iteration 1631 / 4900) loss: 0.995421
(Iteration 1641 / 4900) loss: 0.991380
(Iteration 1651 / 4900) loss: 0.921027
(Iteration 1661 / 4900) loss: 0.948548
(Iteration 1671 / 4900) loss: 1.027954
(Iteration 1681 / 4900) loss: 0.960170
(Iteration 1691 / 4900) loss: 0.915296
(Iteration 1701 / 4900) loss: 0.900243
(Iteration 1711 / 4900) loss: 1.005571
(Iteration 1721 / 4900) loss: 1.099925
(Iteration 1731 / 4900) loss: 0.966131
(Iteration 1741 / 4900) loss: 0.878082
(Iteration 1751 / 4900) loss: 0.881398
(Iteration 1761 / 4900) loss: 1.134881
(Iteration 1771 / 4900) loss: 0.995468
(Iteration 1781 / 4900) loss: 0.965173
(Iteration 1791 / 4900) loss: 0.933680
(Iteration 1801 / 4900) loss: 1.081002
(Iteration 1811 / 4900) loss: 0.886615
(Iteration 1821 / 4900) loss: 0.959413
(Iteration 1831 / 4900) loss: 0.988531
(Iteration 1841 / 4900) loss: 0.933293
(Iteration 1851 / 4900) loss: 1.068693
(Iteration 1861 / 4900) loss: 0.875663
(Iteration 1871 / 4900) loss: 1.056919
(Iteration 1881 / 4900) loss: 1.259342
(Iteration 1891 / 4900) loss: 1.106606
(Iteration 1901 / 4900) loss: 1.079232
(Iteration 1911 / 4900) loss: 0.863394
(Iteration 1921 / 4900) loss: 0.849612
(Iteration 1931 / 4900) loss: 0.876831
(Iteration 1941 / 4900) loss: 0.877882
(Iteration 1951 / 4900) loss: 0.792745
(Epoch 4 / 10) train acc: 0.764000; val_acc: 0.673000
(Iteration 1961 / 4900) loss: 0.887813
(Iteration 1971 / 4900) loss: 0.847412
(Iteration 1981 / 4900) loss: 1.061428
(Iteration 1991 / 4900) loss: 1.046224
(Iteration 2001 / 4900) loss: 0.941656
(Iteration 2011 / 4900) loss: 0.915866
(Iteration 2021 / 4900) loss: 0.832122
(Iteration 2031 / 4900) loss: 0.877350
(Iteration 2041 / 4900) loss: 1.072725
(Iteration 2051 / 4900) loss: 0.849924
(Iteration 2061 / 4900) loss: 1.000968
(Iteration 2071 / 4900) loss: 0.909729
(Iteration 2081 / 4900) loss: 0.986123
(Iteration 2091 / 4900) loss: 1.103448
(Iteration 2101 / 4900) loss: 1.077264
(Iteration 2111 / 4900) loss: 0.901037
(Iteration 2121 / 4900) loss: 0.850147
(Iteration 2131 / 4900) loss: 0.905657
(Iteration 2141 / 4900) loss: 1.066942
(Iteration 2151 / 4900) loss: 0.980156
(Iteration 2161 / 4900) loss: 0.859521
(Iteration 2171 / 4900) loss: 1.032124
(Iteration 2181 / 4900) loss: 0.958032
(Iteration 2191 / 4900) loss: 0.831273
(Iteration 2201 / 4900) loss: 1.006524
(Iteration 2211 / 4900) loss: 0.887646
(Iteration 2221 / 4900) loss: 0.934334
(Iteration 2231 / 4900) loss: 0.923419
(Iteration 2241 / 4900) loss: 0.988292
(Iteration 2251 / 4900) loss: 0.940045
(Iteration 2261 / 4900) loss: 0.926120
(Iteration 2271 / 4900) loss: 0.847050
(Iteration 2281 / 4900) loss: 0.879536
(Iteration 2291 / 4900) loss: 0.971689
(Iteration 2301 / 4900) loss: 1.021186
(Iteration 2311 / 4900) loss: 1.142980
(Iteration 2321 / 4900) loss: 0.890221
(Iteration 2331 / 4900) loss: 0.895106
(Iteration 2341 / 4900) loss: 0.755692
(Iteration 2351 / 4900) loss: 0.797354
(Iteration 2361 / 4900) loss: 1.107999
(Iteration 2371 / 4900) loss: 0.884316
(Iteration 2381 / 4900) loss: 0.964876
(Iteration 2391 / 4900) loss: 1.033086
(Iteration 2401 / 4900) loss: 0.942267
(Iteration 2411 / 4900) loss: 0.863802
(Iteration 2421 / 4900) loss: 0.919329
(Iteration 2431 / 4900) loss: 0.897507
(Iteration 2441 / 4900) loss: 0.828159
(Epoch 5 / 10) train acc: 0.755000; val_acc: 0.677000
(Iteration 2451 / 4900) loss: 0.890574
(Iteration 2461 / 4900) loss: 1.084460
(Iteration 2471 / 4900) loss: 0.792382
(Iteration 2481 / 4900) loss: 0.796894
(Iteration 2491 / 4900) loss: 0.837069
(Iteration 2501 / 4900) loss: 0.806074
(Iteration 2511 / 4900) loss: 1.048511
(Iteration 2521 / 4900) loss: 0.754703
(Iteration 2531 / 4900) loss: 0.893880
(Iteration 2541 / 4900) loss: 0.798263
(Iteration 2551 / 4900) loss: 0.927424
(Iteration 2561 / 4900) loss: 0.969523
(Iteration 2571 / 4900) loss: 0.858452
(Iteration 2581 / 4900) loss: 1.044043
(Iteration 2591 / 4900) loss: 1.019671
(Iteration 2601 / 4900) loss: 0.892999
(Iteration 2611 / 4900) loss: 0.854015
(Iteration 2621 / 4900) loss: 0.662875
(Iteration 2631 / 4900) loss: 0.846160
(Iteration 2641 / 4900) loss: 0.838161
(Iteration 2651 / 4900) loss: 0.763272
(Iteration 2661 / 4900) loss: 0.826610
(Iteration 2671 / 4900) loss: 0.825466
(Iteration 2681 / 4900) loss: 1.066117
(Iteration 2691 / 4900) loss: 0.815104
(Iteration 2701 / 4900) loss: 0.718516
(Iteration 2711 / 4900) loss: 0.757613
(Iteration 2721 / 4900) loss: 0.878024
(Iteration 2731 / 4900) loss: 0.933245
(Iteration 2741 / 4900) loss: 0.840712
(Iteration 2751 / 4900) loss: 0.814755
(Iteration 2761 / 4900) loss: 0.837787
(Iteration 2771 / 4900) loss: 0.773391
(Iteration 2781 / 4900) loss: 0.960278
(Iteration 2791 / 4900) loss: 0.764837
(Iteration 2801 / 4900) loss: 0.813582
(Iteration 2811 / 4900) loss: 0.892240
(Iteration 2821 / 4900) loss: 0.893417
(Iteration 2831 / 4900) loss: 0.990944
(Iteration 2841 / 4900) loss: 0.820242
(Iteration 2851 / 4900) loss: 0.896815
(Iteration 2861 / 4900) loss: 0.906842
(Iteration 2871 / 4900) loss: 0.893291
(Iteration 2881 / 4900) loss: 0.833539
(Iteration 2891 / 4900) loss: 1.105986
(Iteration 2901 / 4900) loss: 0.846088
(Iteration 2911 / 4900) loss: 0.854187
(Iteration 2921 / 4900) loss: 0.818431
(Iteration 2931 / 4900) loss: 0.721106
(Epoch 6 / 10) train acc: 0.781000; val_acc: 0.665000
(Iteration 2941 / 4900) loss: 0.813465
(Iteration 2951 / 4900) loss: 0.876677
(Iteration 2961 / 4900) loss: 0.824349
(Iteration 2971 / 4900) loss: 0.872870
(Iteration 2981 / 4900) loss: 0.968806
(Iteration 2991 / 4900) loss: 0.798334
(Iteration 3001 / 4900) loss: 0.606209
(Iteration 3011 / 4900) loss: 0.743586
(Iteration 3021 / 4900) loss: 0.825604
(Iteration 3031 / 4900) loss: 0.930333
(Iteration 3041 / 4900) loss: 0.733789
(Iteration 3051 / 4900) loss: 0.843146
(Iteration 3061 / 4900) loss: 1.019768
(Iteration 3071 / 4900) loss: 0.895059
(Iteration 3081 / 4900) loss: 0.768567
(Iteration 3091 / 4900) loss: 0.809840
(Iteration 3101 / 4900) loss: 0.736426
(Iteration 3111 / 4900) loss: 0.816224
(Iteration 3121 / 4900) loss: 1.046517
(Iteration 3131 / 4900) loss: 0.852326
(Iteration 3141 / 4900) loss: 0.846737
(Iteration 3151 / 4900) loss: 0.722145
(Iteration 3161 / 4900) loss: 0.887418
(Iteration 3171 / 4900) loss: 0.818104
(Iteration 3181 / 4900) loss: 0.723976
(Iteration 3191 / 4900) loss: 0.876916
(Iteration 3201 / 4900) loss: 0.769441
(Iteration 3211 / 4900) loss: 0.938827
(Iteration 3221 / 4900) loss: 0.880592
(Iteration 3231 / 4900) loss: 0.830734
(Iteration 3241 / 4900) loss: 0.832731
(Iteration 3251 / 4900) loss: 0.796142
(Iteration 3261 / 4900) loss: 0.784375
(Iteration 3271 / 4900) loss: 1.014950
(Iteration 3281 / 4900) loss: 0.798339
(Iteration 3291 / 4900) loss: 0.695761
(Iteration 3301 / 4900) loss: 0.807420
(Iteration 3311 / 4900) loss: 0.973377
(Iteration 3321 / 4900) loss: 0.863173
(Iteration 3331 / 4900) loss: 1.017492
(Iteration 3341 / 4900) loss: 0.891199
(Iteration 3351 / 4900) loss: 0.762813
(Iteration 3361 / 4900) loss: 0.819447
(Iteration 3371 / 4900) loss: 0.863040
(Iteration 3381 / 4900) loss: 0.746993
(Iteration 3391 / 4900) loss: 0.816747
(Iteration 3401 / 4900) loss: 0.881394
(Iteration 3411 / 4900) loss: 0.864796
(Iteration 3421 / 4900) loss: 0.921203
(Epoch 7 / 10) train acc: 0.811000; val_acc: 0.695000
(Iteration 3431 / 4900) loss: 0.908046
(Iteration 3441 / 4900) loss: 0.887571
(Iteration 3451 / 4900) loss: 0.783812
(Iteration 3461 / 4900) loss: 0.789480
(Iteration 3471 / 4900) loss: 0.712670
(Iteration 3481 / 4900) loss: 0.845727
(Iteration 3491 / 4900) loss: 0.742852
(Iteration 3501 / 4900) loss: 0.680383
(Iteration 3511 / 4900) loss: 0.784017
(Iteration 3521 / 4900) loss: 0.709126
(Iteration 3531 / 4900) loss: 0.919844
(Iteration 3541 / 4900) loss: 0.760931
(Iteration 3551 / 4900) loss: 0.722423
(Iteration 3561 / 4900) loss: 0.786753
(Iteration 3571 / 4900) loss: 0.786641
(Iteration 3581 / 4900) loss: 0.750231
(Iteration 3591 / 4900) loss: 0.929126
(Iteration 3601 / 4900) loss: 1.046636
(Iteration 3611 / 4900) loss: 0.773860
(Iteration 3621 / 4900) loss: 0.797793
(Iteration 3631 / 4900) loss: 0.672807
(Iteration 3641 / 4900) loss: 0.970435
(Iteration 3651 / 4900) loss: 0.846172
(Iteration 3661 / 4900) loss: 0.801360
(Iteration 3671 / 4900) loss: 0.718122
(Iteration 3681 / 4900) loss: 0.759350
(Iteration 3691 / 4900) loss: 0.672899
(Iteration 3701 / 4900) loss: 0.753029
(Iteration 3711 / 4900) loss: 0.877542
(Iteration 3721 / 4900) loss: 0.821379
(Iteration 3731 / 4900) loss: 0.749943
(Iteration 3741 / 4900) loss: 0.735686
(Iteration 3751 / 4900) loss: 0.681666
(Iteration 3761 / 4900) loss: 0.772523
(Iteration 3771 / 4900) loss: 0.816757
(Iteration 3781 / 4900) loss: 0.844597
(Iteration 3791 / 4900) loss: 0.900891
(Iteration 3801 / 4900) loss: 0.849011
(Iteration 3811 / 4900) loss: 0.682869
(Iteration 3821 / 4900) loss: 0.797422
(Iteration 3831 / 4900) loss: 0.915422
(Iteration 3841 / 4900) loss: 0.863164
(Iteration 3851 / 4900) loss: 1.033201
(Iteration 3861 / 4900) loss: 0.712044
(Iteration 3871 / 4900) loss: 0.668526
(Iteration 3881 / 4900) loss: 0.746382
(Iteration 3891 / 4900) loss: 0.834883
(Iteration 3901 / 4900) loss: 0.798530
(Iteration 3911 / 4900) loss: 0.888593
(Epoch 8 / 10) train acc: 0.817000; val_acc: 0.676000
(Iteration 3921 / 4900) loss: 0.872623
(Iteration 3931 / 4900) loss: 0.895267
(Iteration 3941 / 4900) loss: 0.849411
(Iteration 3951 / 4900) loss: 0.690541
(Iteration 3961 / 4900) loss: 0.708850
(Iteration 3971 / 4900) loss: 0.853350
(Iteration 3981 / 4900) loss: 0.703812
(Iteration 3991 / 4900) loss: 0.803158
(Iteration 4001 / 4900) loss: 0.716203
(Iteration 4011 / 4900) loss: 0.850785
(Iteration 4021 / 4900) loss: 0.831967
(Iteration 4031 / 4900) loss: 0.897323
(Iteration 4041 / 4900) loss: 0.767640
(Iteration 4051 / 4900) loss: 0.755837
(Iteration 4061 / 4900) loss: 0.609939
(Iteration 4071 / 4900) loss: 0.706501
(Iteration 4081 / 4900) loss: 0.708737
(Iteration 4091 / 4900) loss: 0.677101
(Iteration 4101 / 4900) loss: 0.748415
(Iteration 4111 / 4900) loss: 0.723392
(Iteration 4121 / 4900) loss: 0.607537
(Iteration 4131 / 4900) loss: 0.632094
(Iteration 4141 / 4900) loss: 0.733898
(Iteration 4151 / 4900) loss: 0.718885
(Iteration 4161 / 4900) loss: 0.805234
(Iteration 4171 / 4900) loss: 0.865545
(Iteration 4181 / 4900) loss: 0.671142
(Iteration 4191 / 4900) loss: 0.829241
(Iteration 4201 / 4900) loss: 0.644056
(Iteration 4211 / 4900) loss: 0.715216
(Iteration 4221 / 4900) loss: 0.856073
(Iteration 4231 / 4900) loss: 0.992739
(Iteration 4241 / 4900) loss: 0.897028
(Iteration 4251 / 4900) loss: 0.808812
(Iteration 4261 / 4900) loss: 0.863981
(Iteration 4271 / 4900) loss: 0.724470
(Iteration 4281 / 4900) loss: 0.711593
(Iteration 4291 / 4900) loss: 0.819189
(Iteration 4301 / 4900) loss: 0.788668
(Iteration 4311 / 4900) loss: 0.853776
(Iteration 4321 / 4900) loss: 0.958360
(Iteration 4331 / 4900) loss: 0.661267
(Iteration 4341 / 4900) loss: 0.790899
(Iteration 4351 / 4900) loss: 0.779448
(Iteration 4361 / 4900) loss: 0.934495
(Iteration 4371 / 4900) loss: 0.803143
(Iteration 4381 / 4900) loss: 0.771951
(Iteration 4391 / 4900) loss: 0.722691
(Iteration 4401 / 4900) loss: 0.787155
(Epoch 9 / 10) train acc: 0.828000; val_acc: 0.695000
(Iteration 4411 / 4900) loss: 0.772397
(Iteration 4421 / 4900) loss: 0.765978
(Iteration 4431 / 4900) loss: 0.877139
(Iteration 4441 / 4900) loss: 0.883751
(Iteration 4451 / 4900) loss: 0.726202
(Iteration 4461 / 4900) loss: 0.807448
(Iteration 4471 / 4900) loss: 0.826968
(Iteration 4481 / 4900) loss: 0.700057
(Iteration 4491 / 4900) loss: 0.744502
(Iteration 4501 / 4900) loss: 0.692437
(Iteration 4511 / 4900) loss: 0.818369
(Iteration 4521 / 4900) loss: 0.908699
(Iteration 4531 / 4900) loss: 0.776671
(Iteration 4541 / 4900) loss: 0.819699
(Iteration 4551 / 4900) loss: 0.815685
(Iteration 4561 / 4900) loss: 0.713748
(Iteration 4571 / 4900) loss: 0.585705
(Iteration 4581 / 4900) loss: 0.809625
(Iteration 4591 / 4900) loss: 0.848895
(Iteration 4601 / 4900) loss: 0.906940
(Iteration 4611 / 4900) loss: 0.930338
(Iteration 4621 / 4900) loss: 0.730985
(Iteration 4631 / 4900) loss: 0.750572
(Iteration 4641 / 4900) loss: 0.838269
(Iteration 4651 / 4900) loss: 0.841107
(Iteration 4661 / 4900) loss: 0.788277
(Iteration 4671 / 4900) loss: 0.782542
(Iteration 4681 / 4900) loss: 0.771011
(Iteration 4691 / 4900) loss: 0.753001
(Iteration 4701 / 4900) loss: 0.702424
(Iteration 4711 / 4900) loss: 0.915836
(Iteration 4721 / 4900) loss: 0.759142
(Iteration 4731 / 4900) loss: 0.759592
(Iteration 4741 / 4900) loss: 0.553609
(Iteration 4751 / 4900) loss: 0.670858
(Iteration 4761 / 4900) loss: 0.752814
(Iteration 4771 / 4900) loss: 0.862292
(Iteration 4781 / 4900) loss: 0.870768
(Iteration 4791 / 4900) loss: 0.706257
(Iteration 4801 / 4900) loss: 0.775441
(Iteration 4811 / 4900) loss: 0.720358
(Iteration 4821 / 4900) loss: 0.637500
(Iteration 4831 / 4900) loss: 0.946981
(Iteration 4841 / 4900) loss: 0.707923
(Iteration 4851 / 4900) loss: 0.781114
(Iteration 4861 / 4900) loss: 0.730278
(Iteration 4871 / 4900) loss: 0.880367
(Iteration 4881 / 4900) loss: 0.691554
(Iteration 4891 / 4900) loss: 0.591691
(Epoch 10 / 10) train acc: 0.844000; val_acc: 0.677000
In [146]:
y_test_pred = np.argmax(model.loss(data[‘X_test’]), axis=1)
print ‘Test set accuracy: ‘, (y_test_pred == data[‘y_test’]).mean()
Test set accuracy: 0.676
Extra Credit Description¶
If you implement any additional features for extra credit, clearly describe them here with pointers to any code in this or other files if applicable.