Topic 3: Convolution Layer¶
Convolution Layer Implementation¶
In [1]:
import mlp.layers as layers
import mlp.initialisers as init
class ConvolutionalLayer(layers.LayerWithParameters):
“””Layer implementing a 2D convolution-based transformation of its inputs.
The layer is parameterised by a set of 2D convolutional kernels, a four
dimensional array of shape
(num_output_channels, num_input_channels, kernel_dim_1, kernel_dim_2)
and a bias vector, a one dimensional array of shape
(num_output_channels,)
i.e. one shared bias per output channel.
Assuming no-padding is applied to the inputs so that outputs are only
calculated for positions where the kernel filters fully overlap with the
inputs, and that unit strides are used the outputs will have spatial extent
output_dim_1 = input_dim_1 – kernel_dim_1 + 1
output_dim_2 = input_dim_2 – kernel_dim_2 + 1
“””
def __init__(self, num_input_channels, num_output_channels,
input_dim_1, input_dim_2,
kernel_dim_1, kernel_dim_2,
kernels_init=init.UniformInit(-0.01, 0.01),
biases_init=init.ConstantInit(0.),
kernels_penalty=None, biases_penalty=None):
“””Initialises a parameterised convolutional layer.
Args:
num_input_channels (int): Number of channels in inputs to
layer (this may be number of colour channels in the input
images if used as the first layer in a model, or the
number of output channels, a.k.a. feature maps, from a
a previous convolutional layer).
num_output_channels (int): Number of channels in outputs
from the layer, a.k.a. number of feature maps.
input_dim_1 (int): Size of first input dimension of each 2D
channel of inputs.
input_dim_2 (int): Size of second input dimension of each 2D
channel of inputs.
kernel_dim_x (int): Size of first dimension of each 2D channel of
kernels.
kernel_dim_y (int): Size of second dimension of each 2D channel of
kernels.
kernels_intialiser: Initialiser for the kernel parameters.
biases_initialiser: Initialiser for the bias parameters.
kernels_penalty: Kernel-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the kernels.
biases_penalty: Biases-dependent penalty term (regulariser) or
None if no regularisation is to be applied to the biases.
“””
self.num_input_channels = num_input_channels
self.num_output_channels = num_output_channels
self.input_dim_1 = input_dim_1
self.input_dim_2 = input_dim_2
self.kernel_dim_1 = kernel_dim_1
self.kernel_dim_2 = kernel_dim_2
self.kernels_init = kernels_init
self.biases_init = biases_init
self.kernels_shape = (
num_output_channels, num_input_channels, kernel_dim_1, kernel_dim_2
)
self.inputs_shape = (
None, num_input_channels, input_dim_1, input_dim_2
)
self.kernels = self.kernels_init(self.kernels_shape)
self.biases = self.biases_init(num_output_channels)
self.kernels_penalty = kernels_penalty
self.biases_penalty = biases_penalty
def fprop(self, inputs):
“””Forward propagates activations through the layer transformation.
For inputs `x`, outputs `y`, kernels `K` and biases `b` the layer
corresponds to `y = conv2d(x, K) + b`.
Args:
inputs: Array of layer inputs of shape
(batch_size, num_input_channels, input_dim_1, input_dim_2).
Returns:
outputs: Array of layer outputs of shape
(batch_size, num_output_channels, output_dim_1, output_dim_2).
“””
stride = 1
inputs = inputs
w = self.kernels[:, :, ::-1, ::-1]
b = self.biases
batch_size, num_input_channels, input_dim_1, input_dim_2 = inputs.shape
num_output_channels, num_input_channels, kernel_dim_1, kernel_dim_2 = w.shape
out = np.zeros((batch_size, num_output_channels, 1 + (input_dim_1 – kernel_dim_1) / stride, 1+ (input_dim_2 – kernel_dim_2) / stride))
hc = 0
for i in range(0, input_dim_1 – kernel_dim_1 + 1, stride):
wc = 0
for j in range(0, input_dim_2 – kernel_dim_2 + 1, stride):
subx = inputs[:, :, i:i + kernel_dim_1, j:j+ kernel_dim_2]
for u in range(batch_size):
for v in range(num_output_channels):
out[u, v, hc, wc] = np.sum(subx[u] * w[v]) + b[v]
wc += 1
hc += 1
return out
def bprop(self, inputs, outputs, grads_wrt_outputs):
“””Back propagates gradients through a layer.
Given gradients with respect to the outputs of the layer calculates the
gradients with respect to the layer inputs.
Args:
inputs: Array of layer inputs of shape
(batch_size, num_input_channels, input_dim_1, input_dim_2).
outputs: Array of layer outputs calculated in forward pass of
shape
(batch_size, num_output_channels, output_dim_1, output_dim_2).
grads_wrt_outputs: Array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_dim_1, output_dim_2).
Returns:
Array of gradients with respect to the layer inputs of shape
(batch_size, num_input_channels, input_dim_1, input_dim_2).
“””
dout = grads_wrt_outputs
stride = 1
pad = 0
hc = 0
inputs = inputs
w = self.kernels[:, :, ::-1, ::-1]
batch_size, num_input_channels, input_dim_1, input_dim_2 = inputs.shape
num_output_channels, num_input_channels, kernel_dim_1, kernel_dim_2 = w.shape
dinputs = np.zeros_like(inputs)
for i in range(0, input_dim_1 – kernel_dim_1 + 1, stride):
wc = 0
for j in range(0, input_dim_2 – kernel_dim_2 + 1, stride):
dsubx = dinputs[:, :, i:i + kernel_dim_1, j:j+ kernel_dim_2]
for u in range(batch_size):
for v in range(num_output_channels):
dsubx[u] += dout[u, v, hc, wc] * w[v]
wc += 1
hc += 1
dx = dinputs[:,:, pad:input_dim_1-pad, pad:input_dim_2-pad]
return dx
def grads_wrt_params(self, inputs, grads_wrt_outputs):
“””Calculates gradients with respect to layer parameters.
Args:
inputs: array of inputs to layer of shape (batch_size, input_dim)
grads_wrt_to_outputs: array of gradients with respect to the layer
outputs of shape
(batch_size, num_output_channels, output_dim_1, output_dim_2).
Returns:
list of arrays of gradients with respect to the layer parameters
`[grads_wrt_kernels, grads_wrt_biases]`.
“””
# raise NotImplementedError()
dout = grads_wrt_outputs
stride = 1
hc = 0
inputs = inputs
w = self.kernels[:, :, ::-1, ::-1]
b = self.biases
batch_size, num_input_channels, input_dim_1, input_dim_2 = inputs.shape
num_output_channels, num_input_channels, kernel_dim_1, kernel_dim_2 = w.shape
dw = np.zeros_like(w)
db = np.zeros_like(b)
for i in range(0, input_dim_1 – kernel_dim_1 + 1, stride):
wc = 0
for j in range(0, input_dim_2 – kernel_dim_2 + 1, stride):
subx = inputs[:, :, i:i + kernel_dim_1, j:j+ kernel_dim_2]
for u in range(batch_size):
for v in range(num_output_channels):
dw[v] += dout[u, v, hc, wc] * subx[u]
db[v] += dout[u, v, hc, wc]
wc += 1
hc += 1
dw = dw[:, :, ::-1, ::-1]
return [dw, db]
def params_penalty(self):
“””Returns the parameter dependent penalty term for this layer.
If no parameter-dependent penalty terms are set this returns zero.
“””
params_penalty = 0
if self.kernels_penalty is not None:
params_penalty += self.kernels_penalty(self.kernels)
if self.biases_penalty is not None:
params_penalty += self.biases_penalty(self.biases)
return params_penalty
@property
def params(self):
“””A list of layer parameter values: `[kernels, biases]`.”””
return [self.kernels, self.biases]
@params.setter
def params(self, values):
self.kernels = values[0]
self.biases = values[1]
def __repr__(self):
return (
‘ConvolutionalLayer(\n’
‘ num_input_channels={0}, num_output_channels={1},\n’
‘ input_dim_1={2}, input_dim_2={3},\n’
‘ kernel_dim_1={4}, kernel_dim_2={5}\n’
‘)’
.format(self.num_input_channels, self.num_output_channels,
self.input_dim_1, self.input_dim_2, self.kernel_dim_1,
self.kernel_dim_2)
)
Convolution Layer Testing¶
In [2]:
import numpy as np
def test_conv_layer_fprop(layer_class, do_cross_correlation=False):
“””Tests `fprop` method of a convolutional layer.
Checks the outputs of `fprop` method for a fixed input against known
reference values for the outputs and raises an AssertionError if
the outputted values are not consistent with the reference values. If
tests are all passed returns True.
Args:
layer_class: Convolutional layer implementation following the
interface defined in the provided skeleton class.
do_cross_correlation: Whether the layer implements an operation
corresponding to cross-correlation (True) i.e kernels are
not flipped before sliding over inputs, or convolution
(False) with filters being flipped.
Raises:
AssertionError: Raised if output of `layer.fprop` is inconsistent
with reference values either in shape or values.
“””
inputs = np.arange(96).reshape((2, 3, 4, 4))
kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))
if do_cross_correlation:
kernels = kernels[:, :, ::-1, ::-1]
biases = np.arange(2)
true_output = np.array(
[[[[ -958., -1036., -1114.],
[-1270., -1348., -1426.],
[-1582., -1660., -1738.]],
[[ 1707., 1773., 1839.],
[ 1971., 2037., 2103.],
[ 2235., 2301., 2367.]]],
[[[-4702., -4780., -4858.],
[-5014., -5092., -5170.],
[-5326., -5404., -5482.]],
[[ 4875., 4941., 5007.],
[ 5139., 5205., 5271.],
[ 5403., 5469., 5535.]]]]
)
layer = layer_class(
num_input_channels=kernels.shape[1],
num_output_channels=kernels.shape[0],
input_dim_1=inputs.shape[2],
input_dim_2=inputs.shape[3],
kernel_dim_1=kernels.shape[2],
kernel_dim_2=kernels.shape[3]
)
layer.params = [kernels, biases]
layer_output = layer.fprop(inputs)
assert layer_output.shape == true_output.shape, (
‘Layer fprop gives incorrect shaped output. ‘
‘Correct shape is \n\n{0}\n\n but returned shape is \n\n{1}.’
.format(true_output.shape, layer_output.shape)
)
assert np.allclose(layer_output, true_output), (
‘Layer fprop does not give correct output. ‘
‘Correct output is \n\n{0}\n\n but returned output is \n\n{1}.’
.format(true_output, layer_output)
)
return True
def test_conv_layer_bprop(layer_class, do_cross_correlation=False):
“””Tests `bprop` method of a convolutional layer.
Checks the outputs of `bprop` method for a fixed input against known
reference values for the gradients with respect to inputs and raises
an AssertionError if the returned values are not consistent with the
reference values. If tests are all passed returns True.
Args:
layer_class: Convolutional layer implementation following the
interface defined in the provided skeleton class.
do_cross_correlation: Whether the layer implements an operation
corresponding to cross-correlation (True) i.e kernels are
not flipped before sliding over inputs, or convolution
(False) with filters being flipped.
Raises:
AssertionError: Raised if output of `layer.bprop` is inconsistent
with reference values either in shape or values.
“””
inputs = np.arange(96).reshape((2, 3, 4, 4))
kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))
if do_cross_correlation:
kernels = kernels[:, :, ::-1, ::-1]
biases = np.arange(2)
grads_wrt_outputs = np.arange(-20, 16).reshape((2, 2, 3, 3))
outputs = np.array(
[[[[ -958., -1036., -1114.],
[-1270., -1348., -1426.],
[-1582., -1660., -1738.]],
[[ 1707., 1773., 1839.],
[ 1971., 2037., 2103.],
[ 2235., 2301., 2367.]]],
[[[-4702., -4780., -4858.],
[-5014., -5092., -5170.],
[-5326., -5404., -5482.]],
[[ 4875., 4941., 5007.],
[ 5139., 5205., 5271.],
[ 5403., 5469., 5535.]]]]
)
true_grads_wrt_inputs = np.array(
[[[[ 147., 319., 305., 162.],
[ 338., 716., 680., 354.],
[ 290., 608., 572., 294.],
[ 149., 307., 285., 144.]],
[[ 23., 79., 81., 54.],
[ 114., 284., 280., 162.],
[ 114., 272., 268., 150.],
[ 73., 163., 157., 84.]],
[[-101., -161., -143., -54.],
[-110., -148., -120., -30.],
[ -62., -64., -36., 6.],
[ -3., 19., 29., 24.]]],
[[[ 39., 67., 53., 18.],
[ 50., 68., 32., -6.],
[ 2., -40., -76., -66.],
[ -31., -89., -111., -72.]],
[[ 59., 115., 117., 54.],
[ 114., 212., 208., 90.],
[ 114., 200., 196., 78.],
[ 37., 55., 49., 12.]],
[[ 79., 163., 181., 90.],
[ 178., 356., 384., 186.],
[ 226., 440., 468., 222.],
[ 105., 199., 209., 96.]]]])
layer = layer_class(
num_input_channels=kernels.shape[1],
num_output_channels=kernels.shape[0],
input_dim_1=inputs.shape[2],
input_dim_2=inputs.shape[3],
kernel_dim_1=kernels.shape[2],
kernel_dim_2=kernels.shape[3]
)
layer.params = [kernels, biases]
layer_grads_wrt_inputs = layer.bprop(inputs, outputs, grads_wrt_outputs)
assert layer_grads_wrt_inputs.shape == true_grads_wrt_inputs.shape, (
‘Layer bprop returns incorrect shaped array. ‘
‘Correct shape is \n\n{0}\n\n but returned shape is \n\n{1}.’
.format(true_grads_wrt_inputs.shape, layer_grads_wrt_inputs.shape)
)
assert np.allclose(layer_grads_wrt_inputs, true_grads_wrt_inputs), (
‘Layer bprop does not return correct values. ‘
‘Correct output is \n\n{0}\n\n but returned output is \n\n{1}’
.format(true_grads_wrt_inputs, layer_grads_wrt_inputs)
)
return True
def test_conv_layer_grad_wrt_params(
layer_class, do_cross_correlation=False):
“””Tests `grad_wrt_params` method of a convolutional layer.
Checks the outputs of `grad_wrt_params` method for fixed inputs
against known reference values for the gradients with respect to
kernels and biases, and raises an AssertionError if the returned
values are not consistent with the reference values. If tests
are all passed returns True.
Args:
layer_class: Convolutional layer implementation following the
interface defined in the provided skeleton class.
do_cross_correlation: Whether the layer implements an operation
corresponding to cross-correlation (True) i.e kernels are
not flipped before sliding over inputs, or convolution
(False) with filters being flipped.
Raises:
AssertionError: Raised if output of `layer.bprop` is inconsistent
with reference values either in shape or values.
“””
inputs = np.arange(96).reshape((2, 3, 4, 4))
kernels = np.arange(-12, 12).reshape((2, 3, 2, 2))
biases = np.arange(2)
grads_wrt_outputs = np.arange(-20, 16).reshape((2, 2, 3, 3))
true_kernel_grads = np.array(
[[[[ -240., -114.],
[ 264., 390.]],
[[-2256., -2130.],
[-1752., -1626.]],
[[-4272., -4146.],
[-3768., -3642.]]],
[[[ 5268., 5232.],
[ 5124., 5088.]],
[[ 5844., 5808.],
[ 5700., 5664.]],
[[ 6420., 6384.],
[ 6276., 6240.]]]])
if do_cross_correlation:
kernels = kernels[:, :, ::-1, ::-1]
true_kernel_grads = true_kernel_grads[:, :, ::-1, ::-1]
true_bias_grads = np.array([-126., 36.])
layer = layer_class(
num_input_channels=kernels.shape[1],
num_output_channels=kernels.shape[0],
input_dim_1=inputs.shape[2],
input_dim_2=inputs.shape[3],
kernel_dim_1=kernels.shape[2],
kernel_dim_2=kernels.shape[3]
)
layer.params = [kernels, biases]
layer_kernel_grads, layer_bias_grads = (
layer.grads_wrt_params(inputs, grads_wrt_outputs))
assert layer_kernel_grads.shape == true_kernel_grads.shape, (
‘grads_wrt_params gives incorrect shaped kernel gradients output. ‘
‘Correct shape is \n\n{0}\n\n but returned shape is \n\n{1}.’
.format(true_kernel_grads.shape, layer_kernel_grads.shape)
)
assert np.allclose(layer_kernel_grads, true_kernel_grads), (
‘grads_wrt_params does not give correct kernel gradients output. ‘
‘Correct output is \n\n{0}\n\n but returned output is \n\n{1}.’
.format(true_kernel_grads, layer_kernel_grads)
)
assert layer_bias_grads.shape == true_bias_grads.shape, (
‘grads_wrt_params gives incorrect shaped bias gradients output. ‘
‘Correct shape is \n\n{0}\n\n but returned shape is \n\n{1}.’
.format(true_bias_grads.shape, layer_bias_grads.shape)
)
assert np.allclose(layer_bias_grads, true_bias_grads), (
‘grads_wrt_params does not give correct bias gradients output. ‘
‘Correct output is \n\n{0}\n\n but returned output is \n\n{1}.’
.format(true_bias_grads, layer_bias_grads)
)
return True
An example of using the test functions if given in the cell below. This assumes you implement a convolution (rather than cross-correlation) operation. If the implementation is correct
In [3]:
all_correct = test_conv_layer_fprop(ConvolutionalLayer, False)
all_correct &= test_conv_layer_bprop(ConvolutionalLayer, False)
all_correct &= test_conv_layer_grad_wrt_params(ConvolutionalLayer, False)
if all_correct:
print(‘All tests passed.’)
All tests passed.
In [5]:
%matplotlib notebook
import numpy as np
import matplotlib.pyplot as plt
import logging
from mlp.layers import AffineLayer, SoftmaxLayer, SigmoidLayer
from mlp.errors import CrossEntropyError, CrossEntropySoftmaxError
from mlp.models import SingleLayerModel, MultipleLayerModel
from mlp.initialisers import UniformInit
from mlp.learning_rules import GradientDescentLearningRule
from mlp.data_providers import MNISTDataProvider
from mlp.optimisers import Optimiser
plt.style.use(‘ggplot’)
def train_model_and_plot_stats(
model, error, learning_rule, train_data, valid_data, num_epochs, stats_interval, early_stop = False):
# As well as monitoring the error over training also monitor classification
# accuracy i.e. proportion of most-probable predicted classes being equal to targets
data_monitors={‘acc’: lambda y, t: (y.argmax(-1) == t.argmax(-1)).mean()}
# Use the created objects to initialise a new Optimiser instance.
optimiser = Optimiser(
model, error, learning_rule, train_data, valid_data, data_monitors,[],False, early_stop)
# Run the optimiser for 5 epochs (full passes through the training set)
# printing statistics every epoch.
stats, keys, run_time = optimiser.train(num_epochs=num_epochs, stats_interval=stats_interval)
# Plot the change in the validation and training set error over training.
fig_1 = plt.figure(figsize=(8, 4))
ax_1 = fig_1.add_subplot(111)
for k in [‘error(train)’, ‘error(valid)’]:
ax_1.plot(np.arange(1, stats.shape[0]) * stats_interval,
stats[1:, keys[k]], label=k)
ax_1.legend(loc=0)
ax_1.set_xlabel(‘Epoch number’)
# Plot the change in the validation and training set accuracy over training.
fig_2 = plt.figure(figsize=(8, 4))
ax_2 = fig_2.add_subplot(111)
for k in [‘acc(train)’, ‘acc(valid)’]:
ax_2.plot(np.arange(1, stats.shape[0]) * stats_interval,
stats[1:, keys[k]], label=k)
ax_2.legend(loc=0)
ax_2.set_xlabel(‘Epoch number’)
return stats, keys, run_time, fig_1, ax_1, fig_2, ax_2
Model with Convolution Layer¶
In [6]:
# Seed a random number generator
seed = 6102016
rng = np.random.RandomState(seed)
# Set up a logger object to print info about the training run to stdout
logger = logging.getLogger()
logger.setLevel(logging.INFO)
logger.handlers = [logging.StreamHandler()]
# Create data provider objects for the MNIST data set
train_data = MNISTDataProvider(‘train’, rng=rng)
valid_data = MNISTDataProvider(‘valid’, rng=rng)
input_dim, output_dim = 784, 10
# Set training run hyperparameters
batch_size = 100 # number of data points in a batch
init_scale = 0.01 # scale for random parameter initialisation
learning_rate = 0.1 # learning rate for gradient descent
num_epochs = 20 # number of training epochs to perform
stats_interval = 2 # epoch interval between recording and printing stats
# Reset random number generator and data provider states on each run
# to ensure reproducibility of results
rng.seed(seed)
train_data.reset()
valid_data.reset()
# Alter data-provider batch size
train_data.batch_size = batch_size
valid_data.batch_size = batch_size
# Create a parameter initialiser which will sample random uniform values
# from [-init_scale, init_scale]
param_init = UniformInit(-init_scale, init_scale, rng=rng)
input_dim_1 = 28
input_dim_2 = 28
kernel_dim_1 = 2
kernel_dim_2 = 2
num_kernel = 1
output_dim_1 = input_dim_1 – kernel_dim_1 + 1
output_dim_2 = input_dim_2 – kernel_dim_2 + 1
# Create affine + softmax model
model = MultipleLayerModel([
layers.ReshapeLayer((1, input_dim_1, input_dim_2)),
ConvolutionalLayer(1, num_kernel, input_dim_1, input_dim_2, kernel_dim_1, kernel_dim_2),
layers.ReshapeLayer((num_kernel * output_dim_1*output_dim_2,)),
AffineLayer(num_kernel * output_dim_1*output_dim_2, output_dim, param_init, param_init),
SoftmaxLayer()
])
# Initialise a cross entropy error object
error = CrossEntropyError()
# Use a basic gradient descent learning rule
learning_rule = GradientDescentLearningRule(learning_rate=learning_rate)
In [7]:
res = train_model_and_plot_stats(model, error, learning_rule, train_data, valid_data, num_epochs, stats_interval)
Epoch 0:
error(train)=2.30e+00, acc(train)=9.01e-02, error(valid)=2.30e+00, acc(valid)=9.15e-02, params_penalty=0.00e+00
Epoch 2: 533.94s to complete
error(train)=3.02e-01, acc(train)=9.13e-01, error(valid)=2.90e-01, acc(valid)=9.17e-01, params_penalty=0.00e+00
Epoch 4: 564.65s to complete
error(train)=2.86e-01, acc(train)=9.19e-01, error(valid)=2.80e-01, acc(valid)=9.24e-01, params_penalty=0.00e+00
Epoch 6: 550.75s to complete
error(train)=2.75e-01, acc(train)=9.24e-01, error(valid)=2.65e-01, acc(valid)=9.26e-01, params_penalty=0.00e+00
Epoch 8: 561.45s to complete
error(train)=2.71e-01, acc(train)=9.24e-01, error(valid)=2.66e-01, acc(valid)=9.26e-01, params_penalty=0.00e+00
Epoch 10: 534.42s to complete
error(train)=2.74e-01, acc(train)=9.24e-01, error(valid)=2.68e-01, acc(valid)=9.27e-01, params_penalty=0.00e+00
Epoch 12: 555.73s to complete
error(train)=2.66e-01, acc(train)=9.25e-01, error(valid)=2.63e-01, acc(valid)=9.28e-01, params_penalty=0.00e+00
Epoch 14: 551.09s to complete
error(train)=2.65e-01, acc(train)=9.27e-01, error(valid)=2.59e-01, acc(valid)=9.30e-01, params_penalty=0.00e+00
Epoch 16: 570.47s to complete
error(train)=2.61e-01, acc(train)=9.26e-01, error(valid)=2.60e-01, acc(valid)=9.30e-01, params_penalty=0.00e+00
Epoch 18: 571.44s to complete
error(train)=2.71e-01, acc(train)=9.23e-01, error(valid)=2.71e-01, acc(valid)=9.24e-01, params_penalty=0.00e+00
Epoch 20: 553.71s to complete
error(train)=2.60e-01, acc(train)=9.28e-01, error(valid)=2.61e-01, acc(valid)=9.28e-01, params_penalty=0.00e+00


Model without Convolution Layer¶
In [8]:
rng.seed(seed)
train_data.reset()
valid_data.reset()
input_dim, output_dim = 784, 10
# Alter data-provider batch size
train_data.batch_size = batch_size
valid_data.batch_size = batch_size
# Create a parameter initialiser which will sample random uniform values
# from [-init_scale, init_scale]
param_init = UniformInit(-init_scale, init_scale, rng=rng)
input_dim_1 = 28
input_dim_2 = 28
kernel_dim_1 = 2
kernel_dim_2 = 2
num_kernel = 1
output_dim_1 = input_dim_1 – kernel_dim_1 + 1
output_dim_2 = input_dim_2 – kernel_dim_2 + 1
# Create affine + softmax model
model = MultipleLayerModel([
AffineLayer(input_dim, output_dim, param_init, param_init),
SoftmaxLayer()
])
# Initialise a cross entropy error object
error = CrossEntropyError()
# Use a basic gradient descent learning rule
learning_rule = GradientDescentLearningRule(learning_rate=learning_rate)
withOutConvRes = train_model_and_plot_stats(model, error, learning_rule, train_data, valid_data, num_epochs, stats_interval)
Epoch 0:
error(train)=2.31e+00, acc(train)=1.04e-01, error(valid)=2.31e+00, acc(valid)=9.90e-02, params_penalty=0.00e+00
Epoch 2: 0.23s to complete
error(train)=3.53e-01, acc(train)=9.02e-01, error(valid)=3.25e-01, acc(valid)=9.12e-01, params_penalty=0.00e+00
Epoch 4: 0.22s to complete
error(train)=3.19e-01, acc(train)=9.11e-01, error(valid)=2.98e-01, acc(valid)=9.16e-01, params_penalty=0.00e+00
Epoch 6: 0.23s to complete
error(train)=3.06e-01, acc(train)=9.15e-01, error(valid)=2.88e-01, acc(valid)=9.20e-01, params_penalty=0.00e+00
Epoch 8: 0.25s to complete
error(train)=2.94e-01, acc(train)=9.18e-01, error(valid)=2.79e-01, acc(valid)=9.22e-01, params_penalty=0.00e+00
Epoch 10: 0.37s to complete
error(train)=2.88e-01, acc(train)=9.20e-01, error(valid)=2.76e-01, acc(valid)=9.23e-01, params_penalty=0.00e+00
Epoch 12: 0.23s to complete
error(train)=2.83e-01, acc(train)=9.20e-01, error(valid)=2.73e-01, acc(valid)=9.23e-01, params_penalty=0.00e+00
Epoch 14: 0.22s to complete
error(train)=2.80e-01, acc(train)=9.22e-01, error(valid)=2.71e-01, acc(valid)=9.23e-01, params_penalty=0.00e+00
Epoch 16: 0.22s to complete
error(train)=2.76e-01, acc(train)=9.23e-01, error(valid)=2.69e-01, acc(valid)=9.25e-01, params_penalty=0.00e+00
Epoch 18: 0.23s to complete
error(train)=2.73e-01, acc(train)=9.24e-01, error(valid)=2.66e-01, acc(valid)=9.25e-01, params_penalty=0.00e+00
Epoch 20: 0.24s to complete
error(train)=2.72e-01, acc(train)=9.24e-01, error(valid)=2.67e-01, acc(valid)=9.26e-01, params_penalty=0.00e+00


Final Stats with and without Convolution Layer¶
In [11]:
stats = res[0]
keys = res[1]
stats2 = withOutConvRes[0]
keys2 = withOutConvRes[1]
print(‘|convolution layer | final error(train) | final error(valid) | final acc(train) | final acc(valid) |’)
print(‘|——————|——————–|——————–|——————|——————|’)
print(‘| {0:8s} | {1:.2e} | {2:.2e} | {3:.4f} | {4:.4f} |’.format(“Yes”,
stats[-1, keys[‘error(train)’]], stats[-1, keys[‘error(valid)’]],
stats[-1, keys[‘acc(train)’]], stats[-1, keys[‘acc(valid)’]]))
print(‘| {0:8s} | {1:.2e} | {2:.2e} | {3:.4f} | {4:.4f} |’.format(“No”,
stats2[-1, keys2[‘error(train)’]], stats2[-1, keys2[‘error(valid)’]],
stats2[-1, keys[‘acc(train)’]], stats2[-1, keys2[‘acc(valid)’]]))
|convolution layer | final error(train) | final error(valid) | final acc(train) | final acc(valid) |
|——————|——————–|——————–|——————|——————|
| Yes | 2.60e-01 | 2.61e-01 | 0.9280 | 0.9277 |
| No | 2.72e-01 | 2.67e-01 | 0.9243 | 0.9258 |
In [ ]: