COMP9444
Neural Networks and Deep Learning
Outline
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
2
COMP9444 20T2
Image Processing
3
4b. Image Processing
Image Datasets and Tasks AlexNet
Data Augmentation (7.4)
Weight Initialization (8.4) Batch Normalization (8.7.1) Residual Networks
Textbook, Sections 7.4, 8.4, 8.7.1
MNIST Handwritten Digit Dataset
CIFAR Image Dataset
blackandwhite,resolution28×28 60,000 images
10 classes (0,1,2,3,4,5,6,7,8,9)
color,resolution32×32 50,000 images
10 classes
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
1
Dense Networks Style Transfer
COMP9444 20T2 Image Processing
4
COMP9444 20T2 Image Processing 5
ImageNet LSVRC Dataset
Image Processing Tasks
color,resolution227×227 1.2 million images
1000 classes
generating images generating art
image captioning
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
6
COMP9444 20T2
Image Processing 7
Object Detection
LeNet trained on MNIST
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
image classification object detection
object segmentation style transfer
The 5 × 5 window of the first convolution layer extracts from the original 32 × 32 image a 28 × 28 array of features. Subsampling then halves this size to 14 × 14. The second Convolution layer uses another 5 × 5 window to extract a 10 × 10 array of features, which the second subsampling layer reduces to 5 × 5. These activations then pass through two fully connected layers into the 10 output units corresponding to the digits ’0’ to ’9’.
COMP9444 20T2 Image Processing
8
COMP9444 20T2
Image Processing 9
ImageNet Architectures
AlexNet Architecture
COMP9444 20T2
Image Processing
10
COMP9444 20T2 Image Processing 11
AlexNet, 8 layers (2012)
VGG, 19 layers (2014)
GoogleNet, 22 layers (2014) ResNets, 152 layers (2015)
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
AlexNet Details
Enhancements
650K neurons
630M connections
60M parameters
more parameters that images → danger of overfitting
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
5 convolutional layers + 3 fully connected layers
max pooling with overlapping stride
softmax with 1000 classes
2 parallel GPUs which interact only at certain layers
Rectified Linear Units (ReLUs)
overlappingpooling(width=3,stride=2)
stochastic gradient descent with momentum and weight decay data augmentation to reduce overfitting
50% dropout in the fully connected layers
COMP9444 20T2 Image Processing
12
COMP9444 20T2
Image Processing 13
Data Augmentation (7.4)
Convolution Kernels
patches of size 224 × 224 are randomly cropped from the original images
images can be reflected horizontally
also include changes in intensity of RGB channels
at test time, average the predictions on 10 different crops of each test image
filters on GPU-1 (upper) are color agnostic filters on GPU-2 (lower) are color specific these resemble Gabor filters
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
14
COMP9444 20T2 Image Processing 15
Dealing with Deep Networks
Statistics Example: Coin Tossing
> 10 layers
◮ weight initialization ◮ batch nomalization
Mean μ
Variance
Standard Deviation σ
= 1 ( 0 + 1 ) = 0.5 2
> 30 layers
◮ skip connections
= √Variance = 0.5
Example: Toss a coin 100 times, and count the number of Heads
> 100 layers
◮ identity skip connections
Mean μ
Variance
Standard Deviation σ
=100∗0.5 =50 = 100 ∗ 0.25 = 25 = √Variance = 5
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
Example: Toss a coin once, and count the number of Heads
= 1 (0 − 0.5)2 + (1 − 0.5)2 ) = 0.25 2
Example: Toss a coin 10000 times, and count the number of Heads μ=5000, σ =√2500 =50
COMP9444 20T2 Image Processing
16
COMP9444 20T2 Image Processing 17
Statistics
Weight Initialization (8.4)
The mean and variance of a set of n samples x1,…,xn are given by 1n
Consider one layer (i) of a deep neural network with weights w(i) jk
If wk, xk
n
k=1
Var[y] = n Var[w]Var[x]
Var[x(i+1)] ≃ G0 niVar[w(i)]Var[x(i)]
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
18
COMP9444 20T2 Image Processing 19
Mean[x]= n ∑xk k=1
{x(i+1)}1≤ j≤n j
i+1
ki
at the next layer, where g() is the transfer function and
Weight Initialization
Weight Initialization
1n
∑ (xk − Mean[x])2 =
1n ∑ xk2
− Mean[x] 2
(i+1) xj
(i) ni (i) (i) =g(sumj )=g ∑wjk xk
Var[x] =
are independent and y = ∑ wk xk then
k=1 Var[sum(i) ] = niVar[w(i)]Var[x(i)]
n k=1
n k=1
Then
If the nework has D layers, with input x = x(1) and output z = x(D+1), then
In order to have healthy forward and backward propagation, each term in the product must be approximately equal to 1. Any deviation from this could cause the activations to either vanish or saturate, and the differentials to either decay or explode exponentially.
D in Var[z] ≃ ∏G0 ni
(i)
] Var[x]
Var[w
When we apply gradient descent through backpropagation, the differen-
i=1 tials will follow a similar pattern:
D in Var[z] ≃ ∏G0 ni Var[w
(i)
] Var[x]
∂ D out (i) ∂ Var[∂x] ≃ ∏G1 ni Var[w ] Var[∂z]
i=1
∂ D out (i) ∂
i=1
In this equation, nout is the average number of outgoing connections for
Var[∂x] ≃ ∏G1 ni Var[w ] Var[∂z] i=1
i
each node at layer i, and G1 is meant to estimate the average value of the
We therefore choose the initial weights {w(i)} in each layer (i) such that jk
derivative of the transfer function.
G noutVar[w(i)] = 1 1i
For Rectified Linear Units, we can assume G0 = G1 = 1 2
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
connecting the activations {x(i)}1≤k≤n at the previous layer to
Where G0 is a constant whose value is estimated to take account of the transfer function.
If some layers are not fully connected, we replace ni with the average number nin of incoming connections to each node at layer i + 1.
i
COMP9444 20T2
Image Processing 20
COMP9444 20T2 Image Processing 21
Weight Initialization
Batch Normalization (8.7.1)
22-layer ReLU network (left),
Var[w] = 2 converges faster than Var[w] = 1
kkkk
βk ,γk are additional parameters, for each node, which are trained by
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing 22
COMP9444 20T2 Image Processing 23
Going Deeper
Residual Networks
nn
backpropagation along with the other parameters (weights) in the network.
30-layer ReLU network (right),
Var[w] = 2 is successful while Var[w] = 1 fails to learn at all
After training is complete, Mean[x(i)] and Var[x(i)] are either pre-computed kk
nn
on the entire training set, or updated using running averages.
COMP9444 ⃝c Alan Blair, 2017-20
If we simply stack additional layers, it can lead to higher training error as well as higher test error.
Idea: Take any two consecutive stacked layers in a deep network and add a “skip” connection which bipasses these layers and is added to their output.
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
We can normalize the activations x(i) of node k in layer (i) relative to the k
mean and variance of those activations, calculated over a mini-batch of
training items:
x(i) − Mean[x(i)] kk
(i) (i)
(i)
xˆ k =
V a r [ x ( i ) ] k
These activations can then be shifted and re-scaled to y(i) = β(i) + γ(i)xˆ(i)
COMP9444 20T2 Image Processing 24
COMP9444 20T2
Image Processing 25
Residual Networks
Dense Networks
the preceding layers attempt to do the “whole” job, making x as close as possible to the target output of the entire network
F(x) is a residual component which corrects the errors from previous layers, or provides additional details which the previous layers were not powerful enough to compute
Recently, good results have been achieved using networks with densely connected blocks, within which each layer is connected by shortcut connections to all the preceding layers.
with skip connections, both training and test error drop as you add more layers
with more than 100 layers, need to apply ReLU before adding the residual instead of afterwards. This is called an identity skip connection.
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Image Processing
26
COMP9444 20T2
Image Processing 27
Texture Synthesis
Neural Texture Synthesis
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
1. pretrain CNN on ImageNet (VGG-19)
2. pass input texture through CNN; compute feature map F l for ith filter
at spatial location k in layer (depth) l
3. compute the Gram matrix for each pair of features
Gl = FlFl ij ∑ikjk
4. feed (initially random) image into CNN
5. compute L2 distance between Gram matrices of original and new image
6. backprop to get gradient on image pixels
7. update image and go to step 5.
k
ik
COMP9444 20T2 Image Processing 28
COMP9444 20T2
Image Processing
29
Neural Texture Synthesis
Neural Style Transfer
We can introduce a scaling factor wl for each layer l in the network, and define the Cost function as
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
COMP9444 20T2
Image Processing
30
COMP9444 20T2
Image Processing
31
Neural Style Transfer
Neural Style Transfer
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
1Lwl ll2 Estyle = 4 ∑ N2M2 ∑(Gij −Aij)
l=0 l l i,j
where Nl, Ml are the number of filters, and size of feature maps, in layer l, and Gilj, Ailj are the Gram matrices for the original and synthetic image.
content
+ style →
new image
⃝c Alan Blair, 2017-20
For Neural Style Transfer, we minimize a cost function which is Etotal = α Econtent + β Estyle
where
i,k
content image, synthetic image
xc,x Fl
= = = =
ith filter at position k in layer l
number of filters, and size of feature maps, in layer l weighting factor for layer l
Gram matrices for style image, and synthetic image
ik Nl, Ml
wl Gilj, Ailj
= 2 =
||F (x)−F (x )|| + 4 N2M2 (G −A ) ∑ik ikc ∑∑ijij
αl l2βLwl ll2
l=0 l l i,j
COMP9444 20T2 Image Processing 32
References
“ImageNetClassificationwithDeepConvolutionalNeuralNetworks”, Krizhevsky et al., 2015.
“Understanding the difficulty of training deep feedforward neural networks”, Glorot & Bengio, 2010.
“Batch normalization: Accelerating deep network training by reducing internal covariate shift”, Ioffe & Szegedy, ICML 2015.
“Deep Residual Learning for Image Recognition”, He et al., 2016.
“Densely Connected Convolutional Networks”, Huang et al., 2016.
“A Neural Algorithm of Artistic Style”, Gatys et al., 2015.
COMP9444 ⃝c Alan Blair, 2017-20