CS计算机代考程序代写 python deep learning Java IOS GPU flex Keras AI [06-30213][06-30241][06-25024]

[06-30213][06-30241][06-25024]
Computer Vision and Imaging &
Robot Vision
Dr Hyung Jin Chang Dr Yixing Gao
h.j.chang@bham.ac.uk y.gao.8@bham.ac.uk
School of Computer Science

Previously
• Brief history of the neural network
• Shallow vs deep network
• Training neural network – Convolution layer
– Non-linearity (activation functions)
– Backpropagation
– Pooling
– Calculating the number of parameters = Size of the network – Dropout
– Image augmentation
– Representative networks: AlexNet, LeNet, VGG, ResNet
Hyung Jin Chang

DEEP LEARNING II

Today
• TrainingNeuralNetwork(continued) – Data preprocessing
– Weightinitialization
– Batch normalization
– WhichLossandActivationFunctionsshouldIuse?
– End-to-end learning
• Deep Learning Methods
– Generative method (GAN)
• Deep Learning Framework and Practical Tips
– TensorflowvsPytorch
– Representative network architectures – GPUs
– Editors
– Online courses & books & tutorials
• Current Research Topics
Hyung Jin Chang

Today
• TrainingNeuralNetwork(continued) – Data preprocessing
– Weightinitialization
– Batch normalization
– WhichLossandActivationFunctionsshouldIuse? – End-to-end learning
• Deep Learning Methods
– Generative method (GAN)
• Deep Learning Framework and Practical Tips – TensorflowvsPytorch
– Representative network architectures
– GPUs
– Editors
– Online courses & books & tutorials
• Current Research Topics
Hyung Jin Chang

Data Preprocessing
Weight initialization
Batch normalization
Which Loss and Activation Functions should I use? End-to-end learning

Data Preprocessing
Remember: Consider what happens when the input to a neuron is always positive…
What can we say about the gradients on w? Always all positive or all negative 🙁
(this is also why you want zero-mean data!)
zig zag path
hypothetical optimal w vector
allowed gradient update directions
allowed gradient update directions
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
(Assume X [NxD] is data matrix, each example in a row)
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
In practice, you may also see PCA and Whitening of the data
(data has diagonal (covariance matrix is the covariance matrix) identity matrix)
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
Before normalization: classification loss very sensitive to changes in weight matrix; hard to optimize
After normalization: less sensitive to small changes in weights; easier to optimize
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
In practice for Images: center only
e.g. consider CIFAR-10 example with [32,32,3] images
– Subtract the mean image (e.g.AlexNet) (mean image = [32,32,3] array)
– Subtractper-channelmean(e.g.VGGNet) (mean along each channel = 3 numbers)
– Subtractper-channelmeanand Divide by per-channel std (e.g. ResNet)
(mean along each channel = 3 numbers)
Not common to do PCAor whitening
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
Weight initialization
Batch normalization
Which Loss and Activation Functions should I use? End-to-end learning

Weight Initialization
– First idea: Small random numbers
(gaussian with zero mean and 1e-2 standard deviation)
Works ~okay for small networks, but problems with deeper networks.
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Forward pass for a 6-layer net with hidden size 4096
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 – April 22, 2019
Forward pass for a 6-layer net with hidden size 4096
All activations tend to zero for deeper network layers
Q: What do the gradients dL/dW look like?
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 – April 22, 2019
Forward pass for a 6-layer net with hidden size 4096
All activations tend to zero for deeper network layers
Q: What do the gradients dL/dW look like?
A: All zero, no learning =(
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Increase std of initial weights from 0.01 to 0.05
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Increase std of initial weights from 0.01 to 0.05
All activations saturate
Q: What do the gradients look like?
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Activation statistics
Increase std of initial weights from 0.01 to 0.05
All activations saturate
Q: What do the gradients look like?
A: Local gradients all zero, no learning =(
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: std = 1/sqrt(Din)
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: std = 1/sqrt(Din)
“Just right”: Activations are nicely scaled for all layers!
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: “Xavier” Initialization
“Xavier” initialization: std = 1/sqrt(Din)
“Just right”: Activations are nicely scaled for all layers!
For conv layers, Din is kernel_size2 * input_channels
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: “Xavier” Initialization
y = Wx
Derivation:
Var(yi) = Din * Var(xiwi)
“Xavier” initialization: std = 1/sqrt(Din)
“Just right”: Activations are nicely scaled for all layers!
For conv layers, Din is kernel_size2 * input_channels
[Assume x, w are iid] [Assume x, w independant] [Assume x, w are zero-mean]
= Din * (E[x2]E[w2] – E[x]2 E[w]2) h=f(y) iiii
= Din * Var(xi) * Var(wi)
If Var(wi) = 1/Din then Var(yi) = Var(xi)
Glorot and Bengio, “Understanding the difficulty of training deep feedforward neural networks”, AISTAT 2010
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: What about ReLU?
Change from tanh to ReLU
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: What about ReLU?
Change from tanh to ReLU Xavier assumes zero centered activation function
Activations collapse to zero again, no learning =(
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Weight Initialization: Kaiming / MSRA Initialization
ReLU correction: std = sqrt(2 / Din)
“Just right”: Activations are nicely scaled for all layers!
He et al, “Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification”, ICCV 2015
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Proper initialization is an active area of research…
Understanding the difficulty of training deep feedforward neural networks
by Glorot and Bengio, 2010
Exact solutions to the nonlinear dynamics of learning in deep linear neural networks by Saxe et al, 2013
Random walk initialization for training very deep feedforward networks by Sussillo and Abbott, 2014
Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification by He et al., 2015
Data-dependent Initializations of Convolutional Neural Networks by Krähenbühl et al., 2015 All you need is a good init, Mishkin and Matas, 2015
Fixup Initialization: Residual Learning Without Normalization, Zhang et al, 2019
The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks, Frankle and Carbin, 2019
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
Weight initialization
Batch normalization
Which Loss and Activation Functions should I use? End-to-end learning

Batch Normalization
“you want zero-mean unit-variance activations? just make them so.”
consider a batch of activations at some layer. To make each dimension zero-mean unit-variance, apply:
this is a vanilla differentiable function…
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
[Ioffe and Szegedy, 2015]
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization
Input:
N
[Ioffe and Szegedy, 2015]
Per-channel mean, shape is D
Per-channel var, shape is D
Normalized x, Shape is N x D
X
D
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 – April 24, 2018 Lecture 7 – 30
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization
Input:
N
[Ioffe and Szegedy, 2015]
Per-channel mean, shape is D
Per-channel var, shape is D
Normalized x, Shape is N x D
Problem: What if zero-mean, unit variance is too hard of a constraint?
X
D
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 24, 2018 Lecture 7 – 31
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization
Input:
Learnable scale and shift parameters:
Learning = ,
= will recover the
identity function!
Fei-Fei Li & Justin Johnson & Serena Yeung
[Ioffe and Szegedy, 2015]
Per-channel mean, shape is D
Per-channel var, shape is D
Normalized x, Shape is N x D
Output,
Shape is N x D
Lecture 7 – April 24, 2018 Lecture 7 – 32
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization: Test Time
Input:
Learnable scale and shift parameters:
Learning = ,
= will recover the
identity function!
Fei-Fei Li & Justin Johnson & Serena Yeung
Estimates depend on minibatch; can’t do this at test-time!
Per-channel mean, shape is D
Per-channel var, shape is D
Normalized x, Shape is N x D
Output,
Shape is N x D
Lecture 7 – April 24, 2018 Lecture 7 – 33
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization: Test-Time
Input:
Learnable scale and shift parameters:
During testing batchnorm becomes a linear operator! Can be fused with the previous fully-connected or conv layer
Fei-Fei Li & Justin Johnson & Serena Yeung
Per-channel mean, shape is D
Per-channel var, shape is D
(Running) average of values seen during training
(Running) average of values seen during training
Normalized x, Shape is N x D
Output,
Shape is N x D
Lecture 7 – April 24, 2018 Lecture 7 – 34
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization
[Ioffe and Szegedy, 2015]
Usually inserted after Fully Connected or Convolutional layers, and before nonlinearity.
FC
BN
tanh FC
BN
tanh
…
Fei-Fei Li & Justin Johnson & Serena Yeung
Lecture 7 –
April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Batch Normalization
[Ioffe and Szegedy, 2015]
FC
BN
tanh FC
BN
tanh
…
– Makes deep networks much easier to train!
– Improves gradient flow
– Allows higher learning rates, faster convergence
– Networks become more robust to initialization
– Acts as regularization during training
– Zero overhead at test-time: can be fused with conv!
– Behaves differently during training and testing: this
is a very common source of bugs!
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Summary
We looked in detail at:
– Activation Functions (use ReLU)
– Data Preprocessing (images: subtract mean) – Weight Initialization (use Xavier/He init)
– Batch Normalization (use)
Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 7 – April 22, 2019
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Data Preprocessing
Weight initialization
Batch normalization
Which Loss and Activation Functions should I use? End-to-end learning

Regression: Predicting a numerical value
E.g. predicting the price of a product
(especially, if the outputs are somehow Final Activation Function: Linear or Tanh constrained to lie in [−1,1])
Loss Function:

Categorical: Predicting a binary outcome
E.g. predicting a transaction is fraud or not
Final Activation Function: Sigmoid Loss Function:

Categorical: Predicting a single label from multiple classes
E.g. predicting the document’s subject
Final Activation Function: Softmax Loss Function:

Categorical: Predicting multiple labels from multiple classes
E.g. predicting the presence of animals in an image
Final Activation Function: Sigmoid Loss Function:

Summary
Tanh

End-to-end Deep Learning

What is the end-to-end learning?
“There has been some data processing systems or learning systems that require multiple stages of processing and what end-to-end deep learning does is it can take all those multiple stages and replace it usually with just a single neural network.” – Andrew Ng
XY
Audio Features Phonemes Audio
Words
Transcript
Transcript

Slide credit: Andrew Ng

Pros and cons of end-to-end deep learning
• Pros:
• Let the data speak
• Less hand-designing
• Cons:
• May need large amount of data
• Excludes potentially useful hand-designed component

Slide credit: Andrew Ng

Example) Face Recognition
ID
ID

Good choice? Bad Choice?
• Good choice:
• When you have a well-labelled LARGE dataset.
• Hand designed components are not working good enough.
• It works really well and it can really simplify the system and not require you to build so many hand design individual components.
• Bad choice:
• You might need a lot of data before works well. When you have a smaller dataset the traditional pipeline approach actually works often works even better and you need a large dataset to perform the end-to-end approach really shines.
https://www.youtube.com/watch?v=ImUoubi_t7s&t=185s Slide credit: Andrew Ng

Today
• TrainingNeuralNetwork(continued) – Data preprocessing
– Weightinitialization
– Batch normalization
– WhichLossandActivationFunctionsshouldIuse? – End-to-end learning
• Deep Learning Methods
– Generative method (GAN)
• Deep Learning Framework and Practical Tips – TensorflowvsPytorch
– Representative network architectures
– GPUs
– Editors
– Online courses & books & tutorials
• Current Research Keywords
Hyung Jin Chang
02/05/2021

Deep Learning Methods?
Too Many!
Generative Methods
RNN / LSTM
Reinforcement Learning
Vision and Language
Applications?
New Architectures?

Most Effective & Most Popular & Most Interesting !
Generative Methods
Generative Adversarial Network (GAN)

GAN?!
• The first GAN paper was published in 2014 by Ian Goodfellow at NeurIPS.
• Practical & successful architecture proposed in DCGAN (ICLR2016).

Unsupervised Representation Learning with Deep Convolutional Generative
Adversarial Networks (DCGAN) Alec Radford, Luke Metz, Somit Chintala
Many slide materials are from Taehoon Kim’s talk https://github.com/carpedm20/DCGAN-tensorflow

Deep Convolutional Generative Adversarial Networks

Deep Learning
Deep Convolutional Generative Adversarial Networks

ConvNet
Convolutional
Deep
Dog or Cat
Generative Adversarial Networks
Normally, this is all for CNN classification
Image

Generative Model
Deep Convolutional Generative Adversarial Networks
Image
P(z)
Pyz

Adversarial Learning
Deep Convolutional Generative Adversarial Networks

Neural Network
Deep Convolutional Generative Adversarial Networks

DCGAN ∉ Supervised Learning Models to predict tomorrow’s stock prices, predict ratings, and identify photos

DCGAN ∈ Unsupervised Learning Clustering, Anomaly Detection, Generative Models

DCGAN ∈ Generative Model
“What I cannot create, I do not understand.”
– Richard Feynman

Deep Convolutional Generative Adversarial Network

A model to classify dogs and cats

It’s nothing more than just learning about features

Generative Model
for generating dogs and cats pictures?

Generative Model understands dogs and cats better than
classification models.

If the model understands dog and cats well, then classifying them is an easy problem

This is why the Generative Model is important!

f(z) : Learning a function

0.4
z= −0.1 0.2
z= 0.4 0.9
−0.1
z= 0.2- −0.7
0.1
f(z) : Function

f(z) : Function that generates faces

f(z) : Function that generates face images
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
RGB image

f (z) = P (y) :
0
Probabilistic distribution
that generates face images
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
RGB image
255

72 69 80 23 72 69 80 23
72 69 7 80 23
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
celebA

72 69 80 23 72 69 80 23
255
0
Probabilistic distribution of pixels
72 69 7 80
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
23
P (y)

72 69 80 23 72 69 80 23
255
0
72 69 7 80
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
23
P (y)

72
69 7 80
23
72 69 80 23 72 69 80 23
72 69 80 23
34 15 34 57
187 187 187 187
44 83 64 44
44 64 88 43
187 187 187 187
70 24 36 54
74 46 64 87
187 187 187 187
78 74 35 67
44 44 44 97
187 187 187 187
P (y|z)
1
0
255
0
P(z): Gaussian distribution

255
0
P (y|z)

255
0
P (y|z)
DCGAN

Deep Convolutional Generative Adversarial Network

DCGAN

DCGAN Discriminator
Convolution layers

DCGAN
Generator
Deconvolution layers

DCGAN Discriminator
Police
Trying to detect fake money

DCGAN
Generator
Counterfeiter
Trying to make realistic fake money

DCGAN Adversarial Learning
Police Counterfeiter Trying to detect fake money Trying to make realistic fake money

Adversarial Learning
Police Counterfeiter

Adversarial Learning
Police Counterfeiter Train iteratively one-by-one

1. Training on Police →O?X?
→X?O?
Police
The police learns the way of classifying real and fake images

1. Training on Police O
X
Police
The police learns the way of classifying real and fake images

2. Training on Counterfeiter
Police
Counterfeiter
→O!
The counterfeiter learns the way of generating realistic fake images which can fool the police

Police
→O?X? →X?O?
→O!
Counterfeiter

Police is gradually distinguishing between real and fake →O!
Police
Counterfeiter
→X!
→O!

Police
→O! →X!
→O!
Counterfeiter

→O! → X?!
Police
Counterfeiter is gradually generating realistic fakes
→O!
Counterfeiter

Police
→O! → X?!
→O!
Counterfeiter

Police
→O! → O?!
→O!
Counterfeiter

Police
→O! →O?
→O!
Counterfeiter

Police
→O! →O!
→O!
Counterfeiter

Maximise fake detection
Police
→O! →O!
→O!
Counterfeiter
Minimise fake detection
Nash Equilibrium

Discriminator = Police

O
O
Discriminator = Police

O
O
X
X
Discriminator = Police

Discriminator = Police
1
Real (1) or fake (0)
O
0
X
1 1
0 0

Generator = Counterfeiter

1
0
Generator = Counterfeiter Gaussian Distribution
0.6 -0.3 -0.1

1
0
Generator = Counterfeiter Gaussian Distribution
0.6
-0.3 -0.1
X

Generator = Counterfeiter Random numbers with zero mean
Gaussian Distribution
0.6
-0.3 -0.1
X
X

Generator = Counterfeiter Random numbers with zero mean
Gaussian Distribution
0.6
-0.3 -0.1
X
X
1 1O 1
Train the discriminator to classify X as O

Deep Convolutional Generative Adversarial Network

Key insights of the DCGAN Architecture
No pooling
No fully connected layers
Batch normalisation
ReLU and Leaky ReLU
Adam optimizer

Image
Discriminator

Conv 1
Discriminator

Conv 1
Discriminator
1234 5 6 7 8
10 01
3210 2×2kernel 55
1234
2-strided convolution
7 11

Conv 1
Pooling?
Discriminator
1234 5678 3210 1234
1234
5678
3210
10 01
1
234
7 11 2×2kernel 55
2-strided convolution
2×2 max pooling For downsampling
6 8 34

Conv 1
No pooling
Discriminator
10 01
7 11 3210 2×2kernel 55
1234
1234 5 6 7 8
Adaptively learning optimal 4×4 kernel
2-strided convolution
For downsampling

4×4 2 strided convolution
Conv 1
Discriminator

Discriminator
4×4 2 strided convolution
Conv 2

Discriminator
4×4 2 strided convolution
Conv 3

Discriminator
4×4 2 strided convolution
Conv 4
Reshape

Discriminator

Discriminator
BatchNormalization
Normalise previous layer’s outputs to zero-mean and unit variance http://sanghyukchun.github.io/88/

Discriminator
LeakyReLu
LeakyReLU

&
Generator
Image

Matrix multiplication (Fully connected)
FC
Generator

Matrix multiplication (Fully connected)
FC Deconv 1
Generator

Matrix multiplication (Fully connected)
FC Deconv 1
Generator
3×3 -> 5×5

4×4 2 strided deconvolution Matrix multiplication
(Fully connected)
FC Deconv 1
Generator

Generator
4×4 2 strided deconvolution
Deconv 2

Generator
4×4 2 strided deconvolution
Deconv 3

Generator
4×4 2 strided deconvolution
Deconv 4

Generator

Generator
BatchNormalization

ReLU
tanh
Generator
ReLU
tanh

Training Results

MNIST
(Real)

MNIST
(Fake)

CelebA
(Real)

CelebA
(Fake)

Korean face dataset
(Real)

MIC Glasses B/W
Korean face dataset
(Real)

Korean face dataset
(Fake)

Generated bedrooms (LSUN dataset)

DCGAN learns features that are interesting
Random filters Trained filters

Top row: un-modified samples from model
Bottom row: the same samples generated with dropping out “window” filters

Vector arithmetic for visual concepts

A ‘turn’ vector was created from four averaged samples of faces looking left vs looking right.
By adding interpolations along this axis to random samples we were able to reliably transform their pose.

https://github.com/carpedm20/DCGAN-tensorflow http://carpedm20.github.io/faces/

import tensorflow as tf

Inputs

images = tf.placeholder(tf.float32, [32, 84, 84, 3])
O
Inputs

images = tf.placeholder(tf.float32, [32, 84, 84, 3]) z = tf.placeholder(tf.float32, [32, 100], name=’z’)
O
Inputs

Discriminator

h0 = conv2d(images, 64, 4, 4, 2, 2)
h0
Discriminator

Discriminator
h1_logits = batch_norm(conv2d(h0, 64 * 2, 4, 4, 2, 2)) h1 = lrelu(h1_logits)
h1

Discriminator
h2_logits = batch_norm(conv2d(h1, 64 * 4, 4, 4, 2, 2)) h2 = lrelu(h2_logits)
h3_logits = batch_norm(conv2d(h2, 64 * 8, 4, 4, 2, 2)) h3 = lrelu(h3_logits)
h2 h3

h4 = linear(tf.reshape(h3, [32, -1])
Discriminator
h4

Generator

z_ = linear(z, 64*8*4*4)
h0_logits = tf.reshape(z_, [-1, 4, 4, 64 * 8])
z z_
Generator

z_ = linear(z, 64*8*4*4)
h0_logits = tf.reshape(z_, [-1, 4, 4, 64 * 8])
h0 = tf.nn.relu(batch_norm(h0_logits))
z z_ h0
Generator

Generator
h1_logits = tf.nn.relu(deconv2d(h0, [32, 8, 8, 64*4]) h1 = batch_norm(h1_logits))
h2_logits = tf.nn.relu(deconv2d(h1, [32, 8, 8, 64*4]) h2 = batch_norm(h2_logits))
h1
h2

h3 = tf.nn.tanh(deconv2d(h2, [32, 8, 8, 64*4])
Generator

Adversarial Learning

G = generator(z)
Adversarial Learning
X
G

Adversarial Learning
G = generator(z)
D = discriminator(images)
O
D
X
G

Adversarial Learning
G = generator(z)
D = discriminator(images)
D_ = discriminator(G, reuse=True)
O
X
D
D_
X
G

Adversarial Learning
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D, tf.ones_like(D)))
O
X
D
O
d_loss_real
XG

Adversarial Learning
d_loss_real = tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D, tf.ones_like(D)))
d_loss_fake= tf.reduce_mean(
tf.nn.sigmoid_cross_entropy_with_logits(D_, tf.zeros_like(D_)))
O
XX
d_loss_fake
D_
X
G

Adversarial Learning
g_loss = tf.reduce_mean( tf.nn.sigmoid_cross_entropy_with_logits(D_logits_, tf.ones_like(D_)))
X
D_
O
X
g_loss

Adversarial Learning
d_loss = d_loss_real + d_loss_fake

Adversarial Learning
d_loss = d_loss_real + d_loss_fake
d_optim = tf.train.AdamOptimizer(0.0002).minimize(d_loss) g_optim = tf.train.AdamOptimizer(0.0002).minimize(g_loss)

Adversarial Learning
d_loss = d_loss_real + d_loss_fake
d_optim = tf.train.AdamOptimizer(0.0002).minimize(d_loss) g_optim = tf.train.AdamOptimizer(0.0002).minimize(g_loss)
while True:
sess.run(d_optim, { images: batch_images, z: batch_z })
sess.run(g_optim, { z: batch_z })
!”#$%& ‘”()*&+,&$*&+

Today
• TrainingNeuralNetwork(continued) – Data preprocessing
– Weightinitialization
– Batch normalization
– WhichLossandActivationFunctionsshouldIuse?
– End-to-end learning
• Deep Learning Methods
– Generative method (GAN)
• Deep Learning Framework and Practical Tips – TensorflowvsPytorch
– Representative network architectures
– GPUs
– Editors
– Online courses & books & tutorials
• Current Research Keywords
Hyung Jin Chang

Frequently Asked Questions from Students
üWhich deep learning framework do I have to use? üWhich editors to choose?
üWhich architecture should I use?
üDo I have to use GPU?
üAny useful materials?

Frequently Asked Questions from Students
üWhich deep learning framework do I have to use?
üWhich editors to choose? üWhich architecture should I use?
20
üDo I have to use GPU? üAny useful materials?

Deep Learning Frameworks (2019)
(UC Berkeley)
(NYU / Facebook)
(U Montreal)
(Baidu)
(Facebook)
(Facebook)
(Amazon)
(Google)
(Microsoft)
(Google)
+ MatConvNet etc…

TensorFlow Versions
Pre-2.0 (1.13 latest)
– Default static graph
– Optionally dynamic graph
(eager mode).
2.0 (March 2020)
– Default dynamic graph – Optionally static graph
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Static vs Dynamic Graphs
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Keras: High-Level Wrapper
• Keras is a layer on top of TensorFlow, makes common things easy to do
• Used to be third-party, now officially merged into TensorFlow
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

TensorFlow: High-Level Wrappers
– Keras (https://keras.io/)
– tf.keras (https://www.tensorflow.org/api_docs/python/tf/keras)
– tf.estimator (https://www.tensorflow.org/api_docs/python/tf/estimator)
– Sonnet (https://github.com/deepmind/sonnet)
– TFLearn (http://tflearn.org/)
– TensorLayer (http://tensorlayer.readthedocs.io/en/latest/)
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

TensorFlow: Tensorboard
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

TensorFlow: Tensor Processing Units
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

1.4
(Newly added)
Slide credit: Lex Fridman

Suggestions
üFor beginners – Keras
üFor Google Cloud Platform – Tensorflow üFor research – PyTorch
üFor Amazon Web Services (AWS) – MXNET üFor MS Azura – CNTK
üFor Java developers – DL4J
üFor IOS developers – CoreML
üFor Interoperability – ONNX

Frequently Asked Questions from Students
üWhich deep learning framework do I have to use?
üWhich editors to choose?
üWhich architecture should I use?
üDo I have to use GPU? üAny useful materials?
20

Jupyter Notebook
• Jupyter Notebook
(formerly IPython Notebooks) is a web-based interactive computational environment for creating Jupyter notebook documents.
• The Jupyter Notebook has become a popular user interface for Cloud computing
• E.g. Amazon’sSageMaker Notebooks, Google’s Colabo ratory, Microsoft’s Azure Notebook, and Jupyo.

Jupyter Notebook
üJupyter Notebooks really shine
when you are still in the prototyping phase.
üThe code is written in indepedent cells, which are executed individually. This allows the user to test a specific block of code in a project without having to execute the code from the start of the script.
üIncredibly flexible, interactive and many powerful extensions.
üAllow you to run other languages besides Python, like R, SQL, etc.

Comprehensive Beginner’s Guide to Jupyter Notebooks for Data Science & Machine Learning

Python IDE
• PyCharm
• PyCharm is one of the most widely used IDEs for Python programming language.
• PyCharm integrates its tools and libraries such as NumPy and Matplotlib, allowing you work with array viewers and interactive plots.
• Interesting features such as a code editor, errors highlighting, a powerful debugger with a graphical interface, besides of Git integration, SVN, and Mercurial.

Review: Convolutional Neural Networks (CNN)
Slide credit: Taeoh Kim

Plain Networks
Slide credit: Taeoh Kim

ResNet
Slide credit: Taeoh Kim

Slide credit: Taeoh Kim

ResNet Variants
Slide credit: Taeoh Kim

ResNet with Cardinality
Slide credit: Taeoh Kim

DenseNet
Slide credit: Taeoh Kim

Slide credit: Taeoh Kim

Inception / NASNet
Slide credit: Taeoh Kim

CNN Review
Slide credit: Taeoh Kim

CNN Performances
https://github.com/CeLuigi/models- comparison.pytorch
Slide credit: Taeoh Kim

Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

CPU vs GPU in practice
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

CPU vs GPU
Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

Slide credit: Fei-Fei Li & Justin Johnson & Serena Yeung

GPU Resources for MSc @ UoB CS
• 74 new GPU nodes in LG04, each with an NVIDIA RTX 2060

Google Colab (https://colab.research.google.com/)
• Google provides a free cloud service based on Jupyter Notebooks that supports free GPU (Tesla K80) and TPU!
• You can develop deep learning applications using popular libraries such as PyTorch, TensorFlow, Keras, and OpenCV.
• It supports Python 2.7 and 3.6, but not R or Scala yet.
• There is a limit to your session time. üThe session is renewed every 12 hours.

Google Colab (https://colab.research.google.com/)
• You can
ücreate/upload/store/share notebooks,
ümount your Google Drive and use whatever you’ve got stored in there,
üupload notebooks directly from GitHub, üupload Kaggle files,

Google Colab (https://colab.research.google.com/)

Google Colab (https://colab.research.google.com/)
import tensorflow as tf import timeit
# See https://www.tensorflow.org/tutorials/using_gpu#allowing_gpu_memory_growth config = tf.ConfigProto()
config.gpu_options.allow_growth = True
with tf.device(‘/cpu:0’):
random_image_cpu = tf.random_normal((100, 100, 100, 3)) net_cpu = tf.layers.conv2d(random_image_cpu, 32, 7) net_cpu = tf.reduce_sum(net_cpu)
with tf.device(‘/gpu:0’):
random_image_gpu = tf.random_normal((100, 100, 100, 3)) net_gpu = tf.layers.conv2d(random_image_gpu, 32, 7) net_gpu = tf.reduce_sum(net_gpu)
sess = tf.Session(config=config)
# Test execution once to detect errors early. try:
sess.run(tf.global_variables_initializer()) except tf.errors.InvalidArgumentError:
print(
‘\n\nThis error most likely means that this notebook is not ‘ ‘configured to use a GPU. Change this in Notebook Settings via the ‘ ‘command palette (cmd/ctrl-shift-P) or the Edit menu.\n\n’)
raise
def cpu(): sess.run(net_cpu)
def gpu(): sess.run(net_gpu)
# Runs the op several times.
print(‘Time (s) to convolve 32x7x7x3 filter over random 100x100x100x3 images ‘
‘(batch x height x width x channel). Sum of ten runs.’)
print(‘CPU (s):’)
cpu_time = timeit.timeit(‘cpu()’, number=10, setup=”from __main__ import cpu”) print(cpu_time)
print(‘GPU (s):’)
gpu_time = timeit.timeit(‘gpu()’, number=10, setup=”from __main__ import gpu”) print(gpu_time)
print(‘GPU speedup over CPU: {}x’.format(int(cpu_time/gpu_time)))
sess.close()

Frequently Asked Questions from Students
üWhich deep learning framework do I have to use? üWhich editors to choose?
üWhich architecture should I use?
üCan I use GPUs for the project?
üAny useful materials?

Free Online Courses
• Summer School
• Deep Learning & Reinforcement Learning Summer School • (2017) https://goo.gl/4WthXN
• (2018) https://goo.gl/Z7stFe
• Machine Learning Summer School (MPI) https://goo.gl/5hQtA1
• Module
• DeepLearning.ai (Andrew Ng) https://goo.gl/mMr4AW
• CS231n (Stanford) https://goo.gl/WTLZkg • CS224d (Stanford) https://goo.gl/nmY6Ws • Fast.ai (Fast.ai) https://goo.gl/aBkesx
• ReinforcementLearning
• Deep RL bootcamp (UC Berkeley) https://goo.gl/i6CbtR • UCL lectures (DeepMind) https://goo.gl/gF7EoY
• CS294 (UC Berkeley) https://goo.gl/d17a5x

Free Deep Learning Books
1. Deep Learning
by Ian Goodfellow, Yoshua Bengio and Aaron Courville
2. Deep Learning Tutorial
by LISA Lab, University of Montreal
3. Deep Learning: Methods and Applications by Li Deng and Dong Yu
4. First Contact with TensorFlow, get started with Deep Learning Programming by Jordi Torres
5. Neural Networks and Deep Learning by Michael Nielsen
6. A Brief Introduction to Neural Networks by David Kriesel
7. Neural Network Design (2nd edition)
by Martin T. Hagan, Howard B. Demuth, Mark H. Beale and Orlando D. Jess
8. Neural Networks and Learning Machines (3rd edition) by Simon Haykin

Free Machine Learning Books
• “Pattern Recognition and Machine Learning” https://goo.gl/EMbNKm • “The Element of Statistical Learning” https://goo.gl/Y8GqqG
• “Gaussian Process for Machine Learning” https://goo.gl/4LU3Df
• “Dive into Deep Learning” https://goo.gl/Bk5wF5
• “Deep Learning” https://goo.gl/4kVPrm

Top Academic Venues
qTop-tier CV Conferences • CVPR/ICCV/ECCV
• BMVC / WACV
qTop-tier Deep learning & machine learning Conferences • NeurIPS / ICLR / ICML
qTop-tier Robotics Conferences • ICRA / IROS
qTop Journals
• IEEETPAMI/IJCV/JMLR/IEEETRO/IJRR/IEEETIP

Current CV Research Trend Keywords
• Attention
• 3D
• Multi-modal & multi-task
• Active learning and online learning • Graph neural networks
• Vision and language
• Efficient network
• Deep fake
• Ethical AI
… and many more!

Topics covered
• Introduction to computer vision & human vision
• Image formation and colour
• Camera parameters
• Camera calibration
• Image noise
• Image filtering
• Image gradient
• Edge detection
• Matching
• Binary image analysis
• Image texture
• Image segmentation
• Fitting: voting and the Hough transform
• Image warping
• Image Stitching
• Local features: image matching and detection
• Multiple-view Geometry
• Active Imaging (3D imaging technologies)
• Face recognition
• Deep learning for computer vision I
• Deep learning for computer vision II
Hyung Jin Chang

Thank you for your attention!

Related Posts