Demo 3: Deep Learning Demo
Demo 3: Deep Learning
Demo
David Lee
How to Recognise Image?
https://towardsdatascience.com/the-most-intuitive-and-easiest-guide-for-convolutional-neural-network-3607be47480
Deep Learning: CNN, RNN, GAN, VAEs
• Convolution NN (pp2 of this slide):
https://www.youtube.com/watch?v=JB8T_zN7ZC0 (1 hour)
• Recurrent NN (pp47):
https://www.youtube.com/watch?v=UNmqTiOnRfg (22 mins)
• Generative Adversarial Networks (GANs) (pp90)
• https://www.youtube.com/watch?v=-Upj_VhjTBs (Watch 4 mins)
• Https://www.youtube.com/watch?v=dCKbRCUyop8 (Watch 25 mins)
• Variational Autoencoders (VAEs) (pp90)
• https://www.youtube.com/watch?v=9zKuYvjFFS8 (Watch 15mins)
CNN
https://towardsdatascience.com/a-comprehensive-guide-to-
convolutional-neural-networks-the-eli5-way-3bd2b1164a53
CNN
• The agenda for this field is to enable machines to view the world as
humans do, perceive it in a similar manner and even use the
knowledge for a multitude of tasks such as Image & Video
recognition, Image Analysis & Classification, Media Recreation,
Recommendation Systems, Natural Language Processing, etc. The
advancements in Computer Vision with Deep Learning has been
constructed and perfected with time, primarily over one particular
algorithm — a Convolutional Neural Network.
• Convolution refers to the mathematical combination of two functions to
produce a third function.
• It merges two sets of information. In the case of a CNN, the convolution is performed
on the input data with the use of a filter or kernel (these terms are used
interchangeably) to then produce a feature map.
• A pooling layer is another building block of a CNN.
• Pooling. Its function is to progressively reduce the spatial size of the representation
to reduce the amount of parameters and computation in the network. Pooling layer
operates on each feature map independently.
• The ReLu (Rectified Linear Unit) Layer
• ReLu refers to the Rectifier Unit, the most commonly deployed activation function
for the outputs of the CNN neurons.
• Flattening is converting the data into a 1-dimensional array for
inputting it to the next layer.
• We flatten the output of the convolutional layers to create a single long
feature vector. And it is connected to the final classification model, which is
called a fully-connected layer.
https://towardsdatascience.com/the-most-intuitive-and-
easiest-guide-for-convolutional-neural-network-3607be47480
How convolutional neural networks work,
in depth
• Image matching, Filtering, Pooling (Window Size, Stride), Normalisation (ReLU Layer –
change negative to zero), Deep Stacking (Do it again and again) —- Shrink it!
• Fully connected layer ( Every number becomes a node for neural network, vote for it)
• https://www.youtube.com/watch?v=JB8T_zN7ZC0 (First 15 minutes)
• In deep learning, a convolutional neural network is a class of deep neural networks, most
commonly applied to analyzing visual imagery specifically designed to process pixel data.
They are also known as shift invariant or space invariant artificial neural networks, based
on their shared-weights architecture and translation invariance characteristics.
Convolution Network (ConvNet)
• 1. Features: Matching Pieces of Images
• 2. Filtering: Matching with Same Dimension of Matrix
• 3. Convolution: Filter with Value Lying Between (-1,1)
– Cross Correlation
• 4. Convolution Layer: A set of Filtered Images
• 5. Pooling: Shrink it down to a Smaller Matrix (e.g.
Max Pooling – Taking the max value)
• 6. Pooling Layer: A Stake of Images Becomes a
Smaller Stake of Images
• 7. Normalisation (Rectified Linear ReLUs): Convert
Negative numbers to Zero
• 8. Stacking: Layers Get Stacked
• 9. Deep Stacking: Layering or Layers repeated many
times
• 10. Fully Connected Layer: Standard Neural Network
with all nodes connected
To Handle different translation, scaling,
rotation and weight!
• A pixel, pel, picture
element (noun)
(computer science) is
the smallest discrete
component of an
image or picture on a
CRT screen (usually a
colored dot)
• Here is an image for
8×8 pixel.
1. Features!
2. Filtering
The matching is 1!
This does not match!
• Overall Score
(Sum all)/9=0.55
The matching is only .55!
3. Convolution–A Filtered Version with Max 1
•3 filtered versions of the
original image!
4. Convolution Layer-A SET of Filtered Images
5. Pooling-
Shrink it
down a
little
(Max
Pooling –
Taking the
max value)
6. Pooling Layer-A Stake of Images Becomes a
Smaller Stake of Images
7. Normalisation (Rectified Linear ReLUs)
Convert Negative
Numbers to Zero
8. Stacking- Layers Get Stacked
9. Deep Stacking-Layers repeated many times
10. Fully Connected Layer-Standard Neural
Network
• Thickness of line is the weight
Thickness is the strength of the vote!
4 Pixel Image
Shades • But two versions of horizontal!
• The answer stays between -1 and 1!
White Line = 1, Black Line = -1 (Inverse)
• Rectified Linear
Units (ReLUs)
• Take only Positive
• All complicated
patterns are taken
care of!
• It solved the
problem of two
horizontals!
Convolution Network (ConvNet)
• 1. Features: Matching Pieces of Images
• 2. Filtering: Matching with Same Dimension of Matrix
• 3. Convolution: Filter with Value Lying Between (-1,1)
– Cross Correlation
• 4. Convolution Layer: A set of Filtered Images
• 5. Pooling: Shrink it down to a Smaller Matrix (e.g.
Max Pooling – Taking the max value)
• 6. Pooling Layer: A Stake of Images Becomes a
Smaller Stake of Images
• 7. Normalisation (Rectified Linear ReLUs): Convert
Negative numbers to Zero
• 8. Stacking: Layers Get Stacked
• 9. Deep Stacking: Layering or Layers repeated many
times
• 10. Fully Connected Layer: Standard Neural Network
with all nodes connected
• 11. Optimisation of Cross Validation
We know how to optimize!
Gradient Descent Algo Etc!
Comments
• Cross Validation etc for selection of hyperparameters
• The following are good convolutional networks: Image, Sound, Text..
• Local Spatial Data: Good
• Spatial data are of two types according to the storing technique
• Raster data are composed of grid cells identified by row and column. The whole
geographic area is divided into groups of individual cells, which represent an image.
Satellite images, photographs, scanned images, etc., are examples of raster data.
• Vector data are composed of points, polylines, and polygons. Wells, houses, etc., are
represented by points. Roads, rivers, streams, etc., are represented by polylines.
Villages and towns are represented by polygons.
• Not Useful: If the data is just as useful after swapping any of the columns
with each other!
Types of Spatial Data
• Point Data
• Points in a multidimensional space
• Raster data such as satellite imagery, where each pixel stores a
measured value
• Feature vectors extracted from text
• Region Data
• Objects with location and boundary
• geometric approximations constructed using line segments,
polygons, etc., called vector data.
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiemJPPxrfnAhWVheYKHVk-
ATYQFjAAegQIARAB&url=https%3A%2F%2Fwww.slideshare.net%2FRkrahulkr17%2Fpspatial-data&usg=AOvVaw3Uq7N5A0HEQqRknioNECVP
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwiemJPPxrfnAhWVheYKHVk-ATYQFjAAegQIARAB&url=https%3A%2F%2Fwww.slideshare.net%2FRkrahulkr17%2Fpspatial-data&usg=AOvVaw3Uq7N5A0HEQqRknioNECVP
ConvNet • Useful for sound, text and images!
Not Useful for ConvNet!
Recurrent Neural Networks
• https://www.youtube.com/watch?v=UNmqTiOnRfg
• A recurrent neural network is a class of artificial neural networks
where connections between nodes form a directed graph along a
temporal sequence.
• This allows it to exhibit temporal dynamic behavior. Unlike
feedforward neural networks, RNNs can use their internal state to
process variable length sequences of inputs.
• A recurrent neural network (RNN) is a type of artificial neural
network commonly used in speech recognition and natural
language processing (NLP).
• RNNs are designed to recognize a data’s sequential characteristics
and use patterns to predict the next likely scenario.
• RNN, commonly known as Recurrent Neural Network is a very
popular Deep Learning model which is used to carry out a number
of Deep Learning tasks like Time Series prediction, Image
Captioning, Google auto complete feature, etc. RNN as the name
suggests, uses recursion technique to build models.
Applications: GANs and VAEs
• https://towardsdatascience.com/deep-latent-variable-models-unravel-hidden-structures-
a5df0fd32ae2
• What are Generative Adversarial Networks (GANs) and how do they work?
• https://www.youtube.com/watch?v=-Upj_VhjTBs (Watch 4mins)
• Https://www.youtube.com/watch?v=dCKbRCUyop8
• Variational Autoencoders
• https://www.youtube.com/watch?v=9zKuYvjFFS8 (Watch 15mins)
• https://www.youtube.com/watch?v=fcvYpzHmhvA
https://towardsdatascience.com/deep-latent-variable-models-unravel-hidden-structures-a5df0fd32ae2
GAN
• A generative adversarial network (GAN) is a class of machine learning systems invented by
Ian Goodfellow and his colleagues in 2014.
• Two neural networks contest with each other in a game (in the sense of game theory, often
but not always in the form of a zero-sum game).
• Given a training set, this technique learns to generate new data with the same statistics as
the training set.
• For example, a GAN trained on photographs can generate new photographs that look at least
superficially authentic to human observers, having many realistic characteristics.
• Though originally proposed as a form of generative model for unsupervised learning, GANs
have also proven useful for semi-supervised learning, fully supervised learning, and
reinforcement learning. In a 2016 seminar, Yann LeCun described GANs as “the coolest idea
in machine learning in the last twenty years”
Find D(x) to be closest to real image!
Find min G(z) to minimize the difference with
the fake
D is between 0 and 1, 1 is max (real), 0 is min
(Fake)!
Unsupervised because we only know the data
come from either the generator or discriminator
Applications: GANs and VAEs
• https://towardsdatascience.com/deep-latent-variable-models-unravel-hidden-structures-
a5df0fd32ae2
• What are Generative Adversarial Networks (GANs) and how do they work?
• https://www.youtube.com/watch?v=-Upj_VhjTBs (Watched)
• Https://www.youtube.com/watch?v=dCKbRCUyop8 (Watch 25 mins)
• Variational Autoencoders
• https://www.youtube.com/watch?v=9zKuYvjFFS8 (Watch 15mins))
• https://www.youtube.com/watch?v=fcvYpzHmhvA
https://towardsdatascience.com/deep-latent-variable-models-unravel-hidden-structures-a5df0fd32ae2
Variational Autoencoders
• Variational autoencoders (VAEs) are generative models, like Generative
Adversarial Networks.
• Their association with this group of models derives mainly from the architectural
affinity with the basic autoencoder (the final training objective has an encoder
and a decoder), but their mathematical formulation differs significantly.
• VAEs are directed probabilistic graphical models (DPGM) whose posterior is
approximated by a neural network, forming an autoencoder-like architecture.
• Differently from discriminative modeling that aims to learn a predictor given the
observation, generative modeling tries to simulate how the data are generated,
in order to understand the underlying causal relations.
• Causal relations have indeed the great potential of being generalizable.
Normal Autoencoder
Denising Autoencorder
Variation Autoencorders
Latent Recreation as close to the distribution
Reparameterisation
Disentangled VAE
• Last Five Will Be Fixed When
You Change The Latent Vector!
Reinforced Learning – Run on the
Compressed Representation
Too General •Too Much Details
(Over Fitting)
Difference between Normal and Variation AE
• An autoencoder accepts input, compresses it, and then recreates the original
input. This is an unsupervised technique because all you need is the original data,
without any labels of known, correct results. The two main uses of an
autoencoder are to compress data to two (or three) dimensions so it can be
graphed, and to compress and decompress images or documents, which removes
noise in the data.
• A variational autoencoder assumes that the source data has some sort of
underlying probability distribution (such as Gaussian) and then attempts to find
the parameters of the distribution. Implementing a variational autoencoder is
much more challenging than implementing an autoencoder. The one main use of
a variational autoencoder is to generate new data that’s related to the original
source data. Now exactly what the additional data is good for is hard to say. A
variational autoencoder is a generative system, and serves a similar purpose as a
generative adversarial network (although GANs work quite differently).
The Difference Between an Autoencoder and a Variational Autoencoder
The Difference Between an Autoencoder and a Variational Autoencoder
Here, DKL stands for the Kullback–Leibler divergence.
Kullback–Leibler divergence (also called relative entropy)
is a measure of how one probability distribution is
different from a second, reference probability distribution.
The prior over the latent variables is usually set to be the
centred isotropic multivariate Gaussian
AEs and VAEs
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
Here, DKL stands for the Kullback–Leibler divergence.
Kullback–Leibler divergence (also called relative entropy)
is a measure of how one probability distribution is
different from a second, reference probability distribution.
The prior over the latent variables is usually set to be the
centred isotropic multivariate Gaussian
Add Disentangled VAEs
https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence
RAISR
• To aid the users around the world in saving some bucks and
visualizing spotless pictures, Google has turned to RAISR (Rapid and
Accurate Image Super Resolution), a technique that incorporates
machine learning to produce high-quality versions of low-resolution
images.
• https://analyticsindiamag.com/google-raisr-use-machine-learning/