Machine Learning for Financial Data
January 2021
DEEP LEARNING (PART 1)
Contents
◦ What is Deep Learning
◦ Multilayer Perceptrons (MLP)
◦ Convolutional Neural Networks (CNN)
◦ Recurrent Neural Networks (RNN)
◦ Generative Adversarial Networks (GAN)
◦ Deep Reinforcement Learning
◦ Gradient Descent Optimization
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 2
Deep Learning
What is Deep Learning
Deep Learning
Deep learning is a form of artificial intelligence that uses a type of machine learning called an artificial neural network with multiple hidden layers that learns hierarchical representations of the underlying data in order to make predictions given new data.
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 4
Deep Learning
Deep Learning was widely considered impossible in the 1990s when most researchers had abandoned the idea
▪ In 2006, Geoffrey Hinton et al. published a paper showing how to train a deep neural network capable of recognizing handwritten digits with state-of-the-art precision (>98%)
▪ They branded this technique Deep Learning
▪ The paper revived the interest of the scientific community and before long many new papers demonstrated that DL was not only possible, but capable of mind- blowing achievements that no other ML technique could hope to match
▪ This enthusiasm soon extended to many other areas of ML
▪ Fast-forward 15 years and ML has conquered the industry: it is now at heart of
much of the magic in today’s high-tech products
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 5
Deep Learning
Artificial Neuron: Perceptron
An artificial neural network is an ML algorithm based on a very crude approximation of a biological neural network in a brain
Artificial neural networks work quite differently than real biological neural networks; however, they were inspired
by their biological counterpart
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 7
Deep Learning
A perceptron takes a collection of inputs that can carry different weights and produces outputs to other perceptrons
𝑥1 𝑥2
𝑥3
input
𝑦
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 8
Deep Learning
perceptron
output
The weights for input can be either increased or decreased
𝑥1 𝑥2
𝜔3
𝑥3
input
𝜔1 𝜔2
𝑦
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 9
Deep Learning
perceptron
output
The activation function determines how much output it will produce given the summed input values
𝑥1 𝑥2
𝜔3
Affine function is a linear function
𝜔1 𝜔2
0≤𝑦≤1
𝑥3
input
Activation function can be a non-linear function
perceptron
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
10
Deep Learning
Bias is a weighted input that can be used to control the output value
𝑥1 𝑥2
𝜔3
input can be assumed to be 1
𝜔1 𝜔2
𝑦
𝑥3
input
𝜔0
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 11
Deep Learning
perceptron
output
Increasing the bias will shift the activation function to the left
𝑥1 𝑥2
𝜔3
𝑥3
input
𝜔1 𝜔2
𝑦
𝜔0
perceptron
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 12
Deep Learning
Decreasing the bias will shift the activation function to the right
𝑥1 𝑥2
𝜔3
𝑥3
input
𝜔1 𝜔2
𝑦
𝜔0
perceptron
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 13
Deep Learning
Modifying the weight parameters will change the behaviour
of the perceptron
𝑚
𝑦=𝜎 𝜔𝑖𝑥𝑖+𝜔0
𝜎 𝑖=1
𝑥1
𝜔2 𝜔3
𝜔1
𝑥2 𝑥3
input
𝑦
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
14
Deep Learning
𝜔0
perceptron
output
Artificial Neural Network
An artificial neural network is obtained by connecting the inputs and outputs of perceptrons into a network
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
16
Deep Learning
nodes
nodes
nodes
edges
edges
An artificial neural network is composed of an input layer, an output layer, and one or more hidden layers in between
input hidden output Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 17
Deep Learning
Forward propagation uses the current network parameters to compute a prediction for each training data
Incorrect prediction (or prediction error) will be used to teach the network to change the weights of its connections
Forward Propagation
hidden
The labels of the training dataset are used to determine if the network made a correct prediction or not
input
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 18
Deep Learning
Backward propagation uses the prediction error to update the weights of the connections between neurons
Gradient descent is used to decide whether to increase or decrease the edge weights
Training rate is used to decide how much to increase or decrease
input
Backward Propagation
hidden
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 19
Deep Learning
Forward & backward propagations are repeated for each training data until the weights of the network become stable
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 20
Deep Learning
The network of nodes and edges are typically represented using much more computationally efficient data structures
𝜔4.1 𝜔4.2 𝜔4.3 𝜔5.1 𝜔5.2 𝜔5.3 𝜔6.1 𝜔6.2 𝜔6.3 𝜔7.1 𝜔7.2 𝜔7.3
𝜔1.1 𝜔1.2 𝜔1.3 𝜔2.1 𝜔2.2 𝜔2.3 𝜔3.1 𝜔3.2 𝜔3.3
𝜔8.1 𝜔8.2 𝜔8.3 𝜔8.4 𝜔9.1 𝜔9.2 𝜔9.3 𝜔9.4
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 21
Deep Learning
Deep Neural Network
Deep neural networks have more hidden layers allowing them to model progressively more complex functions
multiple hidden layers
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 23
Deep Learning
A face recognition deep neural network is trained by feeding to the input layer a set of labelled images of human faces
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 24
Deep Learning
The first hidden layers would learn to detect geometric primitives, e.g. horizontal/vertical/diagonal lines
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 25
Deep Learning
The middle hidden layers would learn to detect more complex facial features (e.g. eyes, noses, mouths)
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 26
Deep Learning
The final hidden layers would learn to detect the general pattern for entire faces
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 27
Deep Learning
The output layer would learn to detect the most abstract representation of a person (e.g. the name of the person)
Alice Bob Ivy Kim
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 28
Deep Learning
Simple Neural Network Deep Neural Network
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 29
Deep Learning
More Abstract Representation
Tensor processing units (TPUs) are being developed to further accelerate the performance of deep learning
▪ Deep learning’s popularity is due to its accuracy
▪ It has achieved higher accuracy levels than other algorithms have ever before for complex data problems such as natural language processing (NLP)
▪ Requires, and actually capitalize on, vast amount of data to achieve an optimal solution
▪ Also requires considerable computing power to be able to process such large amount of data without taking weeks or more to be trained
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 30
Deep Learning
Deep Learning Architectures
Deep Learning Architectures
▪ Multilayer Perceptron (MLP)
▪ the standard network architecture used in most basic neural network applications
▪ Convolutional Neural Networks (CNN)
▪ a network architecture that works well for images, audios, and videos
▪ Recurrent Neural Networks (RNN)
▪ a network architecture that works well for processing sequences of data over time
▪ Generative Adversarial Networks (GAN)
▪ a technique where we place two opposing neural networks in competition with one another in order
to improve each other’s performance
▪ Deep Reinforcement Learning (RL)
▪ a technique for providing reward signals when multiple steps are necessary to achieve a goal
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 32
Deep Learning
Multilayer Perceptron (MLP)
© 2019 Daniel K.C. Chan. All rights reserved. 33
Each perceptron in the preceding layer can be connected to every perceptron in the subsequent layer
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 34
Deep Learning
Perceptrons in any preceding layer are only ever connected to the perceptrons in a subsequent layer
there is no cycle or loop in the connections of the graph of perceptrons
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 35
Deep Learning
Architecture parameters include inputs, outputs, number of layers, perceptrons per layer & the activation functions
perceptrons per layer
inputs
outputs
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 36
Deep Learning
layers
𝜑
activation functions
Activation Functions
non-linearity is what allows deep neural networks to model complex functions
active region
Linear
Logistic (sigmoid)
Hyperbolic Tangent (tanh)
Rectified Linear Step
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
37
Deep Learning
Unit (ReLU)
(binary)
Interpretation of the Activation Functions
Activation Function
Mathematical Representation
Value Range
Remark
Linear
𝜎𝑥 =𝑐𝑥
-∞ to +∞
Not possible to use in backward propagation as the derivative of the function is a constant and has no relation to the input x. All layers of the neural network collapse into one – becoming simply a linear regression model.
Logistic (sigmoid)
𝜎𝑥=1 1+𝑒−𝑥
0 to 1
Normalizes an input real value. Good for classifier. For x above 2 or below -2, tends to bring the prediction value y to the edge of the curve, very close to 1 or 0. This enables clear predictions. Vanishing gradient problem for high/low x values.
Hyperbolic Tangent (tanh)
𝜎𝑥=2−1 1+𝑒−2𝑥
-1 to +1
Zero centered—making it easier to model inputs that have strongly negative, neutral, and strongly positive values. Gradient strength stronger than sigmoid providing more optimized solution; otherwise, like the Sigmoid function.
ReLU (Rectified Linear Unit)
𝜎 𝑥 = ቊ0 𝑖𝑓 𝑥 < 0 𝑥 𝑖𝑓 𝑥 ≥ 0
0 to +∞
Making the activation sparse and efficient. Converge faster than other functions (i.e. computationally efficient) speeding up network training. No vanishing gradient problem. Use softmax for classification. Use linear function for regression.
Step
𝜎 𝑥 = ቊ−1 𝑖𝑓 𝑥 < 0 1 𝑖𝑓 𝑥 ≥ 0
-1 to +1
A binary step function is a threshold-based activation function. If the input value is above or below a certain threshold, the perceptron is activated and sends exactly the same signal to the next layer.
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 38
Deep Learning
Each added perceptron increases the network complexity and therefore the required processing power
width
The increase in complexity is not linear to the number of perceptrons added
Increase in width
and depth leads
to an explosion
in complexity
and training time
for large neural
networks
depth
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
39
Deep Learning
Convolutional Neural Networks (CNN)
© 2019 Daniel K.C. Chan. All rights reserved. 40
A CNN is a type of deep neural network architecture
designed for specific tasks like image classification
multiple times
Pooling Layer
Down sampling layers to further reduce the number of neurons necessary in subsequent layers
Fully Connected Layer
One or more fully connected layers to connect the pooling layer to the output layer
Input
Input is typically a multi- dimensional array of neurons corresponding to the pixels of an image
Output
Output is typically a 1D array of output neurons – one neuron for each category of image being classified
Convolution Layer
A combination of sparsely connected convolution layers that are hidden
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
41
Deep Learning
Convolution is a technique to extract visual features from an image in small chunks
1
1
1
0
0
0
1
1
1
0
0
0
1
1
1
0
0
1
1
0
0
1
1
0
0
1
x1
1
x0
1
x1
0
0
0
x0
1
x1
1
x0
1
0
0
x1
0
x0
1
x1
1
1
0
0
1
1
0
0
1
1
0
0
1
0
1
0
1
0
1
0
1
4
Input layer
Input image is represented as a matrix of neurons
A filter (or kernel) extracts feature over a region in the image defined by its dimensions (as a bounding box)
Conceptually, the filter will move across the image and perform mathematical operations on individual regions of the image.
Convolved layer
Each neuron in a convolution layer is responsible for a small cluster of neurons in the preceding layer
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
42
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
x1
1
x0
1
x1
0
0
0
x0
1
x1
1
x0
1
0
0
x1
0
x0
1
x1
1
1
0
0
1
1
0
0
1
1
0
0
4
111101 011010=4 001101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 43
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
x1
1
x0
0
x1
0
0
1
x0
1
x1
1
x0
0
0
0
x1
1
x0
1
x1
1
0
0
1
1
0
0
1
1
0
0
4
3
110101 111010=3 011101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 44
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
x1
0
x0
0
x1
0
1
1
x0
1
x1
0
x0
0
0
1
x1
1
x0
1
x1
0
0
1
1
0
0
1
1
0
0
4
3
4
100101 110010=4 111101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 45
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
x1
1
x0
1
x1
1
0
0
x0
0
x1
1
x0
1
1
0
x1
0
x0
1
x1
1
0
0
1
1
0
0
4
3
4
2
011101 001010=2 001101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 46
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
1
x1
1
x0
1
x1
0
0
0
x0
1
x1
1
x0
1
0
0
x1
1
x0
1
x1
0
0
1
1
0
0
4
3
4
2
4
111101 011010=4 011101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 47
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
1
1
x1
1
x0
0
x1
0
0
1
x0
1
x1
1
x0
0
0
1
x1
1
x0
0
x1
0
1
1
0
0
4
3
4
2
4
3
110101 111010=3 110101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 48
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
1
1
1
0
0
x1
0
x0
1
x1
1
1
0
x0
0
x1
1
x0
1
0
0
x1
1
x0
1
x1
0
0
4
3
4
2
4
3
2
001101 001010=2 011101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 49
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
1
1
1
0
0
0
x1
1
x0
1
x1
1
0
0
x0
1
x1
1
x0
0
0
1
x1
1
x0
0
x1
0
4
3
4
2
4
3
2
3
011101 011010=3 110101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 50
Deep Learning
Convolution feature is obtained by multiplying image values bounded by the filter by the convolution matrix
1
1
1
0
0
0
1
1
1
0
0
0
1
x1
1
x0
1
x1
0
0
1
x0
1
x1
0
x0
0
1
1
x1
0
x0
0
x1
4
3
4
2
4
3
2
3
4
111101 110010=4 100101
Convolved feature
Convolution over the input image
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 51
Deep Learning
Filters mathematically modify the input of a convolution to help detect certain types of features in the image
0 0 0 0 1 0 0 0 0
Identity
1 1 2 1 16 2 4 2 1 2 1
Blur
0 −1 0 −1 5 −1 0 −1 0
Sharpen
−1 −1 −1 −1 8 −1 −1 −1 −1
Edge
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
52
Deep Learning
The edges of an image can be padded with 0-valued pixels to fully scan the original image and preserve its dimensions
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 53
Deep Learning
stride length = 1
▪ In practice, we don't explicitly define the filters that our convolutional layer will use
▪ We instead parameterize the filters and let the network learn the best filters to use during training
Pooling reduces the number of neurons in the previous convolution layer while retaining the most important info
1
max
1
max
2
4
5
max
6
max
7
8
3
2
1
0
1
2
3
4
6
2 x 2 max pool
max(1, 1, 5, 6) = 6
Pooled feature
Convolved feature
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 54
Deep Learning
Pooling reduces the number of neurons in the previous convolution layer while retaining the most important info
1
1
2
max
4
max
5
6
7
max
8
max
3
2
1
0
1
2
3
4
6
8
2 x 2 max pool
max(2, 4, 7, 8) = 8
Pooled feature
Convolved feature
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 55
Deep Learning
Pooling reduces the number of neurons in the previous convolution layer while retaining the most important info
1
1
2
4
5
6
7
8
3
max
2
max
1
0
1
max
2
max
3
4
6
8
3
2 x 2 max pool
max(3, 2, 1, 2) = 3
Pooled feature
Convolved feature
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 56
Deep Learning
Pooling reduces the number of neurons in the previous convolution layer while retaining the most important info
1
1
2
4
5
6
7
8
3
2
1
max
0
max
1
2
3
max
4
max
6
8
3
4
2 x 2 max pool
max(1, 0, 3, 4) = 4
Pooled feature
Convolved feature
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 57
Deep Learning
CNNs work well for a variety of tasks including image recognition, image processing, image segmentation, video analysis, and natural language processing
input image a convolution layer with multiple filters
with a matrix output for each filter
a pooling layer produces a down sample feature matrix for each convolution filter
repeat the convolution and pooling steps multiple times using previous features as input
A few fully connected layers to classify the image and produces the classification prediction
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
58
Deep Learning
Recurrent Neural Networks (RNN)
© 2019 Daniel K.C. Chan. All rights reserved. 59
In a feed-forward neural network, input x flows through one or more hidden layers of neurons h, to the output y
𝑥h𝑦
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 60
Deep Learning
Unlike feed-forward neural networks, RNN's contain feedback loops
𝑦
Can operate effectively on h sequences of data with
Output layer
One or more hidden layers
variable input length
Input layer
𝑥
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
61
Deep Learning
Observing the RNN over time, the hidden node h1 uses the input x1 to produce the output y1
𝑦 𝑦1
h unfold h1
𝑥 𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 62
Deep Learning
time
Observing the RNN over time, the hidden node h1 uses the input x1 to produce the output y1
𝑦 𝑦1
h unfold h1
𝑥 𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 63
Deep Learning
time
Observing the RNN over time, the hidden node h1 uses the input x1 to produce the output y1
𝑦 𝑦1
h unfold h1
𝑥 𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 64
Deep Learning
time
Observing the RNN over time, the hidden node h1 uses the input x1 to produce the output y1
𝑦 𝑦1
h unfold h1
𝑥 𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 65
Deep Learning
time
RNN uses knowledge of its previous state as an input for its current prediction giving the network a short-term memory
𝑦 𝑦1 𝑦2
h unfold h1 h2
𝑥 𝑥1 𝑥2
time
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 66
Deep Learning
RNN uses knowledge of its previous state as an input for its current prediction giving the network a short-term memory
𝑦 𝑦1 𝑦2
h unfold h1 h2
𝑥 𝑥1 𝑥2
time
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 67
Deep Learning
RNN uses knowledge of its previous state as an input for its current prediction giving the network a short-term memory
𝑦 𝑦1 𝑦2
h unfold h1 h2
𝑥 𝑥1 𝑥2
time
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 68
Deep Learning
RNN uses knowledge of its previous state as an input for its current prediction giving the network a short-term memory
𝑦 𝑦1 𝑦2
h unfold h1 h2
𝑥 𝑥1 𝑥2
time
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 69
Deep Learning
RNN uses knowledge of its previous state as an input for its current prediction giving the network a short-term memory
𝑦 𝑦1 𝑦2
h unfold h1 h2
𝑥 𝑥1 𝑥2
time
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 70
Deep Learning
The process of using previous state in current prediction can be repeated an arbitrary number of times
𝑦
𝑦1 𝑦2 𝑦3
h1 h2 h3
𝑥1 𝑥2 𝑥3
time
Very effective for working with sequences of data that occur over time
h
unfold
𝑥
Examples:
• time-series data
• sequence of
characters
• sequence of
words
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
71
Deep Learning
RNN Applications
▪ Time-series data
▪ Changes in stock prices
▪ Natural language processing ▪ Speech recognition
▪ Language translation
▪ Conversion modelling
▪ Image captioning ▪ Visual Q&A
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 72
Deep Learning
Predicting the next letter a person is likely to type, the letter just typed and all previous typed letters are important
?
𝑦1
h1
𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. "h" 73
the network predicts the next letter
the user types the letter "h"
Deep Learning
Predicting the next letter a person is likely to type, the letter just typed and all previous typed letters are important
𝑦𝑦𝑦𝑦 1234
h1 h2 h3 h4
𝑥1 𝑥2 𝑥3 𝑥4
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 74
Deep Learning
Predicting the next letter a person is likely to type, the letter just typed and all previous typed letters are important
"i"
𝑦1
h1
𝑥1
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. "h" 75
"i" is predicted based on previous training examples that included the word "hi"
the network predicts the next letter
the user types the letter "h"
Deep Learning
Generative Adversarial Networks (GAN)
© 2019 Daniel K.C. Chan. All rights reserved. 83
Which photo is fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 84
Deep Learning
Which photo is fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 86
Deep Learning
Which photo is fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 88
Deep Learning
Is this fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 90
Deep Learning
Is this fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 91
Deep Learning
Is this fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 92
Deep Learning
Is this fake?
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 93
Deep Learning
Is this fake?
https://www.youtube.com/watch?v=9NSVdQfD0K8&feature=youtu.be
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 94
Deep Learning
The Generative Adversarial Network (GAN) is a combination of two deep learning neural networks
trying to produce synthetic data indistinguishable from real data
trying to detect fake data
Generator Network
Discriminator Network
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 95
Deep Learning
The two networks are adversaries in the sense that they are both competing to beat one another
Generative Adversarial Network (GAN) can be deployed for image generation and image enhancement
deconvolutional neural network
Image Generator
produces
AI news anchor
fake image
human news anchor
real image
detects
convolutional neural network
Image Detector
produces
detects
Real-world Images
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
96
Deep Learning
The generator & discriminator networks can be trained jointly as a max-min game
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 97
Deep Learning
GAN Applications: Image-to-Image Translation
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 98
Deep Learning
GAN Applications: Text-to-Image Synthesis
This small bird has a pink breast and crown, and black primaries and secondaries
This magnificent fellow is almost all black with a red crest and while cheek patch
The flower has petals
that are bright pinkish purple with white stigma
This white and yellow flower have thin white petals and a round yellow stamen
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
99
Deep Learning
GAN Applications: Video & Speech Synthesis
https://www.youtube.com/watch?v=o2DDU4g0PRo https://www.youtube.com/watch?v=HJcdVjkqiW8
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 100
Deep Learning
Deep Reinforcement Learning
© 2019 Daniel K.C. Chan. All rights reserved. 101
Reinforcement learning is a general purpose framework for making optimal decisions using experiences
action
Agent
• Policy (strategy)
• Possible actions History of states
• History of rewards
Environment
• States
• Reward (penalty)
reward
state
◦ An agent has the capacity to act
◦ Each action influences the agent's future state
◦ Success is measured by a scalar reward signal
◦ The goal is to select actions to maximise future reward
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 102
Deep Learning
A self-driving taxi operates in an area that can be represented by a 5x5 grid with 4 pick-up/drop-off locations
◦ The State Space (environment) is the set of all possible situations the self-driving taxi (agent) could inhabit
◦ The state should contain useful information the agent needs to make the right action
◦ The layout on the left will be used as the training environment where the agent will transport people to four different locations: R (0,0), G (0,4), Y (4,0), B (4,3)
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
103
Deep Learning
The self-driving taxi is to take a passenger safely from the pick-up to the drop-off point in the minimum time possible
◦ Assumes that the agent is the only vehicle in the environment
◦ The current location of the agent is the position having coordinate (3,1)
◦ Passenger at Y wishes to go to location R
◦ Total number possible states is 500
◦ Passenger at one of the 4 pick-up points
◦ Passenger inside the taxi
◦ Taxi at 25 (5x5) possible locations
◦ 4 possible drop-off points
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
104
Deep Learning
The self-driving taxi has 6 possible actions: moving the taxi in one of the four directions or pick-up/drop-off passenger
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
105
Deep Learning
◦ ◦
The agent encounters one of the 500 states and takes an action
The action space contains 6 possible actions
◦ move south (action number 0)
◦ move north (action number 1)
◦ move east (action number 2)
◦ move west (action number 3)
◦ pick-up (action number 4)
◦ drop-off (action number 5)
The self-driving taxi is reward-motivated and will navigate by trial experiences that come with rewards or penalties
-1
-1
-1
◦ A high positive reward (e.g. +20 points) for a successful drop-off because this behavior is highly desirable
◦ A penalty (e.g. -10 points) if it tries to drop- off a passenger in wrong locations
◦ A slight negative reward (e.g. -1 point) for not making it to the destination after every time-step
◦ Slight negative because it is preferred that the agent to reach late instead of making wrong moves
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
106
Deep Learning
Reinforcement learning will learn a mapping of states to the optimal action to take in that state
◦ The agent explores the environment and takes actions based off rewards defined in the environment
◦ The optimal action for each state is the action that has the highest cumulative long-term reward
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
107
Deep Learning
Deep Reinforcement Learning Applications
▪ Robotics for industrial automation
▪ Business strategy planning
▪ Machine learning and data processing
▪ It helps you to create training systems that provide custom instruction and materials according to the requirement of students
▪ Aircraft control and robot motion control
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 109
Deep Learning
Gradient Descent Optimization
© 2019 Daniel K.C. Chan. All rights reserved. 110
A linear regression problem using a single perceptron with the identity function being the activation function
the best fit line is obtained by minimising a loss function called the mean squared error
input
perceptron
regression line
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 111
Deep Learning
The objective of the gradient descent optimization is to locate the smallest value of the loss function
Mean Square Error (MSE)
the smallest value of MSE
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
112
Deep Learning
Bias
Weight
Starting with some random values for the parameters, the optimization walks down the surface towards the direction of the lowest or best possible loss function value
initial value of MSE
Mean Squared Error (MSE)
the learning rate controls the size of the step taken during the descent, small step size takes longer time to converge, big step size may not converge
converging on the best value of the loss function using the gradient descent algorithm
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
113
Deep Learning
Bias
Weight
Backward propagation allows the weights and biases of the perceptrons to converge to their final values
Forward Propagation Backward Propagation
hidden
𝐸𝑟𝑟𝑜𝑟
= 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑖𝑜𝑛 − 𝐴𝑐𝑡𝑢𝑎𝑙
𝑂𝑝𝑡𝑖𝑚𝑖𝑧𝑎𝑡𝑖𝑜𝑛
= 𝐴𝑑𝑗𝑢𝑠𝑡𝑚𝑒𝑛𝑡 𝑜𝑓 𝑃𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝑢𝑠𝑖𝑛𝑔
an Optimizer
input
output
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 114
Deep Learning
References
References
“Hands-On Machine Learning with Scikit- Learn and TensorFlow”, Aurelien Geron, O'Reilly Media, Inc., 2017
“The Deep Learning with PyTorch Workshop”, Hyatt Saleh, Packt Publishing, 2020
Copyright (c) by Daniel K.C. Chan. All Rights Reserved.
116
Deep Learning
References
▪ "7TypesofNeuralNetworkActivationFunctions:HowtoChoose?"(https://missinglink.ai/guides/neural-network- concepts/7-types-neural-network-activation-functions-right/)
▪ "Convolutionalneuralnetworks,JeremyJordan(https://www.jeremyjordan.me/convolutional-neural-networks/) ▪ “ReinforcementQ-LearningfromScratchinPythonwithOpenAIGym”,SatwikKansal&BrendanMartin
(https://www.learndatasci.com/tutorials/reinforcement-q-learning-scratch-python-openai-gym/)
Copyright (c) by Daniel K.C. Chan. All Rights Reserved. 117
Deep Learning
THANK YOU