COMP3308/3608, Lecture 9b
ARTIFICIAL INTELLIGENCE
Deep Learning
Russell and Norvig, ch. 21
Copyright By PowCoder代写 加微信 powcoder
Tutorials on Deep Learning:
1) http://cs.stanford.edu/~quocle/tutorial1.pdf 2) http://cs.stanford.edu/~quocle/tutorial2.pdf 3) http://deeplearning.stanford.edu/tutorial/
, COMP3308/3608 AI, week 9b, 2022
• What is deep learning?
• Autoencoder neural networks
• Convolutional neural networks • Applications
, COMP3308/3608 AI, week 9b, 2022
What is Deep Learning? (high-level definition)
(Deep Learning, MIT Press, 2019)
Part of AI that focuses on creating large NNs that are capable of making accurate data-driven decisions
Particularly suited for applications where the data is complex and where large datasets are available
Who uses it?
• Facebook to analyse text in online conversations
• Google, Baidu and Microsoft for image search and machine
translation
• Almost all smart phones for speech recognition and face detection
• Self-driving cars – for localization, motion planning and steering, as
well as tracking driver state
• Healthcare – for processing medical images (X-ray, CT, MRI)
, COMP3308/3608 AI, week 9b, 2022
Deep Learning an AlphaGo
• AlphaGo – defeated the world Go champions in 2016 and 2017 https://theconversation.com/ai-has-beaten-us-at-go-so-what-next-for-humanity- 55945 ( in 2016 and in 2017)
• AlphaGo’s success was surprising!
• Most people expected that it would take much longer before a computer can compete with top human Go players
• Go is much more difficult for computers than chess – massive search space:
• 1,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00 0,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,000,00 0,000,000,000,000,000,000,000,000,000,000,000
• More states (board configurations) than the number of atoms in the universe! • Compare progress in chess and Go:
• Chess: It took 30 years for chess programs to progress from human to world champion level (from 1967 to 1997)
• Go: Using deep learning it took only 7 years to progress from advanced amateur to world champion (from 2009 to 2016)
• => revolutionary impact of deep learning; big acceleration of performance, also applicable to other fields, not only games
, COMP3308/3608 AI, week 9b, 2022
Deep Learning in the News
• GoogleTranslate http://www.nature.com/news/deep-learning-boosts-google-translate-tool-1.20696
• Self-Driving cars http://spectrum.ieee.org/cars-that-think/transportation/advanced-cars/deep- learning-makes-driverless-cars-better-at-spotting-pedestrians
, COMP3308/3608 AI, week 9b, 2022
Deep Learning in the News (2)
• http://www.timesnow.tv/technology-science/article/deep-learning-google-maps- to-become-more-accurate-through-artificial-intelligence/60610
• https://venturebeat.com/2017/04/07/how-olay-skin-advisor-built-their-deep- learning-algorithms/
• http://www.newyorker.com/magazine/2017/04/03/ai-versus-md
• https://www.techemergence.com/deep-learning-applications-in-medical-
• https://www.wired.com/2014/02/netflix-deep-learning/
• https://www.theguardian.com/technology/commentisfree/2020/sep/11/artificial-
intelligence-robot-writing-gpt-3
, COMP3308/3608 AI, week 9b, 2022
What is Deep Learning? (more specific definitions)
• Deep Learning means different things to different people in AI:
1. The NN has more than 1 hidden layer
2. No need for human-invented and pre-selected features – the NN is able to learn the important features automatically
3. Some deep learning architectures use unlabeled data for pre-training of the NN layers, which is followed by supervised learning
, COMP3308/3608 AI, week 9b, 2022
What is Deep Learning? (2)
• Deep Learning: NNs that learn hierarchical feature representations
• Novel techniques developed in the last 10 years
, COMP3308/3608 AI, week 9b, 2022
Backpropagation NNs – Issues
Training is slow – requires many epochs
The NN is typically fully connected – too many parameters to adjust
The weights are initialized randomly and then adjusted by the gradient descent – is there a better way to do this?
With many hidden layers, the learning becomes less effective
• The vanishing gradient problem – the weight changes for the lower levels are very small; these layers learn slower than the higher hidden layers
Require a large dataset of labeled data – this may not be available or difficult to obtain
May get stuck in a local minimum and not find a good solution
Require feature-engineering to select useful features and represent them appropriately (most ML algorithms require this); can we learn the important features automatically?
, COMP3308/3608 AI, week 9b, 2022
Why do we Need More than One Hidden Layer?
Cybenko’s Theorem: Backpropagation NNs with 1 hidden layer are universal approximators – can learn any function with arbitrary low error. Then why do we need more than 1 hidden layer?
1) This is an existence theorem, i.e. it says that there is a NN with 1 hidden layer that can do this but doesn’t tell us how to find this NN
2) This doesn’t mean that 1 hidden layer is the most effective representation that will result in the fastest learning, easiest implementation or best solution (ability to classify correctly new examples)
, COMP3308/3608 AI, week 9b, 2022
Deep Learning Architectures
Stacked autoencoder networks Convolutional networks Recurrent neural networks
• e.g. Long Short-term Memory (LSTM), Gated Recurrent Unit (GRU)
• Used for sequences, e.g. text processing (sequence of words or characters) – predict the class of a sequence or output another sequence relevant to the input sequence
Restricted Boltzmann machines We will study 1 and 2
, COMP3308/3608 AI, week 9b, 2022
Autoencoder Neural Networks
, COMP3308/3608 AI, week 9b, 2022
Autoencoder NN
• We have a set of input vectors without their class (unlabelled data): x ={x1,x2, x3…}
• Each xi is a n-dim vector representing 1 input vector
• An autoencoder NN:
• Sets the target values to be the same as the input values (yi=xi) and uses the backpropagation algorithm to learn this mapping
• => the number of input and output neurons is the same
• Has 1 hidden layer with a smaller number of neurons than the
input neurons
COMP3308/3608 AI, week 9b, 2022
Autoencoders – History
• Autoencoders were first mentioned by Rumelhart, Hinton and Williams in 1986 in the paper which introduced the backpropagation algorithm: http://www.cs.toronto.edu/~fritz/absps/pdp8.pdf
• They are typically used for dimensionality reduction, image and data compression
• More recently – in deep NN for pre-training of the network (weight initialization)
, COMP3308/3608 AI, week 9b, 2022
Autoencoder NN – Main Idea
• We are interested in the hidden layer, in particular the outputs of the hidden neurons
• hi – the vector at the hidden layer for input vector xi
• The hidden layer can be seen as trying to learn a compressed version
of the input vector
• Compressed because the number of hidden neurons is smaller than the
number of input neurons
• Example – we can use the autoencoder for image compression:
• x are the pixel values of a 10×10 image => xi is a 100-dim vector
• We have 50 hidden neurons – hi is 50- dim vector
• The network learns a compressed representation of the image – hi is a compressed version of xi
• The compressed representation can be
used for different purposes, including as
input to another NN or ML classifier
compressed image h
, COMP3308/3608 AI, week 9b, 2022
original image x
Autoencoders – Traditional Applications
• In addition to image and data compression, autoencoders can be used for encryption
• The weights W1 perform encoding
• The weights W2 perform decoding
• The receiver needs W2 to decode the encrypted input
encrypted input
COMP3308/3608 AI, week 9b, 2022
Original input
Decoded (reconstructed) input
Encoded input
Autoencoders as Initialization Method for Deep NN
• Can be used to pre-train the layers of a deep NN in advance • 1 layer at a time, 1 autoencoder for each layer
• The training of a deep NN will include 3 steps:
1. Pre-training step: Train a sequence of autoencoders, 1 for each
layer (unsupervised)
2. Fine-tuning step 1: Train the last layer using backpropagation
(supervised)
3. Fine-tuning step 2: train the whole network using
backpropagation (supervised)
Example: Let’s use this method to pre-train a deep NN with 2 hidden layers, h1 and h2
COMP3308/3608 AI, week 9b, 2022
Autoencoders as Initialization Method – Example
• Pre-training step: Train a sequence of autoencoders, 1 for each layer (unsupervised) = 2 autoencoders for our example
, COMP3308/3608 AI, week 9b, 2022
How is the Pre-training Done? (1)
Pre-training means finding W1 and W2 for our deep NN
To find W1, we train Autoencoder 1 with weights W1 and W1’ and h1 number of hidden neurons
• Unsupervised using the input vectors x only After the training is completed:
The learned W1 is set in the deep NN as values for the weights between the input and first hidden layer
W1’ is not needed; it is discarded
But we also need to find W2 – the weights between the hidden layer h1 and the hidden layer h2
This will be done using Autoencoder 2, but we need to compute the input for Autoencoder 2
Autoencoder 1
COMP3308/3608 AI, week 9b, 2022
Autoencoder 2
How is the Pre-training Done? (2)
Computing the input for Autoencoder 2:
h1 has formed h1(x), a compressed representation of the input data x, i.e. has discovered and extracted useful structure/pattern (we hope)
We use the learned W1 to compute the values of the neurons in h1 in Autoencoder 1 for all the data (all training examples), i.e. we compute h1(x)
These values will be used as an input to Autoencoder 2
h1(x) can be seen as a different representation of the training data – a transformation applied to the training data
COMP3308/3608 AI, week 9b, 2022
h1h1Autoencoder 1 h1(x)
h2 Autoencoder 2 h2
h (x) h1(x) h2(h1(x)) 1
1 h2(h1(x))
How is the pre-training done? (3)
• To find W2, we train Autoencoder 2 with weights W2 and W2’ and h2 number of hidden neurons
• Unsupervised using the output produced by Autoencoder 1, i.e. using h1(x)
• After the training is completed:
• The learned W2 is set in the deep NN as values for the weights
between the first hidden and second hidden layer
• W2’ is discarded
hh2 h1h1 2
, COMP3308/3608 AI, week 9b, 2022
h1h1Autoencoder 1 h1(x)
h2 Autoencoder 2 h2
h (x) h1(x) h2(h1(x)) 1
1 h2(h1(x))
How is fine tuning step 1 done?
• The next step is:
Fine-tuning step 1: Train the last layer using backpropagation (supervised)
• We need to compute the input for this training, which is the output of h2
• h2 has formed h2(h1(x)), a compressed representation of h1(x)
• We use the learned W2 to compute the values of the neurons in h2 in Autoencoder 2 for all our data
Fine-tuning step 2: train the whole network using backpropagation (supervised) – as usual
hh2 h1h1 2
h (x) h1(x) h2(h1(x)) 1
COMP3308/3608 AI, week 9b, 2022
h1h1Autoencoder 1 h1(x)
1 h2(h1(x))
h2 Autoencoder 2 h2
Stacked Autoencoders
• Using several autoencoders for pre-training in this way is called stacking autoencoders
• Each layer of the network learns an encoding of the layer below
• The network can learn hierarchical features in an unsupervised way
• The network is called a stacked autoencoder
Image from https://www.mql5.com/en/articles/1103 , COMP3308/3608 AI, week 9b, 202182
Other Types of Autoencoders
• Sparse autoencoder – an autoencoder with more hidden neurons than inputs • It doesn’t compress the input but may
still discover interesting structure in data, a different representation that may be useful
• Denoising autoencoder
• A percentage of data is randomly
• This forces the autoencoder to learn
robust features that generalize better
• Similar to another idea – dropout – see
next slides
, COMP3308/3608 AI, week 9b, 2022
Visualizing a Trained Autoencoder
• Consider image processing
• We have trained the autoencoder on 20×20 images and have 100
hidden neurons
• After the training has completed, we would like to visualize what the
autoencoder has learnt (i.e. the function computed by each hidden
neuron hi)
• We can do this as an image – for each hidden neuron, visualize the
input that maximizes the neuron’s activation
, COMP3308/3608 AI, week 9b, 2022
Visualizing a Trained Autoencoder (2)
• Each square shows the image that maximally activated each of the 100 hidden neurons
• Some of the hidden neurons have learned to detect edges at different positions and with different orientations
• These are useful features for object recognition
, COMP3308/3608 AI, week 9b, 2022
Visualizing a Trained Stacked Autoencoder
• A stacked autoencoder can learn a hierarchy of features
• Example: handwritten digit recognition (MNIST dataset, 60 000
training and 10 000 testing examples of 28×28 handwritten digits)
• 3 stacked autoencoders were used to pre-train a NN
• 1st hidden layer has learned stroke-like features
• 2nd hidden layer – digit parts
• 3rd layer – entire digits
Image from Erhan et. al (2010) – Why does unsupervised pre-training help deep learning? JMLR 2010, http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf
, COMP3308/3608 AI, week 9b, 2022
Autoencoders – Advantages
• Able to automatically learn features from unlabeled data
• Especially important for sensory data applications – computer vision,
audio processing and natural language processing – where researchers have spent many years manually devising good features (vision, audio and text)
• Note: in many domains the features learnt by autoencoders are still not superior than the best hand-engineered features but there are some emerging cases where they are (with more sophisticated autoencoders)
• These learned features can be used in conjunction with other ML/NN algorithms – extract the features, apply another ML algorithm
• Useful for pre-training layers of deep NNs – Erhan et al. (2010)
• Shown experimentally that NNs pre-trained with autoencoders
converge faster and have better generalization ability, i.e. find a better
• In contrast, the standard randomly initialized deep NN is slower to
train, and easily gets stuck in a poor local minima
, COMP3308/3608 AI, week 9b, 2022
Why Does Unsupervised Pre-training Help Deep Learning?
• Erhan et al (2010), http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf
• Compared deep NNs with and without pre-training
experimentally on several big dataset – results:
• NNs with pre-training have better accuracy on test data than NNs
without pre-training
• In NNs without pre-training, the probability to find a poor local
minimum increases as the number of hidden layers increases. NNs
with pre-training are robust to this.
• NN with pre-training provide a better starting position for the NN – in
a “basin” with a better local minimum
, COMP3308/3608 AI, week 9b, 2022
Results With and Without Pre-training
Pre-trained NN:
1. Not a big difference – pre-training already provided a good starting position, the fine-tuning doesn’t see to change the weights significantly
2. The fine-tuning changes least the first layer
Layers 2 and 3 doesn’t seem to learn structured features (at least not visually interpretable features)
hidden first layer
after pre- training
after fine- tuning with backpro- pagation
Not pre-trained NN (randomly initialized):
after training with back- propagation
, COMP3308/3608 AI, week 9b, 2022
Convolutional Neural Networks
, COMP3308/3608 AI, week 9b, 2022
Convolutional NNs
• Introduced by LeCun et al. in 1989 http://yann.lecun.org/exdb/publis/pdf/lecun-89e.pdf
• A special type of multilayer NNs
• Trained with the backpropagation algorithm as most of the other
multilayer NNs but have a different architecture
• Designed to recognize visual patterns directly from pixel images with minimal pre-processing
• Can recognize patterns with high variability, e.g. handwritten characters, and are robust to distortions and geometric transformations such as shifting
• Used in speech and image recognition; have shown excellent performance in hand-written digit classification, face detection, image classification (e.g. the ImageNet dataset)
http://yann.lecun.com/exdb/lenet/
, COMP3308/3608 AI, week 9b, 2022
Main Idea 1 – Local Connectivity
• Fully connected network – each neuron from a given layer is connected with each neuron in the next layer => too many connections per neuron
• Ex.: The input is a 100×100 pixels image => input vector is 104 dimensional; each hidden neuron in the first hidden layer will have 104 connections = 104 weights (+ 1 bias weight) to learn = too computationally expensive
• Instead, we can restrict the connections – each hidden neuron is connected only to a small subset of inputs, corresponding to adjacent pixels (a patch, continuous region in the image)
• Inspired by biological neural systems, e.g. neurons in the visual cortex have localized receptive fields (i.e. respond only to stimuli in a certain location)
Fully connected neuron
, COMP3308/3608 AI, week 9b, 2022
Locally connected neuron
Local Connectivity (2)
• With local connectivity, each neuron is responsive to changes in its inputs only (i.e. in its receptive field)
receptive field of neuron i
• We can extend this idea to all layers
• We can easily modify the backpropagation algorithm to work with
local connectivity:
• Forward pass – assume that the missing connections have weights 0
• Backward pass: no need to compute the gradient for the missing
connections
, COMP3308/3608 AI, week 9b, 2022
Main Idea 2 – Sharing Weights
• The number of connections can be further reduced by weight sharing – some of the weights are constrained to be equal to each other – example:
convolutional layer
w1=w4=w7 w2=w5=w8 w3=w6=w9
• => we need to store a smaller number of weights – instead of storing weights from w1 to w9, we will store w1, w2 and w3 only
• Weight sharing means using the same weights to different parts of the im
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com