Deep Learning
By Majid Babaei
What is Deep Learning
• Deep learning is a subset of machine learning that creates patterns for use in decision making.
Multi-Layer Perceptron
It consists of a single input layer, one or more hidden layer and finally an output layer. Each layer consists of a collection of perceptron.
Input layer is basically one or more features of the input data.
Every hidden layer consists of one or more neurons and process certain aspect of the
feature and send the processed information into the next hidden layer.
The output layer receives the data from last hidden layer and finally output the result.
What does make Single-layer Perceptron unsuitable for real- world challenges?
Single-layer Perceptron simply cannot produce the sort of performance that we expect from a modern neural-network architecture.
Single-layer Perceptron will not be able to approximate the complex input– output relationships that occur in real- life scenarios.
The solution is a multilayer Perceptron
Convolutional Neural Network (CNN)
Convolutional Neural Network (CNN)
It is one of the most popular ANN and is widely used in the fields of image and video recognition.
Like ordinary Neural Networks: they are made up of neurons that have learnable weights and biases.
Its primary purpose is to extract features from the input image
It has three important layers (Convolution , Pooling, and Fully connected)
Convolution as a mathematical operation
• Convolution is a mathematical operation on two functions that produces a third function expressing how the shape of one is modified by the other.
We will not go into the mathematical details of Convolution here, but will try to understand how it works over some examples
Layers in Convolutional Neural Network (CNN)
01
Convolution layer − It is the primary building block and perform computational tasks.
02
Pooling layer − It is arranged next to convolution layer
It is used to reduce the size of inputs by removing unnecessary information.
03
Fully connected layer − It is arranged next to series of convolution and pooling layers and classify input into various categories.
Why regular ANNs are not suitable for processing images ?
Problems with regular ANNs
• Regular ANNs don’t scale well to full images.
• Suppose an image of size 32*32*3 (32 wide, 32 high, 3 color channels)
• So a single fully-connected neuron in a first hidden layer of a regular Neural Network would have 32*32*3 = 3072 weights.
• This amount still seems manageable, but clearly this fully-connected structure does not scale to larger images.
Example
• For example, an image of more respectable size, e.g. 200*200*3, would lead to neurons that have 200*200*3 = 120,000 weights.
• Clearly, this full connectivity is wasteful and the huge number of parameters would quickly lead to overfitting.
What is the intuition behind the cNN?
An Intuitive Explanation of cNN
• Convolutional Neural Networks take advantage of the fact that the input consists of images and they constrain the architecture in a more sensible way.
• Unlike a regular Neural Network, the layers of a cNN have neurons arranged in 3 dimensions: width, height, depth.
• As we will soon see, the neurons in a layer will only be connected to a small region of the layer before it!
Connection to a small region
• An input volume in red (e.g. a 32x32x3), and an example volume of neurons in the first layer.
How does cNN work
step-by-step ?
The Convolution Step [1]
• As we discussed, every image can be considered as a matrix of pixel values. • Consider a 5 x 5 image whose pixel values are only 0 and 1
The Convolution Step [2]
• Also, consider another 3 x 3 matrix as shown below:
The Convolution Step [3]
• We slide the orange matrix over the original image (green) by 1 pixel
• And for every position, we compute element wise multiplication
• And add the multiplication outputs to get
the final integer which forms a single element of the output matrix
The Convolution Step [3]
• The output matrix is called Convolved Feature or Feature Map.
Take a moment to understand how the computation is being done.
The Convolution Step [*]
• In cNN terminology, the 3×3 matrix is called a ‘filter‘ or ‘kernel’ or ‘feature detector’.
• The matrix formed by sliding the filter over the image and computing the dot product is called the ‘Convolved Feature’ or ‘Activation Map’ or the ‘Feature Map‘.
• It is important to note that filters acts as feature detectors from the original input image.
The Convolution Step [**]
• It is evident that different values of the filter matrix will produce different Feature Maps for the same input image.
• As an example, consider the following input image:
• Inthistable,wecanseetheeffectsof convolution of the image with different filters.
•
•
Asshown,wecanperformoperations just by changing the numeric values of our filter matrix.
This means that different filters can detect different features
The Convolution Step [***]
The more number of filters we have, the more image features get extracted
The Pooling Step [1]
• It is called subsampling or down-sampling
• It reduces the dimensionality of each feature map but retains the
most important information.
• Pooling can be of different types: Max, Average, Sum etc.
The Pooling Step [2]
• In case of Max Pooling, we define a spatial neighborhood (for example, a 2×2 window)
• Then take the largest element within that window.
• Instead of taking the largest element we could also take the average or sum
In practice, Max Pooling has been shown to work better.
The ReLU operation
Since convolution is a linear operation – element wise matrix multiplication and addition, so we account for non-linearity by introducing a non-linear function like ReLU.
The ReLU operation
It shows the ReLU operation applied to one of the feature maps
Other non linear functions such
as tanh or sigmoid can also be used instead of ReLU, but ReLU shows better performance in most situations.
The Pooling Step [2]
• Pooling operation is applied separately to each feature map
The Pooling operation
• It shows the effect of Pooling on the Rectified Feature Map
The Pooling Step [3]
Makes the input representations (feature dimension) smaller and more manageable
Reduces the number of parameters and computations in the network, therefore, controlling overfitting
It help us to detect objects in an image no matter where they are located
So far we have seen how Convolution, ReLU and Pooling work.
It is important to understand that these layers are the basic
building blocks of any cNN.
Overview of a cNN
• Together these layers extract the useful features from the images, introduce non-linearity in our network and reduce feature dimension
Fully Connected layer
Fully Connected layer [1]
• The Fully Connected layer is a traditional Multi-Layer Perceptron that uses a softmax activation function in the output layer
• The term “Fully Connected” implies that every neuron in the previous layer is connected to every neuron on the next layer.
Fully Connected layer [2]
• The output from the convolutional and pooling layers represent high- level features of the input image.
• The purpose of the Fully Connected layer is to use these features for classifying the input image into various classes based on the training dataset.
• For example, the image classification task we set out to perform has four possible outputs.
Fully Connected layer [3]
• The sum of output probabilities from the Fully Connected Layer is 1.
• This is ensured by using the Softmax as the activation function in the
output layer of the Fully Connected Layer.
• Apart from classification, adding a fully-connected layer is also a way
of learning non-linear combinations of these features. Combinations of those features might be even better!
Putting it all together
• TheConvolution+PoolinglayersactasFeatureExtractorsfrom the input image while Fully Connected layer acts as a classifier.
• In the above image, since the input image is a boat, the target probability is 1 for Boat class and 0 for other three classes
The overall training process of the cNN
This process essentially means that all the weights and parameters of the cNN have been optimized to correctly classify images from the training set.
If our training set is large enough, the network will (hopefully) generalize well to new images and classify them into correct categories.
We can do it in five steps!
The training process
• Step1: We initialize all filters and weights with random values
• Step2: The network takes a training image as input, goes through the forward propagation step (convolution, ReLU and pooling operations along with forward propagation in the Fully Connected layer) and finds the output probabilities for each class.
The training process
• Step3: Calculate the total error at the output layer
• Step4: Use Backpropagation to calculate the gradients of the error with respect to all weights in the network and use gradient descent to update all filter values / weights and parameter values to minimize the output error.
What are gradient and gradient decent?
The training process
What does “to minimize the output error” mean in step4 ?
• Note1: This means that the network has learnt to classify this particular image
correctly by adjusting its weights such that the output error is reduced.
• Note2: Parameters like number of filters, filter sizes, and architecture of the network have all been fixed before Step 1 and do not change during training process – only the values of the filter matrix and connection weights get updated.
What is gradient?
The gradient function gives the slope of a function at any point on its curve.
What is gradient descent?
Gradient descent is an optimization algorithm used to minimize some function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.
The training process
• Step5: Repeat steps 2-4 with all images in the training set.
Visualizing Convolutional Neural Networks
• Adam Harley created amazing visualizations of a Convolutional Neural Network trained on the MNIST Database of handwritten digits.
• I highly recommend playing around with it to understand details of how a cNN works.
• https://www.cs.ryerson.ca/~aharley/vis/conv/flat.html
Overview of Keras
Keras runs on top of open source machine libraries like TensorFlow, Theano or Cognitive Toolkit (CNTK).
Theano is a python library used for fast numerical computation tasks.
TensorFlow is the most famous symbolic math library used for creating neural networks and deep learning models.
TensorFlow
• When installing TensorFlow, you can choose either the CPU-only or GPU-supported version.
• I’d recommend to install the CPU version if you need to design and train simple machine learning models, or if you’re just starting out.
• However, the CPU version can be slower while performing complex tasks, especially those involving image processing. I’d recommend installing the GPU-supported version.
Installation with pip
• Once the installation completes, check for the version of pip running on your system.
• pip3 –version
• pip2 –version
• /usr/bin/pip2 –version [Linux Users]
• After that, you only have to run one simple command to install TensorFlow. • pip3 install –upgrade tensorflow
• The command will take some time to execute, so remain patient.
Interested reader?!
If you want to learn more beyond this then we recommend trying a more detailed resource, like the Hands-On Machine Learning with Scikit-Learn and TensorFlow book.
Add tensorflow package to your PyCharm
Add Keras package to your PyCharm
• Keras API can be divided into three main categories • Model
• Layer
• Core Modules
Keras
Architecture of
• In Keras, every ANN is represented by Keras Models.
• Every Keras Model is composition of Keras Layers such as input, hidden layer, output layers, convolution layer, pooling layer, etc.
• Keras model and layer access Keras modules for activation function, loss function etc.
Keras
Architecture of
• Using Keras model, Keras Layer, and Keras modules, any ANN algorithm (CNN, RNN, etc.,) can be represented in a simple and efficient manner.
Keras: Model
• Sequential Model − Sequential model is basically a linear composition of Keras Layers.
• Sequential model is easy to use as well as it has the ability to represent nearly all available neural networks.
• Functional API − Functional API is basically used to create complex models
• such as multi-output models or models with shared layers.
Keras: Model: An Example
• Line 1 imports Sequential model from Keras models
• Line 2 imports Dense layer and Activation module
• Line 3 create a new sequential model using Sequential API • Line 4 [Not Now!]
Keras: Layer
• Each Keras layer in the Keras model represent the corresponding layer (input layer, hidden layer and output layer) in the actual proposed neural network model.
• Keras provides a lot of pre-build layers so that any complex neural network can be easily created.
• Some of the important Keras layers are specified below: • Core Layers
• Convolution Layers • Pooling Layers
• Recurrent Layers
Keras: Layer: An Example
• Line 3 adds a dense layer with relu activation function.
• Line 4 adds a dropout layer to handle overfitting.
• Line 5 adds another dense layer with relu activation function. • Line 6 adds another dropout layer to handle over-fitting.
• Line 7 adds final dense layer with softmax activation function.
Keras: Core Modules
• Keras also provides a lot of built-in neural network related functions to properly create the Keras model and Keras layers.
• Some of the function are as follows:
• Activations module − provides many activation function like softmax, relu, etc.,
• Loss module − provides loss functions like mean_squared_error, mean_absolute_error, poisson, etc.,
• Optimizer module − provides optimizer function like adam, sgd, etc.,
• Regularizers − provides functions like L1 regularizer, L2 regularizer, etc.,
Initializers
• In Deep Learning, weight will be assigned to all input data. Initializers module provides different functions to set these initial weight.
• Some of the Keras Initializer function are: • Zeros
• Ones
• Constant
• RandomNormal • RandomUniform • etc.
Initializers: Zeros
• Generates 0 for all input data.
• Where, kernel_initializer represents the initializer for kernel of the model.
Initializers: Ones
• Generates 1 for all input data.
Initializers: Constant
• Generates a constant value (say, 5) specified by the user for all input data. • where, value represents the constant value
from keras.models import Sequential
from keras.layers import Activation, Dense from keras import initializers
my_init = initializers.Constant(value = 5)
model.add( Dense(512, activation = ‘relu’, input_shape = (784,), kernel_initializer = my_init) )
Initializers: RandomNormal
• Generates value using normal distribution of input data.
• mean represents the mean of the random values to generate
• stddev represents the standard deviation of the random values to generate • seed represents the values to generate random number
Initializers: RandomUniform
• Generates value using uniform distribution of input data.
• minval represents the lower bound of the random values to generate • maxval represents the upper bound of the random values to generate
from keras import initializers
my_init = initializers.RandomUniform(minval = -0.05, maxval = 0.05, seed = None) model.add(Dense(512, activation = ‘relu’, input_shape = (784,), kernel_initializer = my_init))
Activations
• In Deep learning, activation function is a special function used to find whether a specific neuron is activated or not.
• Basically, the activation function does a nonlinear transformation of the input data and thus enable the neurons to learn better.
Activations: linear
• Applies Linear function. Does nothing.
• Where, activation refers the activation function of the layer. It can be specified simply by the name of the function and the layer will use corresponding activators.
Activations: relu
• Applies Rectified Linear Unit.
Activations: softmax
• Applies Softmax function.
Activations: tanh
• Applies Hyperbolic tangent function.
Activations: sigmoid
• Applies Sigmoid function.
The course project instruction is available on BlackBoard
79