NEURAL NETWORKS Applied Analytics: Frameworks and Methods 2
1
Outline
■ Introduction to Neural Networks
■ Artificial Neuron
■ Multiple Layer Neural Networks
■ Network Architecture
■ Illustration of Neural Networks on MNIST
■ Types of Networks
■ Applications
■ Using Deep Learning at Scale
2
Deep Learning
■ Artificial Neural networks, conceived in the ‘50s as a crude approximation of how the human brain works, are the basis of what is now referred to as Deep Learning.
■ Artificial Neural Networks used to address machine learning problems are referred to as Deep Learning. Since our goal is to apply neural networks to solve machine learning problems, we will use the terms Deep Learning and Neural Networks interchangeably.
■ Deep Learning is a form of Artificial Intelligence that uses a type of machine learning called an artificial neural network with multiple hidden layers that learns hierarchical representations of the underlying data in order to make predictions given new data
3
Deep Learning
is a Form of Artificial Intelligence
4
Deep Learning
is a type of Machine Learning
Source: MachineLearningMastery.com
5
Deep Learning
is a type of Machine Learning
Source: MachineLearningMastery.com
6
Evaluation of Deep Learning Pros Cons
■ Excels at tasks such as computer vision, natural language processing, speech recognition.
■ Powers many recommender systems, and fraud detection systems.
■ Algorithms are very general and adaptive
■ Work directly on raw data. No feature engineering required.
■ Computational resource intensive ■ Tricky hyper parameterization
■ Non-optimal methods
7
Artificial Neural Network
■ Inspired by the biological neuron but they work very differently
■ Consists of a series of Neurons connected together in a network
■ Lets first examine an Artificial Neuron.
8
Artificial Neuron
■ Input (x1, x2, x3)
■ Weights or parameters (ω1, ω2, ω3)
■ Bias (ω0)
■ Neuron
■ Output (y)
■ Activation function (e.g., tanh)
9
Artificial Neuron
y
Inputs
Neuron
Output
10
Artificial Neuron
x1
x2 y
x3
Inputs
Neuron
Output
11
Artificial Neuron
x1
ω1
x ω2 y
2
ω3 Inputs
x3
Neuron
Output
12
Artificial Neuron
ω0
x1
ω1 x ω2
y
2
ω3 Inputs
x3
Neuron
Output
13
Artificial Neuron
ω0
x1
ω1
xω2 Σy
2
ω3 Inputs
x3
Neuron
Output
14
Artificial Neuron
ω0
x1
ω1
xω2 Σfy
2
ω3 Inputs
x3
Neuron
Output
15
Artificial Neural Network
■ Network of connected neurons
■ Includes
– Input Layer
– One or more Hidden Layer(s)
– Output Layer
16
Artificial Neural Network
Inputs
Hidden Layer
Outputs 17
Artificial Neural Network
■ In practice, each layer can be a
– vector (one-dimensional)
– matrix (two-dimensional array)
– tensor (n-dimensional array)
18
Deep Learning Neural Network
■ A neural network with more than one hidden layer
■ Adding more hidden layers enables the network to model progressively more complex functions
19
Deep Learning Neural Network
Source: Communications of the ACM
20
Hierarchical Representations
Greater Abstraction
Source: Data Science Central
21
MNIST
■ A large database of handwritten digits that is commonly used for training various image processing systems
■ 42000 images.
■ Each image is on a 28 x 28 pixel gray scale
■ Thus, image has a 784 pixel descriptors
Source: Wikipedia
22
MNIST
Basic Neural Network
■ MNIST Data
■ Inputs: 784
■ Hidden Layer(s): 1
■ Hidden Units or Neurons: 5
23
Activation Functions
■ Linear
■ Logistic (sigmoid)
■ Hyperbolic Tangent (tanh)
■ Rectified Linear Unit (ReLU) ■…
24
MNIST
Multi-Layer Neural Network
■ MNIST Data
■ Inputs: 784
■ Hidden Layer(s): 2
■ Hidden Units or Neurons: 5 in each layer
25
Mechanics of Neural Networks
■ Specify Neural Network hyper-parameters
– Number and nature of Inputs
– Number of hidden layers
– Number of neurons per hidden layer
– Activation function (tanh, softmax, sigmoid)
■ Determine parameter weights by minimizing a loss function (e.g., mean square error, cross-entropy, Hinge) using stochastic gradient descent
– Gradient descent is computationally demanding for large datasets. Stochastic gradient descent or mini-batch gradient descent searches for minima using a small batches of the training set rather than the entire train sample.
■ Specify regularization terms (L1 and L2) to prevent overfitting
26
Network Architectures
■ Various network architectures are possible by changing the
– Number and nature of Inputs
– Number of hidden layers
– Number of neurons per hidden layer
– Activation function
■ Adding more hidden layers makes the network deep
■ Adding more neurons per layer makes the network wide
27
Network Architecture
■ There is not a solid theoretical basis to determine the best Network Architecture
■ Often a trial-and-error process
■ Some approaches include
– Using a network architecture for a similar problem
– Progressively increasing complexity by adding more hidden units and layers until performance improvements become asymptotic
28
Tuning Hyper-Parameters
■ The term Hyper-Parameters is used to distinguish them from standard model parameters
– They define higher level concepts about the model like complexity or capacity to learn
– They cannot be learned directly from the data in the standard model training process and need to be predefined
– They are critical to model performance
29
Tuning Hyper-Parameters
■ There are many hyper-parameters to tune
– Number of hidden layers
– Number of hidden units
– Number of training iterations
– Learning rate
– Regularization
30
Tuning Hyper-Parameters
■ Two approaches to tuning – Manual:
■ Quite a common approach
■ Without domain knowledge or experience with similar problems, this can take a long time
– Automatic:
■ Using Stochastic gradient descent for Hyper-Parameters means training the model from scratch each time! This is computationally expensive and only practical for small datasets. For small models, here are two approaches
– Grid Search
– Random Search
31
MNIST
Random Search Neural Network
■ MNIST Data
■ Use Random Search
■ Number of each of the following hyper-parameters explored
– Activation Function: 6
– Network architectures: 6
– L1: 101
– L2: 101
32
Deep Learning in R ■ R Packages
– nnet
– neuralnet
– RSNNS
– deepnet
– darch
– caret
– RNN
– Autoencoder
– RcppDL
– h2o
– MXNetR ….
■
Other General Frameworks – Caffe
– Tensorflow
– Theano
– Torch
– Deeplearning4j – CNTK
33
TYPES OF NEURAL NETWORKS
34
Types
■ Fully Connected Networks
■ Convolutional Networks
■ Recurrent Networks
■ Generative Adversarial Networks
■ Deep Reinforcement Learning
35
Fully Connected Feed Forward Network
■ Fully Connected: Each neuron is connected to every neuron in the subsequent layer
■ Feed Forward: Neurons are only connected to neurons in a subsequent layer. There are no feed back loops.
36
Fully Connected Feed-Forward Network
37
Convolutional Neural Network
■ Adding Hidden Layers and Neurons to a Fully Connected Network results in an exponential increase in system resources
38
Convolutional Neural Network
39
Convolutional Neural Network ■ Convolution
– Technique that allows us to extract visual features from an image in small chunks. Each neuron in a convolution layer is responsible for a small cluster of neurons in the preceding layer.
■ Filter
– Bounding box that defines the cluster of neurons. Also known as a kernel.
– Filters help identify certain features of an image such as sharpen an image or detect edges.
■ Pooling (also known as subsampling or downsampling)
– Reduces the number of neurons in the previous layer while still retaining the most important information
40
Convolutional Neural Networks
■ Find application in
– Image recognition
– Image processing
– Image segmentation
– Video analysis
– Language processing
41
Recurrent Neural Network
42
Recurrent Neural Network
43
Generative Adversarial Network (GAN)
44
Deep Reinforcement Learning
45
APPLICATIONS
46
Sentence Completion
47
Translation
48
Summarizing
49
Auto-Tagging
50
Summarizing Pictures
51
Filling in Pictures
52
Completing Pictures
53
Do you recognize them?
54
Editing Audio
55
Sign Language
56
Lip Reading
57
USING DEEP LEARNING WITH LARGE DATASETS
58
Using Deep Learning with Large Datasets
■ Deep Learning Services
■ Deep Learning Platforms
■ Deep Learning Frameworks
59
Deep Learning as a Service
■ Training Data
■ Training Model
■ Hosted on Server
60
Deep Learning Platforms
■ You provide Data
■ Training Model
■ Hosted on Server
61
Deep Learning Frameworks
■ Own Data
■ Own Algorithm ■ Host
62
Summary
■ This module addressed the following topics
– Introduction to Neural Networks
– Artificial Neuron
– Multiple Layer Neural Networks
– Network Architecture
– Illustration of Neural Networks on MNIST
– Types of Networks
– Applications
– Using Deep Learning at Scale
63