Ve492: Introduction to Artificial Intelligence
Neural Nets
UM-SJTU Joint Institute
Copyright By PowCoder代写 加微信 powcoder
Some slides adapted from http://ai.berkeley.edu, AIMA, UM
Learning Objectives
❖ What is statistical machine learning?
❖ What is an artificial neural network?
❖ What makes artificial neural network powerful?
❖ How to train an artificial neural network?
❖ Overwiew of statistical machine learning
❖ (Deep) neural network ❖ Applications
Artificial Neuron
❖ Perceptron: 𝑔(𝑧) = 𝑠𝑖𝑔𝑛(𝑧)
❖ Logistic regression: 𝑔(𝑧) = !
!”#!” ❖ Linear regression: 𝑔(𝑧) = 𝑧
❖ 𝑔 is called activation function, usually non-linear (sub)differentiable
How to Learn the Parameters of the Model?
❖ Iterative method that updates unknown weights
❖ For perceptron:
𝑤←𝑤+𝑦∗𝜑(𝑥)𝟏(𝑤⋅𝜑(𝑥)⋅𝑦∗ <0)
❖ For logistic regression:
❖ Write likelihood function: 𝑃(𝑋, 𝑌|𝑤)
❖ Maximize log𝑃(𝑋, 𝑌|𝑤) with gradient descent
𝑤←𝑤+𝛼𝛻"log𝑃(𝑋,𝑌|𝑤) (batch)
𝑤 ← 𝑤 + 𝛼𝛻"log𝑃(𝑥, 𝑦|𝑤) (stochastic)
General Framework: Statistical ML
❖ Minimize empirical risk: # # 𝑚𝑖𝑛" ∑#l(h"(𝑥 ),𝑦 )
❖ For perceptron:
l(𝑦=, 𝑦) = 𝑚𝑎𝑥(0, −𝑦𝑦=)
❖ For logistic regression:
l(𝑦̂, 𝑦) = −𝑦log(𝑦̂) − (1 − 𝑦)log(1 − 𝑦̂)
❖ For linear regression: $ l(𝑦̂, 𝑦) = (𝑦̂ − 𝑦)
Mini-Batch Gradient Descent
𝑚𝑖𝑛!∑"l(h!(𝑥"), 𝑦")
Observation: gradient over small set of training examples (=mini-batch) can be computed in parallel, might as well do that instead of a single one
❖ foriter=1,2,...
❖ pick random subset of training examples J
𝑤 ← 𝑤 − 𝛼 ∑ 𝛻! l ( h ! ( 𝑥 # ) , 𝑦 # ) #∈%
Neural Networks
Artificial Neural Networks
❖ Basic idea:
❖ Let’s connect many artificial neurons and ...
❖ Train them to perform a task!
❖ Deep learning:
❖ Usually, large number of artificial neurons
❖ Importance of architecture
❖ Differentiable learning
Multi-class Logistic Regression
𝜑!(𝑥) 𝜑"(𝑥)
s o f t m a x
(Deep) Neural Network
𝜑!(𝑥) 𝜑"(𝑥)
s o f t m a x
... ............
g = nonlinear activation function
(Deep) Neural Network
❖ Directly learns the features from data
... ............
s o f t m a x
g = nonlinear activation function
Common Activation Functions
[source: MIT 6.S191 introtodeeplearning.com]
Training a (Deep) Neural Network
Training the deep neural network is just like logistic regression:
just 𝑤 tends to be a much, much larger vector ☺ => just run gradient method
min−𝑙𝑙𝑤 =min−Flog𝑃𝑦# 𝑥(#);𝑤)
+ stop when log likelihood of hold-out data starts to decrease (early stopping)
Quiz: Hyperparameter Tuning
❖ Can we tune the hyperparameters on the hold-out data used for early stopping?
Neural Networks Properties
❖ Theorem (Universal Function Approximators). A two- layer neural network with a sufficient number of neurons can approximate any continuous function to any desired accuracy .
❖ Practical considerations
❖ Can be seen as learning the features ❖ Large number of neurons
❖ Danger for overfitting ❖ (hence early stopping!)
Neural Net Demo! https://playground.tensorflow.org/
How about computing all the derivatives?
❖ A neural network is just the composition of many functions: h”(𝑥) = 𝑓'(𝑓'()(… 𝑓)(𝑥) … ))
❖ Apply chain rule:
❖ 𝑓(𝑥) = 𝑔(h(𝑥))
❖ 𝑓$(𝑥) = h$(𝑥)𝑔$(h(𝑥))
❖ ⇒ Derivatives can be computed by following well- defined procedures
Automatic Differentiation
❖ Automatic differentiation software
❖ e.g., Theano, TensorFlow, PyTorch
❖ Only need to program the function 𝑔(𝑥, 𝑤)
❖ Can automatically compute all derivatives w.r.t. all entries in 𝑤
❖ This is typically done by caching info during forward computation pass of g, and then doing a backward pass = “backpropagation”
❖ Autodiff / Backpropagation can often be done at computational cost comparable to the forward pass
Deep Neural Networks
Applications
Computer Vision Speech Recognition Machine Translation
Features and Generalization
[HoG: Dalal and Triggs, 2005]
Features and Generalization
Performance
graph credit , Clarifai
Performance
graph credit , Clarifai
Performance
graph credit , Clarifai
Performance
graph credit , Clarifai
Performance
graph credit , Clarifai
MS COCO Image Captioning Challenge
Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more
Visual QA Challenge
, , Jiasen Lu, , , C. ,
Speech Recognition
graph credit , Clarifai
Machine Translation
Google Neural Machine Translation (in production)
Concluding Remarks
❖ Bayes’ nets
❖ Representation
❖ (Conditional) independence ❖ Local Search
❖ Hill-climbing, genetic algorithm
❖ For more information:
❖ AIMA, Chapter 18 for Learning from Examples
Concluding Remarks
❖ Statistical machine learning $ $ ❖ Minimizeempiricalrisk:𝑚𝑖𝑛#∑$l(h#(𝑥),𝑦)
❖ Gradientdescent
❖ Deep neural nets
❖ Universalfunctionapproximationtheorem
❖ Automatic feature learning ❖ Differentiablelearning
❖ Automatic differentiation
❖ Applications ❖ CV, ASP, NLP
❖ For more information:
❖ AIMA, Chapter 23 for Natural Language for Communication
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com