程序代写 Ve492: Introduction to Artificial Intelligence

Ve492: Introduction to Artificial Intelligence
Neural Nets

UM-SJTU Joint Institute

Some slides adapted from http://ai.berkeley.edu, AIMA, UM

Learning Objectives
❖ What is statistical machine learning?
❖ What is an artificial neural network?
❖ What makes artificial neural network powerful?
❖ How to train an artificial neural network?

❖ Overwiew of statistical machine learning
❖ (Deep) neural network ❖ Applications

Artificial Neuron
❖ Perceptron: 𝑔(𝑧) = 𝑠𝑖𝑔𝑛(𝑧)
❖ Logistic regression: 𝑔(𝑧) = !
!”#!” ❖ Linear regression: 𝑔(𝑧) = 𝑧
❖ 𝑔 is called activation function, usually non-linear (sub)differentiable

How to Learn the Parameters of the Model?
❖ Iterative method that updates unknown weights
❖ For perceptron:
𝑤←𝑤+𝑦∗𝜑(𝑥)𝟏(𝑤⋅𝜑(𝑥)⋅𝑦∗ <0) ❖ For logistic regression: ❖ Write likelihood function: 𝑃(𝑋, 𝑌|𝑤) ❖ Maximize log𝑃(𝑋, 𝑌|𝑤) with gradient descent 𝑤←𝑤+𝛼𝛻"log𝑃(𝑋,𝑌|𝑤) (batch) 𝑤 ← 𝑤 + 𝛼𝛻"log𝑃(𝑥, 𝑦|𝑤) (stochastic) General Framework: Statistical ML ❖ Minimize empirical risk: # # 𝑚𝑖𝑛" ∑#l(h"(𝑥 ),𝑦 ) ❖ For perceptron: l(𝑦=, 𝑦) = 𝑚𝑎𝑥(0, −𝑦𝑦=) ❖ For logistic regression: l(𝑦̂, 𝑦) = −𝑦log(𝑦̂) − (1 − 𝑦)log(1 − 𝑦̂) ❖ For linear regression: $ l(𝑦̂, 𝑦) = (𝑦̂ − 𝑦) Mini-Batch Gradient Descent 𝑚𝑖𝑛!∑"l(h!(𝑥"), 𝑦") Observation: gradient over small set of training examples (=mini-batch) can be computed in parallel, might as well do that instead of a single one ❖ foriter=1,2,... ❖ pick random subset of training examples J 𝑤 ← 𝑤 − 𝛼 ∑ 𝛻! l ( h ! ( 𝑥 # ) , 𝑦 # ) #∈% Neural Networks Artificial Neural Networks ❖ Basic idea: ❖ Let’s connect many artificial neurons and ... ❖ Train them to perform a task! ❖ Deep learning: ❖ Usually, large number of artificial neurons ❖ Importance of architecture ❖ Differentiable learning Multi-class Logistic Regression 𝜑!(𝑥) 𝜑"(𝑥) s o f t m a x (Deep) Neural Network 𝜑!(𝑥) 𝜑"(𝑥) s o f t m a x ... ............ g = nonlinear activation function (Deep) Neural Network ❖ Directly learns the features from data ... ............ s o f t m a x g = nonlinear activation function Common Activation Functions [source: MIT 6.S191 introtodeeplearning.com] Training a (Deep) Neural Network Training the deep neural network is just like logistic regression: just 𝑤 tends to be a much, much larger vector ☺ => just run gradient method
min−𝑙𝑙𝑤 =min−Flog𝑃𝑦# 𝑥(#);𝑤)
+ stop when log likelihood of hold-out data starts to decrease (early stopping)

Quiz: Hyperparameter Tuning
❖ Can we tune the hyperparameters on the hold-out data used for early stopping?

Neural Networks Properties
❖ Theorem (Universal Function Approximators). A two- layer neural network with a sufficient number of neurons can approximate any continuous function to any desired accuracy .
❖ Practical considerations
❖ Can be seen as learning the features ❖ Large number of neurons
❖ Danger for overfitting ❖ (hence early stopping!)

Neural Net Demo! https://playground.tensorflow.org/

How about computing all the derivatives?
❖ A neural network is just the composition of many functions: h”(𝑥) = 𝑓'(𝑓'()(… 𝑓)(𝑥) … ))
❖ Apply chain rule:
❖ 𝑓(𝑥) = 𝑔(h(𝑥))
❖ 𝑓$(𝑥) = h$(𝑥)𝑔$(h(𝑥))
❖ ⇒ Derivatives can be computed by following well- defined procedures

Automatic Differentiation
❖ Automatic differentiation software
❖ e.g., Theano, TensorFlow, PyTorch
❖ Only need to program the function 𝑔(𝑥, 𝑤)
❖ Can automatically compute all derivatives w.r.t. all entries in 𝑤
❖ This is typically done by caching info during forward computation pass of g, and then doing a backward pass = “backpropagation”
❖ Autodiff / Backpropagation can often be done at computational cost comparable to the forward pass

Deep Neural Networks
Applications
Computer Vision Speech Recognition Machine Translation

Features and Generalization
[HoG: Dalal and Triggs, 2005]

Features and Generalization

Performance
graph credit , Clarifai

MS COCO Image Captioning Challenge
Karpathy & Fei-Fei, 2015; Donahue et al., 2015; Xu et al, 2015; many more

Visual QA Challenge
, , Jiasen Lu, , , C. ,

Speech Recognition
graph credit , Clarifai

Machine Translation
Google Neural Machine Translation (in production)

Concluding Remarks
❖ Bayes’ nets
❖ Representation
❖ (Conditional) independence ❖ Local Search
❖ Hill-climbing, genetic algorithm
❖ For more information:
❖ AIMA, Chapter 18 for Learning from Examples

Concluding Remarks
❖ Statistical machine learning $ $ ❖ Minimizeempiricalrisk:𝑚𝑖𝑛#∑$l(h#(𝑥),𝑦)
❖ Gradientdescent
❖ Deep neural nets
❖ Universalfunctionapproximationtheorem
❖ Automatic feature learning ❖ Differentiablelearning
❖ Automatic differentiation
❖ Applications ❖ CV, ASP, NLP
❖ For more information:
❖ AIMA, Chapter 23 for Natural Language for Communication

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts