程序代做 GP02IKqU1JPm187fH?usp=sharing

Lecture 4: FeedForward Neural Network
Instructor:

Outline of this lecture

Copyright By PowCoder代写 加微信 powcoder

} From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture
} Cost Function
} Optimization
} Case Study

Inspired by Human Brain
• Our brain has lots of neurons connected together and the strength of the connection between neurons represents long-term knowledge
• Visual Cortex in Brains:
• The first hierarchy of neurons that receives information in the visual
cortex are sensitive to specific edges
• Brain regions further down the visual pipeline are sensitive to more
complex structures such as faces, textures, etc.

What’s Neural Network (by a newspaper reporter)
} Part of the machine learning field; exceptional effective at learning knowledge/patterns from a large amount of annotated data
} Provided a sufficient amount of training samples, the system begins to understand things like human beings.
} Work as the computer brains beyond many eye-catching systems: self-driving vehicles, intelligent diagnosis, data science,, ALPHA GO
} GPUs+ TONS of human annotations } Money!

What’s Neural Network (In this course)
} A machine learning model applicable to regression, classification, unsupervised learning, etc.
} Making predictions by a function of function
} Multiple layers
} Inputs would be of high-dimensional features
} Outputs would be low-dimensional labels (e.g., binary labels)
} The name reflects the design principles of the prediction function

Nonlinear Transformation
} Linear Regression
h!(𝑥) = 𝜃”𝑥 = 𝜃% + 𝜃&𝑥& + 𝜃’𝑥’ + ⋯
} Logistic Regression
h!(𝑥) = 𝑔(𝜃”𝑥)
𝑔𝑥=1 1+𝑒#$

Neuron in Brain

Neuron Unit
g ∗ : activation functions
𝜃$ 𝜃! 𝜃 𝜃# ”
𝑥$ 𝑥! 𝑥” 𝑥#
𝜃”𝑥 = 𝜃%𝑥% + 𝜃&𝑥& + 𝜃’𝑥’ + 𝜃(𝑥(
h! 𝑥 =𝑔𝜃”𝑥 = 1

Activation functions
} ReLU is popular

Neural Network: Deeper
𝑎$ 𝑎! 𝑎” 𝑎#
𝑥$ 𝑥! 𝑥” 𝑥#
Layer 3: Output Layer 2: Hidden Layer
Layer 1: Input

Neural Network: Architecture
} Multiple Layer

How neural networks work?

Example 1: neural network for XOR
} XOR: exclusive OR gate,
XOR= 𝑥”and (𝑛𝑜𝑡 𝑥#) 𝑜𝑟 [ (𝑛𝑜𝑡 𝑥”) and 𝑥#]
Nonlinear Boundary

} XOR= 𝑥”and (𝑛𝑜𝑡 𝑥#)
} Three basic operators } or
} 𝑥”and (𝑛𝑜𝑡 𝑥#) } (𝑛𝑜𝑡 𝑥”) and 𝑥#
𝑜𝑟 [ (𝑛𝑜𝑡 𝑥”) and 𝑥#]
} Idea: constructing basic operator first and then compose XOR

Operator: Or? And ? Not? else?
h% 𝑥 = g(−30×1 + 20×𝑥! + 20×𝑥”)
Prediction
Answer: and

Operator: Or? And ? Not? else?
h% 𝑥 = g(30×1 − 40×𝑥!)
Prediction
Answer: not gate

Quiz: design neuron unit for OR
h% 𝑥 =g(?)
Prediction

Operator: OR
= g(−30×1 + 𝑥!×40 + 𝑥”×40)
Prediction
Assume parameters have been learnt (introduced later)

𝑥!and (𝑛𝑜𝑡 𝑥”)
h% 𝑥 = g(−30×1 + 𝑥!×50 − 𝑥”×30)
Prediction

Operator:[ (𝑛𝑜𝑡 𝑥!) and 𝑥”]
= g(−30×1 − 𝑥!×30 + 𝑥”×50)
Prediction
Assume parameters have been learnt (introduced later)

Composite operator: XOR
1 𝑥! 𝑥” 1 𝑥! 𝑥” 1 𝑥! 𝑥”
OR 𝑥!and (𝑛𝑜𝑡 𝑥”)
XOR= 𝑥”and (𝑛𝑜𝑡 𝑥#) 𝑜𝑟 [ (𝑛𝑜𝑡 𝑥”) and 𝑥#]
[ (𝑛𝑜𝑡 𝑥!) and 𝑥”]

how to design neural network for XNOR
XNOR= 𝑥”and𝑥# 𝑜𝑟 [ (𝑛𝑜𝑡 𝑥”) and (𝑛𝑜𝑡 𝑥#)]

Multi-class Neural Network
Training samples: x#, y# ,
1,0,0 , if i-th sample is see bass 𝑦$ ∈ 𝑅% 0,1,0 , if i-th sample is salmon
0,0,1 , if i-th sample is cyprinoid
𝑎#(“) 𝑥$ 𝑥! 𝑥” 𝑥#
𝑎(“) 𝑎!(“) $

Architecture

Outline of this lecture
} From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture
} Cost Function
} Optimization
} Case Study

Network-1: two-layer network, Binary class
With network parameters, how to get predictions?
𝑧”# = [𝜃”” , 𝜃”#” , 𝜃”F” ]G 𝑥”,𝑥#,𝑥F hH 𝑥 =𝑎”# =𝑔 𝑧”#

Network-2: two layer, multi-class
Network parameters are Provided, how to get Predictions?
h$(𝑥) h$(𝑥) h$(𝑥)
Make Prediction
} 𝑧”# = [𝜃”” , 𝜃”#” , 𝜃”$” ]% 𝑥”,𝑥#,𝑥$
} h& 𝑥 ” = 𝑎”# = 𝑔 𝑧”#
}𝑎## =? }𝑎$# =?

Network-3:more layers
For one training sample 𝑥
𝑎”(“) 𝑎#(“)
𝑎(!) = 𝑥!,𝑥”,𝑥#
𝑧!” =2𝜃!(!𝑎(!) ()$
𝑧 ” ” = 2 𝜃 ” !( 𝑎 ( ( ! ) ()$
𝑧*” =2𝜃*(!𝑎(!) ()$
𝑧 ” = (𝜃(!))′ 𝑎(!)

Make Predictions: more layers
𝑎’ = 𝑔(𝑧’)
h!(𝑥) = 𝑎(
= (𝜃(‘))′𝑎(‘) 𝑎( = 𝑔(𝑧()
𝑎!(“) 𝑎”(“) 𝑎#(“)

Forward propagation:
𝑎!(“) 𝑎”(“)
𝑧(‘) = 𝜃(&)𝑎&
𝑎(‘) = 𝑔(𝑧 ‘ )
𝑧(() = 𝜃(‘)𝑎(‘)
h! x = 𝑎(() = 𝑔(𝑧 ( )

Outline of this lecture
} From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture
} Cost Function
} Optimization
} Case Study

Cost Function
} How well the prediction is? } Review
} Residual Square of Sum (RSS) } Cross-entropy Function

TrainingSet: (𝑥+,𝑦+),𝑖=1,2,3,…m
Binary class: yJ = 0 or 1 , one output unit
Multi- class:yJ = 1, 0,0 , 0,1, 0 , or 0,0,1 , K(=3) output units

Cost Function
} Network-1: two-layer network, Binary class h%(𝑥)
𝑧!” =[𝜃!&! ,𝜃!! , 𝜃!”! , 𝜃!%! ]’ 𝑥&,𝑥!,𝑥”,𝑥% h( 𝑥 =𝑎!” =𝑔 𝑧!”
𝑥$ 𝑥! 𝑥” 𝑥#
𝐹 𝜃 = 2𝑚 =(𝑦’ − h& 𝑥’ )#

Cost Function
} Network 2: two-layer network, multi-class
h$(𝑥) h$(𝑥) h$(𝑥) h& 𝑥’ ,𝑦’ ∈𝑅*($
𝐹 𝜃 =2𝑚 == 𝑦’ 𝑘 −h&(𝑥’)[𝑘] # ‘(” +(”
𝑥% 𝑥! 𝑥” 𝑥#
} Network 3: multiple layer network, multiple class
𝑎&(“) 𝑎!(“) 𝑎”(“) 𝑎#(“)
𝑥& 𝑥! 𝑥” 𝑥#
Same cost function as two-layer network

Cost Function
} Example 4: Multiple-layer network, binary class
𝑥& 𝑥! 𝑥” 𝑥#

Outline of this lecture
} From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture
} Cost Function
} Optimization
} Case Study

Optimization: gradient descent
For each parameter,
𝜃, = 𝜃, − 𝛼 𝜕𝐹 𝜃 𝜕𝜃,
How to calculate gradients ?

Network-1: two-layer, binary class
𝐹 𝜃 = 2𝑚 =(𝑦’ − h& 𝑥’ )#
𝑥$ 𝑥! 𝑥” 𝑥#
Derivatives w.r.t parameters
𝜕𝐹(𝜃) = 𝜕𝐹(𝜃) = 𝜕𝐹(𝜃) = 𝜕𝐹(𝜃) = 𝜃”, (“) 𝜃”” (“) 𝜃”# (“) 𝜃”$ (“)

𝑥$ 𝑥! 𝑥” 𝑥#
Derivative w.r.t 𝜃(& )
⇒hH𝑥P −𝑦P𝑔Q 𝑧”# 𝜕𝑧! 𝜕𝜃(“)
Error received from Gradient of current neuron unit upper layer

} Gradient Descent Method Repeat
𝜃!&(!) = 𝜃!&(!) − 𝛼 𝜕𝐹 𝜃 𝜕𝜃!& !
𝜃!!(!) = 𝜃!!(!) − 𝛼 𝜕𝐹 𝜃 𝜕𝜃!!(!)
𝜃!”(!) = 𝜃!”(!) − 𝛼 𝜕𝐹 𝜃 𝜕𝜃!”(!)
𝜃!%(!) = 𝜃!%(!) − 𝛼 𝜕𝐹 𝜃 𝜕𝜃!%(!)

How About Multi-layer Networks?
} How to calculate gradients for every parameter?

𝑔($𝑤P𝑥P +𝑏) P
} g(): activation function

Network Architectures

Network: Prediction
} Forward pass For one training sample 𝑥
Layer 1: Layer 2:
𝑎(“) = 𝑥$,𝑥”,𝑥%,𝑥&
𝑧”% =<𝜃"'"𝑎'(") '($ 𝑧%% =<𝜃%"'𝑎'(") '($ 𝑧% =<𝜃"𝑎(") Layer 2: Layer3: 𝑎% = 𝑔(𝑧%) 𝑧& =(𝜃% )𝑎(%) 𝑎& = 𝑔(𝑧&) h+(𝑥) = 𝑎& )' ' 𝑧% =𝜃"*𝑎(") Network: Cost Function 𝐹 𝜃 = 1 = 𝑦P − 𝑓H 𝑥P Gradients: last layer 𝐹 𝑊 = 1 2 h% 𝑥* − 𝑦* " 2𝑚 * h%𝑥* =𝑔𝑊𝑥* =𝑔(𝑤!𝑥*1+𝑤"𝑥*"+⋯) Gradients:other layers 𝑎=𝑔($𝑤P𝑥P +𝑏) P Local gradients 𝜕𝐹 𝜕𝑎 𝜕𝑎 𝜕𝑤! 𝜕𝐹 𝜕𝑎 𝜕𝑎 𝜕𝑤% Example: Gradients for a simple network Gradient Descent 𝐹(𝑥, 𝑦, 𝑧) 𝑥 = 𝑥 − 𝛼 𝜕𝐹 𝜕𝑥 𝑦 = 𝑦 − 𝛼 𝜕𝐹 𝜕𝑦 𝑧 = 𝑧 − 𝛼 𝜕𝐹 𝜕𝑧 Example: Gradients for a simple network 𝐹(𝑥, 𝑦, 𝑧) 𝜕𝐹 = 𝜕𝐹 𝜕𝑞 𝜕𝑥 𝜕𝑞 𝜕𝑥 Example: A Neuron 1 + 𝑒#(-/$/.-0$0.-1 ) Set x=(-1.0, -2.0), y=1, w=(2.0, -3.0, - 3.0), How to calculate gradients for w? + 1.0 ∗−1 -1.0 𝑒𝑥𝑝 0.37 1+ 1.37 1/ Set +, = 1.0 " +- 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) -3. 0 -2.0 + 1.0 ∗−1 -1.0 𝑒𝑥𝑝 0.37 1+ 1.37 1/ −0.53 .-=−! ./ /, 𝑑𝐹𝑑𝑎=1.0∗ − 1 =−0.53 𝑑𝑎 𝑑𝑥 1.37" 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) -3. 0 -2.0 + 1.0 ∗−1 -1.0 𝑒𝑥𝑝 0.37 1+ 1.37 1/ −0.53 −0.53 𝑎𝑥 =1+𝑥, .-=1 ./ 𝑑𝐹 𝑑𝑎=1∗ −0.53 =−0.53 𝑑𝑎 𝑑𝑥 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) -3. 0 -2.0 + 1.0 ∗−1 1+ 1.37 1/ −0.53 𝑎 𝑥 =exp(𝑥), .- =exp 𝑥 ./ 𝑑𝐹 𝑑𝑎 = exp −1.0 ∗ −0.53 𝑑𝑎 𝑑𝑥 -1.0 𝑒𝑥𝑝 0.37 −0.2 −0.53 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) -3. 0 -2.0 + 𝑒𝑥𝑝 0.37 1+ 1.37 1/ −0.53 −0.53 1.0 ∗−1 -1.0 0.2 −0.2 𝑎𝑥 =−1∗𝑥, .-=−1 ./ 𝑑𝐹 𝑑𝑎=(−1)∗ −0.2 =0.2 𝑑𝑎 𝑑𝑥 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) 𝑤 -3.0 " 0.2 ∗ −1 -1.0 −0.2 𝑒𝑥𝑝 0.37 −0.53 1 + 1.37 −0.53 [local gradients] *[upstream gradients] 1 ∗[0.2]=0.2 1 ∗[0.2]=0.2 1 + 𝑒#(-/$/.-0$0.-1 ) x=(-1.0, -2.0), y=1 w=(2.0, -3.0, -3.0) 𝑤& 2.0 −0.2 ∗ -2.0 0.2 + 1.0 ∗−1 -1.0 𝑒𝑥𝑝 0.37 1+ 1.37 1/ 0.73 0.2 −0.2 −0.53 −0.53 1.0 [local gradients] *[upstream gradients] 𝑤0: −1.0 ∗ 0.2 = −0.2 𝑥&: 2.0 ∗ [0.2] = 0.4 𝑤& 2.0 −0.2 ∗ -2.0 0.2 "ih*+ , "ih*+ = (1 − g x )(g x ) 1 + 𝑒@ A%B%CA&B&CA' = 𝑔 𝑤$𝑥$ + 𝑤!𝑥! + 𝑤" ef g = h*+ =("ih*+j") " Sigmoid gate ∗ −1 -1.0 𝑒𝑥𝑝 0.37 1 + 1.37 −0.2 −0.53 0.73*(1-0.73)=0.2 Backward Propagation } Add gate: gradient distributor } Mul Gate: gradient switcher } Quiz: How is the gradient calculated for a Max gate? } Max gate: gradient router max(𝑤&, 𝑥&) 𝑤& 2.0 −0.2 ∗ -2.0 0.2 + 1.0 ∗−1 -1.0 𝑒𝑥𝑝 0.37 1+ 1.37 1/ 0.73 0.2 −0.2 −0.53 −0.53 1.0 Learning: forward & backward } Forward Propagation } Start from the first layer } Propagate predictions of hidden layers } Backward Propagation } Start from the last layer } Propagate errors made for an training sample } Complete training procedure } Forward + backward 𝑎&(") 𝑎!(") 𝑎"(") 𝑥& 𝑥! 𝑥" 𝑥# qInput: training set qOutput: Δ!"# , gradients for , l=1,2,...L-1, qSetΔ!"# =0 qFor each sample of label y, üSet𝑎!=𝑎"! =[𝑥"] ü Forward propagation to get 𝑎# = [𝑎$#] for all layers 𝑙; ü For the last layer, compute 𝛿(&) = 𝑎(&) − 𝑦 ü Back propagation to get 𝛿(&(!), 𝛿(&()),...; üΔ$"#=Δ$"#+𝑎"#𝑔*𝑧(D 𝛿$#+! } Forward propagation } Backward propagation } Graphical representation for gradient calculation } Computational Graph! Important concept used in most deep learning toolboxes (e.g., Tensorflow) Outline of this lecture } From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture } Cost Function } Optimization } Case Study Environment } Jupyter Notebook } Google Co-Lab } Free to use (12 hours) } Access to GPUs } Loading files form google drive } Full support of Python3, Tensorflow, etc. } Welcome To Colaboratory - Colaboratory (google.com) } Machine Learning Examples Example: Text Classification } IMDB Database: 50,000 movie reviews } 50% training samples } 50% testing samples } Labels: Positive/Negative Review: “This was an absolutely terrible movie. Don't be lured in by or . Both are great actors, but this must simply be their worst role in history. Even their great acting could not redeem this movie's ridiculous storyline. This movie is an early nineties US propaganda piece. The most pathetic scenes were those when the Columbian rebels were making their cases for revolutions. Alonso appeared phony, and her pseudo-love affair with Walken was nothing but a pathetic emotional plug in a movie that was devoid of any real meaning. I am disappointed that there are movies like this, ruining actor's like 's good name. I could barely sit through it.” Label: Negative (0) Example: Text Classification } IMDB Database: 50,000 movie reviews } 50% training samples } 50% testing samples } Labels: Positive/Negative Review: “This is the kind of film for a snowy Sunday afternoon when the rest of the world can go ahead with its own business as you descend into a big arm-chair and mellow for a couple of hours. Wonderful performances from Cher and (as always) gently row the plot along. There are no rapids to cross, no dangerous waters, just a warm and witty paddle through life at its best. A family film in every sense and one that deserves the praise it received. .” Label: Positive (1) Google Co-Lab } https://colab.research.google.com/drive/1VvpND8QXdrL Ls AzGP02IKqU1JPm187fH?usp=sharing • TensorFlow:isadeeplearningframeworkthatsupportsa wide variety of neural networks • Computational graph for gradient calculation • Support GPUs What is Tensorflow } Open APIs: Graph based computation } Consider a=(b+c)*(c+2), we might break it into the following components: } d=b+c } e=c+2 } a=d*e Why graph computation } Parallel computing } Download the database from internet! } Dump into google co-lab space (limited to each account) } One might upload data file to the Google Drive and Locate the file’s in CoLab } Exploring the dataset by printing out samples Model Summary • Feature Extraction: to represent a chuck of sentences, one need to use a text embedding • Transfer a sentence into a feature vector • E.g., bag of words • Pre-trained embedding models Model Summary • One should customize the number of layers and the number hidden units for each layer. • Specify the following settings • Loss function • Optimizer • Execute the training • Training data • Training labels • Batch_size • Validation data Training History:Convergence Training History: Accuracy Outline of this lecture } From Neuron to Feedforward Neural Network } Prediction Function: Network Architecture } Cost Function } Optimization } Case Study Others Not Covered } Loss Functions } Variants of Gradient Descent Methods 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com