程序代写代做代考 python deep learning chain .

Introduction to Neural Networks
CS918 Natural Language Processing

Elena Kochkina

Table of contents

Basics
Neuron
Activation functions
Feedforward NN
Backpropagation

Types of Neural Networks
Recursive NNs
Convolutional NNs
Recurrent NNs

Network regularisation
Regularisation methods

Resources

Neuron

Inputs

…

x3 ⌃

Inputs:
X = [x1, x2, .., xn]
X ∈ Rn

Neuron

Inputs

…

x3 ⌃

Inputs:
X = [x1, x2, .., xn]
X ∈ Rn
Weights:
W = [ω1, ω2, .., ωn]
W ∈ Rn×m, where
m – number of
neurons
Bias:
B = [b1, .., bm],
b ∈ Rm

Neuron performs the following computation:
∑n

i=1 ωixi + b
In matrix form: W · X + B

Neuron

Inputs

…

x3 ⌃

+1
!n+1

Inputs:
X = [x1, x2, .., xn, 1]
X ∈ Rn
Weights: W =
[ω1, ω2, .., ωn, ωn+1]
W ∈ R(n+1)×m,
where m – number
of neurons

Neuron performs the following computation:
∑n

i=1 ωixi
In matrix form: W · X

Activation function

Neuron

Inputs

…

x3 ⌃

Activation function

f(x)

Output

!n+1

Inputs:
X = [x1, x2, .., xn, 1]
X ∈ Rn
Weights: W =
[ω1, ω2, .., ωn, ωn+1]
W ∈ R(n+1)×m,
where m – number
of neurons
Activation
function f(x)

The output is y = f (
∑n

i=1 ωixi )
In matrix form: f (W · X )

Activation functions

Function Formula
Linear f (X ) = X

Sigmoid/Logistic f (X ) = 1
1+e−X

Rectified Linear (ReLU) f (X ) = max(0,X )

Tanh f (X ) = tanh(X )

Softmax f : Yi =
eXi∑
eXi

Most activations are element-wise and non-linear
Derivatives are easy to compute

Feedforward Neural Network with One Hidden Layer

W
x

g
Hidden

Input Output

Input: X ∈ Rn
Target output: Ŷ ∈ Rk
Predicted output: Y ∈ Rk
Hidden: H ∈ Rp
Weights: Wx ∈ Rp×n,
Wy ∈ Rk×p
Activations: f, g
Loss function: L
Loss:E ∈ R

Forward Propagation

H = f (Wx · X )

Y = g(Wy · H)

E = L(Y , Ŷ )

Backpropagation

Backpropagation is based on a
chain rule.
Lets assume:
g(X ) = X
f (X ) = tanh(X )
L(Y , Ŷ ) = 1/2

∑k
i=1(Yi − Ŷi )

∂E
∂WY

= ∂E
∂Y

∂Y
∂WY

= (Ŷ −Y )× (H)

∂E
∂WX

= ∂E
∂Y

∂Y
∂H

∂H
∂WX

= ((Ŷ − Y )WY )⊗ (1−
tanh2(WX · X ))× X

W
x

Stochastic Gradient Descent
ϵ > 0 – learning rate
W = W − ϵ ∂E

∂W
, ∀W ∈ [Wy ,Wx ]

Types of Neural Networks

▶ Feedforward

▶ Recursive

▶ Convolutional

▶ Recurrent

▶ etc

Recursive NNs
Recursive Neural Networks (RNNs) have been successful in learning
sequence and tree structures in natural language processing.

Source: Deep Learning for Natural Language Processing

Recursive NNs

Simplest architecture: nodes are combined
into parents using a weight matrix that is
shared across the whole network, and a
non-linearity such as tanh.

p1,2 = tanh(W [c1; c2]),

where W is learned n × 2n weight matrix

source: Wikipedia

Recursive NNs

Typically, Stochastic Gradient Descent (SGD) is used to train the
network.
The gradient is computed using Backpropagation Through
Structure (BPTS)

Principally the same as general backpropagation

Two differences resulting from the tree structure:

▶ Split derivatives at each node

▶ Sum derivatives of W from all nodes

Convolution

Continuous convolution:
(f ∗ g)(t) =

∫∞
−∞ f (τ)g(t − τ)dτ

Disctrete convolution:
(f ∗ g)(c) =

∑
a f (a) · g(c − a)

(f ∗ g)(c) =
∑

a+b=c f (a) · g(b)

Properties:

▶ Convolution is commutative
f ∗ g = g ∗ f

▶ Convolution is associative
(f ∗ g) ∗ h = f ∗ (g ∗ h)

Figure: source: Wikipedia

Convolution

source(left): developer.apple.com
source(right):
http://docs.gimp.org/en/plug-in-
convmatrix.html

Convolutional NN layers. Illustration

A max-pooling layer takes the maximum of features over small
blocks of a previous layer.

source: http://colah.github.io/posts/2014-07-Conv-Nets-Modular/

Convolutional NNs

CNNs are trained using backpropagation

CNNs are widely used in computer vision for image and video
recognition: Krizhevsky et al, 2012, Ciresan et al, 2011, etc

CNNs can also be used in NLP for sentence modeling,
classification, semantic parsing, etc : Collobert et al, 2008,
Kalchbrenner et al, 2014, etc

Recurrent NNs

A recurrent neural network (RNN) is a class of artificial neural
network where connections between units form a directed cycle.

WH WH WH

WX WX

WY WY

W
H

Xt Xt+1

Yt+1Yt g g

Zt, Ht Zt+1, Ht+1f f

Unfold

RNNs are often used for processing sequential data/ time series.
The standard way of training RNN is Backpropagation through
time (BPTT).

Recurrent NNs.

Forward Propagation:
Given H0
Zt = WX · Xt +WH · Ht−1
Ht = f (Zt)
Yt = g(WY · Ht)
Et = L(Yt , Ŷt)
E =

∑T
t=1 Et

Backpropagation through time
is analogous to standard
backpropagation, we sum up the
errors, we also sum up the
gradients at each time step for
one training example.

WH WH WH

WX WX

WY WY

Xt Xt+1

Yt+1Yt g g

Zt, Ht Zt+1, Ht+1f f

Recurrent NNs. Illustration

source: http://iamtrask.github.io

Long Short-Term Memory (LSTM)

Forward Propagation:
Given H0,C0
Ct = ft

⊙
Ct−1+ it

⊙
f1(Wx ·Xt +WH ·Ht−1)

Ht = ot
⊙

f2(Ct)
it = σ(Wi ,X · Xt +Wi ,C · Ct−1 +Wi ,H · Ht−1)
ot = σ(Wo,X · Xt +Wo,C · Ct +Wo,H · Ht−1)
ft = σ(Wf ,X · Xt +Wf ,C · Ct−1 +Wf ,H ·Ht−1)

(Graves, 2006)

Related Posts