程序代写代做代考 decision tree Excel Finite State Automaton deep learning COMP9444

COMP9444
Neural Networks and Deep Learning
Outline
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Recurrent Networks 2
COMP9444 20T2
Recurrent Networks 3
5a. Recurrent Networks
􏰈 Processing Temporal Sequences 􏰈 Sliding Window
􏰈 Recurrent Network Architectures 􏰈 Hidden Unit Dynamics
Textbook, Chapter 10
􏰈 Long Short Term Memory
Processing Temporal Sequences
Sliding Window
There are many tasks which require a sequence of inputs to be processed rather than a single input.
􏰈 speech recognition
􏰈 time series prediction
􏰈 machine translation
􏰈 handwriting recognition
How can neural network models be adapted for these tasks?
The simplest way to feed temporal input to a neural network is the “sliding window” approach, first used in the NetTalk system (Sejnowski & Rosenberg, 1987).
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 20T2 Recurrent Networks 1

COMP9444 20T2 Recurrent Networks 4
COMP9444 20T2
Recurrent Networks 5
NetTalk Task
NetTalk Architecture
Given a sequence of 7 characters, predict the phonetic pronunciation of the middle character.
For this task, we need to know the characters on both sides. For example, how are the vowels in these words pronounced?
pa pat pate paternal mo mod mode modern
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Recurrent Networks 6
COMP9444 20T2
Recurrent Networks 7
NetTalk
Simple Recurrent Network (Elman, 1990)
􏰈 NETtalk gained a lot of media attention at the time.
􏰈 Hooking it up to a speech synthesizer was very cute. In the early stages of training, it sounded like a babbling baby. When fully trained, it pronounced the words mostly correctly (but sounded somewhat robotic).
􏰈 Later studies on similar tasks have often found that a decision tree could produce equally good or better accuracy.
􏰈 at each time step, hidden layer activations are copied to “context” layer
􏰈 hidden layer receives connections from input and context layers
􏰈 theinputsarefedoneatatimetothenetwork,itusesthecontextlayer
􏰈 This kind of approach can only learn short term dependencies, not the medium or long term dependencies that are required for some tasks.
to “remember” whatever information is required for it to produce the correct output
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20

COMP9444 20T2 Recurrent Networks
8
COMP9444 20T2 Recurrent Networks 9
Back Propagation Through Time
Other Recurrent Network Architectures
􏰈 wecan“unroll”arecurrentarchitectureintoanequivalentfeedforward architecture, with shared weights
􏰈 it is sometimes beneficial to add “shortcut” connections directly from input to output
􏰈 applying backpropagation to the unrolled architecture is reffered to as “backpropagation through time”
􏰈 connections from output back to hidden have also been explored (sometimes called “Jordan Networks”)
􏰈 we can backpropagate just one timestep, or a fixed number of timesteps, or all the way back to beginning of the sequence
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2 Recurrent Networks
10
COMP9444 20T2 Recurrent Networks
11
Second Order (or Gated) Networks
Task: Formal Language Recognition
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
d
j j0 jkk
10
11 10
111 01
1111 00
11111 011 111111 110 1111111 11111110 11111111 10111111
xt=tanh(Wσt+ Wx ) ∑ σt t−1
z = t a n h ( P 0 + ∑ P j x nj ) j=1
k=1 d
Accept Reject
Scan a sequence of characters one at a time, then classify the sequence as Accept or Reject.

COMP9444 20T2 Recurrent Networks 12
COMP9444 20T2 Recurrent Networks
13
Dynamical Recognizers
Task: Formal Language Recognition
􏰈 gated network trained by BPTT
􏰈 emulates exactly the behaviour of Finite State Automaton
Scan a sequence of characters one at a time, then classify the sequence as Accept or Reject.
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2 Recurrent Networks 14
COMP9444 20T2
Recurrent Networks
15
Dynamical Recognizers
Phase Transition
􏰈 trained network emulates the behaviour of Finite State Automaton 􏰈 training set must include short, medium and long examples
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
Accept Reject
1 000
0 11000
10 0001
01 000000000
00 11111000011 100100 1101010000010111 001111110100 1010010001 0100100100 0000
11100 00000
0010

COMP9444 20T2 Recurrent Networks 16
COMP9444 20T2 Recurrent Networks 17
Chomsky Hierarchy
Task: Formal Language Prediction
Language
Regular
Context Free
Context Sensitive Recursively Enumerable
Machine Example Finite State Automaton an (n odd) Push Down Automaton anbn Linear Bounded Automaton anbncn Turing Machine true QBF
abaabbabaaabbbaaaabbbbabaabbaaaaabbbbb . . .
Scan a sequence of characters one at a time, and try at each step to predict
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2
Recurrent Networks
18 COMP9444 20T2 Recurrent Networks
19
Elman Network for predicting anbn
Oscillating Solution for anbn
COMP9444
⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
a
b
ab
the next character in the sequence.
In some cases, the prediction is probabilistic.
For the anbn task, the first b is not predictable, but subsequent b’s and the initial a in the next subsequence are predictable.

COMP9444 20T2 Recurrent Networks 20
COMP9444 20T2 Recurrent Networks 21
Learning to Predict anbn
Monotonic Solution for anbn
􏰈 the network does not implement a Finite State Automaton but instead uses two fixed points in activation space – one attracting, the other repelling (Wiles & Elman, 1995)
􏰈 networks trained only up to a10b10 could generalize up to a12b12
􏰈 trainingtheweightsbyevolutionismorestablethanbybackpropagation
􏰈 networks trained by evolution were sometimes monotonic rather than oscillating
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 20T2 Recurrent Networks 22
COMP9444 20T2
Recurrent Networks 23
Hidden Unit Analysis for anbn
Counting by Spiralling
hidden unit trajectory fixed points and eigenvectors COMP9444 ⃝c Alan Blair, 2017-20
􏰈 for this task, sequence is accepted if the number of a’s and b’s are equal 􏰈 networkcountsupbyspirallinginwards,downbyspirallingoutwards
COMP9444 ⃝c Alan Blair, 2017-20

COMP9444 20T2 Recurrent Networks 24
COMP9444 20T2 Recurrent Networks 25
Hidden Unit Dynamics for anbncn
Partly Monotonic Solution for anbncn
􏰈 SRN with 3 hidden units can learn to predict anbncn by counting up and down simultaneously in different directions, thus producing a star shape.
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 20T2 Recurrent Networks 26
COMP9444 20T2 Recurrent Networks 27
Long Range Dependencies
Long Short Term Memory
􏰈 Simple Recurrent Networks (SRNs) can learn medium-range dependencies but have difficulty learning long range dependencies
􏰈 LongShortTermMemory(LSTM)andGatedRecurrentUnits(GRU) can learn long range dependencies better than SRN
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
Two excellent Web resources for LSTM:
http://colah.github.io/posts/2015-08-Understanding-LSTMs/ christianherta.de/lehre/dataScience/machineLearning/neuralNetworks/LSTM.php

COMP9444 20T2
Recurrent Networks 28
COMP9444 20T2 Recurrent Networks 29
Reber Grammar
Embedded Reber Grammar
COMP9444
⃝c Alan Blair, 2017-20
COMP9444 ⃝c Alan Blair, 2017-20
COMP9444 20T2
Recurrent Networks 30
COMP9444 20T2 Recurrent Networks 31
Simple Recurrent Network
Long Short Term Memory
SRN – context layer is combined directly with the input to produce the next hidden layer.
LSTM – context layer is modulated by three gating mechanisms: forget gate, input gate and output gate.
SRN can learn Reber Grammar, but not Embedded Reber Grammar. COMP9444 ⃝c Alan Blair, 2017-20
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
COMP9444 ⃝c Alan Blair, 2017-20

COMP9444 20T2 Recurrent Networks
32
COMP9444 20T2 Recurrent Networks 33
Long Short Term Memory
Gated Recurrent Unit
COMP9444
⃝c Alan Blair, 2017-20
GRU is similar to LSTM but has only two gates instead of three.
COMP9444 ⃝c Alan Blair, 2017-20