PowerPoint Presentation
Lecture 5: Recurrent Neural Network
Instructor:
Copyright By PowCoder代写 加微信 powcoder
Outline of this lecture
Why Recurrent Neural Networks (RNNs)
How RNN works
Why Long Short-term Memory (LSTM) Network?
LSTM and Forecasting
Case Study
Rep: Feedforward Neural Network
Each input sample is represented by a fixed-length vector of features
No time-wise context information is modeled
Motivations
Some machine learning problems have time-wise inputs/outputs, e.g., work on time-series data
E.g., predicting the daily values of a stock
E.g., forecasting the inflation rate of a country
E.g., forecasting the exchange rate between two currencies
E.g., predicting if one student will show up for a class meeting
E.g., predicting for a city (e.g., Los Angeles) the number of tourists for each month in the next 12 months
Problem: Forecasting the number of passengers on a monthly basis
Method A: learning a supervised Learning model from a training set (x,y)
x: features of a given month, including weather condition, GPD, interest rates, is peak month?
y: output labels (regression)
Fail to model the strong dependence between monthly observations
Recurrent Neural Network (RNNs)
RNN is designed to model the relationship between sequence of data points and a sequence of output labels, i.e., a sequence-to-sequence model
For a given time t, a neuron or a neural network is employed as the prediction function
The output or hidden states of the network at time t are used as inputs to the network at time t+1.
Recurrent Neural Network (RNNs)
Recurrent Neural Network (RNNs)
The network weights are shared over time
A RNN comprises of multiple copies of RNN CELLS over time, where each cell deals with different inputs at different time steps
Case 1: RNN for sentiment classification
Task: classify a movie review as positive or negative
Inputs: multiple words or sentences (a sequence of words)
Outputs: binary labels (positive vs. negative)
E.g., ‘The food is really delicious’
Case 1: RNN for sentiment classification
In this case, only the last cell’s output is used for making the prediction
RNN Variants: Input-output
Single input, single output
e.g. Feedforward Network
Single input, multiple output
e.g. forecasting inflation rates
Multiple input, Single output
Multiple input, Multiple output
e.g. sentiment classification
e.g. speech recognition
squared(H(x_i)-y_i)
squared(H(x_i)[1..k]-y_i([1..k])
squared(H(x_i)-y_i)
RNN: Learning
Most cost functions developed for other networks could be applicable for RNNs as well
Optimizer (Gradient descent)
Backpropagation through time (BPTT)
The network of RNN cells is considered as a big feed-forward network
A single computational graph is constructed to propagate gradients back through time
The network parameters should kept the same for all RNN cells: first updated for each copy and then averaged over
RNN: Limitations
Like a product of multiple real numbers can shrink to zero or explode to infinity, the product of multiple matrices would lead to two issues while using gradient descent method:
Gradient Shrinkage: zero or being closed to zero
Gradient Explosion: gradients become extremely large
RNN can only model the time-wise relationship of fixed-term length
Might be too short or long for a given problem over time
Not adaptive
From RNN to LSTM
Long Short-Term Memory (LSTM) network is a special design of RNN.
LSTM uses the idea of constant error flow for RNNs to ensure that gradients do not decay or explode
The key component of LSTM is a memory cell that works like an accumulator over time
A new state is obtained through additive operations over the previous state, instead of multiplicative operations, to ensure gradient based methods behave well.
From RNN Cell to LSTM Cell
is the output or activation of cell
is the input data
g() is the activation function
From RNN Cell to LSTM Cell
LSTM cells
point-wise operation
network layer
vector flow
is the output or activation of cell
is the input data
g(), activation function, tanh(),output range (-1,1)
(), activation function, sigmoid function, output range (0,1)
LSTM Cell: Architecture
Cell State
Hidden State
Output (Hidden State)
Output (Hidden State)
Next Cell State
Forget Gate
Input Gate
Output Gate
LSTM cells
LSTM Cell: Architecture
Cell State
Next Cell State
Cell State
LSTM Cell: Architecture
Information from the input data will be filtered by this LSTM layer
Forget gate
LSTM Cell: Architecture
Input gate
Information from the input data will be selected to add to the cell state
LSTM Cell: Architecture
Update Cell State
The new state is now determined by the previous state and the information from the input gate!
LSTM Cell: Architecture
Output Gate
The updated cell state is used to generate the hidden state or output state
LSTM Cell: Architecture
LSTM Cell: A different architecture
et al. Learning phrase representation using RNN encoder-decoder for statistical machine translation. 2014
Outline of this lecture
Why Recurrent Neural Networks (RNNs)
How RNN works
Why Long Short-term Memory (LSTM) Network?
LSTM and Forecasting
Time-series data
Case Study
Objectives of Time-series Analysis
Interpretation
Forecasting
Hypothesis Testing
Simulation
Example: Anti-diabetic Drug Sales
A Public dataset
Forecasting: Random walk
Assuming the time-series y(t), t=1,2,3…, is generated from a stochastic model.
The prediction for every horizon is simply set to be the last observed value: y(t+w|t)=y(t) where w is the horizon
A variant is to assume the time-series has a seasonal component with period T, and y(t+w|t)=y(t+w-T).
This model is often used as the benchmark results
Forecasting: Seasonal Decomposition
Decompose the time series with seasonality (e.g.,weekly, monthly, etc.) into the sum of three components:
y(t)= Season(t) + Trend(t)+Remainder(t),
This method is called as additive decomposition
Forecasting: Seasonal Decomposition
Additive decomposition
statsmodels.tsa.seasonal.seasonal_decompose(x, model=’additive’, filt=None, period=None, two_sided=True, extrapolate_trend=0)
Forecasting: Seasonal Decomposition
Multiplicative decomposition:
y(t)= Season(t) *Trend(t)*Remainder(t)
statsmodels.tsa.seasonal.seasonal_decompose(x, model=multiplicative’, filt=None, period=None, two_sided=True, extrapolate_trend=0)
statsmodels.tsa.seasonal.seasonal_decompose — statsmodels
Forecasting: Exponential Smoothing
Classical forecasting method: the forecasts are equal to a weighted average of past observations and the corresponding weights decrease exponentially as walking away from time t:
where is between 0 and 1
It is possible to extend this original method to deal with trends/sesonalities.
Forecasting: ARIMA
Autoregressive model: the forecast at a time t is a linear combination of past values of the variable.
Moving Average: the forecast at a time t is a linear combination of past forecast errors.
Where q is the model order, is white noise.
ARIMA: AutoRegressive Integrated Moving Average
Integrating: differencing the time-series to make it stationary.
Combining the above two models
Three hyperparameters: d (degree of first differencing involved), w (order of the autoregressive part) and q (order of the moving average part)
Other forecasting models
Dynamic linear models: at each time t the forecast is a linear model with time-involving coefficients
TBATs: standing for Trigonometric, Box-Cox transform, ARMA errors, Trend and Seasonal components;
deal with multiple seasonalities by modeling each seasonality with a trigonometric representation based on Fourier series
Prophet: be able to deal with multiple seasonalities (facebook)
The forecast is represented as the linear combination of trend, seasonality and holiday
Formulated in the Bayesian framework
NNETAR: neural network autoregression
Inputs: the last elements of the sequence up to time t
Outputs: the forecasted value at time t+1
LSTM for forecasting
A LSTM model comprises of multiple LSTM cells over time steps and is able to model the time-wise dependences of the time series.
LSTM might need a large amount of training samples
Case Study: Two examples
LSTM Example 1: Toy Data
LSTM Example 2 : Monthly Anti-Diabetic Drug Sales
LSTM Example 1: Toy Data
Inputs: a sequence of data [10,20,30,40,50,60,70,80,90]
Objective: forecast the next number
Method: Develop a LSTM model (n_len=3)
Step 1: sample multiple sub-sequence of length n_len, and for each sub-sequence, use the following number in the series as the output label, which leads to multiple pairs (X, y)
X=[10 20 30], y=[40]
X=[20 30 40], y=[50]
X=[30 40 50], y=[60]
Step 2: Training a LSTM model
Step 3: Apply the trained model over a testing sequence X=[70,80,90] to get its prediction
LSTM Example 1: Toy Data
Preparing training data for LSTM
LSTM Example 1: Toy Data
Network architecture for the LSTM model
LSTM Example 1: Toy Data
Training and Evaluating the LSTM model
LSTM Example 2
Data: Anti-Diebetic drug sales in Australia form 1991 to 2008
LSTM Example 2
Idea: need to remove the seasonal factors and model the trend and reminder factors only.
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
Step 4: Create a LSTM model
Step 5: Train and Evaluate the model
LSTM Example 2
Step 1: Load data
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Original sequence
Blue: trend
Green: seasonality
Yellow: Reminder
Yellow: trend+reminder
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
Step 4: Create a LSTM model
LSTM Example 2
Step 1: Load data
Step 2: Removing the seasonal factors
Step 3: Preparing training and testing samples
Step 4: Create a LSTM model
Step 5: Train and Evaluate the model
Last but most important,
Visualizing the predictions
Last but most important,
Visualizing the predictions
More Layers in Tensorflow
Core layers
Dense layers
Activation layers
Embedding layers
Masking layers
Lambda layers
Convolution Layers
Pooling layers
Recurrent layers
Preprocessing layers
Normalization layers
Regularization layers
Attention layers
Reshaping layers
Merging layers
Locally-connected layers
Activation layers
https://keras.io/api/
More Loss Functions in Tensorflow
Cross-entropy loss
Poisson class:
loss = y_pred – y_true * log(y_pred)
KLDivergence:
loss = y_true * log(y_true / y_pred)
MeanSquaredError
loss = squared(y_true – y_pred)
MeanAbsoluteError
loss = abs(y_true – y_pred)
MeanAbsolutePercentageError
loss = 100 * abs(y_true – y_pred) / y_true
Cosine Similarity:
loss = -sum(l2_norm(y_true) * l2_norm(y_pred))
Hinge loss:
loss= maximum(1 – y_true * y_pred, 0)
Squared Hinge Loss:
loss = square(maximum(1 – y_true * y_pred, 0))
More optimizers in Tensorflow
SGD: Stochastic Gradient Descent
Mini-batch Gradient Descent
Momentum-based Gradient Descent
V= costant*V – learning_rate * gradient
ADAM: Adaptive Moment Estimation
Outline of this lecture
Why Recurrent Neural Networks (RNNs)
How RNN works
Why Long Short-term Memory (LSTM) Network?
LSTM and Forecasting
Case Study
https://arxiv.org/pdf/1412.6980.pdf
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com