程序代写代做代考 case study graph Liyuan Xing, Gleb Sizov, Odd Erik Gundersen

Liyuan Xing, Gleb Sizov, Odd Erik Gundersen
Time Series Forecasting
stand in the present and forecast the future

Lecture 3
Introduction to time series forecasting (Liyuan)
Data exploration by time series graphics (Liyuan)
Statistic methods (Liyuan)
• Time series decomposition
• Exponential smoothing
• ARI MA models
• Time series regression models
Evaluation (Liyuan)
• (Nested) cross validation
• Forecast error types and measures
Page 2

Lecture 7
Machine learning methods (Gleb)
• Statistical methods vs machine learning, M4 competition • Recurrent neural networks
• Hybrid models
• Global vs local models
Forecast distribution and uncertainty (Odd Erik)
• Prediction interval, Qantile • Bootstrapping and bagging
Case study: Projects in TrønderEnergi (Odd Erik)
• Wind power production forecasting • Grid loss forecasting
Page 3

Introduction
• Forecasting
• Time series forecasting

Introduction
What is forecasting?
• Predicting the future as accurately as possible, given all of the information available, including historical data and knowledge of any future events that might impact the forecasts.
What are often forecasted?
• Short-term (minutes, hours, days): schedule of personnel, production and transportation, related to forecasts of demand
• Medium-term (weeks, months): determine future resource requirements, in order to purchase raw materials, hire personnel, or buy machinery and equipment
• Long term (years): strategic planning Page 5

What can be forecasted?
Some things are easier to forecast than others
• The time of the sunrise tomorrow morning can be forecast precisely • Tomorrow’s lotto numbers cannot be forecast with any accuracy
Predictability
• How well we understand the factors that contribute to it
• How much data is available
• Whether the forecasts can affect the thing we are trying to forecast
Page 6

Is predictability statisfied?
Forecasts of electricity demand
• Factors: temperatures, calendar variation, economic conditions
• Data: a sufficient history of data on electricity demand and weather conditions
• Effect: no
Forecasting currency exchange rates
• Factors: limited understanding and direct effect on the rates themselves
• Data: plenty of
• Effect: yes, direct
Page 7

Time series forecasting
Time series data
• Anything that is observed sequentially over time
• historical values • Impacting factors
Time series forecasting
• Estimate how the sequence of observations will continue into the future
Page 8

Basic steps
Time series data
Time series forecasting
Problem definition Gathering information
Using and evaluation a forecasting model
Forecasting and evaluation • Make forecasts • Error measures
Get the data
• Historical values • Impacting factors
Preliminary(exploratory) analysis
Know the data • Trend
• Seasonal
• Cyclic
• Correlation • Stationary
Choosing and fitting models
Build the model
• Regression model
• Decomposition
• Exponential smoothing • ARIMA models
• Machine learning methods
Page 9

Data exploration
• Trend
• Seasonal
• Cyclic
• Correlation • Stationary
• Time series patterns
• Seasonal plots
• Scatter plots
• Lag plots and ACF plots
• White noise
• Stationarity and ACF plots

Data exploration
The first thing to do in any data analysis task is to plot the data. Graphs enable many features of the data to be visualised
• Patterns (trend, seasonal, cyclic, stationary) • Unusual observations (outliers)
• Changes over time (autocorrelation)
• Relationships between variables (correlation)
The features that are seen in plots of the data must then be incorporated, as much as possible, into the forecasting methods to be used.
Page 11

Time series patterns
Not stationarity
• Trend: long-term increase or decrease • Seasonal: associated with calendar,
with fixed and known frequency
• Cyclic : rises and falls that are not of a fixed frequency
• Cyclic vs Seasonal
• Frequency: not fixed vs fixed
• Length: longer vs shorter
• Magnitudes: more variable vs less variable
Stationarity
• Stationary: whose properties do not depend on the time at which the series is observed
• awhitenoiseseries
• atimeserieswithcyclicbehaviour (but with no trend or seasonality)
Page 12

Seasonal + Cyclic
Trend
Time series patterns
The observations are plotted against the time of observation, with consecutive observations joined by straight lines
Trend + Seasonal
Stationary

Seasonal plots
The data are plotted against the individual “seasons” in which the data were observed
Seasonal plots (horizontal or circular)
Seasonal subseries plots

Scatter plots
Explore relationships between time series
Correlation coefficients can measure the strength of the relationshilp

Lag plots and ACF plots
The horizontal axis shows lagged values of the time series
Autocorrelation
Seasonal
Trend + Seasonal

White noise
No autocorrelation
95% of the spikes in the ACF to lie within ±2/√T where T is the length of the time series

Stationary and ACF plots
ACF drops to zero​ •Quickly for a
stationary time series​ •Slowly for a non-
stationary data​
Trend
Stationary

• Time series model • Explanatory model • Mixed model
Statistic methods
• Simple forecasting methods
• Time series decomposition (1920s)
• Exponential smoothing (late 1950s)
• ARIMA models
• Time series regression models
• Dynamic regression models

Forecasting categories
Time series models
Explanatory models
Mixed models
• Only historical values on the variable to be forecast
• Simple forecasting methods,decomposition, exponential
smoothing, ARIMA models
• Only factors that affect the variable to be forecast
• Regression models
• Both historical values and impacting factors
• Dynamic regression models, machine learning methods
Page 20

Simple forecasting methods
Average method
Naive method
Seasonal naïve method
Drift method
• The forecasts of all future values are equal to the average (or “mean”) of the historical data
• All forecasts to be the value of the last observation
• Each forecast to be equal to the last observed value from the same season of the year
• Allow the forecasts to increase or decrease over time
• The amount of change over time (called the drift) is set to be the average change seen in the historical data
Page 21

Serve as benchmarks rather than the method of choice

Time series decomposition
• Explore historical changes over time
• Can be used in forecasting
Exponential smoothing
• Describe the trend and seasonality
• Weighted averages of past
observations, with the weights decaying exponentially as the observations get older
ARIMA models
• Describe the
autocorrelations
• A linear combination of past values of the variable and past forecast errors
Time series regression models
• Capture
relationships
between forecast variable and predictor variables
• In ex-ante forecasts, obtaining forecasts of the predictors can be challenging
Overview of statistic methods
Page 23

Time series decomposition
Time series data (trend, seasonal, cyclic)
Classic, X11, SEATS, STL decomposition
Additive
Multiplicative
Split into several components, each representing an underlying pattern category
St is the seasonal component
Tt is the trend-cycle component
Rt is the remainder component
Page 24

Classical decomposition
Additive decomposition
• Step1: Compute trend-cycle component by using either a 2×m-MA or an m-MA
• Step2: Calculate the detrended series by subtraction
• Step3: Estimate seasonal component by averaging the detrended values for that season • Step4: Calculate remainder component by subtraction
Multiplicative decomposition
• Step1: Same as additive decomposition
• Step2: Calculate the detrended series by division • Step3: Same as additive decomposition
• Step4: Calculate the detrended series by division
Page 25

M is odd-order
Data without seasonality
A moving average of an even-order m, followed by another moving average of order 2
Data with seasonality
A 2×m-MA is equivalent to a weighted moving average of order m+1
•where all observations take the weight 1/m, except for the first and last terms which take weights 1/(2m)
2×4-MA: trend-cycle of quarterly data
2×12-MA: trend-cycle of monthly data
7-MA: trend-cycleof daily data with a weekly seasonality
Estimating the trend-cycle
Moving averages Centred moving Weighted moving m-MA averages 2×m-MA averages
Examples
Page 26

Additive decomposition by STL
The three components in the bottom three panels are decomposed from the data in the top panel,
and can be added together to reconstruct the data in the top panel

Forecasting with decomposition
Present
Future
Forecast
• seasonal component
• seasonal naïve method
Forecast
• seasonal adjust component
• any non-seasonal forecasting method
Composition
• additive
• multiplicative
Page 28

Exponential smoothing
Exponential smoothing methods
• The more recent the observation the higher the associated weight • Weights decaying exponentially as the observations get older
Naïve method
• The most recent observation is the only important one, and all previous observations provide no information for the future.
Average method
• All observations are of equal importance, and gives them equal weights when generating forecasts.
Page 29

Simple exponential smoothing
This method is suitable for forecasting data with no clear trend or seasonal pattern Forecast equation:
Weigted average form:
Component form:
• Forecast equation
• Smoothing equation
Page 30

Trend methods
Holt’s linear trend method: a constant trend indefinitely into the future
• Forecast equation • Level equation
• Trend equation
Damped trends methods: “dampens” the trend to a flat line some time in the future
• Forecast equation • Level equation
• Trend equation
Page 31

Linear trend vs Damped trend
Damping parameter is set to a relatively low number (φ=0.90) to exaggerate the effect of damping for comparison

Holt-Winters’ seasonal method
Additive method
• Seasonal variations are roughly constant through the series
Multiplicative method
Damped method
• Both for additive and multiplicative

Seasonal variations are changing proportional to the level of the series
Page 33

HW additive vs multiplicative forecasts
The forecasts generated
by multiplicative seasonality display larger and increasing seasonal variation as the level of the forecasts increases
Multiplicative seasonality fits the training data best judging from the RMSE
Additive RMSE: 1.763 Multiplicative RMSE: 1.576

Example

Innovations state space models
A statistic model is a stochastic (or random) data generating process that can produce an entire forecast distribution, not just point forecasts
The term “innovations” comes from the fact that all equations use the same random error process εt∼NID(0,σ2); NID stands for “normally and independently distributed”
Each state space model is labeled as ETS(⋅,⋅,⋅) for (Error, Trend, Seasonal) • Error ={A,M}, Trend ={N,A,Ad}, Seasonal ={N,A,M}
A fully specified statistical model
• The measurement equation shows the relationship between the observations and the unobserved states
• The state equation shows the evolution of the state (level, trend, seasonal) through time
• Statistical distribution of the errors, either additive or multiplicative Page 36

State space equations
ETS(A,N,N): ETS(M,N,N): ETS(A,A,N): ETS(M,A,N):

Model estimation and selection
Estimating ETS models
• Smoothing parameters α, β, γ and φ, and initial states l0, b0, s0,s−1,…,s−m+1 • Traditionally: 0<α,β∗,γ∗,φ<1 • State space model: 0<α<1, 0<β<α, 0<γ<1−α, 0.8<φ<0.98 • By maximising the “likelihood”, instead of minimising the sum of squared errors Model selection • Determines which of the ETS models is most appropriate for a given time series • Using information criteria such as AIC, AICc and BIC • L is the likelihood of the data • k is the total number of parameters and initial states that have been estimated Page 38 Forecasting with ETS models Point forecasts are obtained by • Iterating the equations for t=T+1,...,T+h • Setting all εt=0 for t>T
The prediction intervals differ
• Between models with additive and multiplicative methods
ETS point forecasts are equal to the medians of the forecast distributions • Models with only additive components: medians and means are equal
• Models with multiplicative errors, or with multiplicative seasonality: medians and means are not equal
Page 39
Present: ETS(M,A,N)
Future: T+1
Future: T+2

ARIMA (AutoRegressive Integrated Moving Average)
Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting, and provide complementary approaches to the problem
• Exponential smoothing models are based on a description of the trend and seasonality in the data
• ARIMA models aim to describe the autocorrelations in the data
Page 40

Regression-like models (ARIMA = AR + MA)
Multiple regression model
• A linear combination of predictors or factors Autoregressive AR(p) model
• A linear combination of past values of the variable
• Normally for stationary data, by constraints on φ
Moving average MA(q) models
• A weighted moving average of past few forecast errors
• Changing the parameters θ1,…,θq results in different time series patterns
• The variance of the error term εt will only change the scale of the series, not the patterns
Page 41

Stationarity
Differencing can help stabilise the mean of a time series
• By removing changes in the level of a time series, and therefore
eliminating (or reducing) trend and seasonality
• Unit root tests is one way to determine more objectively whether differencing is required or not
Differencing types
• First-order: the differences between consecutive observations
• Second-order: difference the data a second time
• Seasonal: the difference between an observation and the previous observation from the same season
Page 42

ARIMA(p,d,q) models
Non-seasonal models
• Combine differencing with autoregression and a moving average model
• p: order of the autoregressive part
• d: degree of first differencing involved • q: order of the moving average part
Seasonal models
• Include additional seasonal terms in the non-seasonal ARIMA models by multiplication
• The seasonal part of the model consists of terms that are similar to the non- seasonal components of the model, but involve backshifts of the seasonal period
Page 43

Model estimation and order selection
Estimating ARIMA models: c, φ1,…,φp, θ1,…,θq
• For given p, d, q, and P, D, Q, estimate the parameters by maximising the
“likelihood”, which is similar to minimising the sum of squared errors
• dandDareoftendecidedinadvancebyauto.arima()function
• p,qandP,QaresometimespossibletobedeterminedbyACF/PACFplot
Order selection: p, q, or/and P, Q
• Determines which of the ARIMA models is most appropriate for a given time series • Using information criteria such as AIC, AICc and BIC
• L is the likelihood of the data, k=1 if c≠0 and k=0 if c=0.
Page 44

Forecasting –point forecasts
Page 45
Present: ARIMA(3,1,1)
Expand
Expand the equation so that yt is on the left hand side and all other terms are on the right
Rewrite
Rewrite the equation by replacing t with T+h
Replace
On the right hand side of the equation, replace future observations with their forecasts, future errors with zero, and past errors with the corresponding residuals.
Future: T+1 Future: T+2

Forecasting –prediction intervals
The prediction intervals for ARIMA models are based on assumptions that the residuals are uncorrelated and normally distributed
• If either of these assumptions does not hold, then the prediction intervals may be incorrect Prediction intervals from ARIMA models increase as the forecast horizon increases
• For stationary models (d=0), prediction intervals will converge, so long horizons are all essentially the same • For d≥1, prediction intervals will continue to grow into the future
The calculation of ARIMA prediction intervals is more difficult, but the first prediction interval is easy to calculate
• A 95% prediction interval is given by , where is the standard deviation of the residuals • This result is true for all ARIMA models regardless of their parameters and orders
Page 46

ARIMA vs ETS
It is a commonly held myth that ARIMA models are more general than exponential smoothing
Linear exponential smoothing models are all special cases of ARIMA models, the non-linear exponential smoothing models have no equivalent ARIMA counterparts
There are also many ARIMA models that have no exponential smoothing counterparts
All ETS models are non-stationary, while some ARIMA models are stationary
The AICc is useful for selecting between models in the same class, but not across
ARIMA and ETS
Page 47

Regression model: linear model
Predictor variables are for impacting factors, NOT for historical values of forecast variable
Simple linear regression
• A linear relationship between the forecast variable y and a single predictor variable x
• A linear relationship between the forecast variable y and mutiple predictor variables x1, x2, …, xk
Multiple regression model
Page 48

Some useful predictors
Trend variable
• A linear trend can be modelled by simply using x1,t=t as a predictor
Dummy variables
• Two categories (e.g., “yes” and “no” for public holiday, outlier and so on)
• One dummy variable which takes value 1 corresponding to “yes” and 0 corresponding to “no”
Seasonal dummy variables
• Six dummy variables are needed to code seven categories (e.g., day of the week)
Example: a regression model with a linear trend and quarterly dummy variables

Page 49

Some useful predictors
Intervention variables (e.g. competitor activity)
• ‘Spike’ variable: takes value one in the period of the intervention and zero elsewhere
• ‘Step’ variable: takes value zero before the intervention and one from the time of intervention onward
Trading days (e.g. sales)
• The number of trading days in a month
Distributed lags (e.g. advertising expenditure)
• Several predictors with different lagged values
Page 50

Model estimation and predictor selection
Estimating regression models: β0,β1,…,βk
• For given predictors, estimate parameters by minimising the sum of squared errors • With n predictors, there are 2n possible models
• Best subset regression: All potential regression models • Stepwise regression: Remove one predictor at a time
Predictor selection:
• Determines which of the regression models is most appropriate for a given time series
• Using measures such as AdjR2, CV, AIC, AICc, BIC
• T is the number of observations used for estimation • k is the number of predictors in the model
Page 51

Forecasting with regression
Ordinary regression model
• Good at capturing relationships between forecast variable and predictor variables
• Bad at forecasting, and requires future values of each predictor
• in many cases generating forecasts for the predictor variables can be more challenging than forecasting directly the forecast variable without using predictors
Predictive regression model
• Use their lagged values as predictors
• Make model operational for easily generating forecasts
Page 52
Present:
Future: T+h
Present:
Future: T+h

Nonlinear regression
Log-linear or linear- log before estimating a regression model
• Transform the forecast variable y • And/or the predictor variable x
• Piecewise linear
• Linear regression splines • Cubic regression spline
Nonlinear function
Page 53

Forecasting with a nonlinear trend
straightforward: predictors are known into the future calendar-related variables such as time, day-of-week
difficulty: predictors are themselves unknown
Quadratic or higher order
• Not recommended, since the resulting forecasts are often unrealistic when they are extrapolated
Piecewise is better
• A nonlinear trend constructed of linear pieces
Log-linear forecasting
Page 54
• For exponential trend

Evaluation
• (Nested) cross validation
• Forecast error types and measures

Residuals vs Forecast error
Residual
• Checking whether a model has adequately captured the information in the data
• The “residuals” in a time series model are what is left over after fitting a model, the difference between an observed value and its fitted value
• Training set
• One-step forecasts
Forecast error
• Evaluating forecast accuracy using genuine forecasts on new data that were not used when fitting the model
• A forecast “error” is the difference between an observed value and its forecast
• Test set
• Can involve multi-step forecasts
Page 56

Overfitting
The cubic spline gives the best fit to the historical data but poor forecasts
A model which fits the training data well will not necessarily forecast well.
A perfect fit can always be obtained by using a model with enough parameters.
Over-fitting a model to data is just as bad as failing to identify a systematic pattern in the data.
Page 57

Cross-Validation
Tune hyperparameters
Produce robust measurements of model performance
Split the dataset into a subset called the training set, and another subset called the test set
Hold-out cross-validation
The model is trained on the training subset and parameters that minimize error on the validation set are chosen
Split the training set into a training subset and a validation set if any parameters need to be tuned
Page 58
The model is trained on the full training set using the chosen parameters, and the error on the test set is recorded

K-fold Cross-Validation (less biased model)
every observation from the original dataset has the chance of appearing in training and test set
Split the entire data randomly into k folds. The higher value of K leads to less biased model. The lower value of K is similar to the train-test split approach
Fit the model using the K-1 folds and validate the model using the remaining Kth fold. Note down the scores/errors
Page 59
Repeat this process until every K-fold serve as the test set. Then take the average of your recorded scores as the performance metric for the model.

Nested Cross-Validation (Time series)
an almost unbiased estimate of the true error Prevent data leakage
• withhold all data about events that occur chronologically after the events used for fitting the model
Dependent test set
• an outer loop for error estimation • an inner loop for parameter tuning
Page 60

Nested CV with 1 or 3 train/test splits
Page 61

Forecast errors
Scale-dependent errors
• On the same scale as the data
• Cannot be used to make comparisons between series that involve different units
• Accuracy measures: MAE, RMSE
Percentage errors
• Unit free
• Frequently used to compare forecast performances between data sets
• Accuracy measures: MAPE, sMAPE
Scaled errors
• Based on the training MAE from a simple forecast method
• An alternative to using percentage errors
• Accuracy measures: MASE
Page 62

Scale-dependent measures
Scale-dependent error
• The difference between an observed value and its forecast
Mean absolute error: MAE
• Popular as it is easy to both understand and compute
• A forecast method that minimises the MAE will lead to forecasts of the median
Root mean squared error: RMSE
• The RMSE is also widely used, despite being more difficult to interpret
• A forecast method that minimising the RMSE will lead to forecasts of the mean
Page 63

Measures for percentage error
Percentage error
• Scale-dependent error devided by an observation value
Mean absolute error: MAE
• Disadvantages
• Infinite or undefined
• Put a heavier penalty on negative errors than on positive errors
• Assume the unit of measurement has a meaningful zero
Root mean squared error: RMSE
• Still involves division by a number close to zero, making the calculation unstable
• The value of sMAPE can be negative, so it is not really a measure of “absolute percentage errors” at all
Page 64

Measures for scale errors
Scaled error
• Scaling the errors based on the training MAE from a simple forecast method
Seasonal time series?
• No, naïve forecasts
• Yes, seasonal naïve forecasts
Mean absolute scaled error: MASE
• qj>1: if it arises from a better forecast than the average naïve forecast
• qj>1: if the forecast is worse than the average naïve forecast
Page 65

Consistent results among measures?

Machine learning methods
• Statistical methods vs machine learning, M4 competition
• Recurrent neural networks
• Hybrid models
• Global vs local models

Forecast distribution and uncertainty
• Prediction interval, quantile
• Bootstrapping and bagging

Case study: Projects in TrønderEnergi
• Wind power production forecasting
• Grid loss forecasting