CS计算机代考程序代写 Week-2 The Forecaster’s Toolbox

Week-2 The Forecaster’s Toolbox

Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman

Business Forecasting Analytics
ADM 4307 – Fall 2021

The Forecaster’s Toolbox

Ahmet Kandakoglu, PhD

20 September, 2021

Outline

• A tidy forecasting workflow

• Some simple forecasting methods

• Naïve method

• Seasonal naïve method

• Average method

• Drift method

• Residual diagnostics

• Evaluating forecasting accuracy

• Prediction intervals

• Questions

Fall 2021 ADM 4307 Business Forecasting Analytics 2

A Tidy Forecasting Workflow

• The process of producing forecasts for time series data can be broken down into a

few steps

• To illustrate the process, we will fit linear trend models to national GDP data stored in

global_economy

Fall 2021 ADM 4307 Business Forecasting Analytics 3

Data Preparation (Tidy)

• The first step in forecasting is to prepare data in the correct format.

• This process may involve loading in data, identifying missing values, filtering the time series,

and other pre-processing tasks.

Fall 2021 ADM 4307 Business Forecasting Analytics 4

> gdppc

# A tsibble: 15,150 x 5 [1Y]

# Key: Country [263]

Year Country GDP Population GDP_per_capita

1 1960 Afghanistan 537777811. 8996351 59.8

2 1961 Afghanistan 548888896. 9166764 59.9

3 1962 Afghanistan 546666678. 9345868 58.5

4 1963 Afghanistan 751111191. 9533954 78.8

5 1964 Afghanistan 800000044. 9731361 82.2

6 1965 Afghanistan 1006666638. 9938414 101.

7 1966 Afghanistan 1399999967. 10152331 138.

8 1967 Afghanistan 1673333418. 10372630 161.

9 1968 Afghanistan 1373333367. 10604346 130.

10 1969 Afghanistan 1408888922. 10854428 130.

# … with 15,140 more rows

gdppc <- global_economy %>%

mutate(GDP_per_capita = GDP/Population) %>%

select(Year, Country, GDP, Population,

GDP_per_capita)

gdppc

Plot the Data (Visualize)

• An essential step in understanding the data.

• Looking at your data allows you to identify common patterns, and subsequently specify an

appropriate model..

Fall 2021 ADM 4307 Business Forecasting Analytics 5

gdppc %>%

filter(Country == “Sweden”) %>%

autoplot(GDP_per_capita) +

labs(y = “$US”, title = “GDP per capita

for Sweden”)

Define a Model (Specify)

• There are many different time series models that can be used for forecasting.

• In this case the model function is TSLM() (time series linear model), the response variable is

GDP_per_capita and it is being modelled using trend()

Fall 2021 ADM 4307 Business Forecasting Analytics 6

TSLM(GDP_per_capita ~ trend())

Train the Model (Estimate)

• Once an appropriate model is specified, we next train the model on some data.

• One or more models can be trained using the model() function.

• A mable is a model table, each cell corresponds to a fitted model.

Fall 2021 ADM 4307 Business Forecasting Analytics 7

> fit

# A mable: 263 x 2

# Key: Country [263]

Country trend_model

1 Afghanistan

2 Albania

3 Algeria

4 American Samoa

5 Andorra

6 Angola

7 Antigua and Barbuda

8 Arab World

9 Argentina

10 Armenia

# … with 253 more rows

fit <- gdppc %>%

model(trend_model = TSLM(GDP_per_capita ~ trend()))

Check Model Performance (Evaluate)

• Once a model has been fitted, it is important to check how well it has performed on the data.

• There are several diagnostic tools available to check model behavior.

• Accuracy measures that allow one model to be compared against another.

Fall 2021 ADM 4307 Business Forecasting Analytics 8

Produce Forecasts (Forecast)

• With an appropriate model specified, estimated and checked, it is time to produce the

forecasts using forecast().

• Specify the number of future observations to forecast. For example, forecasts for the next 3
years can be generated using h = 3. We can also use natural language; e.g., h = “3 years”.

• A fable is a forecast table with point forecasts and distributions

Fall 2021 ADM 4307 Business Forecasting Analytics 9

> fit %>% forecast(h = “3 years”)

# A fable: 789 x 5 [1Y]

# Key: Country, .model [263]

Country .model Year GDP_per_capita .mean

1 Afghanistan trend_model 2018 N(526, 9653) 526.

2 Afghanistan trend_model 2019 N(534, 9689) 534.

3 Afghanistan trend_model 2020 N(542, 9727) 542.

4 Albania trend_model 2018 N(4716, 476419) 4716.

5 Albania trend_model 2019 N(4867, 481086) 4867.

6 Albania trend_model 2020 N(5018, 486012) 5018.

7 Algeria trend_model 2018 N(4410, 643094) 4410.

8 Algeria trend_model 2019 N(4489, 645311) 4489.

9 Algeria trend_model 2020 N(4568, 647602) 4568.

10 American Samoa trend_model 2018 N(12491, 652926) 12491.

# … with 779 more rows

fit %>% forecast(h = “3 years”)

Visualizing Forecasts

• With an appropriate model specified, estimated and checked, it is time to produce the

forecasts using forecast().

• Specify the number of future observations to forecast. For example, forecasts for the next 3
years can be generated using h = 3. We can also use natural language; e.g., h = “3 years”.

• A fable is a forecast table with point forecasts and distributions

Fall 2021 ADM 4307 Business Forecasting Analytics 10

fit %>%

forecast(h = “3 years”) %>%

filter(Country == “Sweden”) %>%

autoplot(gdppc) +

labs(y = “$US”, title = “GDP per capita for Sweden”)

Outline

• A tidy forecasting workflow

• Some simple forecasting methods

• Naïve method

• Seasonal naïve method

• Average method

• Drift method

• Residual diagnostics

• Evaluating forecasting accuracy

• Prediction intervals

• Questions

Fall 2021 ADM 4307 Business Forecasting Analytics 11

Some Simple Forecasting Methods

• Some forecasting methods are very simple and surprisingly effective.

• Here are four methods that we will use as benchmarks for other forecasting

methods:

• Average method

• Naïve method

• Seasonal naïve method

• Drift method

Fall 2021 ADM 4307 Business Forecasting Analytics 12

Average Method

• Forecast of all future values is equal to mean of historical data {𝑦1, … , 𝑦𝑇}.

ො𝑦𝑇+ℎ|𝑇 = 𝑦 = (𝑦1 +⋯+ 𝑦𝑇)/𝑇

MEAN(y)

# y contains the time series

Fall 2021 ADM 4307 Business Forecasting Analytics 13

bricks <- aus_production %>%

filter_index(“1970 Q1” ~ “2004 Q4”)

bricks %>% model(MEAN(Bricks))

Naïve Method

• Forecasts equal to last observed value.

• Simple to use and understand, very low cost and low accuracy

ො𝑦𝑇+ℎ|𝑇 = 𝑦𝑇

NAIVE(y)

Fall 2021 ADM 4307 Business Forecasting Analytics 14

bricks %>% model(NAIVE(Bricks))

Seasonal Naïve Method

• Forecasts equal to last value from same season.

ො𝑦𝑇+ℎ|𝑇 = 𝑦𝑇+ℎ−𝑚(𝑘+1)

(𝑚 = seasonal period and 𝑘 is the integer part of (ℎ − 1)/𝑚)

SNAIVE(y ~ lag(m))

Fall 2021 ADM 4307 Business Forecasting Analytics 15

bricks %>% model(SNAIVE(Bricks ~ lag(“year”)))

Drift Method

• A variation on the naïve method is to allow the forecasts to increase or decrease over time,

where the amount of change over time (called the drift) is set to be the average change seen

in the historical data

• So the forecast for time 𝑇 + ℎ is given by:

𝑦𝑇 +

𝑇 − 1

𝑡=2

𝑇

𝑦t − 𝑦t−1 = 𝑦𝑇 + ℎ
𝑦T − 𝑦1
𝑇 − 1

• This is equivalent to drawing a line between the first and last observation, and extrapolating it

into the future

RW(y ~ drift())

Fall 2021 ADM 4307 Business Forecasting Analytics 18

Drift Method

bricks %>% model(RW(Bricks ~ drift()))

Fall 2021 ADM 4307 Business Forecasting Analytics 19

Example 2 – Australian Quarterly Beer Production

# Set training data from 1992 to 2006

train <- aus_production %>% filter_index(“1992 Q1” ~ “2006 Q4”)

# Fit the models

beer_fit <- train %>%

model(

Mean = MEAN(Beer),

Drift = RW(Beer ~ drift()),

`Naïve` = NAIVE(Beer),

`Seasonal naïve` = SNAIVE(Beer)

)

# Generate forecasts for 14 quarters

beer_fc <- beer_fit %>% forecast(h = 14)

# Plot forecasts against actual values

beer_fc %>% autoplot(train, level = NULL) + autolayer(

filter_index(aus_production, “2007 Q1” ~ .), colour = “black”) +

labs( y = “Megalitres”, title = “Forecasts for quarterly beer

production”) + guides(colour = guide_legend(title = “Forecast”))

Fall 2021 ADM 4307 Business Forecasting Analytics 20

Outline

• Some simple forecasting methods

• Naïve method

• Seasonal naïve method

• Average method

• Drift method

• Residual diagnostics

• Evaluating forecasting accuracy

• Prediction intervals

• Questions

Fall 2021 ADM 4307 Business Forecasting Analytics 21

Fitted Values and Residuals

• A residual in forecasting is the difference between an observed value and its

forecast based on other observations:

𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖

• For time series forecasting, a residual is based on one-step forecasts; that is

ො𝑦𝑡|𝑡−1 is the forecast of 𝑦𝑡 based on observations 𝑦1, … , 𝑦𝑡.

• ො𝑦𝑡|𝑡−1 is also called fitted values.

Fall 2021 ADM 4307 Business Forecasting Analytics 22

Fitted Values and Residuals

The fitted values and residuals from a model can be obtained using the augment() function.

augment(beer_fit)

Fall 2021 ADM 4307 Business Forecasting Analytics 23

> augment(beer_fit)

# A tsibble: 240 x 6 [1Q]

# Key: .model [4]

.model Quarter Beer .fitted .resid .innov

1 Mean 1992 Q1 443 436. 6.55 6.55

2 Mean 1992 Q2 410 436. -26.4 -26.4

3 Mean 1992 Q3 420 436. -16.4 -16.4

4 Mean 1992 Q4 532 436. 95.6 95.6

5 Mean 1993 Q1 433 436. -3.45 -3.45

6 Mean 1993 Q2 421 436. -15.4 -15.4

7 Mean 1993 Q3 410 436. -26.4 -26.4

8 Mean 1993 Q4 512 436. 75.6 75.6

9 Mean 1994 Q1 449 436. 12.6 12.6

10 Mean 1994 Q2 381 436. -55.4 -55.4

# … with 230 more rows

Residual Diagnostics

• A good forecasting method will yield residuals with the following properties:

• The residuals are uncorrelated. If there are correlations between residuals, then there is

information left in the residuals which should be used in computing forecasts

• The residuals have zero mean. If the residuals have a mean other than zero, then the

forecasts are biased

• Any forecasting method that does not satisfy these properties can be

improved. That does not mean that forecasting methods that satisfy these

properties can not be improved.

Fall 2021 ADM 4307 Business Forecasting Analytics 24

Residual Diagnostics

• It is possible to have several forecasting methods for the same data set, all of

which satisfy these properties. Checking these properties is important to see if

a method is using all available information well, but it is not a good way for

selecting a forecasting method.

• If either of these two properties is not satisfied, then the forecasting method

can be modified to give better forecasts

• Adjusting for bias is easy: if the residuals have mean 𝑚, then simply add 𝑚 to
all forecasts and the bias problem is solved

Fall 2021 ADM 4307 Business Forecasting Analytics 25

Residual Diagnostics

• In addition to these essential properties, it is useful (but not necessary) for the

residuals to also have the following two properties:

• The residuals have constant variance

• The residuals are normally distributed

• These two properties make the calculation of prediction intervals easier.

However, a forecasting method that does not satisfy these properties may not

necessarily be improved

Fall 2021 ADM 4307 Business Forecasting Analytics 26

Example: Google Stock Price

Naïve forecast:

ො𝑦𝑡|𝑡−1 = 𝑦𝑡−1

𝑒𝑡 = 𝑦𝑡 − 𝑦𝑡−1

Note: 𝑒𝑡 are one-step-forecast residuals

Fall 2021 ADM 4307 Business Forecasting Analytics 27

Example: Google Stock Price

autoplot(google_2015, Close) +

labs(y = “$US”, title = “Google daily closing stock prices in 2015”)

Fall 2021 ADM 4307 Business Forecasting Analytics 28

Example: Google Stock Price

google_2015 %>%

model(NAIVE(Close)) %>%

gg_tsresiduals()

Fall 2021 ADM 4307 Business Forecasting Analytics 29

ACF of Residuals

• We assume that the residuals are white noise (uncorrelated, mean zero,

constant variance). If they aren’t, then there is information left in the residuals

that should be used in computing forecasts.

• So, a standard residual diagnostic is to check the ACF of the residuals of a

forecasting method.

• We expect these to look like white noise.

Fall 2021 ADM 4307 Business Forecasting Analytics 30

Outline

• Some simple forecasting methods

• Naïve method

• Seasonal naïve method

• Average method

• Drift method

• Residual diagnostics

• Evaluating forecasting accuracy

• Prediction intervals

• Questions

Fall 2021 ADM 4307 Business Forecasting Analytics 31

Forecast Errors

• Forecast “error”: the difference between an observed value and its forecast

𝑒𝑖 = 𝑦𝑖 − ො𝑦𝑖

• Unlike residuals, forecast errors on the test set involve multi-step forecasts.

• These are true forecast errors as the test data is not used in computing the

forecast ො𝑦𝑖

Fall 2021 ADM 4307 Business Forecasting Analytics 32

Measures of Forecast Accuracy

• Key measures to evaluate the accuracy:

Mean absolute error: MAE = mean(|𝑒𝑖|)

Mean square error: MSE = mean(𝑒𝑖
2)

Mean absolute percentage error: MAPE = 100 mean(|𝑒𝑖|/|𝑦𝑖|)

Root mean squared error: RMSE = mean(𝑒𝑖
2)

• MAE, MSE, RMSE are all scale dependent.

• MAPE is scale independent but is only sensible if 𝑦𝑡 ≫ 0 for all 𝑡, and 𝑦 has a
natural zero.

Fall 2021 ADM 4307 Business Forecasting Analytics 33

Example – Australian Quarterly Beer Production

recent_production <- aus_production %>%

filter(year(Quarter) >= 1992)

beer_train <- recent_production %>%

filter(year(Quarter) <= 2007) beer_fit <- beer_train %>%

model(

Mean = MEAN(Beer),

`Naïve` = NAIVE(Beer),

`Seasonal naïve` = SNAIVE(Beer),

Drift = RW(Beer ~ drift())

)

beer_fc <- beer_fit %>% forecast(h = 10)

beer_fc %>%

autoplot(aus_production %>% filter(year(Quarter) >= 1992),

level = NULL) + labs( y = “Megalitres”,

title = “Forecasts for quarterly beer production”

) + guides(colour = guide_legend(title = “Forecast”))

Fall 2021 ADM 4307 Business Forecasting Analytics 34

Example – Australian Quarterly Beer Production

accuracy(beer_fc, recent_production)

• It is obvious from the graph that the seasonal naïve method is best for these data, although it

can still be improved

• Sometimes, different accuracy measures will lead to different results as to which forecast

method is best.

Method RMSE MAE MAPE MASE

Drift method 64.90 58.88 14.58 4.12

Mean method 38.45 34.83 8.28 2.44

Naïve method 62.69 57.40 14.18 4.01

Seasonal naïve method 14.31 13.40 3.17 0.94

Fall 2021 ADM 4307 Business Forecasting Analytics 35

Training and Test Sets

• It is important to evaluate forecast accuracy using genuine forecasts

• It is invalid to look at how well a model fits the historical data. However, the

accuracy of forecasts can only be determined by considering how well a model

performs on new data that were not used when fitting the model

• When choosing models, it is common to use a portion of the available data for

fitting and use the rest of the data for testing the model

Fall 2021 ADM 4307 Business Forecasting Analytics 36

Training and Test Sets

• The following points should be noted:

• A model which fits the data well does not necessarily forecast well

• A perfect fit can always be obtained by using a model with enough

parameters

• Over-fitting a model to data is as bad as failing to identify the systematic

pattern in the data

• The test set must not be used for any aspect of model development or

calculation of forecasts.

• Forecast accuracy is based only on the test set

Fall 2021 ADM 4307 Business Forecasting Analytics 37

Outline

• Some simple forecasting methods

• Naïve method

• Seasonal naïve method

• Average method

• Drift method

• Residual diagnostics

• Evaluating forecasting accuracy

• Prediction intervals

• Questions

Fall 2021 ADM 4307 Business Forecasting Analytics 38

Prediction Intervals

• A prediction interval gives an interval within which we expect 𝑦𝑖 to lie with a
specified probability

• Assuming the forecast errors are uncorrelated and normally distributed, then a

simple 95% prediction interval for the next observation in a time series is

ො𝑦𝑡± 1.96 ො𝜎

where ො𝜎 is an estimate of the standard deviation of the forecast distribution

• When forecasting one-step ahead, the standard deviation of the forecast

distribution is almost the same as the standard deviation of the residuals

Fall 2021 ADM 4307 Business Forecasting Analytics 39

Prediction Intervals

• Naive forecast with prediction interval

google_2015 %>%

model(NAIVE(Close)) %>%

forecast(h = 10) %>% hilo()

• The hilo() function converts the forecast distributions into intervals.

• By default, 80% and 95% prediction intervals are returned, although other options are possible

via the level argument.

Fall 2021 ADM 4307 Business Forecasting Analytics 40

# A tsibble: 10 x 7 [1]

# Key: Symbol, .model [1]

Symbol .model day Close .mean `80%` `95%`

1 GOOG NAIVE(Close) 253 N(759, 125) 759. [745, 773]80 [737, 781]95

2 GOOG NAIVE(Close) 254 N(759, 250) 759. [739, 779]80 [728, 790]95

3 GOOG NAIVE(Close) 255 N(759, 376) 759. [734, 784]80 [721, 797]95

4 GOOG NAIVE(Close) 256 N(759, 501) 759. [730, 788]80 [715, 803]95

5 GOOG NAIVE(Close) 257 N(759, 626) 759. [727, 791]80 [710, 808]95

6 GOOG NAIVE(Close) 258 N(759, 751) 759. [724, 794]80 [705, 813]95

7 GOOG NAIVE(Close) 259 N(759, 876) 759. [721, 797]80 [701, 817]95

8 GOOG NAIVE(Close) 260 N(759, 1002) 759. [718, 799]80 [697, 821]95

9 GOOG NAIVE(Close) 261 N(759, 1127) 759. [716, 802]80 [693, 825]95

10 GOOG NAIVE(Close) 262 N(759, 1252) 759. [714, 804]80 [690, 828]95

Prediction Intervals

google_2015 %>%

model(NAIVE(Close)) %>%

forecast(h = 10) %>%

autoplot(google_2015) +

labs(title=”Google daily closing stock price”, y=”$US” )

Fall 2021 ADM 4307 Business Forecasting Analytics 41

Prediction Intervals

• Point forecasts are often useless without prediction intervals.

• The value of prediction intervals is that they express the uncertainty in the

forecasts.

• If we only produce point forecasts, there is no way of telling how accurate

the forecasts are.

• But if we also produce prediction intervals, then it is clear how much

uncertainty is associated with each forecast.

Fall 2021 ADM 4307 Business Forecasting Analytics 42

Questions (True or False)

• Good forecast methods should have normally distributed residuals.

• A model with small residuals will give good forecasts.

• The best measure of forecast accuracy is MAPE.

• If your model doesn’t forecast well, you should make it more complicated.

• Always choose the model with the best forecast accuracy as measured on the

test set.

Fall 2021 ADM 4307 Business Forecasting Analytics 43

Business Forecasting Analytics
ADM 4307 – Fall 2021

The Forecaster’s Toolbox

Fall 2021 ADM 4307 Business Forecasting Analytics 44