Week-2 Time Series Graphics
Some of the slides are adapted from the lecture notes provided by Prof. Antoine Saure and Prof. Rob Hyndman
Business Forecasting Analytics
ADM 4307 – Fall 2021
Time Series Graphics
Ahmet Kandakoglu, PhD
20 September, 2021
Outline
• Review of last lecture
• Overview of forecasting techniques
• Time series
• tsibble objects in R
• Time plots
• Time series patterns
• Seasonal plots
• Scatter plots
Fall 2021 ADM 4307 Business Forecasting Analytics 2
What is Forecasting?
It is the process of making predictions of the future based on past and present
data.
Fall 2021 ADM 4307 Business Forecasting Analytics 3
I see that you will get a 90 in
Forecasting Analytics this semester.
Features Common to All Forecasts
• Assumes causal system
past ==> future
• Forecasts rarely perfect because of randomness
• Forecasts more accurate for
groups vs. individuals
• Forecast accuracy decreases
as time horizon increases
Fall 2021 ADM 4307 Business Forecasting Analytics 4
Outline
• Review of last lecture
• Overview of forecasting techniques
• Time series
• tsibble objects in R
• Time plots
• Time series patterns
• Seasonal plots
• Scatter plots
Fall 2021 ADM 4307 Business Forecasting Analytics 6
Approaches to Forecasting
• Qualitative: Judgmental methods
• Non-quantitative analysis of subjective inputs
• Considers “soft” information such as human factors, experience, gut instinct
• Quantitative: Analyze “hard” data
• Time series models
• Extends historical patterns of numerical data
• Associative (causal) models
• Create equations with explanatory variables to predict the future
Fall 2021 ADM 4307 Business Forecasting Analytics 7
Quantitative Forecasting
• Conditions for their application:
• Information about the past is available
• Information can be quantified in numerical data
• Some aspects of the past pattern will continue into the future (continuity
assumption)
• Two extremes:
• Intuitive or ad hoc methods (simple, based on empirical experience, and no
accuracy information)
• Formal quantitative methods based on statistical principles
Fall 2021 ADM 4307 Business Forecasting Analytics 8
Quantitative Forecasting
• Time series models:
• Prediction of the future is based on past values of a variable and/or past errors
• The goal is to determine the pattern in the historical data series and extrapolate that
pattern into the future
• Black box that makes no attempt to discover the factor affecting forecast variable behavior
• Explanatory models:
• Assume that the variable to be forecasted shows an explanatory relationship with one or
more independent variables
• The goal is to determine the form of the relationship and use it to forecast future values of
the forecast variable
Fall 2021 ADM 4307 Business Forecasting Analytics 9
Quantitative Forecasting – An Example
Gross National Product (GDP) is a measure of a country’s economic
performance
• Time series models:
• GNPt+1= f (GNPt, GNPt-1 , GNPt-2 , GNPt-3 ,…, error)
• Explanatory model:
• GDP = f(monetary and fiscal policies, inflation, capital spending, imports, exports, error)
Time series models can often be used more easily to forecast, whereas explanatory
models can be used with greater success for policy and decision making
Fall 2021 ADM 4307 Business Forecasting Analytics 10
Qualitative Forecasting
• Do not require data in the same manner as quantitative forecasting methods
• Inputs required are mainly the product of judgement and accumulative
knowledge
• Used mainly to provide hints, to aid the planner, and to supplement
quantitative forecasts, rather than to provide a specific numerical forecast
• Used almost exclusively for medium- and long-term situations
• Frequently the only alternative is no forecast at all
Fall 2021 ADM 4307 Business Forecasting Analytics 11
Outline
• Review of last lecture
• Overview of forecasting techniques
• Time series
• tsibble objects in R
• Time plots
• Time series patterns
• Seasonal plots
• Scatter plots
Fall 2021 ADM 4307 Business Forecasting Analytics 12
Time Series and Cross-Sectional Data
• Time Series:
• Historical data that consists of a sequence of observations over time
• We will assume that the times of observations are equally spaced
• Monthly Australian beer production (megaliters, Ml) from January 1991–August 1995
• Cross-Sectional Data:
• All observations are from the same time
• Price ($US), mileage (mpg) and country of origin for 45 automobiles from Consumer
Reports, April 1990, pp. 235–255
Fall 2021 ADM 4307 Business Forecasting Analytics 13
tsibble Objects in R
• A tsibble allows storage and manipulation of multiple time series in R.
• It contains:
• An index: time information about the observation
• Measured variable(s): numbers of interest
• Key variable(s): optional unique identifiers for each series
• It works with tidyverse functions.
Fall 2021 ADM 4307 Business Forecasting Analytics 14
tsibble Objects
An Example of a time series is stored in a tsibble object in R
mydata <- tsibble( year = 2012:2016, y = c(123, 39, 78, 52, 110), index = year ) or mydata <- tibble( year = 2012:2016, y = c(123, 39, 78, 52, 110)) %>% as_tsibble(index = year)
mydata
Year Observation
2012 123
2013 39
2014 78
2015 52
2016 110
Fall 2021 ADM 4307 Business Forecasting Analytics 15
# A tsibble: 5 x 2 [1Y]
year y
1 2012 123
2 2013 39
3 2014 78
4 2015 52
5 2016 110
tsibble Objects
• For observations that are more frequent than once per
year, we need to use a time class function on the
index.
• For example, suppose we have a monthly dataset z.
• This can be converted to a tsibble object using the
following code
z_ts <- z %>%
mutate(Month = yearmonth(Month)) %>%
as_tsibble(index = Month)
Fall 2021 ADM 4307 Business Forecasting Analytics 16
z
# A tibble: 5 x 2
Month Observation
1 2019 Jan 50
2 2019 Feb 23
3 2019 Mar 34
4 2019 Apr 30
5 2019 May 25
z_ts
# A tsibble: 5 x 2 [1M]
Month Observation
1 2019 Jan 50
2 2019 Feb 23
3 2019 Mar 34
4 2019 Apr 30
5 2019 May 25
The tsibble index
Common time index variables can be created with these functions
Fall 2021 ADM 4307 Business Forecasting Analytics 17
Frequency Function
Annual start:end
Quarterly yearquarter()
Monthly yearmonth()
Weekly yearweek()
Daily as_date(), ymd()
Sub-daily as_datetime(), ymd_hms()
Key tsibble Functions
The most common functions:
• select(): subset columns
• filter(): subset rows on conditions
• arrange(): sort results
• mutate(): create new columns
• group_by(): group data by columns
• summarize(): create summary statistics
Fall 2021 ADM 4307 Business Forecasting Analytics 18
Graphical Summaries
• The first thing to do is to visualize the data
• Graphs allow us to see basic features of the data such as patterns, unusual
observations, changes over time, and relationships between variables
• These features should be included in an forecasting model
• The type of data will determine which type of graph is most appropriate
• Time plots, seasonal plots and scatterplots are routinely used in forecasting
Fall 2021 ADM 4307 Business Forecasting Analytics 19
Graphical Summaries
• Time plots
• The data are plotted over time
• Reveal trends over time, regular seasonal behavior and other systematic features of the
data
• Seasonal plots
• The data are plotted against the individual “seasons” in which the data were observed
• Enable the underlying seasonal pattern and substantial departures from the seasonal
pattern to be seen clearly
• Scatterplots
• Plot the variable that we wish to forecast against an explanatory variable
• Help us to visualize the relationship between two variables
Fall 2021 ADM 4307 Business Forecasting Analytics 20
Time Plots
PBS %>%
filter(ATC2 == “A10”) %>%
select(Month, Concession, Type, Cost) %>%
summarise(TotalC = sum(Cost)) %>%
mutate(Cost = TotalC / 1e6) -> a10
a10 %>%
autoplot(total_cost) +
ylab(“$ million”) +
xlab(“Month”) +
ggtitle(“Antidiabetic drug sales”)
Fall 2021 ADM 4307 Business Forecasting Analytics 21
Time Plots
melsyd_economy <- ansett %>%
filter(Airports == “MEL-SYD”, Class == “Economy”) %>%
mutate(Passengers = Passengers/1000)
autoplot(melsyd_economy, Passengers) +
labs(title = “Ansett airlines economy class”,
subtitle = “Melbourne-Sydney”,
y = “Passengers (‘000)”)
Fall 2021 ADM 4307 Business Forecasting Analytics 22
Time Plots
The time plot immediately reveals some interesting features:
• Range of the data
• Times at which peaks occur
• Relative size of the peaks compared with the rest of the series
• Randomness in the series (the data pattern is not perfect)
Fall 2021 ADM 4307 Business Forecasting Analytics 23
Time Plots
Now, your turn.
• Create plots of the following time series: aus_production, pelt,
gafa_stock, vic_elec.
• Use help() to find out about the data in each series.
• For the last plot, modify the axis labels and title.
Fall 2021 ADM 4307 Business Forecasting Analytics 24
Time Series Patterns
Horizontal pattern
• The data values fluctuate around a constant mean
• Such a series is called stationary in its mean
Seasonal pattern
• The data values are influenced by seasonal factors such as the month of the year or the day of the
week
• Seasonal series are sometimes called periodic although they do not exactly repeat themselves over
time
Cyclical pattern
• The data exhibit rises and falls that – are not of a fixed period
Trend pattern
• There is a long-term increase or decrease in the data
Many data series include a combination of the preceding patterns
Fall 2021 ADM 4307 Business Forecasting Analytics 25
Time Series Patterns
Year
1
Year
2
Year
3
Year
4
Seasonal peaks (winters) Trend component
Actual line
D
e
m
a
n
d
f
o
r
s
n
o
w
b
o
a
rd
s
Random
variation
Fall 2021 ADM 4307 Business Forecasting Analytics 26
Time Series Patterns
• Differences between seasonal and cyclic patterns:
• A seasonal pattern is of a constant length, while a cyclical pattern varies in length
• The average length of a cycle is usually longer than that of seasonality and
• The magnitude of a cycle is usually more variable than that of seasonality
• The timing of peaks and troughs is predictable with seasonal data, but
unpredictable in the long term with cyclic data.
Fall 2021 ADM 4307 Business Forecasting Analytics 27
Time Series Patterns
aus_production %>% filter(year(Quarter) >= 1980) %>% autoplot(Electricity) +
labs(y = “GWh”, title = “Australian electricity production”)
Fall 2021 ADM 4307 Business Forecasting Analytics 28
Time Series Patterns
aus_production %>% autoplot(Bricks) + labs(y = “million units”,
title = “Australian clay brick production”)
Fall 2021 ADM 4307 Business Forecasting Analytics 29
Time Series Patterns
us_employment %>% filter(Title == “Retail Trade”, year(Month) >= 1980) %>%
autoplot(Employed / 1e3) +
labs(y = “Million people”, title = “Retail employment, USA”)
Fall 2021 ADM 4307 Business Forecasting Analytics 30
Time Series Patterns
gafa_stock %>% filter(Symbol == “AMZN”, year(Date) >= 2018) %>%
autoplot(Close) + labs(y = “$US”, title = “Amazon closing stock price”)
Fall 2021 ADM 4307 Business Forecasting Analytics 31
Time Series Patterns
pelt %>% autoplot(Lynx) + labs(y=”Number trapped”,
title = “Annual Canadian Lynx Trappings”)
Fall 2021 ADM 4307 Business Forecasting Analytics 32
Seasonal Plots
a10 %>% gg_season(total_cost, labels = “both”) + labs(y = “$ million”,
title = “Seasonal plot: antidiabetic drug sales”)
Fall 2021 ADM 4307 Business Forecasting Analytics 33
Seasonal Plots
• Data plotted against the individual “seasons” in which the data were observed.
(In this case a “season” is a month.)
• Something like a time plot except that the data from each season are
overlapped.
• Enables the underlying seasonal pattern to be seen more clearly, and also
allows any substantial departures from the seasonal pattern to be easily
identified.
• In R: gg_season()
Fall 2021 ADM 4307 Business Forecasting Analytics 34
Seasonal Polar Plots
gg_season(a10, total_cost, polar=TRUE) + ylab(“$ million”)
Fall 2021 ADM 4307 Business Forecasting Analytics 35
Seasonal Subseries Plots
a10 %>% gg_subseries(total_cost) + labs(y = “$ million”,
title = “Subseries plot: antidiabetic drug sales”)
Fall 2021 ADM 4307 Business Forecasting Analytics 36
Seasonal Subseries Plots
• Data for each season collected together in time plot as separate time series.
• Enables the underlying seasonal pattern to be seen clearly, and changes in
seasonality over time to be visualized.
• In R: gg_subseries()
Fall 2021 ADM 4307 Business Forecasting Analytics 37
Seasonal Plots
Quarterly Australian Beer Production
beer <- aus_production %>%
select(Quarter, Beer) %>%
filter(year(Quarter) >= 1992)
beer %>% autoplot(Beer)
Fall 2021 ADM 4307 Business Forecasting Analytics 38
Seasonal Plots
Quarterly Australian Beer Production
beer %>% gg_season(Beer, labels=”right”)
Fall 2021 ADM 4307 Business Forecasting Analytics 39
Seasonal Plots
Quarterly Australian Beer Production
beer %>% gg_subseries(Beer)
Fall 2021 ADM 4307 Business Forecasting Analytics 40
Seasonal Plots
Now, your turn.
• Look at the quarterly tourism data for the Snowy Mountains
snowy <- tourism %>% filter(Region == “Snowy Mountains”)
• Use autoplot(), gg_season() and gg_subseries() to explore the data.
• What do you learn?
Fall 2021 ADM 4307 Business Forecasting Analytics 41
Scatterplots
• Scatterplot helps us to visualize the relationship
between the variables.
vic_elec %>%
filter(year(Time) == 2014) %>%
ggplot(aes(x = Temperature, y = Demand)) +
geom_point() +
labs(x = “Temperature (degrees Celsius)”,
y = “Electricity demand (GW)”)
• It is clear that high demand occurs when
temperatures are high due to the effect of air-
conditioning. But there is also a heating effect,
where demand increases for very low
temperatures.
Fall 2021 ADM 4307 Business Forecasting Analytics 42
Correlation
• It is common to compute correlation coefficients to measure the strength of the linear
relationship between two variables.
• The value always lies between −1 and 1 with negative values indicating a negative
relationship and positive values indicating a positive relationship.
Fall 2021 ADM 4307 Business Forecasting Analytics 43
Autocorrelation
• Correlation measures the extent of a linear relationship between two variables.
• Autocorrelation measures the linear relationship between lagged values of a
time series.
• The autocorrelation coefficients make up the autocorrelation function or ACF.
• The autocorrelation coefficients for the beer production data can be computed
using the ACF() function.
Fall 2021 ADM 4307 Business Forecasting Analytics 44
Autocorrelation
The plot is sometimes known as a correlogram.
recent_production %>%
ACF(Beer) %>%
autoplot() + labs(title=”Australian beer production”)
Fall 2021 ADM 4307 Business Forecasting Analytics 45
Trend and Seasonality in ACF Plots
• When data have a trend, the autocorrelations for small lags tend to be large and positive
because observations nearby in time are also nearby in value.
• When data are seasonal, the autocorrelations will be larger for the seasonal lags (at multiples
of the seasonal period) than for other lags.
• When data are both trended and seasonal, you see a combination of these effects.
• The a10 data shows both trend and seasonality.
a10 %>% ACF(Cost, lag_max = 48) %>%
autoplot() + labs(title=”Australian antidiabetic drug sales”)
Fall 2021 ADM 4307 Business Forecasting Analytics 46
White Noise
• Time series that show no autocorrelation are called white noise.
• White noise data is uncorrelated across time with zero mean and constant variance
y %>%
ACF(wn) %>%
autoplot() + labs(title = “White noise”)
Fall 2021 ADM 4307 Business Forecasting Analytics 47
White Noise
• We expect 95% of the spikes in the ACF to lie within the bounds on a graph of the ACF (the
blue dashed lines).
• If one or more large spikes are outside these bounds, or if more than 5% of spikes are outside
these bounds, then the series is probably not white noise.
set.seed(30)
y <- tsibble(sample = 1:50, wn = rnorm(50), index = sample) y %>% autoplot(wn) + labs(title = “White noise”, y = “”)
Fall 2021 ADM 4307 Business Forecasting Analytics 48
Business Forecasting Analytics
ADM 4307 – Fall 2021
Time Series Graphics
Fall 2021 ADM 4307 Business Forecasting Analytics 49