ETC3231/5231 Business forecasting
Ch2. Time series graphics OTexts.org/fpp3/
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
2
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
3
Class packages
library(fpp3) # Associated with the book
# Make sure “install dependencies” is checked.
# Data manipulation and plotting functions
library(tidyverse)
# Time series manipulation library(tsibble)
# Tidy time series data library(tsibbledata)
# Forecasting functions library(fable)
# Time series graphics and statistics library(feasts)
4
tsibble objects
A tsibble allows storage and manipulation of multiple time series in R.
It contains:
An index: time information about the observation
Measured variable(s): numbers of interest
Key variable(s): optional unique identifiers for each
series
It works with tidyverse functions.
5
tsibble objects
global_economy
## # A tsibble: 15,150 x 6 [1Y]
## # Key: Country [263]
## Year Country GDP Imports Exports Population
##
## 1 1960 Afghanistan 537777811. 7.02
## 2 1961 Afghanistan 548888896. 8.10
## 3 1962 Afghanistan 546666678. 9.35
## 4
## 5
## 6
## 7 1966 Afghanistan 1399999967. 18.6
## 8 1967 Afghanistan 1673333418. 14.2
## 9 1968 Afghanistan 1373333367. 15.2
## 10 1969 Afghanistan 1408888922. 15.0 10.1 10854428
## # … with 15,140 more rows
1963 Afghanistan 751111191. 16.9
1964 Afghanistan 800000044. 18.1
1965 Afghanistan 1006666638. 21.4
11.3 9938414
8.57 10152331
6.77 10372630
8.90 10604346
4.13 8996351
4.45 9166764
4.88 9345868
9.17 9533954
8.89 9731361
6
tsibble objects
global_economy
## # A tsibble: 15,150 x 6 [1Y]
## # Key: Country [263]
## Year Country GDP Imports Exports Population
##
## 4
## 5
## 6
## 7 1966 Afghanistan 1399999967. 18.6
## 8 1967 Afghanistan 1673333418. 14.2
## 9 1968 Afghanistan 1373333367. 15.2
## 10 1969 Afghanistan 1408888922. 15.0 10.1 10854428
## # … with 15,140 more rows
1963 Afghanistan 751111191. 16.9
1964 Afghanistan 800000044. 18.1
1965 Afghanistan 1006666638. 21.4
11.3 9938414
8.57 10152331
6.77 10372630
8.90 10604346
4.13 8996351
4.45 9166764
4.88 9345868
9.17 9533954
8.89 9731361
6
tsibble objects
global_economy
## # A tsibble: 15,150 x 6 [1Y]
## # Key: Country [263]
## Year Country GDP Imports Exports Population
##
## 4
## 5
## 6
## 7 1966 Afghanistan 1399999967. 18.6
## 8 1967 Afghanistan 1673333418. 14.2
## 9 1968 Afghanistan 1373333367. 15.2
## 10 1969 Afghanistan 1408888922. 15.0 10.1 10854428
## # … with 15,140 more rows
1963 Afghanistan 751111191. 16.9
1964 Afghanistan 800000044. 18.1
1965 Afghanistan 1006666638. 21.4
11.3 9938414
8.57 10152331
6.77 10372630
8.90 10604346
4.13 8996351
4.45 9166764
4.88 9345868
9.17 9533954
8.89 9731361
6
tsibble objects
global_economy
## # A tsibble: 15,150 x 6 [1Y]
## # Key: Country [263]
## Year Country GDP Imports Exports Population ##
## 1 1960 Afghanistan 537777811. 7.02
## 2 1961 Afghanistan 548888896. 8.10
## 3 1962 Afghanistan 546666678. 9.35
## 4
## 5
## 6
## 7 1966 Afghanistan 1399999967. 18.6
## 8 1967 Afghanistan 1673333418. 14.2
## 9 1968 Afghanistan 1373333367. 15.2
## 10 1969 Afghanistan 1408888922. 15.0 10.1 10854428
## # … with 15,140 more rows
1963 Afghanistan 751111191. 16.9
1964 Afghanistan 800000044. 18.1
1965 Afghanistan 1006666638. 21.4
11.3 9938414
8.57 10152331
6.77 10372630
8.90 10604346
4.13 8996351
4.45 9166764
4.88 9345868
9.17 9533954
8.89 9731361
6
tsibble objects
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
##
## 1 1998 Q1 Adelaide SA
## 2 1998 Q2 Adelaide SA
## 3 1998 Q3 Adelaide SA
## 4 1998 Q4 Adelaide SA
## 5 1999 Q1 Adelaide SA
## 6 1999 Q2 Adelaide SA
## 7 1999 Q3 Adelaide SA
## 8 1999 Q4 Adelaide SA
## 9 2000 Q1 Adelaide SA
## 10 2000 Q2 Adelaide SA
## # … with 24,310 more rows
Business 135.
Business 110.
Business 166.
Business 127.
Business 137.
Business 200.
Business 169.
Business 134.
Business 154.
Business 169.
7
tsibble objects
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## Ind
## 2 1998 Q2 Adelaide SA
## 3 1998 Q3 Adelaide SA
Business 135.
Business 110.
Business 166.
Business 127.
Business 137.
Business 200.
Business 169.
Business 134.
Business 154.
Business 169.
## 4 1998 Q4 Adelaide SA
## 5 1999 Q1 Adelaide SA
## 6 1999 Q2 Adelaide SA
## 7 1999 Q3 Adelaide SA
## 8 1999 Q4 Adelaide SA
## 9 2000 Q1 Adelaide SA
## 10 2000 Q2 Adelaide SA
## # … with 24,310 more rows
7
tsibble objects
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## Ind
## 2 1998 Q2 Adelaide SA
## 3 1998 Q3 Adelaide SA
Business 135.
Business 110.
Business 166.
Business 127.
Business 137.
Business 200.
Business 169.
Business 134.
Business 154.
Business 169.
## 4 1998 Q4 Adelaide SA
## 5 1999 Q1 Adelaide SA
## 6 1999 Q2 Adelaide SA
## 7 1999 Q3 Adelaide SA
## 8 1999 Q4 Adelaide SA
## 9 2000 Q1 Adelaide SA
## 10 2000 Q2 Adelaide SA
## # … with 24,310 more rows
7
tsibble objects
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## Ind
## 2 1998 Q2 Adelaide SA
## 3 1998 Q3 Adelaide SA
## 4 1998 Q4 Adelaide SA
## 5 1999 Q1 Adelaide SA
## 6 1999 Q2 Adelaide SA
## 7 1999 Q3 Adelaide SA
## 8 1999 Q4 Adelaide SA
## 9 2000 Q1 Adelaide SA
## 10 2000 Q2 Adelaide SA
## # … with 24,310 more rows
7
tsibble objects
tourism
## # A tsibble: 24,320 x 5 [1Q]
## # Key: Region, State, Purpose [304]
## Quarter Region State Purpose Trips
## Ind
## 2 1998 Q2 Adelaide SA
## 3
## 4
## 5
## 6
## 7 1999 Q3 Adelaide SA
## 8 1999 Q4 Adelaide SA
## 9 2000 Q1 Adelaide SA
## 10 2000 Q2 Adelaide SA
## # … with 24,310 more rows
Domestic visitor nights in thousands by state/region and purpose.
1998 Q3
1998 Q4
1999 Q1
1999 Q2
Adelaide
Adelaide
Adelaide
Adelaide
SA SA SA SA
7
The tsibble index
Example
mydata <- tsibble(
year = 2012:2016,
y = c(123, 39, 78, 52, 110),
index = year
) mydata
## # A tsibble: 5 x 2 [1Y] ## year y
##
8
The tsibble index
Example
mydata <- tibble(
year = 2012:2016,
y = c(123, 39, 78, 52, 110)
) %>%
as_tsibble(index = year)
mydata
## # A tsibble: 5 x 2 [1Y] ## year y
##
9
The tsibble index
For observations more frequent than once per year, we need to use a time class function on the index.
z
## # A tibble: 5 x 2
## Month Observation
##
## 1 2019 Jan 50
## 2 2019 Feb 23
## 3 2019 Mar 34
## 4 2019 Apr 30
## 5 2019 May 25
10
The tsibble index
For observations more frequent than once per year, we need to use a time class function on the index.
z %>%
mutate(Month = yearmonth(Month)) %>%
as_tsibble(index = Month)
## # A tsibble: 5 x 2 [1M]
## Month Observation
##
## 1 2019 Jan 50
## 2 2019 Feb 23
## 3 2019 Mar 34
## 4 2019 Apr 30
## 5 2019 May 25
11
The tsibble index
Common time index variables can be created with these functions:
Frequency Function
Annual Quarterly Monthly Weekly Daily Sub-daily
start:end yearquarter() yearmonth() yearweek() as_date(), ymd() as_datetime()
12
The key to many time series
Year Length Sex Time
1896 100 men 12.0 1928 100 women 12.2 1900 200 men 22.2 1948 200 women 24.4 1896 400 men 54.2 1964 400 women 52.0
13
The key to many time series
olympic_running %>% as_tsibble(
key = c(Length, Sex), index = Year)
## # A tsibble: 312 x 4 [4Y]
## # Key: Length, Sex [14]
## Year Length Sex Time
##
## 1 1896
## 2 1900
## 3 1904
## 4 1908
100 men 12
100 men 11
100 men 11
100 men 10.8
14
## 5 1912 100 men 10.8
Digression – the pipe %>% operator
15
Example Australian prison population
16
Read a csv file and convert to a tsibble
prison <- readr::read_csv("data/prison_population.csv")
## # A tibble: 3,072 x 6
## date state gender legal indigenous count
##
## 1 2005-03-01 ACT Female Remanded ATSI 0
## 2 2005-03-01 ACT Female Remanded Other 2
## 3 2005-03-01 ACT Female Sentenced ATSI 0
## 4 2005-03-01 ACT Female Sentenced Other 0
## 5 2005-03-01 ACT Male Remanded ATSI 7
## 6 2005-03-01 ACT Male Remanded Other 58
## 7 2005-03-01 ACT Male Sentenced ATSI 0
## 8 2005-03-01 ACT Male Sentenced Other 0
## 9 2005-03-01 NSW Female Remanded ATSI 51
## 10 2005-03-01 NSW Female Remanded Other 131
## # … with 3,062 more rows
17
Read a csv file and convert to a tsibble
prison <- readr::read_csv("data/prison_population.csv") %>%
mutate(Quarter = yearquarter(date))
## # A tibble: 3,072 x 7
## date state gender legal indigenous count Quarter
##
0 2005 Q1
2 2005 Q1
0 2005 Q1
0 2005 Q1
7 2005 Q1
58 2005 Q1
0 2005 Q1
0 2005 Q1
51 2005 Q1
131 2005 Q1
18
## 1 2005-03-01 ACT
## 2 2005-03-01 ACT
## 3 2005-03-01 ACT
## 4 2005-03-01 ACT
## 5 2005-03-01 ACT
## 6 2005-03-01 ACT
## 7 2005-03-01 ACT
## 8 2005-03-01 ACT
## 9 2005-03-01 NSW
## 10 2005-03-01 NSW
## # … with 3,062 more rows
Female Remanded ATSI
Female Remanded Other
Female Sentenc~ ATSI
Female Sentenc~ Other
Male Remanded ATSI
Male Remanded Other
Male Sentenc~ ATSI
Male Sentenc~ Other
Female Remanded ATSI
Female Remanded Other
Read a csv file and convert to a tsibble
prison <- readr::read_csv("data/prison_population.csv") %>%
mutate(Quarter = yearquarter(date)) %>%
select(-date)
## # A tibble: 3,072 x 6
## state gender legal indigenous count Quarter
##
## 1 ACT Female Remanded ATSI
## 2 ACT Female Remanded Other
## 3 ACT Female Sentenced ATSI
## 4 ACT Female Sentenced Other
## 5 ACT Male Remanded ATSI
## 6 ACT Male Remanded Other
## 7 ACT Male Sentenced ATSI
## 8 ACT Male Sentenced Other
## 9 NSW Female Remanded ATSI
## 10 NSW Female Remanded Other
## # … with 3,062 more rows
0 2005 Q1
2 2005 Q1
0 2005 Q1
0 2005 Q1
7 2005 Q1
58 2005 Q1
0 2005 Q1
0 2005 Q1
51 2005 Q1
131 2005 Q1
19
Read a csv file and convert to a tsibble
prison <- readr::read_csv("data/prison_population.csv") %>%
mutate(Quarter = yearquarter(date)) %>%
select(-date) %>%
as_tsibble(
index = Quarter,
key = c(state, gender, legal, indigenous)
)
## # A tsibble: 3,072 x 6 [1Q]
## # Key: state, gender, legal, indigenous [64]
## state gender legal indigenous count Quarter
##
## 1 ACT
## 2 ACT
## 3 ACT
## 4 ACT
## 5 ACT
## 6 ACT
Female Remanded ATSI
Female Remanded ATSI
Female Remanded ATSI
Female Remanded ATSI
Female Remanded ATSI
Female Remanded ATSI
0 2005 Q1
1 2005 Q2
0 2005 Q3
0 2005 Q4
1 2006 Q1
1 2006 Q2
20
Australian Pharmaceutical Benefits Scheme
21
Australian Pharmaceutical Benefits Scheme
The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.
22
Australian Pharmaceutical Benefits Scheme
The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.
Many drugs bought from pharmacies are subsidised to allow more equitable access to modern drugs. The cost to government is determined by the number and types of drugs purchased. Currently nearly 1% of GDP.
The total cost is budgeted based on forecasts of drug usage.
Costs are disaggregated by drug type (ATC1 x15 / ATC2 84), concession category (x2) and patient type (x2), giving 84 × 2 × 2 = 336 time series.
22
Working with tsibble objects
PBS
## # A tsibble: 65,219 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [336]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
##
## 1 1991 Jul Concession~ Co-pa~ A
## 2 1991 Aug Concession~ Co-pa~ A
## 3 1991 Sep Concession~ Co-pa~ A
## 4 1991 Oct Concession~ Co-pa~ A
## 5 1991 Nov Concession~ Co-pa~ A
## 6 1991 Dec Concession~ Co-pa~ A
## 7 1992 Jan Concession~ Co-pa~ A
## 8 1992 Feb Concession~ Co-pa~ A
## 9 1992 Mar Concession~ Co-pa~ A
## 10 1992 Apr Concession~ Co-pa~ A
## # … with 65,209 more rows
Alimentar~ A01 STOMATOLO~ 18228 67877
Alimentar~ A01 STOMATOLO~ 15327 57011
Alimentar~ A01 STOMATOLO~ 14775 55020
Alimentar~ A01 STOMATOLO~ 15380 57222
Alimentar~ A01 STOMATOLO~ 14371 52120
Alimentar~ A01 STOMATOLO~ 15028 54299
Alimentar~ A01 STOMATOLO~ 11040 39753
Alimentar~ A01 STOMATOLO~ 15165 54405
Alimentar~ A01 STOMATOLO~ 16898 61108
Alimentar~ A01 STOMATOLO~ 18141 65356
23
Working with tsibble objects
We can use the filter() function to select rows.
## # A tsibble: 816 x 9 [1M]
## # Key: Concession, Type, ATC1, ATC2 [4]
## Month Concession Type ATC1 ATC1_desc ATC2 ATC2_desc Scripts Cost
##
## 1 1991 Jul Concession~ Co-pa~ A
## 2 1991 Aug Concession~ Co-pa~ A
## 3 1991 Sep Concession~ Co-pa~ A
## 4 1991 Oct Concession~ Co-pa~ A
## 5 1991 Nov Concession~ Co-pa~ A
## 6 1991 Dec Concession~ Co-pa~ A
## 7 1992 Jan Concession~ Co-pa~ A
## 8 1992 Feb Concession~ Co-pa~ A
## 9 1992 Mar Concession~ Co-pa~ A
## 10 1992 Apr Concession~ Co-pa~ A
## # … with 806 more rows
Alimentar~ A10 ANTIDIAB~ 89733 2.09e6
Alimentar~ A10 ANTIDIAB~ 77101 1.80e6
Alimentar~ A10 ANTIDIAB~ 76255 1.78e6
Alimentar~ A10 ANTIDIAB~ 78681 1.85e6
Alimentar~ A10 ANTIDIAB~ 70554 1.69e6
Alimentar~ A10 ANTIDIAB~ 75814 1.84e6
Alimentar~ A10 ANTIDIAB~ 64186 1.56e6
Alimentar~ A10 ANTIDIAB~ 75899 1.73e6
Alimentar~ A10 ANTIDIAB~ 89445 2.05e6
Alimentar~ A10 ANTIDIAB~ 97315 2.23e6
24
PBS %>%
filter(ATC2 == “A10″)
Working with tsibble objects
We can use the select() function to select columns.
PBS %>%
filter(ATC2==”A10”) %>%
select(Cost)
Selecting index: “Month”
Error: The result is not a valid tsibble.
Do you need ‘as_tibble()‘ to work with data frame?
25
Working with tsibble objects
We can use the select() function to select columns.
## # A tsibble: 816 x 4 [1M]
## # Key: Concession, Type [4]
## Month Concession Type Cost
##
## 1 1991 Jul Concessional Co-payments 2092878
## 2 1991 Aug Concessional Co-payments 1795733
## 3 1991 Sep Concessional Co-payments 1777231
## 4 1991 Oct Concessional Co-payments 1848507
## 5 1991 Nov Concessional Co-payments 1686458
## 6 1991 Dec Concessional Co-payments 1843079
## 7 1992 Jan Concessional Co-payments 1564702
## 8 1992 Feb Concessional Co-payments 1732508
## 9 1992 Mar Concessional Co-payments 2046102
## 10 1992 Apr Concessional Co-payments 2225977
26
PBS %>%
filter(ATC2 == “A10”) %>%
select(Month, Concession, Type, Cost)
Working with tsibble objects
We can use the summarise() function to summarise over keys.
## # A tsibble: 204 x 2 [1M]
## Month total_cost
##
## 1 1991 Jul
## 2 1991 Aug
## 3 1991 Sep
## 4 1991 Oct
## 5 1991 Nov
## 6 1991 Dec
## 7 1992 Jan
## 8 1992 Feb
## 9 1992 Mar
## 10 1992 Apr
3526591
3180891
3252221
3611003
3565869
4306371
5088335
2814520
2985811
3204780
27
PBS %>%
filter(ATC2 == “A10”) %>%
select(Month, Concession, Type, Cost) %>%
summarise(total_cost = sum(Cost))
Working with tsibble objects
We can use the mutate() function to create new variables.
## # A tsibble: 204 x 2 [1M]
## Month total_cost
##
## 1 1991 Jul 3.53
## 2 1991 Aug 3.18
## 3 1991 Sep 3.25
## 4 1991 Oct 3.61
## 5 1991 Nov 3.57
## 6 1991 Dec 4.31
## 7 1992 Jan 5.09
## 8 1992 Feb 2.81
## 9 1992 Mar 2.99
28
PBS %>%
filter(ATC2 == “A10”) %>%
select(Month, Concession, Type, Cost) %>%
summarise(total_cost = sum(Cost)) %>%
mutate(total_cost = total_cost / 1e6)
Working with tsibble objects
We can use the mutate() function to create new variables.
## # A tsibble: 204 x 2 [1M]
## Month total_cost
##
## 1 1991 Jul 3.53
## 2 1991 Aug 3.18
## 3 1991 Sep 3.25
## 4 1991 Oct 3.61
## 5 1991 Nov 3.57
## 6 1991 Dec 4.31
## 7 1992 Jan 5.09
## 8 1992 Feb 2.81
## 9 1992 Mar 2.99
29
PBS %>%
filter(ATC2 == “A10”) %>%
select(Month, Concession, Type, Cost) %>%
summarise(total_cost = sum(Cost)) %>%
mutate(total_cost = total_cost / 1e6) -> a10
Your turn
30
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
31
Graphics
First in any modelling/forecasting task
should be to plot your data.
Plots allow us to identify: Patterns;
Unusual observations;
Changes over time;
Relationships between variables.
32
Time plots
a10 %>%
autoplot(total_cost) +
labs(title = “Antidiabetic drug sales”,
y = “$ millions”)
30
20
10
Antidiabetic drug sales
1995 Jan
2000 Jan
2005 Jan
Month [1M]
33
$ millions
Ansett airlines
ansett
## # A tsibble: 7,407 x 4 [1W]
## # Key: Airports, Class [30]
## Week Airports Class Passengers
##
## 1 1989 W28 ADL-PER Business 193
## 2 1989 W29 ADL-PER Business 254
## 3 1989 W30 ADL-PER Business 185
## 4 1989 W31 ADL-PER Business 254
## 5 1989 W32 ADL-PER Business 191
## 6 1989 W33 ADL-PER Business 136
## 7 1989 W34 ADL-PER Business 0
## 8 1989 W35 ADL-PER Business 0
## 9 1989 W36 ADL-PER Business 0
## 10 1989 W37 ADL-PER Business 0
34
## # … with 7,397 more rows
Ansett airlines
ansett %>%
autoplot(Passengers)
30000
20000
10000
0
Airports/Class ADL−PER/Business
MEL−ADL/Business MEL−BNE/Business MEL−OOL/Business MEL−PER/Business MEL−SYD/Business SYD−ADL/Business SYD−BNE/Business SYD−OOL/Business SYD−PER/Business ADL−PER/Economy MEL−ADL/Economy MEL−BNE/Economy MEL−OOL/Economy MEL−PER/Economy
MEL−SYD/Economy SYD−ADL/Economy SYD−BNE/Economy SYD−OOL/Economy SYD−PER/Economy ADL−PER/First MEL−ADL/First MEL−BNE/First MEL−OOL/First MEL−PER/First MEL−SYD/First SYD−ADL/First SYD−BNE/First SYD−OOL/First SYD−PER/First
Passengers
1987 W53
1990 W01
Week [1W]
1992 W01
35
Ansett airlines
ansett %>%
filter(Class == “Economy”) %>%
autoplot(Passengers)
36
30000
20000
10000
0
1987 W53
1990 W01
1992 W01
Week [1W]
Airports/Class ADL−PER/Economy
MEL−ADL/Economy MEL−BNE/Economy MEL−OOL/Economy MEL−PER/Economy MEL−SYD/Economy SYD−ADL/Economy SYD−BNE/Economy SYD−OOL/Economy SYD−PER/Economy
Passengers
Ansett airlines
ansett %>%
filter(Airports == “MEL-SYD”) %>%
autoplot(Passengers)
37
30000
20000
10000
0
1987 W53
1990 W01
1992 W01
Week [1W]
Airports/Class MEL−SYD/Business
MEL−SYD/Economy MEL−SYD/First
Passengers
Ansett airlines
ansett %>%
filter(Airports == “MEL-SYD”) %>%
autoplot(Passengers)
38
30000
20000
10000
0
1987 W53
1990 W01
1992 W01
Week [1W]
Airports/Class MEL−SYD/Business
MEL−SYD/Economy MEL−SYD/First
Passengers
Time plots
ansett %>%
filter(Airports==”MEL-SYD”, Class==”Economy”) %>%
mutate(Passengers = Passengers/1000) %>%
autoplot(Passengers) +
labs(title = “Ansett airlines economy class”,
subtitle = “Melbourne-Sydney”,
y = “Passengers (‘000)”)
39
Time plots
30
20
10
0
Ansett airlines economy class Melbourne−Sydney
1987 W53
1990 W01
Week [1W]
1992 W01
40
Passengers (‘000)
Your turn
Create plots of the following time series: Bricks from aus_production, Lynx from pelt, Close from gafa_stock, Demand from vic_elec. Use help() to find out about the data in each series.
For the last plot, modify the axis labels and title.
41
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
42
Time series patterns
Trend Seasonal
pattern exists when there is a long-term increase or decrease in the data.
pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week). pattern exists when data exhibit rises and falls that are not of fixed period (duration usually of at least 2 years).
Cyclic
43
Time series components
Differences between seasonal and cyclic patterns:
seasonal pattern constant length; cyclic pattern variable length
average length of cycle longer than length of seasonal pattern
magnitude of cycle more variable than magnitude of seasonal pattern
44
Time series patterns
aus_production %>%
filter(year(Quarter) >= 1980) %>%
autoplot(Electricity) +
labs(y=”GWh”, title=”Australian electricity production”)
45
Time series patterns
aus_production %>%
autoplot(Bricks) +
labs(title=”Australian clay brick production”,
y=”Units (millions)”)
46
Time series patterns
us_employment %>%
filter(Title == “Retail Trade”, year(Month) >= 1980) %>%
autoplot(Employed / 1e3) +
labs(title=”Retail employment, USA”, y=”People (millions)”)
47
Time series patterns
gafa_stock %>%
filter(Symbol == “AMZN”, year(Date) >= 2018) %>%
autoplot(Close) +
labs(title=”Amazon closing stock price”, y=”$”)
48
Time series patterns
pelt %>%
autoplot(Lynx) +
ggtitle(“Annual Canadian Lynx Trappings”) +
xlab(“Year”) + ylab(“Number trapped”)
49
Seasonal or cyclic?
Differences between seasonal and cyclic patterns:
seasonal pattern constant length; cyclic pattern variable length
average length of cycle longer than length of seasonal pattern
magnitude of cycle more variable than magnitude of seasonal pattern
50
Seasonal or cyclic?
Differences between seasonal and cyclic patterns:
seasonal pattern constant length; cyclic pattern variable length
average length of cycle longer than length of seasonal pattern
magnitude of cycle more variable than magnitude of seasonal pattern
The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.
50
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
51
Seasonal plots
a10 %>% gg_season(total_cost, labels = “both”) +
labs( title= “Seasonal plot: antidiabetic drug sales”,
y=”$ million”)
Seasonal plot: antidiabetic drug sales
302 2
2
2
20
2
008
007
006
005
2
2
101 1
2
2
004
003
002
001
000
999
9987 996
9954 993
992
2008
1
1 1
1
Jan Feb Mar
Apr May Jun Jul
Aug Sep Oct
Nov Dec
Month
199
1
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
19953
1994
1992
1991
52
$ million
Seasonal plots
Data plotted against the individual “seasons” in which the data were observed. (In this case a
“season” is a month.)
Something like a time plot except that the data from each season are overlapped.
Enables the underlying seasonal pattern to be seen more clearly, and also allows any substantial departures from the seasonal pattern to be easily identified.
In R: gg_season()
53
Seasonal subseries plots
a10 %>% gg_subseries(total_cost) +
labs(title=”Subseries plot: antidiabetic drug sales”,
y= “$ millions”)
Subseries plot: antidiabetic drug sales
Jan Feb Mar
Apr May Jun Jul Aug Sep Oct
Nov Dec
30
20
10
Month
54
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
1995 2000 2005
$ millions
Seasonal subseries plots
Data for each season collected together in time plot as separate time series.
Enables the underlying seasonal pattern to be seen clearly, and changes in seasonality over time to be visualized.
In R: gg_subseries()
55
Quarterly Australian Beer Production
new_prod <- aus_production %>% select(Quarter, Beer) %>%
filter(year(Quarter) >= 1992)
new_prod %>% autoplot(Beer) +
labs(title = “Australian beer production”,
y = “Megalitres”)
500
450
400
Australian beer production
1995 Q1
2000 Q1
2005 Q1
2010 Q1
Quarter [1Q]
56
Megalitres
Quarterly Australian Beer Production
new_prod %>% gg_season(Beer, labels=”right”) +
labs(title = “Australian beer production”,
y = “Megalitres”)
500
450
400
Australian beer production
2010
19924 1997
19995 1998 1993 12909062 2000
20016 2003 20098
2005
2007
2004
Q1 Q2 Q3 Q4
Quarter
57
Megalitres
Quarterly Australian Beer Production
new_prod %>% gg_subseries(Beer) +
labs(title = “Australian beer production”,
y = “Megalitres”)
Australian beer production
500
450
400
Q1
Q2
Q3
Q4
Quarter
58
1995
2000
2005
2010
1995
2000
2005
2010
1995
2000
2005
2010
1995
2000
2005
2010
Megalitres
Your turn
Look at the quarterly tourism data for the Snowy Mountains
dat <- filter(tourism,
Region %in% c("Snowy Mountains","Gold Co
Purpose == "Holiday")
Use autoplot(), gg_season() and gg_subseries() to explore the data. What do you learn?
59
Multiple seasonal periods
vic_elec
## # A tsibble: 52,608 x 5 [30m]
## Time Demand Temperature Date Holiday
##
## 1 2012-01-01 00:00:00 4383.
## 2 2012-01-01 00:30:00 4263.
## 3 2012-01-01 01:00:00 4049.
## 4 2012-01-01 01:30:00 3878.
## 5 2012-01-01 02:00:00 4036.
## 6 2012-01-01 02:30:00 3866.
## 7 2012-01-01 03:00:00 3694.
## 8 2012-01-01 03:30:00 3562.
## 9 2012-01-01 04:00:00 3433.
## 10 2012-01-01 04:30:00 3359.
## # … with 52,598 more rows
21.4 2012-01-01 TRUE
21.0 2012-01-01 TRUE
20.7 2012-01-01 TRUE
20.6 2012-01-01 TRUE
20.4 2012-01-01 TRUE
20.2 2012-01-01 TRUE
20.1 2012-01-01 TRUE
19.6 2012-01-01 TRUE
19.1 2012-01-01 TRUE
19.0 2012-01-01 TRUE
60
Multiple seasonal periods
vic_elec %>% gg_season(Demand)
61
Multiple seasonal periods
vic_elec %>% gg_season(Demand, period = “week”)
62
Multiple seasonal periods
vic_elec %>% gg_season(Demand, period = “day”)
63
Australian holidays
holidays <- tourism %>%
filter(Purpose == “Holiday”) %>%
group_by(State) %>%
summarise(Trips = sum(Trips))
## # A tsibble: 640 x 3 [1Q]
## # Key: State [8]
## State Quarter Trips
##
## 1 ACT 1998 Q1 196.
## 2 ACT 1998 Q2 127.
## 3 ACT 1998 Q3 111.
## 4 ACT 1998 Q4 170.
## 5 ACT 1999 Q1 108.
## 6 ACT 1999 Q2 125.
## 7 ACT 1999 Q3 178.
## 8 ACT 1999 Q4 218.
## 9 ACT 2000 Q1 158.
## 10 ACT 2000 Q2 155.
64
## # … with 630 more rows
Australian holidays
holidays %>% autoplot(Trips) +
labs(title=”Australian domestic holidays”,
y=”Thousands of overnight trips”)
Australian domestic holidays
4000
3000
2000
1000
0
2000 Q1
2005 Q1
2010 Q1
2015 Q1
Quarter [1Q]
State ACT
NSW NT QLD SA TAS VIC WA
65
Thousands of overnight trips
Seasonal plots
holidays %>% gg_season(Trips) +
labs(title=”Australian domestic holidays”,
y=”Thousands of overnight trips”)
66
Australian domestic holidays
300 250 200 150 100
4000 3500 3000 2500
400 300 200 100
2700 2400 2100 1800 1500
900 800 700 600 500
600 400 200
3500 3000 2500 2000 1500 1300 1100
900 700
Q1 Q2 Q3 Q4
Quarter
2017 2012 2007 2002
Thousands of overnight trips
ACT NSW NT QLD SA TAS VIC WA
Seasonal subseries plots
holidays %>% gg_subseries(Trips) +
labs(title=”Australian domestic holidays”,
y=”Thousands of overnight trips”)
67
Australian domestic holidays
300 250 200 150 100
4000 3500 3000 2500
400 300 200 100
2700 2400 2100 1800 1500
900 800 700 600 500
600 400 200
3500 3000 2500 2000 1500 1300 1100
900 700
Q1 Q2 Q3 Q4
Quarter
ACT NSW NT QLD SA TAS VIC WA
2000 2005 2010 2015
2000 2005 2010 2015
2000 2005 2010 2015
2000 2005 2010 2015
Thousands of overnight trips
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
68
Example: Beer production
new_prod %>% gg_lag(Beer)
69
lag 1
lag 2
lag 3
500
450
400
500
450
400
500
450
400
lag 4
lag 5
lag 6
lag 7
lag 8
lag 9
400 450 500
400 450 500
400 450 500
lag(Beer, n)
season Q1 Q2 Q3 Q4
Beer
Example: Beer production
new_prod %>% gg_lag(Beer, geom=’point’)
70
lag 1
lag 2
lag 3
500
450
400
500
450
400
500
450
400
lag 4
lag 5
lag 6
lag 7
lag 8
lag 9
400 450 500
400 450 500
400 450 500
lag(Beer, n)
season Q1 Q2 Q3 Q4
Beer
Lagged scatterplots
Each graph shows yt plotted against yt−k for different values of k.
The autocorrelations are the correlations associated with these scatterplots.
71
Numerical data summaries
Covariance and correlation: measure extent of linear relationship between two variables (y and x).
T (y −y ̄)(x −x ̄) √t√t
r=t=1
Tt = 1 ( y t − y ̄ ) 2 Tt = 1 ( x t − x ̄ ) 2
Lies between -1 and 1
72
Correlation coefficient
Which one has the highest correlation?
All these have r = 0.82. Hence importance of plots.
73
Autocorrelation
Autocovariance (ck) and autocorrelation (rk): measure linear relationship between lagged values of a time series y.
74
Autocorrelation
Autocovariance (ck) and autocorrelation (rk): measure linear relationship between lagged values of a time series y.
We measure the relationship between:
yt and yt−1 yt and yt−2 yt and yt−3 …
yt and yt−k etc.
74
Autocorrelation
We denote the sample autocovariance at lag k by ck and the sample autocorrelation at lag k by rk. Then define
c T (yt −y ̄ )(yt−k −y ̄ ) k t=k+1
r==
k c 0 Tt = 1 ( y t − y ̄ )
r1 indicates how successive values of y relate to each other r2 indicates how y values two periods apart relate to each other
rk is almost the same as the sample correlation between yt and yt−k.
75
Autocorrelation
Results for first 9 lags for beer data:
new_prod %>% ACF(Beer, lag_max = 9)
## # A tsibble: 9 x 2 [1Q]
## ## ## 1 ## 2 ## 3 ##4 ##5 ##6 ##7 ##8
lag acf
1Q -0.102 2Q -0.657 3Q -0.0603 4Q0.869 5Q -0.0892 6Q -0.635 7Q -0.0542 8Q0.832
76
Autocorrelation
Results for first 9 lags for beer data:
new_prod %>% ACF(Beer, lag_max = 9) %>% autoplot()
0.5
0.0
−0.5
2468
lag [1Q]
77
acf
Autocorrelation
r4 is positive and higher than for the other lags. This is due to the seasonal pattern in the data.
the peaks (troughs) tend to be 4 quarters apart.
the spikes every 4 lags after this (r8, r12, . . .) decrease in
size as the lag number increases.
r2 is more negative than for the other lags because troughs and peaks tend to be 2 quarters apart.
The highest and the lowest productions are 2 quarters apart.
The spikes every 4 lags after this r6, r10, . . . decrease in size as the lag number increases.
Together, the autocorrelations at lags 1, 2, . . . , make up the autocorrelation or ACF.
The plot is known as a correlogram.
78
ACF
new_prod %>% ACF(Beer) %>% autoplot()
0.5
0.0
−0.5
2 4 6 8 10 12 14 16 18
lag [1Q]
79
acf
Trend and seasonality in ACF plots
When data have a trend, the autocorrelations for small lags tend to be large and positive. When data are seasonal, the autocorrelations will be larger at the seasonal lags (i.e., at multiples of the seasonal frequency)
When data are trended and seasonal, you see a combination of these effects.
80
Google stock price
google_2015 <- gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) == 2015) %>%
select(Date, Close)
google_2015
## # A tsibble: 252 x 2 [1D]
## Date Close
##
## 1 2015-01-02 522.
## 2 2015-01-05 511.
## 3 2015-01-06 499.
## 4 2015-01-07 498.
## 5 2015-01-08 500.
## 6 2015-01-09 493.
81
Google stock price
google_2015 %>% autoplot(Close)
700
600
500
Jan 2015 Apr 2015 Jul 2015 Oct 2015 Jan 2016
Date [1D]
82
Close
Google stock price
google_2015 %>%
ACF(Close, lag_max=100)
# Error: Can’t handle tsibble of irregular interval.
83
Google stock price
google_2015 %>%
ACF(Close, lag_max=100)
# Error: Can’t handle tsibble of irregular interval.
google_2015
## # A tsibble: 252 x 2 [1D]
## Date Close
##
## 1 2015-01-02 522.
## 2 2015-01-05 511.
## 3 2015-01-06 499.
## 4 2015-01-07 498.
## 5 2015-01-08 500.
## 6 2015-01-09 493. 83 ## 7 2015-01-12 490.
Google stock price
google_2015 <- google_2015 %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE)
google_2015
## # A tsibble: 252 x 3 [1]
## Date Close trading_day
##
## 1 2015-01-02 522. 1
## 2 2015-01-05 511. 2
## 3 2015-01-06 499. 3
## 4 2015-01-07 498. 4
## 5 2015-01-08 500. 5
## 6 2015-01-09 493. 6
84
Google stock price
google_2015 %>%
ACF(Close, lag_max=100) %>%
autoplot()
1.00
0.75
0.50
0.25
0.00
25 50 75 100
lag [1]
85
acf
Google stock price
google_longer <- gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) >= 2014) %>%
select(Date, Close) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE)
google_longer
## # A tsibble: 1,258 x 3 [1]
## Date Close trading_day
##
## 1 2014-01-02 553. 1
## 2 2014-01-03 549. 2
## 3 2014-01-06 555. 3
## 4 2014-01-07 566. 4
86
Google stock price
google_longer %>%
autoplot(Close)
1250
1000
750
500
0 400
800 1200
trading_day [1]
87
Close
Google stock price
google_longer %>%
ACF(Close, lag_max=100) %>%
autoplot()
1.00
0.75
0.50
0.25
0.00
25 50 75 100
lag [1]
88
acf
Aus monthly electricity production
elec2 <- as_tsibble(fma::elec) %>%
filter(year(index) >= 1980)
elec2 %>% autoplot(value)
14000
12000
10000
8000
1980 Jan
1985 Jan
1990 Jan
1995 Jan
index [1M]
89
value
Aus monthly electricity production
elec2 %>% ACF(value, lag_max=48) %>%
autoplot()
0.75
0.50
0.25
0.00
6 12 18 24 30 36 42 48
lag [1M]
The slow decrease in the ACF as the lags increase is due to the trend, while the “scalloped” shape is due the seasonality.
90
acf
Your turn
We have introduced the following functions:
gg_lag ACF
Explore the following time series using these functions. Can you spot any seasonality, cyclicity and trend? What do you learn about the series?
Bricks from aus_production
Lynx from pelt
Victorian Electricity Demand from aus_elec
91
Which is which?
1. Daily temperature of cow
2. Monthly accidental deaths
3. Monthly air passengers
4. Annual mink trappings
80
60
40
0 20
11000
10000
9000
8000
7000
600
400
200
90000
60000
30000
40 60
1974 Jan
1976 Jan
1978 Jan
1950 Jan
1955 Jan
1960 Jan
1860 1880 1900
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
1.0
0.5
0.0
ABCD
6 12 18
5 10 15
5 10 15
6 12 18
92
acf
chirps per minute
acf
thousands
acf
thousands
acf
thousands
Outline
1 Time series in R
2 Time plots
3 Time series patterns
4 Seasonal plots
5 Lag plots and autocorrelation
6 White noise
93
Example: White noise
set.seed(1)
wn <- tsibble(t = seq(36), y = rnorm(36), index = t)
wn %>% autoplot(y)
1
0
−1
−2
0 10 20 30
t [1]
94
y
Example: White noise
set.seed(1)
wn <- tsibble(t = seq(36), y = rnorm(36), index = t)
wn %>% autoplot(y)
1
0
−1
−2
White noise data is uncorrelated across time with zero mean and constant variance.
(Technically, we require independence as well.)
Each sample autocorrelation for white noise series is expected to be close to zero.
0 10 20 30
t [1]
94
y
Sampling distribution of autocorrelations
Sampling distribution of rk for white noise data is asymptotically N(0,1/T).
95% of all rk for white noise must lie within √
±1.96/ T.
If this is not the case, the series is probably not
WN.
Common to plot lines at ±1.96/ T when plotting ACF. These are the critical values.
√
95
Autocorrelation
96
Example:
T = 36 and so critical values √
at ±1.96/ 36 = ±0.327.
All autocorrelation coefficients lie within these limits, confirming that the data are white noise. (More precisely, the data cannot be distinguished from white noise.)
Autocorrelation
Note: 5% chance to be outside the critical values (Type I error). You want to see spikes a long way out or many of them. Don’t get too excited for 1 just outside especially at longer lags.
96
Example:
T = 36 and so critical values √
at ±1.96/ 36 = ±0.327.
All autocorrelation coefficients lie within these limits, confirming that the data are white noise. (More precisely, the data cannot be distinguished from white noise.)
Example: Pigs slaughtered
pigs <- aus_livestock %>%
filter(State == “Victoria”, Animal == “Pigs”,
year(Month) >= 2014)
pigs %>% autoplot(Count/1e3) + ylab(“Thousands”) +
ggtitle(“Number of pigs slaughtered in Victoria”)
110
100
90
80
Number of pigs slaughtered in Victoria
2014 Jan
2016 Jan
2018 Jan
Month [1M]
97
Thousands
Example: Pigs slaughtered
pigs %>% ACF(Count) %>% autoplot()
0.2
0.0
−0.2
6 12
lag [1M]
98
acf
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)
Difficult to detect pattern in time plot.
ACF shows significant autocorrelation for lag 2 and 12.
Indicate some slight seasonality.
99
Example: Pigs slaughtered
Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)
Difficult to detect pattern in time plot.
ACF shows significant autocorrelation for lag 2 and 12.
Indicate some slight seasonality.
These show the series is not a white noise series.
99
Your turn
You can compute the daily changes in the Google stock price in 2018 using
dgoog <- gafa_stock %>%
filter(Symbol == “GOOG”, year(Date) >= 2018) %>%
mutate(trading_day = row_number()) %>%
update_tsibble(index=trading_day, regular=TRUE) %>%
mutate(diff = difference(Close))
Does diff look like white noise?
100