Set 5 – Periodograms and the Frequency Domain
Semester 2 2021
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 1 / 27
Cycles in Time Series – Why?
Because the Earth is tilted and revolves around the Sun every 365.25 days this creates a year and seasons, and thus a calendar. Within this calendar we have various ways peoples have divided up the year – weeks, months, etc. Additionally the Earth rotates on its axis once every 24 hours, from whence we create a day and its subdivision.
Because of this many activities have a cycle e. how much studying you do per month.
Problem
Find if these cycles exist in the data and, if they do, describe their properties.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 2 / 27
Periodogram
Interesting because, …
Contains important information about the time series. In fact could be MOST important.
Many time series exhibit regular periodic structure.
Due to natural rhythms of animals/weather/climate/planet! We can see this in the auto correlation function.
Observation
Cyclic behaviour can often be approximated by a sine or cosine function(s).
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 3 / 27
Tourist Arrivals as a ?
Observed cycle reasonably modelled by sinusoidal function
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 4 / 27
Revision – Sine and Cosine Waves
angles in radians π2 = 90à, π = 180à, 2π = 360à
cosine is a sine: sin(x + π2 ) = cos(x)
Pythagorean theorem: sin2(x) + cos2(x) = 1
maximum value for sin(x) is 1, minimum -1
sin(nπ) = cos(nπ + π2 ) = 0 for all integer n
cos(nπ) = sin(nπ + π2 ) = 1 for all integer n
sin[(2n + 1)π + π2 ] = cos[(2n + 1)π] = −1 for all integer n
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 5 / 27
Amplitude, frequency and phase
Consider general sine wave:
Asin(2πft − φ)
A is the amplitude – graphically half-height of wave.
f is the frequency – numbers of cycles per unit of time
f1 is the period/wavelength – length of one cycle φ is the phase – offset for start of cycle
Note
High frequency means small wavelength and vice-versa. Phase φ is in radians, so φ/(2πf )
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 6 / 27
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 7 / 27
Regression with sine waves
Idea
We find a cycle of a certain length if a sinusoidal function with the same wavelength gives a good fit to the data.
Possible regression model
Yt =Asin(2πft−φ)+εt
It is not a linear regression model as we do not know frequency f or phase
φ
Assume for a moment that we know the frequency f A is a linear regression coefficient
but the phase φ is still not linear
Is there a way to find the phase as a linear problem?
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 8 / 27
Regression with sine and cosine waves
Some useful trigonometric rules
sin(x ± y) = sin(x)cos(y) ± cos(x)sin(y)
cos(x ± y) = cos(x)sin(y) ∓ sin(x)cos(y) sin(2x) = 2sin(x)cos(x)
cos(2x) = cos2(x) − sin2(x)
(1)
(2) (3) (4)
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain
Semester 2 2021
9 / 27
Regression with sine and cosine waves
Problem
Yt =Asin(2πft−φ)+εt
is not linear in φ Solution
Using equation (1 we get:
Asin(2πft − φ) = Asin(2πft)cos(φ) − Acos(2πft)sin(φ) So now we have linear model for Yt
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 10 / 27
Regression with sine and cosine waves
Centering data
To apply the linear regression model
Yt =β1sin(2πft)+β2cos(2πft)+εt
the data needs to be centred, Yt = Xt − X ̄ because the model has no intercept term.
Amplitude
Our main interest is finding the frequencies f where the amplitude A is large.
Natural estimators for
amplitude: Aˆ = βˆ12 + βˆ2 phase: φˆ = tan−1 −βˆ1
βˆ2
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 11 / 27
Example: Tourist Arrivals
Assume frequency f = 1/12
Therefore, period is 1/f – that is 12 months βˆ1 = 87, 690, βˆ1 = 31, 259
Aˆ = 93, 094
φˆ = −0.342
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 12 / 27
Example: Tourist Arrivals
Assume frequency f = 1/24
Therefore, period is 1/f – that is 24 months βˆ1 = −911.9, βˆ1 = −11, 976
Aˆ = 12, 011
φˆ = 1.495
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 13 / 27
Finding Frequencies
So far we assumed the frequency f is known.
But what happens if we do not know f ?
We want to find all frequencies f where the amplitude is large.
Possible Strategy:
Run the regression with sine waves for “all” reasonable frequencies.
If Yt has a cycle of frequency f , the corresponding amplitude will be larger than for other frequencies.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 14 / 27
Finding Frequencies
Assume in the following that the sample size n is even and that the time series is regularly spaced.
Use the frequencies f = 0, n1, n2,…, 12
Called the harmonic frequencies, Fourier, or fundamental frequencies.
Ifnisodd,f = j mj=0,1,2,…,n−1 n−1 2
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 15 / 27
Finding Frequencies
Harmonic frequencies are f = 0, n1, n2,…, 12
Including f = 0 means an infinite period, which as we see later adds
an intercept term.
f = 1/2 is called the Nyquist frequency. It has a period of 2 so completes a cycle every two observations. It is the shortest cycle that is detectable in the observations. Higher frequencies are indistinguishable or “aliased”
f = 1/n means period of n. This is the longest cycle considered that is completely contained in the observations.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 16 / 27
Sampling/ Fourier Frequencies
Plots of cos(2πft) and sin(2πft) where n = 16. Only observe sine waves at t = 1,2,…,n
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 17 / 27
Sampling/ Fourier Frequencies
For f = 12 the sin(2πft) = 0 for all t. By contrast cos(2πft) alternates between ±1
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 18 / 27
Aliasing at higher frequencies
Plot of sin(2πft) where n = 16 but only observe sine waves at
t = 1;,2,…,n. Blue line is f = 1 . Red line is indistinguishable or
“aliased” from f = 17 > 1 16 2
16
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 19 / 27
Aliasing at higher frequencies
Blue line is sin(2πft) for f = 7 . Red line is −2sin(2πft) for f = 9 and is 16 16
more representative.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 20 / 27
Periodogram as regression
Regression for every harmonic frequency j/n
j j
Yt =β1,nj sin 2πnt +β2,nj cos 2πnt +εt with estimators βˆ1,nj and βˆ2,nj
Estimated squared amplitude jˆˆ
P n =β1,nj +β2,nj is called scaled periodogram.
The Periodogram is given by
j n j P n =4P n
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 21 / 27
Example – A sine wave with noise
Yt = cos(2π0.1t + 0.6π) + Wt, with σW = 1
The frequency f = 0.1 is hard to identify in the time series of Yt , but is easily seen in the periodogram.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 22 / 27
Example – A sine waves with noise
Yt = cos(2π0.1t + 0.6π) + cos(2π7t + 0.3π) + Wt σW = 1
The frequencies f1 = 0.1 and f2 = 1/7 cannot be identified in the time series plot, but are easily seen in the periodogram.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 23 / 27
Computation by Complete Regression
Instead of fitting ( n2 + 1 separate regressions, we can compute the periodogram with a complete regression
n/2 j j Yt =β1,nj sin 2πnt +β2,nj cos 2πnt
j=0
as the regressors are all orthogonal (property of sin and cos).
Problem: We have ( n2 + 1) regression coefficients, but only n datapoints. But
0 = sin(2π0t) = sin(2π 12 t) ∀t
0 = β 1 , 0 = β 1 , 21
So n regression coefficients remain.
Note: cos(2π0t) = 1, so β2,0 is an intercept term.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 24 / 27
What About Error Term?
Compute the periodogram with a complete regression
n/2 j j Yt =β1,nj sin 2πnt +β2,nj cos 2πnt
j=0
so n regression coefficients and n datapoints. This will give a perfect fit and no error term ε is needed.
The OLS estimates βˆi,nj are the same as for the individual regression
models, due to the orthogonality of the regressors (property of sin and cos).
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 25 / 27
Fourier Transforms
The decomposition of a time series into sine waves
n/2 j j Yt =β1,nj sin 2πnt +β2,nj cos 2πnt
j=0
is called Fourier transform. The complex numbers j √n
d n = 2 β1,nj +iβ1,nj are called Fourier coefficients.
The Fourier transform contains the same information as the time series and vice versa.
Periodogram I(j/n) = |d(j/n)|2
The time series is decomposed into the frequency domain.
The “Fast Fourier Transform” algorithm computes v quicker than
OLS, using the R command fft.
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 26 / 27
Arrivals Data
The peak at low frequencies is due to the trend. It is better to
de-trend beforehand.
Distinct peak at f = 1 12
Smallerpeaksatf = 2 , 3 , 4 , 5 areduetospectralleakageinto 12 12 12 12
harmonics of the frequency 1 12
©Fidelio Statistical Services Set 5 – Periodograms and the Frequency Domain Semester 2 2021 27 / 27