CS计算机代考程序代写 Bayesian AI finance Contents

Contents
F70TS: Time Series
1. Introduction,ACF,StationarityandOperators 3
1.1. Objectivesoftimeseriesanalysis…………………… 6 1.2. AutocorrelationFunctionandStationarity ………………. 7 1.3. Operators ……………………………… 9 1.4. UnivariateLinearProcesses……………………… 10
2. Moving Average Processes 12
2.1. PropertiesofMAProcesses……………………… 12
3. Autoregressive Processes 15
3.1. AR(1)Processes…………………………… 17 3.2. AR(2)Processes…………………………… 19 3.3. PartialAutocorrelations……………………….. 22
4. The MA(∞) and AR(∞) Processes 25 4.1. TheMA(∞)Processes ……………………….. 25 4.2. AR(∞)Processes………………………….. 27 4.3. InvertibilityofMA(q)Processes …………………… 27
5. The ARMA and ARIMA Processes 29
5.1. ARMAProcesses………………………….. 29 5.2. ARIMAProcesses …………………………. 31
6. Forecasting for Linear Processes 35
6.1. Box-JenkinsForecasting ………………………. 35 6.2. Forecastingintervalsanderror ……………………. 38
7. Estimation for Univariate Linear Processes 41
7.1. Estimationofμ …………………………… 41 7.2. PropertiesofX ̄ …………………………… 41 7.3. EstimationoftheAutocorrelationFunction . . . . . . . . . . . . . . . . . . 45
8. Estimation of the ARMA Model 49
8.1. EstimationforanAR(1)Model……………………. 49 8.2. TheYule-WalkerEstimatorsofanAR(p)Model . . . . . . . . . . . . . . . . 51 8.3. LeastSquaresEstimatorsofanARMAModel . . . . . . . . . . . . . . . . . 53
1

8.4. SelectionofanARMA(p,q)Model …………………. 55 8.5. SelectionofdinanARIMA(p,d,q)Model………………. 56 8.6. ExamplesofModelSelection…………………….. 56
9. VectorTimeSeries 60
9.1. StationaryMultivariateTimeSeries …………………. 61 9.2. MultivariateARMAProcesses ……………………. 62 9.3. VAR(p) ………………………………. 63
10.Introduction to ARCH and GARCH 65
10.1.PropertiesofFinancialTimeSeries………………….. 65 10.2.ARCHModels …………………………… 67 10.3.GARCHModels ………………………….. 68
11.Further Reading 70
A. Stationarity Triangle for an AR(2) Process 71
B. Proofs of Results in Section 7.2 72
B.1.ProofofTheorem7.1………………………… 73 B.2.ProofofTheorem7.2………………………… 73 B.3.ProofofTheorem7.3………………………… 74
2

1. Introduction, ACF, Stationarity and Operators
A time series (TS) is a set of observations of a particular quantity made one after another in time. These data are usually equally-spaced in time (daily, monthly, quarterly, annually,…), but this is not always the case.
The fields of application of modern time-series analysis are numerous, encompassing finance, economics, geography, demography, management, medicine, meteorology etc. See some ex- amples of time series data in Figures 1–4.
The most important thing to note is that these observations are NOT just an i.i.d. sample – they will be related to one another by a dependence structure. It is this relationship of “serial dependence” that is of interest.
Formally, we have the following definition:
Definition1.1 ATimeSeriesisastochasticprocessdefinedonaprobabilityspace(Ω,F,P). That means, for a given index set T, a time series X = {Xt}t∈T is a collection of random variables defined on a probability space (Ω, F , P).
Remarks:
• In the majority of cases we have T = {0,1,…,∞}, or T = {0,1,…,n}, or T = {−∞,…,−1,0,1,…,∞}. This means X = {Xt}t∈T is a stochastic process in dis- crete time, that is, X = {Xt}t=T is a set of random variables Xt in time order.
• The series
x1,x2,…,xn = {xt}nt=1
iscalledanobservation(orobservedseries)ofthetimeseriesX1,…,Xn ={Xt}nt=1. The term “time series” is used both for the sequence of random variables {Xt} and for the sequence of observations {xt}.
• “Time” could be a more general coordinate system. For example, the length along a river, road, pipeline, or power line, or a vector-valued coordinate system (e.g., spatial data analysis). However, in this course we will focus on temporal series analysis.
• We mainly focus on univariate series Xt ∈ R and discuss multivariate series at the end of the course.
3

1987 1 1989 1 1991 1 1993 1 1995 1 1997 1 1999 1 2001 1 2003 1 2005 1 2007 1
Quarter
Figure 1: Retail Price Index (UK)
0 20 40 60 80
Index
Figure 2: Annual force of inflation (annual log-difference of RPI)
4
rpiLD RPI
0.02 0.04 0.06 0.08 0.10 100 120 140 160 180 200

1950 1960 1970 1980 1990 2000
year
Figure 3: UK annual GDP at current prices (in Million GBP)
1950 1960 1970 1980 1990 2000
gdp$year[2:ng]
Figure 4: Log-difference of UK annual GDP at current prices
5
gdpLD GDP
0.05 0.10 0.15 0.20 0 200000 600000 1000000 1400000

1.1. Objectives of time series analysis
The objectives of time series analysis can be summarised as follows: 1. Description and analysis
a) Plot the series. A data plot shows up the basic properties of the series and may indicate appropriate methods of analysis. It may, for example, suggest that a trans- formation of the data is appropriate (e.g. if standard deviation increases with mean, a log transformation may stabilise the variance). It may also show up the presence of outliers; i.e., points which do not fit the overall data pattern.
b) Calculation of simple descriptive summary measures, such as mean, variance, and autocorrelation plots (see Autocorrelation Function).
c) Modelling: a mathematical model may be proposed and fitted. This is useful, for example, for forecasting future values of the series. Time series models usually describe the observed series {xt} by combining three terms:
• Seasonal model: deterministic periodic patterns (e.g., a restaurant that has more business during the end of the week and the weekend, this would pro- duce a seasonality of 7 days). Some series include several periodic patterns (e.g., increased business during summer months, or during the main sports season, etc.)
• Trend model: underlying deterministic upwards or downwards trend of the observed quantities (e.g., demand for certain commodities might be growing, or shrinking).
• Stochastic model: random fluctuations in the time series that are not due to trend or seasonality.
The seasonality and trend components of a time series are deterministic, and can therefore be estimated and removed from the observed series to isolate the stochas- tic component of the series. Analysing this stochastic component is significantly more challenging.
2. Forecasting: Given an observed series we predict future values, with uncertainty quan- tification on the delivered prediction.
3. Control: In a controlled physical process, analysing an appropriate time series allows adjusting the input variables in order to stabilise the process. (e.g., the temperature con- trol system controlling an oven, the automatic pilot system in an autonomous vehicle).
4. Explanation: In a multivariate context, we may use the variation in one time series to “explain” the behaviour of another time series (e.g., the relationship between the price of petrol and the price of oil, or between demand for certain products and the weather).
6

1.2. Autocorrelation Function and Stationarity
Our main tool for understanding the dependence structure within time series data is the auto- covariance function, defined as follows:
Definition 1.2 The covariance between two elements Xt and Xs (t, s ∈ T) of a time series X , is called the autocovariance function of X, denoted by
γ(t, s) = Cov(Xt, Xs) = E[(Xt − E(Xt))(Xs − E(Xs))]. In particular, γ(t, t) = Var(Xt).
Roughly speaking, a time series is said to be stationary if its properties do not change over time. There are many different ways to make this idea mathematically precise. One common way is the property of weak stationarity, which is defined based on autocovariances:
Definition 1.3 A time series is said to be weakly stationary if (i) E(Xt2)<∞forallt∈T, (ii) E(Xt)=μforallt∈T,and (iii) γ(r,s)=γ(r+t,s+t)forallr,s,t∈T. We note the following properties of a weakly stationary time series: • the mean and variance are constant. • the autocovariances γ(r, s) = Cov(Xr, Xs) only depend on the time lag k = s − r. It is hence a one-dimensional function, γ(k) = Cov(Xt, Xt+k) for k = ... − 1, 0, 1, ... and ∀ t ∈ T, called the autocovariance function of the process Xt. We may also define a normalized version of the autocovariance function, called the autocor- relation function: Definition 1.4 The autocorrelation function (ACF) is the standardised autocovariance func- tion of a (weakly) stationary process Xt: γ(k) Cov(Xt, Xt+k) Cov(Xt, Xt+k) ρ(k) = γ(0) = Var(Xt) = 􏰜Var(Xt)Var(Xt+k) . Example: White Noise This is the simplest stationary series. The elements Xt for all t ∈ T are i.i.d. with mean 0 and variance σ2. So, γ(t, s) = 0 for t ̸= s, and γ(t, t) = σ2. The White Noise process is usually 7 denoted by ε (or ε(t)). It has no “structure” but is used as a fundamental building block for models with interesting and complex structure (see later sections). Properties of the autocovariance function of a (weakly) stationary process (similar properties hold for the ACF): • γ(0)=Var(Xt)=σX2 ≥0andρ(0)≡1. • |γ(k)|≤γ(0)forallk∈T, • γ(−k)=γ(k)forallk∈T. An alternative definition of stationarity that is sometimes useful is strong stationarity: Definition 1.5 A time series is said to be strongly (or strictly) stationary, if P(Xt1 < x1,...,Xtk < xk) = P(Xt1+t < x1,...,Xtk+t < xk) ∀k = 1,2,...; t,t1,...,tk ∈ T and x1,...,xk ∈ R. Strong stationarity is a property of the whole distribution of a process. Weak stationarity is just defined based on the first two moments of the process. It is clear that strong stationarity does not follow from weak stationarity. However, note that, in general, weak stationarity also does not follow from strong stationarity either, because the second moments (or even the mean) of a strongly stationary process may not exist. If the variance of a strongly stationary process exists, then it is also weakly stationary. Note that if the process is Gaussian, i.e. the distribution of Xt for any t ∈ T is normal, then the two stationary properties are equivalent to each other. Examples: 1. Xt = εt, t = 0, 1, ... with i.i.d. Cauchy εt is strongly, but not weakly, stationary (since the variance is not finite here). 2. Let εt be i.i.d. with E(εt) = 0 and E(ε2t ) < ∞, then the moving average process (of order 1) defined by Xt = ψεt−1 + εt, |ψ| < 1 and the autoregressive process (of order 1) defined by Xt = φXt−1 + εt, |φ| < 1 are both strongly and weakly stationary. See later on for more detail. 3. Let εt be i.i.d. N (0, 1) random variables. Then the process 􏱘 εt √ fort=1,3,... Xt = (ε2t −1)/ 2 fort=2,4,..., is weakly, but not strongly, stationary. 8 4. The Random Walk process Xt = Xt−1 + εt with i.i.d. εt is neither weakly nor strongly stationary. To see this, check that the variance of Xt depends on t. 1.3. Operators A time series model is often defined by using mathematical operators and some base stochastic process (such as a white noise process). Operators allow transforming one time series into another, and can be applied to the random variables Xt or the observations xt. We will focus on the following widely used operators: Backward shift: BXt := Xt−1, and BrXt := Xt−r Difference: ∆Xt = Xt − Xt−1 Seasonal difference: DrXt = Xt − Xt−r = (1 − Br)Xt Forward Shift: FXt = Xt+1, and FrXt = Xt+r Identity: 1Xt = Xt Summation: SXt = Xt + Xt−1 + Xt−2 + ... Operators can be handled algebraically and admit inverse operators. For example: =⇒ ∆≡1−B. Also, ∆xt :=xt −xt−1 =xt −Bxt =(1−B)xt, S(1−B)xt =S(xt −xt−1)=Sxt −Sxt−1 =(xt +xt−1 +xt−2 +...) =⇒ S(1−B)≡1 S∆ ≡ 1 i.e. S ≡ ∆−1. Alternatively, The Moving Average Operator −(xt−1 +xt−2 +xt−3 +...) i.e.S≡(1−B)−1. Sxt =xt +xt−1 +xt−2 +... =(1+B+B2 +...)xt =(1−B)−1xt. = xt 9 Moving averages are a useful way of building models for stationary time series data, especially by applying a moving average operator to a white noise process. Notation: is called a moving average (MA) operator. We write mt = Txt. T is defined by: T = [w−k,...,w−1,w0,w1,...,wk] k Txt = 􏱜wixt+i =w−kxt−k +···+w0xt +···+wkxt+k i=−k 1.4. Univariate Linear Processes A very wide class of time series models is that of linear processes, including the well known AR (autoregressive), MA (moving average), ARMA (autoregressive moving average) pro- cesses with i.i.d. innovations. The simplest process is the White Noise (WN) model we have already met, defined as follows: Definition 1.6 A White Noise process is an i.i.d. series εt with E(εt) = 0 and Var(εt) = σε2. A WN process is strongly and weakly stationary with 􏱘σε2, k=0, 􏱘1, k=0, γ(k) = 0, k ̸= 0 and ρ(k) = 0, k ̸= 0. (1) Observe that infinite variance i.i.d. series such as i.i.d. Cauchy series, are not included in our definition. A time series is said to be linear, if it can be represented as a linear (possibly infinite) sum (called a linear filter) of εt, where {εt} is a WN process. See the MA(∞) process defined later. Examples of Time Series Models Based on White Noise; 1. First order moving average process - MA(1): Xt = 0.8εt−1 + εt = (1 + 0.8B)εt, εt are, for example, i.i.d. N (0, 1) random variables. 2. Second order moving average process - MA(2): Xt = 0.5εt−1 + 0.3εt−2 + εt = (1 + 0.5B + 0.3B2)εt, Using the MA-operator: Xt = T εt with T = [0.3, 0.5, 1, 0, 0] 10 εt are as above. 3. First order autoregressive process - AR(1): εt as above. 4. Random Walk: εt as above. Xt =Xt−1 +εt ⇐⇒∆Xt =εt, Xt = Sεt t=0,1,..., Xt =0.8Xt−1 +εt ⇐⇒(1−0.8B)Xt =εt, 11 2. Moving Average Processes Definition 2.1 Let εt be a white noise process. The moving average (MA) process of order q with parameter vector (ψ1,...,ψq) is given by q MA(q): Xt =ψ1εt−1 +...+ψqεt−q +εt =􏱜ψiεt−i, (2) i=0 Xt = = 􏰆ψ0B0 + ψ1B + ... + ψqBq􏰇εt = ψ(B)εt (3) q where ψ(B) = 􏰢 ψiBi. i=0 Examples: (i) An MA(1) process: Xt = 0.8εt−1 + εt. (ii) Another MA(1) process: Xt = −0.9εt−1 + εt. (iii) An MA(2) process: Xt = 0.7εt−1 + 0.4εt−2 + εt. See Figure 5 for plots of two sets of data simulated from this process. (iv) Another MA(2) process: Xt = 0.6εt−1 − 0.3εt−2 + εt. 2.1. Properties of MA Processes The MA process defined above does not involve εi with i > t, i.e. does not involve information about the future. A process which does not involve future information is called causal.
An MA model can also be defined with coefficients ψi ̸= 0 for i < 0 (see, e.g., the MA(∞) process given later). In this case the process is not causal. A finite order MA process is always stationary. We can obtain the autocovariances γ(k) and the autocorrelations ρ(k) of an MA process easily by using the fact Var(εt) = σε2, Cov(εt,εt±k) = 0 for all t and k ̸= 0. For example, for the 12 whereψq ̸=0andψ0 =1. Remarks: • E(Xt) = 0 for an MA(q) process • Using the backshift operator B we have ψ0B0εt + ψ1Bεt + ... + ψqBqεt A simulated time series following a MA(2) with psi1 = 0.7 and psi2 = 0.4 0 100 200 300 400 500 Time The second simulated time series following the same model 0 100 200 300 400 500 Figure 5: Two simulated realisations following the MA(2) Model (iii) given above. MA(1) process Xt = 0.8εt−1 + εt we have Var(Xt ) = γ(±1) = γ(±k) = ρ(0) = ρ(±1) = ρ(±k) = Var(0.8εt−1 + εt) = 0.82Var(εt) + Var(εt) = 1.64σε2, Cov(Xt, Xt+1) = Cov(0.8εt−1 + εt, 0.8εt + εt+1) = 0.8σε2, 0for|k|>1,andhence
1,
0.8σε2
1.64σε2 = 0.488, and
0for|k|>1.
ε ii+k i=0
 0,
 γ(−k),
13
k > q, k < 0. (4) Time Let γε(k) = Cov(εt, εt+k). Then γε(0) = σε2 and γε(k) = 0 for k ̸= 0. For an MA(q) process we obtain a) The autocovariances: γ(k) = = qq 􏱜􏱜ψiψjγε(k+i−j) i=0 j=0 􏰗q−k 􏰘   σ 2 􏰢 ψ ψ , 0 ≤ k ≤ q , Xt Xt −3 0 2 −4 0 2 b) In particular, Var(Xt) = σ2 = σε2 c) The autocorrelations: ρ(k) = 􏰗q􏰘 􏰢 ψi2 . i=0  􏰗q−k  􏰘􏰈􏰗 q 􏰘 􏰢 􏰢2  ψψ ψ,0≤k≤q, i i+k i i=0  0,  ρ(−k), i=0 (5) k > q,
k < 0. We see that γ(k) and ρ(k) of an MA(q) process are zero for |k| > q. Examples:
1. For the MA(1) process Xt = −0.9εt−1 + εt we have
a) γ(0) = Var(Xt) = (ψ02 + ψ12)σε2 = 1.81σε2,
b) γ(±1) = ψ1σε2 = −0.9σε2, ρ(±1) = −0.497 and c) γ(±k) = ρ(±k) = 0 for |k| > 1.
2. For the MA(2) process Xt = 0.7εt−1 + 0.4εt−2 + εt we have a) γ(0) = Var(Xt) = (1 + ψ12 + ψ2)σε2 = 1.65σε2,
b) γ(±1) = (ψ0ψ1 + ψ1ψ2)σε2 = 0.98σε2, ρ(±1) = 0.594, c) γ(±2) = ψ0ψ2σε2 = 0.4σε2, ρ(±2) = 0.242 and
d) γ(±k) = ρ(±k) = 0 for |k| > 2.
14

Empirical ACF of second simulated series Theoretical ACF of MA(2)
02468 02468
Lag Lag
Figure 6: The ACFs (estimated and theoretical) for the MA(2) process. The estimated ACF is based on the second realisation shown in Figure 5.
3. Autoregressive Processes
Autoregressive (AR) processes are another important class of time series models that are widely used to analyse data.
Definition 3.1 A pth order autoregressive process AR(p) is defined by
Xt = φ1Xt−1 + … + φpXt−p + εt, (6)
where φp ̸= 0. Equivalently, we have
p
εt =Xt −􏱜φiXt−i, (7)
i=1
Observe that these processes are closely related to regression analysis, with regressors given
by shifted (past) instances of the same time series. Setting φ0 = 1, we have
εt = φ0B0Xt − φ1BXt − … − φpBpXt = φ(B)Xt, (8) 15
ACF
0.0 0.2 0.4 0.6 0.8 1.0
ACF
0.0 0.2 0.4 0.6 0.8 1.0

p
where φ(B) = 1 − 􏰢 φiBi.
i=1
Examples: Given a WN process εt, we can define
1. An AR(1) process: Xt = 0.8Xt−1 + εt,
2. An AR(2) process: Xt = 1.1Xt−1 − 0.3Xt−2 + εt.
Replacing B in φ(B) with a variable z we obtain a pth order polynomial φ(z).
Definition3.2 Thepolynomialφ(x)iscalledthecharacteristicpolynomialofanAR(p)model. Similarly, ψ(z) is the characteristic polynomial of an MA(q) model.
Setting φ(z) = 0 we obtain the characteristic equation of an AR(p) model: φ(z)=1−φ1z−…−φpzp =0. (9)
This equation has exactly p roots z1, …, zp (possibly multiple or complex).
Note that φ(z) and {z1, …, zp} determine each other. Hence, the correlation structure and
some other important properties of Xt are determined by z1, …, zp.
To show this we introduce first the important concept of the unit circle in time series analysis. The unit circle in the complex plane is the set of all complex numbers with norm one:
z = a + ib such that |z| = 1, √√
where i = −1 denotes the imaginary unit and |z| = a2 + b2 is the Euclidean norm of z. We have seen that an MA process with summable coefficients are stationary. But this is not
true for an AR model. For instance, Xt = Xt−1 + εt, the random walk, is non-stationary. Theorem 3.3 An AR(p) process Xt is causal and stationary, iff (if and only if) all of the roots
of φ(z) lie outside the unit circle, i.e., iff
|zk| > 1, ∀1 ≤ k ≤ p.
Proof. See, for example, Theorem 3.1.1 of Brockwell and Davis (1991). ⋄
These results are based on the assumption that the process starts at t = −∞. For an AR process starting at t = 0 or t = 1, these results only hold asymptotically, i.e., Xt will converge (almost surely) to another X ̃t as t → ∞, where X ̃t is defined following the same AR model but starting at t = −∞.
If we can find a factorised version of φ(z), then it is easy to check whether the process is causal stationary. For the AR(3) process
Xt = 1.8Xt−1 − 1.07Xt−2 + 0.21Xt−3 + εt
16

we have with
φ(z) = (1 − 0.5z)(1 − 0.6z)(1 − 0.7z)
z1 =2,z2 =10/6andz3 =10/7. These are all outside the unit circle and X is therefore stationary.
If the conditions of Theorem 3.3 are fulfilled, we have

Xt = φ−1(B)εt = 􏱜 ψiεt−i, (10)
i=0 which is a causal stationary MA(∞) process with

􏱜|ψi| < ∞. i=0 Generally, ψi ̸= 0 for all i, because the reciprocal φ−1(z) of a finite order polynomial φ(z) is an infinite order polynomial. We study MA(∞) process in the following chapter. 3.1. AR(1) Processes The simplest AR process is the AR(1) process Xt = φXt−1 + εt. (11) For this process the condition of Theorem 3.3 reduces to |φ| < 1, and we have seen that ψi = φi and We obtain ∞ Xt =􏱜φiεt−i. (12) i=0 ∞∞ 􏱜ψi =􏱜φi =1/(1−φ)<∞. i=0 i=0 The ACF of an AR(1) model can be calculated using its MA(∞) representation. γ(k) = E(XtXt+k) 􏱚􏰂 ∞ 􏰃􏰂 ∞ 􏰃􏲥 =E􏱜φiε 􏱜φjε = σ ε2 φ | k | φ 2 i = σ ε2 φ|k| , t−i i=0 t+k−j j=0 􏱜∞ i=0 1−φ2 17 and ρ(k) = γ(k)/γ(0) = φ|k|, for k = 0,±1,±2,... . Note that γ(k) ̸= 0 for any k. For φ > 0, ρ(k) are always positive and
decrease monotonically. For φ < 0, |ρ(k)| decrease monotonically but with alternating signs. These results can also be obtained recursively. Note that E(Xt) = 0, γ(k) = E[XtXt+k] and E[Xt−kεt] = 0 for Multiply both sides of (11) with Xt−k and take expectations: k ̸= 0 For k = 0 For k = 1 For k ≥ 2 E(XtXt) γ(0) E(Xt−1Xt) γ(1) E(Xt−kXt) = E(Xt−kφXt−1) + E(Xt−kεt) γ(k) = φγ(k − 1) = = E(XtφXt−1) + E(Xtεt) φγ(1) + σε2. (13) (14) = = φγ(0) E(Xt−1φXt−1) + E(Xt−1εt) (15) By solving this system of equations system we obtain the same results as given above, i.e., 1 φ φ2 γ(0)= 1−φ2σε2, γ(1)= 1−φ2σε2, γ(2)= 1−φ2σε2,..., ρ(0) ≡ 1, ρ(1) = φ, ρ(2) = φ2,... ρ(k)= γ(k) = φγ(k−1) =φρ(k−1)=φk γ(0) γ(0) Example ρ(k) for AR(1) processes with φ = 0.8 and φ = −0.8, respectively. Figure 7 displays two realisations following each of these two AR(1) models with i.i.d. N (0, 1) εt . The realisations following AR models with φ > 0 and φ < 0 look quite different. The former is dominated by positive and the latter by negative correlations. 18 􏱘σε2 for k=0 AR(1) with phi = 0.8 0 20 40 60 80 100 Time AR(1) with phi = −0.8 0 20 40 60 80 100 Time AR(1) with phi = 0.8 0 2 4 6 8 Lag AR(1) with phi = −0.8 0 2 4 6 8 Lag Figure 7: Two AR(1) processes and their autocorrelations 3.2. AR(2) Processes Some properties of the more complex AR(2) processes will be discussed in this section. For a given AR(2) model we have to first check whether it is causal stationary. In some special cases this can be done by means of a factorisation of φ(z). Examples: 1) For the AR(2) process Xt = 1.1Xt−1 − 0.3Xt−2 + εt we have φ(z)=1−1.1z+0.3z2 =(1−0.5z)(1−0.6z) with the roots z1 = 2 and z2 = 1 2 . This process is causal stationary. 3 2) For the AR(2) process Xt = −1.5Xt−1 + Xt−2 + εt we have φ(z)=1+1.5z−z2 =(1−0.5z)(1+2z) with the roots z1 = 2 and z2 = − 1 . This process is not causal stationary (however, there exist 2 a non-causal stationary solution of Xt). Usually we have to check whether the conditions of Theorem 3.3 are fulfilled or not. For an AR(2) model Xt = φ1Xt−1 + φ2Xt−2 + εt 19 Xt Xt −3 −1 1 2 3 −4 −2 0 2 4 ACF ACF −0.5 0.0 0.5 1.0 0.2 0.4 0.6 0.8 1.0 these conditions are equivalent to all of the following simple conditions on the coefficients holding: (i) φ1 + φ2 < 1, (ii) φ2 −φ1 <1, (iii) −1<φ2 <1. See Appendix A for a proof of this. Using these conditions it is easy to check whether an AR(2) model is causal stationary or not. An AR(2) model is stationary if conditions (i)–(iii) all hold. It is not stationary if one (or more) of these conditions does not hold. Examples: 1) Xt = 0.556Xt−1 + 0.322Xt−2 + εt is causal stationary, because φ1 +φ2 =0.878<1,φ2 −φ1 =−0.234<1and−1<φ2 =0.322<1. 2) Xt = 0.424Xt−1 − 1.122Xt−2 + εt is not causal stationary, because φ2 = −1.122 < −1. The above stationary conditions define a triangle in the φ1-φ2 plane, which can be divided into four regions (see Figure 8): RegionA φ1 >0andφ21 +4φ2 >0(tworealrootsz1 andz2)
RegionB φ1 <0andφ21 +4φ2 >0(tworealrootsz1 andz2)
Region C φ1 < 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2) Region D φ1 > 0 and φ21 + 4φ2 < 0 (two conjugate complex roots z1 and z2) The ACFs of an AR(2) process with coefficients in different regions have different functional forms. The ACF of an AR(2) model We have seen explicit formulas for the ACFs of MA processes and AR(1) processes. The ACF of an AR(p) process with p > 1 has to be calculated in a recursive way.
Consider an AR(2) process
Xt − φ1Xt−1 − φ2Xt−2 = εt
Multiply both sides with Xt−k and take expectations. We obtain, for k = 0, 1 and 2 respec-
tively,
γ (0) −φ1 γ (1) −φ2 γ (2) = σε2 , γ (0) −φ1 γ (1) −φ2 γ (2) = σε2 ,
γ(1) −φ1γ(0) −φ2γ(1) = 0, or −φ1γ(0) +(1−φ2)γ(1) +0γ(2) = 0, (16) γ(2) −φ1γ(1) −φ2γ(0) = 0, −φ2γ(0) −φ1γ(1) +γ(2) = 0.
20

The four regions of the AR(2) parameters
BA
CD
−2 −1 0 1 2
phi1
Figure 8: AR(2) stationarity triangle
Solving these equations, we have
γ(0) = γ(1) = γ(2) =
(1 − φ2)σε2
(1 + φ2)[(1 − φ2)2 − φ21],
φ 1 σ ε2
(1 + φ2)[(1 − φ2)2 − φ21], (17)
[φ21 +φ2(1−φ2)]σε2
(1 + φ2)[(1 − φ2)2 − φ21],
and, for k ≥ 2,
This recursive formula and the initial solutions given in (17) allow us to calculate the autoco-
γ(k) = φ1γ(k − 1) + φ2γ(k − 2). (18) variances γ(k) of an AR(2) process for any finite k.
This idea can be generalized to a common AR(p) model, for which p + 1 initial values have to be solved.
For ρ(k) we have ρ(0) ≡ 1. For k = 1, dividing the second equation of (16) by γ(0), we have ρ(1) − φ1 − φ2ρ(1) = 0. (19)
Hence
ρ(1)= φ1 . (20) 1−φ2
21
phi2
−1.0 −0.5 0.0 0.5 1.0

Analogously, we obtain recursive formulas for ρ(k) for k ≥ 2:
ρ(k) = φ1ρ(k − 1) + φ2ρ(k − 2). (21)
We see, for an AR(2) model, the calculation of ρ(k) is a little bit easier than that of γ(k). Example.Calculateρ(k),k=0,1,…,50,oftheAR(2)processXt =0.5Xt−1+0.4Xt−2+εt.
Solution: Following the above formulas we have
ρ(0) = 1, ρ(1) = φ1/(1 − φ2) = 0.833, ρ(2) = φ1ρ(1) + φ2ρ(0) = 0.817,
ρ(3) = φ1ρ(2) + φ2ρ(1) = 0.742, ρ(4) = 0.698, ρ(5) = 0.645, ρ(6) = 0.602, … , ρ(48) = 0.029, ρ(49) = 0.027 and ρ(50) = 0.025.
Similarly, we can obtain recursive formulas for computing the coefficients αi in the MA(∞) representation of an AR process given in (10). But this will not be discussed further here. What we need to know are only some properties of αi, such as absolute summability.
The ACF for the four different regions (see Figure 9):
1. Xt =0.5Xt−1 +0.4Xt−2 +εt withφ1 andφ2 inareaA; 2. Xt = −0.5Xt−1 + 0.4Xt−2 + εt with φ1 and φ2 in area B; 3. Xt = −1.8Xt−1 − 0.9Xt−2 + εt with φ1 and φ2 in area C; 4. Xt =1.8Xt−1 −0.9Xt−2 +εt withφ1 andφ2 inareaD.
We see that the ACF of an AR(2) process with coefficients in region A looks like that of an AR(1) with positive coefficient and that of an AR(2) process with coefficients in region B looks like that of an AR(1) with negative coefficient.
If the roots of φ(z) are complex, then ρ(k) appears to be like damped sine waves. If the coefficients are in region C, the sign of ρ(k) changes quite frequently. If the coefficients are in region D, the sign of ρ(k) keeps the same in a half period, as by a sine function.
See Figure 10 for plots of the corresponding time series data.
3.3. Partial Autocorrelations
If we want to fit an AR(p) model to our data, we have to choose a proper order of the model. By means of the ACF, it is not easy to determine which p should be used. The partial auto- correlation function (PACF) was traditionally introduced as a tool for selecting the order of an AR model, because, whereas an AR(p) model has an ACF which is infinite in extent, its partial autocorrelations are only non-zero until lag k = p, similarly to the autocorrelations of an MA model.
The partial autocorrelation of a stationary process at lag k, denoted by α(k) is the corre- lation between Xt and Xt−k conditionally on {Xt−1, …, Xt−k+1}. In other words, the par-
22

Region B
0 5 10 15 20 25 30
Lag
Region C
0 5 10 15 20 25 30
Lag
Region A
0 5 10 15 20 25 30
Lag
Region D
0 5 10 15 20 25 30
Lag
Figure 9: Autocorrelations for four different AR(2) processes
Region B Region A
0 50 100 150 0 50 100 150
Time Time
Region C Region D
0 50 100 150 0 50 100 150
Time Time
Figure 10: Simulations of four different AR(2) processes
23
C$Xt
B$Xt
ACF
ACF
−15 −5 5 15
−4 −2 0 1 2 3
−1.0
0.0 0.5 1.0
−0.5 0.0 0.5 1.0
D$Xt
A$Xt
ACF
ACF
−20 −10 0 5 15
−4 −2 0 2 4
−0.5 0.0 0.5 1.0
0.2 0.6 1.0

tial autocorrelation α(k) is the correlation between the residuals of the regression of Xt on {Xt−1, …, Xt−k+1} and the residuals of the regression of Xt−k on {Xt−1, …, Xt−k+1}.
Assume that we are given a stationary process with autocorrelations ρ(k), k = …, −1, 0, 1, … such that ρ(k) → 0 as k → ∞. The formulas for its PACF are very complex. However, it is easy to calculate the sample PACFs from your data using software, e.g., R.
The PACF for an AR(2) process:
1. Define α(0) = 1 and α(1) = ρ(1) (no observations coming in between Xt and Xt−1). 2. Fork=2wehave
􏰞􏰞 1 ρ(1)􏰞􏰞 􏰞􏰞 ρ(1) ρ(2) 􏰞􏰞
α(2)= 􏰞 1 ρ(1)􏰞 􏰞􏰞
􏰞􏰞ρ(1) 1 􏰞􏰞
ρ(2) − ρ2(1) = 1−ρ2(1) ,
For an AR(p) process it can be shown that α(p) = φp and α(k) = 0 for all k > p. The result α(k) = 0 for all k > p is due to the fact that an AR(p) model is a pth order Markov process. For k > p, the influence of Xt−k on Xt is totally included in Xt−1, …, Xt−p.
Example1:FortheAR(1)processXt =φXt−1+εt,|φ|<1wehaveα(0)=1,α(1)=φ and α(k) = 0 for k > 1.
Example 2: For a causal AR(2) process Xt = φ1Xt−1 + φ2Xt−2 + εt we have α(0) = 1, α(±1) = ρ1 = φ1/(1 − φ2), α(2) = φ2 and α(k) = 0 for k > 2 (following the above results). Note in particular that −1 < α(2) = φ2 < 1, because it is some kind of correlation coefficient. The PACFs of MA processes are very complex and are in general nonzero for all lags, like the ACF of AR models. For the simplest invertible MA(1) process Xt = ψεt−1 + εt with |ψ| < 1, it can be shown that, after lengthy calculation, α(k) = −(−ψ)k(1 − ψ2). (22) 1 − ψ2(k+1) See Box and Jenkins (1976). For ψ < 0, α(k) < 0 ∀ k ̸= 0. For ψ > 0, α(k) has alternating signs.
We see that the ACF of a MA(q) process is zero for lag |k| > q. And the ACF of an AR(p) process is nonzero for all lags. In contrast to this, the PACF of a MA(q) process is nonzero for all lags. And the PACF of an AR(p) process is zero for lag |k| > p.
24

4. The MA(∞) and AR(∞) Processes 4.1. The MA(∞) Processes
In time series analysis we often need to consider infinite order MA processes, denoted by MA(∞), which is an generalisation of the MA(q) processes described above. This will be motivated in the following by the simple AR(1) (first order autoregressive) process.
Example: The AR(1) process Xt = φXt−1 + εt with |φ| < 1 has an MA(∞) representation ∞ Xt =􏱜φiεt−i. (23) i=0 ∞ Proof: Xt = φXt−1 + εt = φ(φXt−2 + εt−1) + εt = ... = 􏰢 φiεt−i. ⋄ i=0 A general MA(∞) process is defined through a linear filter (MA-filter) of εt: ∞ Xt = 􏱜 ψiεt−i. (24) i=−∞ The MA(∞) process is causal if ψi = 0 ∀ i < 0. Observe that the MA(∞) representation of an AR(1) process is causal. Hereafter, we will mainly consider causal MA(∞) processes ∞ Xt =􏱜ψiεt−i. (25) i=0 Without loss of generality, we will often put ψ0 = 1. The MA(∞) process is stationary if ψi, i = 0, 1, ..., are squared summable, i.e. ∞ 􏱜ψi2 <∞. (26) i=0 Provided that (26) holds, by extending the results on the ACF of an MA(q) process, we have 􏰀∞􏰁 1. γ(0)=Var(Xt)= 􏰢ψi2 σε2 <∞; i=0 􏰀∞􏰁 2. γ(k) = Cov(Xt, Xt+k) = 􏰢 ψiψi+|k| σε2, k = 0, ±1, ... . i=0 􏰀∞ 3. ρ(k)= 􏰢ψiψi+|k| i=0 􏰁􏰈􏰀∞ 􏰁 􏰢ψi2 ,k=0,±1,.... i=0 25 Many practically relevant autoregressive processes admit the causal MA(∞) representations defined in (25). The above results can hence be used to calculate (or approximate) the ACFs of autoregressive processes. A stronger condition on the coefficients ψi of an MA(∞) process is absolute summability: It is easy to show that ∞ 􏱜 |ψi| < ∞. (27) i=0 ∞∞ 􏱜|ψi|<∞=⇒􏱜ψi2 <∞, i=0 i=0 but not vice versa. A series that converges faster than i−1/2 is squared summable, and that which converges faster than i−1 is absolutely summable (see examples below). Examples: Let α0 = 1. 1. ψi =i−1/2 fori=1,2,...=⇒􏰢ψi2 =∞,􏰢|ψi|=∞. i=0 i=0 ∞∞ 2. ψi =i−1 fori=1,2,...=⇒􏰢ψi2 <∞,􏰢|ψi|=∞. i=0 i=0 ∞∞ 3. ψi =i−3/2 fori=1,2,...=⇒􏰢ψi2 <∞,􏰢|ψi|<∞. i=0 i=0 Summability conditions play an important role in the theory of time series analysis, which can be seen from the following simple lemma. Lemma 4.1 For an MA(∞) process defined by (25) we have, and ∞∞ 􏱜 |γ(k)|<∞, if 􏱜|ψi|<∞ (28) k=−∞ i=0 ∞∞ 􏱜 γ2(k)<∞, if 􏱜ψi2 <∞. (29) k=−∞ i=0 ∞∞ This also holds for the general MA(∞) given in (24). The proof of Lemma 4.1 is left as an (optional) exercise. The coefficients in (23) are obviously absolutely summable, i.e. AR(1) with |φ| < 1 is sta- tionarity with absolutely summable γ(k). A stationary process with absolutely summable γ(k) is said to have short memory. This means that a stationary AR(1) process has short memory. 26 4.2. AR(∞) Processes The AR(∞) process is given by or, equivalently, ∞ Xt =􏱜ψiXt−i +εt (30) i=1 ∞ εt =􏱜ψiXt−i (31) i=0 with ψ0 = 1. Usually, it is assumed that 􏰢∞i=0 |ψi| < ∞. Example: An MA(1) process Xt = ψεt−1 + εt with |ψ| < 1 has the following AR(∞) representation: ∞ εt = 􏱜(−ψ)iXt−i (32) i=0 with absolutely summable coefficients φi = (−ψ)i, i = 0, 1, ... . Note that Xt are observable but εt are often unobservable. Hence, the above AR(∞) model of the innovations εt is very useful in theory and practice, because now it is possible to estimate εt from the data. The question is whether such an AR(∞) model is well defined for a given MA process. The answer is that this is only possible if the MA process is invertible. 4.3. Invertibility of MA(q) Processes Invertibility: Any process Xt is said to be invertible if it can be represented as an AR(∞) process with (absolutely) summable coefficients. Note that any AR process with summable coefficients is invertible. Example: The RW Xt = Xt−1 + εt is non-stationary but invertible. There are however some simple MA models which are not invertible. Example: The MA(1) process Xt = εt−1 + εt is stationary but not invertible. Hence, we need to discuss when an MA process is invertible. For this we have the following theorem, which is closely related to Theorem 3.3 on the causal stationarity of an AR process. The characteristic equation of an MA(q) model is ψ(z)=1+ψ1z+...+ψqzq =0. (33) Again, this equation has q roots z1, ..., zq. And ψ(z) and {z1, ..., zq} determine each other. Hence, the correlation structure of Xt is determined by z1, ..., zq. 27 Theorem 4.2 An MA(q) process Xt is invertible, iff all of the roots of ψ(z) lie outside the unit circle, i.e., iff |zi| > 1, ∀1 ≤ i ≤ q.
Proof. See Brockwell and Davis (1991). ⋄
The conditions |zi| > 1 in Theorem 4.2 imply that ψ−1(B) is well defined with non-negative powers and absolutely summable coefficients. Now we have

εt = ψ−1(B)Xt = 􏱜 ψiXt−i,
i=0

which is a causal stationary AR(∞) process with 􏰢 |φi| < ∞. i=0 We see there is a dual relationship between the MA and AR processes. • A causal invertible MA process can be represented as a causal stationary AR(∞). • A causal stationary AR process can be represented as a causal, stationary (and also invertible) MA(∞). Analogously to the causal stationary conditions for AR(1) and AR(2), we have the following invertible conditions for MA(1) and MA(2) processes: 1) An MA(1) process Xt = ψεt−1 + εt is invertible iff |ψ| < 1. 2) An MA(2) process Xt = ψ1εt−1 + ψ2εt−2 + εt is invertible iff all three of the following conditions hold: (i) ψ1 +ψ2 >−1;
(ii) ψ1 −ψ2 <1; (iii) −1<ψ2 <1. Examples: 1) Xt = 1.5εt−1 + 0.75εt−2 + εt is invertible, because ψ1 +ψ2 =2.25>−1,ψ1 −ψ2 =0.75<1,and−1<ψ2 =0.75<1. 2) Xt = 0.75εt−1 − 0.5εt−2 + εt is not invertible, because ψ1 − ψ2 = 1.25 > 1.
28

5. The ARMA and ARIMA Processes 5.1. ARMA Processes
Definition 5.1 An Autoregressive Moving Average process of order (p, q) (ARMA(p, q)) is defined by
Xt = φ1Xt−1 + … + φpXt−p + ψ1εt−1 + … + ψqεt−q + εt. (34) An ARMA model combines an AR and an MA models. Equivalently, (34) can be represented
in the following way:
Xt − φ1Xt−1 − … − φpXt−p = ψ1εt−1 + … + ψqεt−q + εt, (35)
That is,
where
φ(B)Xt = ψ(B)εt, (36) φ(z) = 1 − φ1z − … − φpzp
is the characteristic polynomial of the AR part and
ψ(z) = 1 + ψ1z + … + ψqzq the characteristic polynomial of the MA part.
An AR(p) model is an ARMA(p, 0) model with ψ(z) ≡ 1, and an MA(q) model is an ARMA(0, q) model with φ(z) ≡ 1 .
Examples: Given a WN process εt, we can define 1. An ARMA(1, 1) process:
Xt = 0.8Xt−1 + 0.6εt−1 + εt
2. An ARMA(2,2) process:
Xt = 0.7Xt−1 + 0.1Xt−2 + 0.8εt−1 + 0.16εt−2 + εt
The following theorem is one of the most important theorems in time series analysis. For an
ARMA(p, q) process we have
Theorem 5.2 Assume that φ(z) and ψ(z) have no common factors. Then the ARMA(p, q)
process is
a) causal (stationary) iff all roots of φ(z) lie outside the unit circle, b) invertible iff all roots of ψ(z) lie outside the unit circle,
29

c) causal (stationary) and invertible iff all roots of φ(z) and ψ(z) lie outside the unit circle.
Theorem 5.2 combines Theorems 3.3 and 4.2.
The causal stationarity and invertibility conditions of an ARMA model do not depend on each other.
By combining the conditions given above we can check whether an ARMA(p, q), for p = 0, 1, 2 and q = 0, 1, 2, process is causal stationary and/or invertible or not.
Remark: φ(z) and ψ(z) have a common factor if there exists a function f(z) such that φ(z) = f(z)φ ̃(z) and ψ(z) = f(z)ψ ̃(z)
If a common factor exists then φ ̃(z) and ψ ̃(z) instead of φ(z) and ψ(z) should be used in Theorem 5.2.
Examples:
1) Xt − 0.6Xt−1 − 0.3Xt−2 = 1.5εt−1 + 0.75εt−2 + εt is both causal stationary and invertible.
2) Xt − 0.6Xt−1 − 0.3Xt−2 = 0.75εt−1 − 0.5εt−2 + εt is causal stationary but not invertible.
3) Xt − 0.6Xt−1 − 0.5Xt−2 = 1.5εt−1 + 0.75εt−2 + εt is invertible but not causal stationary.
4) Xt − 0.6Xt−1 − 0.5Xt−2 = 0.75εt−1 − 0.5εt−2 + εt is neither causal stationary nor invertible.
Remark: Under the assumptions of Theorem 5.2 c), an ARMA(p, q) process has on the one
hand the MA(∞) representation

Xt =􏱜αiεt−i, (37)
i=0
where α(z) = φ(z)−1ψ(z) with 􏰢 |αi| < ∞, and on the other hand the AR(∞) representation ∞ εt =􏱜βiXt−i, (38) i=0 where β(z) = φ(z)ψ(z)−1, with 􏰢 |βi| < ∞. The fact that αi or βi are absolutely summable follows since the convolution of two absolutely summable sequences is absolutely summable. Under the assumptions of Theorem 5.2 c), the autocovariances of an ARMA(p,q) process are always absolutely summable, i.e., ∞ 􏱜 |γ(k)| < ∞. k=−∞ The mean of MA, AR or ARMA processes may be non-zero: 30 Definition 5.3 An ARMA(p, q) with mean μ is defined by Xt −μ=φ1(Xt−1 −μ)+...+φp(Xt−p −μ)+ψ1εt−1 +...+ψqεt−q +εt. (39) Note that, if the mean is known, we can simply assume that μ = 0 as before. If the mean is unknown, it is not difficult to estimate μ from the data and to remove it. Hence, ARMA processes with unknown mean have similar properties as those given above. The ACF We rewrite the ARMA(p, q) process as Xt − φ1Xt−1 − ... − φpXt−p = ψ1εt−1 + ... + ψqεt−q + εt. (40) Multiplying both sides of (40) by Xt−k and taking the expectations, we obtain γ(k)−φ1γ(k−1)−...−φpγ(k−p)=σε2 􏱜 ψjψj−k (41) k≤j≤q for 0 ≤ k < max(p, q + 1), and γ(k) − φ1γ(k − 1) − ... − φpγ(k − p) = 0 (42) for k ≥ max(p, q + 1). The recursive equation (42) is the same as for an AR(p). 5.2. ARIMA Processes The class of ARMA processes is the most important class of stationary time series. However, in practice, in particular in finance and insurance, most time series observed are non-stationary. One important reason for non-stationarity is the effect due to the integration of two stationary observations. Non-stationary processes in this sense are hence called integrated ones. In the sequel, the ARMA processes will be extended to well-known linear non-stationary integrated processes. Processes Xt whose dth differenced series Yt are ARMA processes, where d = 0, 1, ..., are called ARIMA (autoregressive integrated moving average) processes, denoted by ARIMA(p, d, q). Definition 5.4 (The ARIMA(p, d, q) process) If d is a non-negative integer, then {Xt} is said to be an ARIMA(p, d, q) process if Yt := (1 − B)dXt is a causal stationary ARMA process. This definition means that Xt satisfies the difference equation φ(B)(1 − B)dXt = ψ(B)εt or φ(B)Yt = ψ(B)εt with Yt = (1 − B)dXt, (43) 31 (a) ARIMA(2,1,0) 0 50 100 150 200 250 300 Time (c) ARIMA(2,1,0) 0 5 10 15 20 25 30 Lag (b) First Diff. 0 50 100 150 200 250 300 Time (d) First Diff. 0 5 10 15 20 25 30 Lag Figure 11: An ARIMA(2,1,0) process, its first difference, and their correlograms where d = 0, 1, ..., φ(z) and ψ(z) are characteristic polynomials of the AR and MA parts, respectively, and {εt} is white noise. If d = 0 we will simply have an ARMA process. The process is stationary if and only if d = 0. This means that an ARIMA model with d ≥ 1 is non-stationary. If a time series follows an ARIMA(p, 1, q) model, then the series of the first differences, i.e. Yt = ∆Xt = Xt − Xt−1, will follow an ARMA(p, q) model and is stationary. Similarly, if a time series follows an ARIMA(p, 2, q) model, then the second differences follow an ARMA(p, q) model and are stationary. Suppose we are given observations x1, x2, ..., xn of a time series Xt. The graph of the sample autocorrelations ρˆ(k), called the correlogram, can be empirically used to check whether Xt is non-stationary or stationary. In Figure 11, we have a simulation from the following ARIMA(2, 1, 0) model displayed in part (a): (1 − B)Xt = Yt with Yt = 0.4Yt−1 + 0.2Yt−2 + εt, where εt are i.i.d. N (0, 1) random variables and where t = 0, 1, ..., 300. The differenced series yt = xt − xt−1, t = 1, 2, ..., 300, is shown in part (b). Note that the original series {xi} is of length 301, but the differenced series {yi} is of length 300. One observation is lost by taking first order differences. 32 0.0 0.4 0.8 −20 0 10 20 ACF X ACF Y 0.0 0.4 0.8 −2 0123 The estimated ACFs (the correlograms) of xt and yt are shown respectively in part (c) and (d). The two dashed lines in a correlogram are the so-called ±2/√n confidence bounds, which will be explained later. Random Walk process We now consider the random walk process, which is a simple example of an ARIMA process that has applications in financial time series analysis. For practical and theoretical reason, a random walk is often assumed to start at the time point t = 0 with known X0 = x0. Without loss of generality it is often assumed X0 = 0. Definition 5.5 (Random Walk) A random walk is the stochastic process defined by 􏱘 X0, t = 0, Xt= Xt−1+εt, t>0, (44) where {εt} is a white noise with mean zero and variance σε2.
A random walk is indeed an ARIMA(0, 1, 0) model starting from t = 1. Obviously, we have t
Xt =􏱜εi +X0. (45) i=1
E(Xt) = X0 and Var(Xt) = tσε2, (not stationary), causal.
Due to the very big variance, a realisation of a random walk often shows a (stochastic) trend,
which is however purely random.
Two realisations of length n = 301 following the same random walk model with i.i.d. N (0, 1) innovations are shown in Figure 12 together with their sample ACFs.
A random walk is also invertible, because we have
εt = Xt − Xt−1, (46)
for t > 0, where εt are i.i.d.
A time series in practice may also have a non-stochastic trend together with a stochastic one. If there is a simple linear trend in a random walk, then it can be modelled by a random walk with drift defined by
􏱘 X0, t = 0,
Xt = Xt−1 +μ+εt, t>0, (47)
where {εt} is white noise with mean zero and variance σε2, and μ ̸= 0 is an unknown constant. Now, we have
t
Xt = tμ + 􏱜 εi + X0 (48)
i=1
with mean E(Xt) = tμ + X0, which forms a linear non-stochastic trend in such a time series.
Moreover, it holds that εt = Xt − Xt−1 − μ.
33

Random Walk
0 50 100 150 200 250 300
Time
Random Walk
0 50 100 150 200 250 300
Time
ACF of RW
0 5 10 15 20 25 30
Lag
ACF of RW
0 5 10 15 20 25 30
Lag
Figure 12: Two realisations of a random walk
34
Xt
Xt
−15 −10 −5 0
−10 0 5 15
ACF
ACF
0.0 0.4 0.8
0.0 0.4 0.8

6. Forecasting for Linear Processes 6.1. Box-Jenkins Forecasting
We want to forecast the value of a future value of the time series Xt+k given the observed values of the series xt,xt−1,…,x1, for some lead time k > 0.
There are many different approaches to forecasting. We concentrate on the Box-Jenkins method to forecasting, which is based on minimising mean square forecast error under fitted ARIMA models. For simplicity, we present the method with ARMA models, the extension to ARIMA models is conceptionally straightforward but requires more calculations.
Consider the MA(∞) representation of an ARMA(p, q) model ∞
Xt = 􏱜ψjεt−j j=0
with ψ0 = 1
So
We focus on linear forecasting techniques that construct estimates of Xt+k by taking a linear
combination of xt, xt−1, xt−2, . . . or, equivalently, of εt, εt−1, εt−2, . . . (of course, the values of
ε ,ε ,… are unknown in practice, so estimates εˆ,εˆ ,… are used instead). Again, the ex- t t−1 t t−1
tension to non-linear forecasts is conceptually simple, but computationally more demanding.
More precisely, we formulate the forecast of Xt+k as ∞
Xt(k)=􏱜ajεt+k−j =akεt +ak+1εt−1 +… j=k
and use the mean square error (MSE) of the forecast E[{Xt+k − Xt(k)}2]
as measure of forecasting accuracy.
We aim to find the coefficients {aj} that produce the forecast Xt(k) with optimal (smallest)
forecasting MSE. By substituting Xt(k) in E[{Xt+k − Xt(k)}2] we obtain
MSE =E[{(εt+k +ψ1εt+k−1 +ψ2εt+k−2 +…)−(akεt +ak+1εt−1 +…)}2],
which is equivalent to
E[{εt+k +ψ1εt+k−1 +···+ψk−1εt+1 +(ψk −ak)εt +(ψk+1 −ak+1)εt−1 +…}2] 􏱚k−1 ∞ 􏲥

􏱜 ψj εt+k−j
Xt+k =
= εt+k +ψ1εt+k−1 +ψ2εt+k−2 +…+ψk−1εt+1 +ψkεt +ψk+1εt−1 +…
j=0
35
= 􏱜ψj2+􏱜(ψj−aj)2 σε2 j=0 j=k

This is a quadratic function w.r.t. the coefficients {aj } that can be minimised analytically. The optimalcoefficientsaregivenbyaj =ψj j=k,k+1,….
We conclude that the minimum MSE forecast is:
∞ Xt(k)=ψkεt +ψk+1εt−1 +···=􏱜ψjεt+k−j
j=k
which corresponds to the conditional expectation of Xt+k given the observed data; that is, the
optimal forecast Xt(k) = E(Xt+k|Xt = xt, Xt−1 = xt−1, . . .). For AR(p) processes Xt =φ1Xt−1 +φ2Xt−2 +…+φpXt−p +εt
which suggests the more natural and equivalent representation
Xt(k) = φ1Xt(k − 1) + φ2Xt(k − 2) + . . . + φpXt(k − p)
that is constructed recursively from Xt(1) = φ1xt + φ2xt−1 + . . . + φpxt−p+1, then Xt(1) = φ1Xt(1) + φ2xt + . . . + φpxt−p+2, which does not explicitly involve estimates of the unob- served process {εt}. More precisely, for an AR(p), the point prediction for Xn+k and given observations x1, …, xn is
xˆn+k =φ1(xˆn+k−1|xn,…,x1)+···+φp(xˆn+k−p|xn,…,x1), (49) where (xˆn+k−i|xn, …, x1), i = 1, …, p, are either the observations xn+k−i, if k − i ≤ 0 or the
prediction xˆn+k−i obtained before, if k − i > 0. The prediction procedure is as follows:
1. The first step optimal linear prediction for k = 1 is
xˆn+1 = φ1xn + … + φpxn+1−p.
2. The second step optimal linear prediction for k = 2 is
xˆn+2 = φ1xˆn+1 + φ2xn + … + φpxn+2−p.
3. The k-step optimal linear prediction for k > p is
xˆn+k = φ1xˆn+k−1 + … + φpxn+k−p.
Remark: Point predictions for a given an AR(p) model depend only on the last p observations (see examples below).
Remark: The prediction Xˆn+k tends to the process mean as k increases. This indicates that the process has finite memory and that the information from observations from the past decreases as we move into the future.
36

k 1 2 3 4 5 6 7 8 9 10 xˆn+k -0.840 -0.588 -0.412 -0.288 -0.202 -0.141 -0.099 -0.069 -0.048 -0.034
Table 1: The first 10 point predictions for our AR(1) model with xn = −1.2
k 1 2 3 4 5 6 7 8 9 10
xˆn+k 0.630 0.528 0.506 0.462 0.429 0.396 0.366 0.338 0.313 0.289 Table 2: The first 10 point predictions for our AR(2) model with xn−1 = 1.1, xn−1 = 0.5.
Example: From a time series x1, …, xn we obtained the following AR(1) model: Xt = 0.7Xt−1 + εt,
where εt are iid WN with mean zero. The last observation is xn = −1.2. Calculate the first 10 point predictions.
Solution: xˆn+1 = φxn = 0.7 ∗ (−1.2) = −0.84, xˆn+k = φxˆn+k−1 = φkxn for k ≥ 2. All predictions are listed in Table 1.
Example: From a time series x1, …, xn we obtained the following AR(2) model: Xt = 0.6Xt−1 + 0.3Xt−2 + εt,
where εt are iid WN with mean zero. The last two observation are xn−1 = 1.1 and xn = 0.5. Calculate the first 10 point predictions.
Solution: xˆn+1 = φ1xn + φ2xn−1 = 0.6 ∗ 0.5 + 0.3 ∗ 1.1 = 0.630, xˆn+2 = φ1xˆn+1 + φ2xn = 0.6 ∗ 630 + 0.3 ∗ 0.5 = 0.528, xˆn+k = φ1xˆn+k−1 + φ2xˆn+k−2 for k ≥ 3. All predictions are listed in Table 2.
If estimates of {εt} are required, e.g., for a general ARMA(p,q) model, we can adopt an AR(∞) representation, or construct estimates εˆ recursively as follows:
εˆ 1 = x 1 ,
εˆ 2 = x 2 − φ 1 x 1 − ψ 1 εˆ 1 , …
εˆ=x−φx …−φx −ψεˆ …−ψεˆ , t t 1 t−1 p t−p 1 t−1 q t−q
byassumingthatεt =Xt =0fort<0. These estimates can then be used in Xt(k) to produce forecasts. 37 t 6.2. Forecasting intervals and error Many time series analysis applications require quantifying the uncertainty in the forecasts and reporting forecasting intervals. Again, we focus on ARMA models and note that the generali- sation to other linear models is conceptually straightforward but requires more calculations. Recall that a causal stationary ARMA(p, q) process can be represented in MA(∞) form: ∞ Xt =􏱜ψiεi. i=0 In a manner akin to the previous section where we studied point prediction of Xˆn+k based on the conditional mean of a future observation Xn+k, here we analyse the uncertainty in our forecasts by analysing the conditional variance of Xn+k given x1, ..., xn: k−1 Var(Xn+k|Xn, ..., X1) = σε2 􏱜 ψi2. i=0 􏰀∞􏰁 Observe that ψ0 = 1 and Var(Xn+k) = σε2 􏰢 ψi2 . We have i=0 1. Var(Xn+k|Xn, ..., X1) increases as k increases. 2. Var(Xn+1|Xn,...,X1)=σε2. 3. σε2 ≤ Var(Xn+k|Xn, ..., X1) ≤ σε2 􏰢 ψi2 = Var(Xn+k) for k ≥ 2. i=0 ∞ 4. lim Var(Xn+k |Xn , ..., X1 ) = σε2 􏰢 ψi2 = Var(Xn+k ). k→∞ i=0 The approximate 95% forecasting interval (FI) for Xn+k is  􏲨􏲧 k − 1 􏲨􏲧 k − 1 ∞  n+kn+kε in+kε i X ∈ x ˆ − 2 σ 􏲧􏲦 􏱜 ψ 2 , x ˆ + 2 σ 􏲧􏲦 􏱜 ψ 2 i=0 i=0 . Remark: The FI here is for one observation, which is not the same as the FI for the sample mean x ̄. Note that lim Xˆn+k = 0 = μX and lim Var(Xn+k|Xn, ..., X1) = Var(Xn+k). This k→∞ k→∞ means that the larger k is, the less information about a future observation Xn+k is contained in the past observations. Example: Figure 13 shows a realisation of length 150 following the AR(1) model Xt = 0.8Xt−1 + εt, where εt are i.i.d. N(0,1) random variables. Note that we have now ψi = 0.8i, i = 0,1,... The first 100 observations are shown in solid lines, whereas the last 50 (see the first row of the table) in points, which are assumed to be unknown future values. 38 An example for interval forecasting by an AR(1) model 0 50 100 150 Figure 13: A realisation of an AR(1) model and the corresponding forecast intervals k 1 2 3 4 5 6 7 8 9 10 xn+k 1.403 3.032 1.687 2.337 2.509 1.888 3.956 4.148 2.917 xˆn+k 2.373 1.898 1.519 1.215 0.972 0.778 0.622 0.498 0.398 lowerbound 0.373 -0.663 -1.345 -1.826 -2.177 -2.439 -2.637 -2.788 -2.905 upperbound 4.373 4.460 4.382 4.256 4.121 3.994 3.881 3.784 3.701 Table 3: The first 10 point predictions and the corresponding 95%-forecasting intervals The point predictions for n = 100 and k = 1, 2, ..., 50, and the 95%-forecasting intervals are shown in long- resp. short-dashed lines, where x100 = 2.966. The first 10 point predictions and the corresponding forecasting intervals are listed in Table 3. To analyse the forecast error, we note that Xt(k) coincides with the MA(∞) representation of Xt+k with the ε’s between t + 1 and t + k set to zero, as this is their expected value. Consequently, Xt+k = (εt+k + ψ1εt+k−1 + ψ2εt+k−2 + . . . + ψk−1εt+1) + (ψkεt + ψk+1εt−1 + . . .) 1.375 0.318 -2.996 3.633 = k−1 􏱜ψjεt+k−j j=0 􏰉 􏰊􏰋 􏰌 ∞ + 􏱜ψjεt+k−j (*) j=k 􏰉 􏰊􏰋 􏰌 ↓↓ the forecast error the forecast Rt(k) Xt(k) Rt(1) = εt+1, i.e., the one-step-ahead forecast error is simply the next noise term. Hence εt+1 = Xt+1 − Xt(1). 39 -2 0 2 4 Note this useful fact! Rt(2) = εt+2 + ψ1εt+1 Rt(3) = εt+3 + ψ1εt+2 + ψ2εt+1 etc. E[Rt(k)] = 0, so the forecast is unbiased. The variance of the forecast error is j=0 j=0 Xt(k) ± 2 s.e.(k) (the 95% probability is calculated under the assumption that the coefficients {φj } are perfectly known, which is not the case in practice). 􏰂k−1 􏰃 V (k) = Var(Rt(k)) = 􏱜 ψj2 σε2. This gives the standard error of the forecast (the “standard error of prediction”) as 􏰂k−1 􏰃1/2 s.e.(k) = 􏰜V (k) = 􏱜 ψj2 σ􏲢ε Assuming normality of the ε’s, we can derive forecast intervals that contain Xt+k with 95% probability Notice that: 1. V(1) = σε2 V (2) = (1 + ψ12)σε2 V(3) = (1+ψ12 +ψ2)σε2 etc. Clearly V (k) increases with k: this matches our intuition. 2. For stationary ARMA processes 􏰂∞􏰃 V(k)→ 􏱜ψj2 σε2 =σX2 ask→∞. j=0 Soforlargek,V(k)≈σX2 . 3. The above results can be generalised to non-stationary processes i.e. ARIMA(p, d, q) with d = 1,2,.... In that case, the coefficients ψj’s do not tend to zero, and so V(k) diverges as k increases. 4. One-step-ahead forecast errors are independent (being single ε terms). However for other lead times, the errors are correlated (generally positively). The forecast errors tend to be of the same sign and so the forecasts tend to be all too high or all too low. 40 7. Estimation for Univariate Linear Processes We will now discuss the estimation of • the mean E(Xt) = μ, • the autocovariances γ(k) = E[(Xt − μ)(Xt+k − μ)], and • the parameters, under the assumption that Xt are an ARIMA(p, d, q) process. The data will be one realisation x1, ..., xn of Xt, which will be called a time series. Furthermore, the selection of the unknown model and the application of the estimated model for forecasting will also be discussed. We will mainly consider the so-called large sample properties of an estimator, based on the assumption that we have a relatively long time series. 7.1. Estimation of μ The expected value μ = E(Xt) can be estimated by the sample mean 1 σ2 so that nX For an estimator θˆ of an unknown parameter θ, the quantity 􏰆 􏰇􏰄􏰆􏰇􏰅2 ˆ2ˆˆ E [θ−θ] = E θ −θ +Var(θ) ˆ is called the mean squared error (MSE) of θ. ˆˆˆ ̄ 1􏱜n Xi. (50) IfXt =εt arei.i.d.withE(Xt)=μandVar(Xt)=σX2 ,wehaveE(X ̄)=μandVar(X ̄)= μˆ = X = n 􏰕 2􏰖 limE(μˆ−μ) =0. (51) n→∞ If MSE(θ)→ 0 as n → ∞, then θ is said to be consistent, denoted by θ → θ, as n → ∞. In the i.i.d. case X ̄ is consistent. 7.2. Properties of X ̄ If Xt is stationary, then X ̄ defined above is clearly unbiased, i.e., E(X ̄) = μ. Hence E[X ̄ − μ]2 =Var(X ̄). 41 i=1 In this section we will give some important properties of X ̄ as an estimator of μ. Proofs are given in Appendix B. Theorem 7.1 Let {Xt; t = 1, 2, ...} be a time series satisfying lim E(Xt) = μ, t→∞ lim Cov(X ̄,Xt) = 0, t→∞ where X ̄ is as defined before. Then lim E[(X ̄ − μ)2] = 0. t→∞ The conditions of Theorem 7.1 mean that E(Xt) is asymptotically a constant, and any single observation does not dominate the covariance. Note that stationarity is not required. Theorem 7.2 Let {Xt} be a stationary time series with mean μ and autocovariances γ(k) such that γ(k) → 0 as k → ∞. Then X ̄ is a consistent estimator of the mean μ. Example: For any causal stationary ARMA process Xt we have γ(k) → 0. Hence, X ̄ → μ as n → ∞. Example: Let Z be a Bernoulli random variable with distribution P(Z = 1) = P(Z) = 0 = 0.5. Define 􏱘 1, forZ=1, Xt =sign(Z−0.5)= −1, forZ=0, t=1,2,.... It is easy to show that {Xt} is a stationary process with zero mean and γ(k) ≡ 1 for all k, i.e. γ(k) ̸→ 0ask → ∞. ForthisprocesswehaveeitherX ̄ ≡ 1(forz = 1)orX ̄ ≡ −1(for z = 0). None of them is equal to or converges to μ = 0. Theorem 7.3 Assume that {Xt} is a stationary time series with absolutely summable autoco- variances γ(k). Then X ̄ → μ, as n → ∞. Furthermore, ∞ limnVar(X ̄)= 􏱜 γ(k). n→∞ k=−∞ The asymptotic variance of X ̄ is larger than that of the sample mean of i.i.d. random variables ∞∞ Yt with the same variance as Xt, if 􏰢 γ(k) > γ(0), and smaller if 􏰢 k=−∞ k=−∞
For a causal stationary ARMA(p, q) process, the above result reduces to ̄ σε2􏰂􏰢qj=0ψj􏰃2
42
γ(k) < γ(0). (52) Var(X)≈ n 1−􏰢p φ . i=1 i Example: Let x1, x2, ..., x400 be an observed time series following the theoretical model Xt −μ=0.5(Xt−1 −μ)+0.3(Xt−2 −μ)+εt, where εt are i.i.d. N(0,σε2) random variables. Calculate the asymptotic variance of the sam- ple mean x ̄. Furthermore, assume that yt, t = 1, 2, ..., 400, are i.i.d. random variables with Var(Yt) = Var(Xt) and unknown mean. Compare Var(y ̄) with the asymptotic variance of x ̄, where y ̄ is the sample mean of yt. Solution: We have .σε2 −2252 2 Var(x ̄) = 400(1 − 0.5 − 0.3) = 400σε = 0.0625σε . (1 + φ2)[(1 − φ2)2 − φ21] (1−0.3)σε2 0.7 2 2 The following theorem provides a CLT for a general linear process, which is a linear filter of an i.i.d. white noise εt with Var(εt) = σε2. Theorem 7.4 Let Xt be a causal stationary process with MA(∞) representation ∞ Xt =􏱜αjεt−j j=0 ∞∞ with 􏰢 |αj| < ∞, 􏰢 αj ̸= 0, and where the εt are i.i.d. random variables with E(εt) = 0 j=0 j=0 Using the formula for γ(0) of an AR(2) model, we obtain Var(Yt) = Var(Xt) = γ(0) = (1 − φ2)σε2 = (1 + 0.3)[(1 − 0.3)2 − 0.52] = 1.3 ∗ 0.24σε = 2.244σε . We have Var(y ̄) = Var(Yt)/400 = 0.0056σε2, i.e., asymptotically, Var(x ̄) > 10Var(y ̄).
and Var(εt) = σε2. Then where
√nX ̄ →D N(0,V),
∞ 􏰂∞􏰃2
V=􏱜γ(k)= 􏱜αj k=−∞ j=0
(53)
The sign →D means convergence in distribution.
Note that Theorem 7.4 holds for all ARMA processes.
We can give a confidence interval for μ of an ARMA model.
Since √nX ̄ tends to a normal distribution, X ̄ is called √n convergent.
43
σε2.

In the general case with E(Xt) = μ ̸= 0, we have
√n(X ̄−μ)→D N(0,V), (54)
where V is the same as in (53).

The assumption 􏰢 αj ̸= 0 is necessary. j=0

Example: LetXt = εt −εt−1. Nowwehave 􏰢αj = 1−1 = 0andhencetheresultsof
j=0
Theorem 7.4 do not hold for this Xt.
Example: Continue the example about the variances of x ̄ and y ̄. Assume that there σε2 = 1, we have the standard deviations of x ̄ and y ̄ are about 0.25 and 0.075, respectively. For simplicity, we can use the standard normal quantile Z0.025 = 1.96 =. 2 to calculate the approximate 95% confidence interval. Now an approximate 95% confidence interval, e.g. for μX, is simply x ̄ ± 2 × SDX ̄ . Assume that we obtained x ̄ = 10.5 and y ̄ = 15.15 from the data. Then the approximate 95% confidence intervals are μX ∈ [10, 11] and μY ∈ [15, 15.30]. The length of the former is more than three times of that of the latter.
Following the CLT √n(X ̄ − μ) →D N (0, V ) we have, asymptotically, X ̄ − μ
∼ N(0,1),
where 􏰜V /n is the asymptotic standard deviation of X ̄ . For an ARMA model we have simply
􏰏
􏰂􏰢qj=0ψj 􏰃2 V = σε 1 − 􏰢p φ
V n
2
.
i=1 i
This means, for any (upper) normal quantile Zα/2 we have, with about (1 − α) cover proba-
bility,
or equivalently,
􏰝V 􏰝V −Zα/2 n ≤X ̄−μ≤Zα/2 n,
􏰙 􏰝V 􏰝V􏰚 μ∈ X ̄−Zα/2 n,X ̄+Zα/2 n .
Example: Let x1, x2, …, x900 be an observed time series following the theoretical model Xt − μX = −0.8εt−1 + 0.3εt−2 + εt,
where εt are i.i.d. N (0, σε2 ) random variables with σε2 = 9. Furthermore, assume that yt , t = 1, 2, …, 900, are i.i.d. random variables with Var(Yt) = Var(Xt) and unknown mean.
44

Assume we have x ̄ = 35.25 and y ̄ = 25.75. Calculate Var(x ̄) asymptotically. Calculate Var(y ̄) and compare it with Var(x ̄). And then construct the approximate 95% confidence intervals of μX and μY , and compare them with each other.
Solution: Xt is MA(2). We have
n
and γ(0) = (1 + ψ12 + ψ2)σε2 = 1.73σε2 and hence Var(y ̄) = Var(Yt)/n = γ(0)/900 = 0.0173.
Asymptotically, Var(x ̄) =. 0.1445Var(y ̄) and SDX ̄ = 0.05, SDY ̄ = 0.1315. The approximate 95% confidence intervals are:
. σ2
Var(x ̄) = ε (1 − 0.8 + 0.3)2 =
0.25·9 900
= 0.0025.
and
μX ∈[x ̄−2SDX ̄,x ̄+2SDX ̄]=[35.15,35.35] μY ∈ [y ̄ − 2SDY ̄ , y ̄ + 2SDY ̄ ] = [25.487, 26.013].
In this example, the asymptotic variance of x ̄ is about 14.5% of that of y ̄ for an i.i.d. random variable with the same variance. Hence, the asymptotic standard deviation of x ̄ is also much smaller than that of y ̄. Consequently, the confidence interval of μX is much shorter than that of μY .
This example provides a case where the estimation of the unknown mean in dependent data is more accurate than the estimation in independent data.
7.3. Estimation of the Autocorrelation Function
Given a stationary process with mean μ and autocovariances γ(k), two reasonable estimators of γ(k) are (sample size n):
1 n−k
􏱜(Xt − X ̄ )(Xt+k − X ̄ ) (55)
n t=1
1 n−k
􏱜(Xt − X ̄ )(Xt+k − X ̄ ) (56)
Remark: If μ is known, we can use μ instead of X ̄ in the above definitions. Now the error caused by X ̄ is avoided. However, it can be shown that, for fixed k, the asymptotic properties of γˆ(k) or γ ̃(k) are the same for cases with known or unknown μ.
45
γˆ(k) =
Note that γˆ(0) = γ ̃(0).
and
γ ̃(k) =
for k = 0,1,…,n−1. For k = −(n−1),…,−1 we define γˆ(k) = γˆ(−k) and γ ̃(k) = γ ̃(−k).
n − k t=1

The autocorrelations can be estimated by
ρˆ(k) = γˆ(k) (57)
or
γˆ(0)
ρ ̃(k) = γ ̃(k). (58)
γ ̃(0)
For k = 0 we have ρˆ(0) = ρ ̃(0) ≡ 1. Hence, for estimating ρ(k) we only need to discuss the
properties of these estimators with k ̸= 0. Which of these two estimators should be used?
It might appear that γ ̃(k) is more appropriate than γˆ(k) because we just have n − k product terms in the sum. However,
• MSE(γˆ(k)) 0, the following
results hold:
E{ρˆ(i)} = n − |i|ρ(i) + O(n−1) n
and
where wij are some constants depending on ρ(k).
Cov{ρˆ(i),ρˆ(j)} = wij/n, (61)
Define ρ(k) = (ρ(1), …, ρ(k))′ and ρˆ(k) = (ρˆ(1), …, ρˆ(k))′. The following theorem shows that ρˆ(h), h = 1, 2, …, k, for fixed k ≥ 1 are asymptotically normal.
Theorem 7.7 Under the assumptions of Theorem 7.5 and for fixed k ≥ 1, we have ρˆ→D N(ρ,n−1W),
where W = (wij ) with wij defined above.
Example: Let Xt = {εt} be white noise with E(εt) = 0 and Var(εt) = σε2 (ρ(k) = 0 for
k ̸= 0)
􏱘1, ifi=j wij = 0, otherwise.
ρˆ(1), ρˆ(2), …, ρˆ(k), are asymptotically independent with mean zero and variance 1 . Therefore, n
95% of the sample autocorrelations should lie between the bounds 1.96. 2
±√n =±√n. 47

These are the so-called ±2/√n confidence bands given on a correlogram in R for the sample ACF, which shows empirically, whether the underlying process could be white noise or not.
Remark: Note that this results is obtained under the i.i.d. assumption. It should be noticed that, if more than 5% of the ρˆ(k) lie outside these bounds, then we can say that the underlying process is possibly not an independent white noise process. However, if more than 95% of ρˆ(k) lie between these two bounds, we cannot say that the process is i.i.d., because the second order properties of an i.i.d. process and an uncorrelated white noise process are the same.
48

8. Estimation of the ARMA Model
In this section we will discuss the estimation of the unknown parameters of an ARMA model. At first it is assumed that the orders p and q are known. For an ARMA(p, q) model with
φ(B)Xt = ψ(B)εt (62) the unknown parameters are θ = (σε2; φ1, …, φp; ψ1, …, ψq). More advanced analysis can be
performed to check whether the normal distribution assumption is appropriate or not.
8.1. Estimation for an AR(1) Model
Xt = φXt−1 + εt, (63) where |φ| < 1 and εt is a white noise process. We have two unknown parameters σε2 and φ. Note that φ = ρ(1). Hence, an estimator of ρ(1) will also be an estimator of φ. Given observations x1, ..., xn, the simplest estimator of φ is 􏰢n−1(xi − x ̄)(xi+1 − x ̄) i=1 ˆ φ = ρˆ(1) = 􏰢ni=1(xi − x ̄)2 . (64) If we assume that E(Xt) = 0, the use of x ̄ is not necessary and φˆ is given by i=1 i The properties of φˆ are the same as discussed in the last section. In particular, |φˆ| ≤ 1 (why?), and under additional conditions it is √n-consistent and asymptotically normally distributed. Example: AR(1) model with zero mean: From a time series x1, ..., x400 we have obtained 400 399 􏱜 x2i = 1012.74 and 􏱜 xixi+1 = 798.15. i=1 i=1 Then we have φˆ = ρˆ(1) = 798.15/1012.74 = 0.788. The fitted model is Xt = 0.788Xt−1 + εt. This example is indeed calculated using a realisation following the AR(1) model given by Xt =0.8Xt−1 +εt,whereεt arei.i.d.N(0,1)randomvariables. 49 􏰢n−1 xixi+1 i=1 ˆ φ = ρˆ(1) = 􏰢n x2 . (65) The estimation of an AR(1) model in the case with unknown mean μX is similar. Now we should first calculate x ̄ (from the sum of all xt, 􏰢nt=1 xt), and then calculate the two sums 􏰢nt=1(xt − x ̄)2 and 􏰢nt=1(xt − x ̄)(xt+1 − x ̄). If information about these three sums are given, then it is enough for all further calculations. Example: Assume that we have a time series x1,...,x900 following an AR(1) model with unknown mean. From the data we obtained and Then and 900 900 􏱜 xt = 27254.45 , 􏱜(xt − x ̄)2 = 13347.46 t=1 t=1 899 􏱜(xt − x ̄)(xt+1 − x ̄) = 8385.93. t=1 from which we have Var(Xt) = Var(Yt) = γ(0) = 􏰂∞􏰃 􏱜 φ2i i=0 1 􏱜n σε2 = 1/(1 − φ2)σε2. By definition, we have x ̄ = 27254.45/900 = 30.28 899 φˆ = ρˆ(1) = 􏱜(xt − x ̄)(xt+1 − x ̄)/ 􏱜(xt − x ̄)2 = 8385.93/13347.46 = 0.6283. t=1 t=1 The fitted model is Xt − 30.28 = 0.6283(Xt−1 − 30.28) + εt. For an AR(1) model, the unknown innovation variance σε2 can also be estimated from the information given in the above examples. Let Yt = Xt − μ, Yt is an AR(1) process with zero mean, unknown parameter φ and Var(Yt) = Var(Xt). We know Yt has the MA(∞) representation 900 ∞ Yt =􏱜φiεt−i, i=0 x2t . σˆε2 =γˆ(0)(1−φˆ2). 50 γˆ(0) = n The unknown innovation variance σε2 can hence be estimated from φˆ and γˆ(0), t=1 Example: In the first example above, where it is assumed that μ = 0, we have simply 1 􏱜n 1 􏱜n x2t = 1012.74/400 = 2.532 γˆ(0) = n σˆε2 = (1 − φˆ2)γˆ(0) = 2.532(1 − 0.7882) = 0.960. and Example: In the second example above, we have t=1 (xt − x ̄)2 = 13347.46/900 = 14.831 σˆε2 = (1 − φˆ2)γˆ(0) = 14.831(1 − 0.62832) = 8.976. n t=1 and Furthermore, we have Var(x ̄) =. 8.976 (1 − 0.6283)−2 = 0.0722. And SDx ̄ =. 0.27. The 900 approximate 95%-CI of μ is μ ∈ [30.28 − 2 ∗ 0.27, 30.28 + 2 ∗ 0.27] = [29.74, 30.82]. Remark:DenotebyYt =Xt −μandX1t =Xt−1.Thenwehave,fort=2,...,n, We have Yt = φX1t + εt. E[Yt|(X1t = xt−1)] = φxt−1. That is we obtain a linear regression function f(x1t) = a + bx1t with a = 0 and b = φ, which is called a regression through the origin because of the fact f (0) = 0. The proposed φˆ here is indeed an approximate least squares estimator of φ in the above regression model. This idea can be extended to general AR(p) models. 8.2. The Yule-Walker Estimators of an AR(p) Model For an AR(p) Y with unknown mean μ we have Yt −μ=φ1(Yt−1 −μ)+···+φp(Yt−p −μ)+εt. (66) LetXt =Yt −μ,Weobtain Xt =φ1Xt−1 +···+φpXt−p +εt. (67) Remark: This is a linear regression model (through the origin). Although this is not a usual regression model, because the regressors Xt−1, ..., Xt−p depend strongly on each other, the unknown parameters φ1, ..., φp can still be estimated using the method of least squares as in standard regression analysis. A solution based on such a least squares method will lead to the same result as the following Yule-Walker estimation. 51 Consider again the equations for γ of an AR(p): γ(0) − γ(1) − φ1γ(1) − φ1γ(0) − − ... − φpγ(p) = σε2, − ... − φpγ(p−1) = 0, − ... − φpγ(p−2) = 0, . . . − ... − φpγ(0) = 0, φ2γ(2) and φ2γ(1) φ2γ(0) . . . φ1γ(1) − γ(p) − φ1γ(p−1) − φ2γ(p−2) The last p equations can be rewritten as γ(2) − γ(1)  γ(2)  γ(2)   = .  γ(p) 􏰉 􏰊􏰋 􏰌 􏰉 γ γ(0) γ(1) γ(2) ... γ(p−1) γ(1) γ(0) γ(1) ... γ(p−2) γ(2) γ(1) γ(0) ... γ(p−3) . . . . γ(p−1) γ(p−2) γ(p−3) ... γ(0) 􏰊􏰋 φ(1)  φ(2)  φ(3)     (68) .  φ(p) 􏰌􏰉 􏰊􏰋 􏰌 Γφ γ = Γφ ⇐⇒ φ = Γ−1γ provided the inverse of Γ exists. We now replace γ and Γ by their estimates γˆ and Γˆ to obtain our estimates for φ: φˆ = Γˆ−1γˆ (69) provided the inverse of Γˆ exists. Generally, Γˆ−1 exists, because Γˆ(k) is positive semi-definite. Results based on (68) are called Yule-Walker Estimators. Remark: If all γˆ(k) are replaced by ρˆ(k), the solutions will not change (why?). Once we have estimates for γ and φ we obtain an estimate for σε2 from γ(0) − φ1γ(1) − φ2γ(2) − . . . − φpγ(p) = σε2 or from the explicit formulas for σε2 for AR(1) and AR(2) processes. Two special cases: 1. For AR(1) with p = 1, there is only one equation in (69), i.e. φˆ1 = [γˆ(0)]−1γˆ(1) = γˆ(1) = ρˆ(1), as proposed earlier. γˆ(0) 2. For AR(2) with p = 2, by replacing all γˆ(k) with ρˆ(k), we have 􏰀φˆ1 􏰁 􏰀 1 ρˆ(1)􏰁−1􏰀ρˆ(1)􏰁 . φˆ = ρˆ(1) 1 ρˆ(2) 2 52 The solutions are ˆ 􏰀 1 − ρˆ(2) 􏰁 φ1 = ρˆ(1) 2 Example: From a time series with n = 400, x1, ..., x400, we have x ̄ = 10.06, γˆ(0) = 1.6378, γˆ(1) = 0.9176 and γˆ(2) = 0.1969. Assume that the theoretical model is an AR(2) with unknown mean μ. Estimate and write down the model. Find a 95%-CI for μ. Solution: We have ρˆ(1) = 0.9176/1.6378 = 0.5603 and ρˆ(2) = 0.1969/1.6378 = 0.1202. ˆ 􏰀 1 − ρˆ(2) 􏰁 φ1 = ρˆ(1) 2 The estimated AR(2) model is = 0.5603 1 − 0.1202 2 = 0.7185 1 − ρˆ (1) ˆ and φ2 = 1 − 􏰀 1 − ρˆ(2) 􏰁 2 . 1 − ρˆ (1) and φˆ2 =1− 1−ρˆ(2) =1− 1−0.1202 =−0.2824. 22 1 − ρˆ (1) 1 − 0.5603 1 − ρˆ (1) 1 − 0.5603 Xt − 10.06 = 0.7185(Xt−1 − 10.06) − 0.2824(Xt−2 − 10.06) + εt. Now, insert φˆ1 and φˆ2 into the formula γ(0) = (1 − φ2)σε2 (1 + φ2)[(1 − φ2)2 − φ21] we have γˆ(0) = 1.5839σε2 =⇒ σε2 = 1.6378/1.5839 = 1.034. Furthermore, we have .σε2 ˆˆ−21.034 −2 Var(x ̄) = 400(1 − φ1 − φ2) = 400 (1 − 0.7185 + 0.2824) = 0.0081. Hence, SDx ̄ =. 0.09. The approximate 95% CI for μ is μ ∈ [10.06−2∗0.09, 10.06+2∗0.09] = [9.88, 10.24]. Remark: The estimation of an AR(p) model is not difficult, because it is related to a regression problem. The first p + 1 estimates of the autocovariances γˆ(k), k = 0, 1, ..., p, (together with n) contain all information we need to estimate an AR(p) model using the above proposal. 8.3. Least Squares Estimators of an ARMA Model We now describe the basic approach to estimate the parameters of an ARMA(p, q). Assume that E(Xt) = 0. For an AR(p) we have εt = Xt − φ1Xt−1 − ... − φpXt−p. 53 The least squares estimators φˆ1, ..., φˆp are obtained by minimising the squared sum nn Q:=􏱜(εˆ)2=􏱜(X−φˆX −...−φˆX )2. t t1t−1 pt−p t=p+1 t=p+1 This approach does not apply if the model has an MA component. In that case we have εt = Xt − φ1Xt−1 − ... − φpXt−p − ψ1εt−1 − ... − ψqεt−q, (70) which involves unobservable ε-values from the past. However, an approximate recursion can be used to carry out a least squares estimation of a general ARMA model. The parameters φ , ..., φ and ψ , ..., ψ , εˆ can be approximated as follows: 1p1qt 1. Calculate μˆ = x ̄ and define yt = xt − x ̄, the centralised data again. 2. Putεˆ =0andy =0fort≤0,becausewedonothaveinformationaboutthem. Let n tt 3. Set for t > max(p, q).
The effect due to the above approximation is negligible for large n.
εˆ1 = y1
εˆ2 =y2−φ1y1−ψ1εˆ1
.
εˆ = y−φy −…−φy −ψεˆ −…−ψεˆ
t t 1 t−1 p t−p 1 t−1
q t−q
􏱜2 Q(φ ,…,φ ;ψ ,…,ψ ) = εˆ .
(71)
1p1qt t=1
One approach to estimate φ1, …, φp and ψ1, …, ψq is to minimise Q over all values of φ1, …, φp and ψ1, …, ψq subject to a causality constraint (i.e., the causal stationary regions of these pa- rameters). These are the so-called approximate least squares estimators of an ARMA model. Note that, an estimated ARMA model will always be causal stationary, because parameters outside the causal stationary regions will not be considered. This is a restriction of the esti- mation method, which does not mean that the underlying model should certainly be a causal stationary ARMA model. It is possible to perform a hypothesis test to assess the goodness-of- fit of the ARMA model, but this is beyond the scope of this course.
Remark: Another commonly used method for estimating an ARMA model is the maximum- likelihood (ML) method, which will not be discussed in our lecture. When the distribution of Xt is normal, then the least squares method and the ML method are asymptotically equivalent.
54

8.4. Selection of an ARMA(p, q) Model
Assume that the underlying process Xt is an ARMA(p0, q0) model with unknown (p0, q0). The question is how the unknown parameters (p0, q0) can be estimated. This is the so-called model selection problem, which also occurs in other areas of statistics. As mentioned previously, the partial ACF can be used for selecting an AR model. A more general approach useful for ARMA models is to select the orders p0 and q0 by optimising some statistical information criteria that balances model-fit-to-data and model complexity.
Note that, for given (p, q), an ARMA(p, q) model is estimated by minimising the least squares
criterion
However, the minimisation of the criterion Q cannot be directly used for selecting p and q because this would lead to model overfitting (i.e., p → ∞ and q → ∞ as n increases). For model selection it is hence necessary to penalise model complexity.
Q =
n
􏱜2
εˆ . (72)
t t=1
For given (p, q), denote the minimised value of Q as Q(p, q). Then define
σˆ2 =Q(p,q). (73)
p,q n Notethat,Qisthesquaredsumofεˆ.Hence,σˆ2 isinfactanestimatorofσ2.Twoinformation
criteria based on σˆ2 that are widely used in the literature are p,q
tp,q ε
1. the AIC (Akaike’s Information Criterion, Akaike, 1978)
AIC(p,q)=lnσˆ2 +2p+q, (74)
p,q n
2. the BIC (Bayesian Information Criterion, Akaike, 1977, Schwarz, 1978)
BIC(p,q)=lnσˆ2 +ln(n)(p+q). (75) p,q n
The penalty term in the BIC is larger than that in the AIC. Hence, the orders selected following the BIC are not larger than those selected following the AIC.
The main difference between these two methods is that the estimates pˆ and qˆ related to BIC are consistent estimators of p0 and q0 (i.e. they tends to p0 and q0 respectively as n → ∞), while those related to AIC are not consistent. For this reason we prefer to use BIC.
Simulated examples: Three realisations are generated following the AR(4) model Xt = 0.4Xt−1 + 0.2Xt−2 + 0.0Xt−3 − 0.3Xt−4 + εt
with i.i.d. N (0, 1) εt . Table 4 shows the estimated BIC’s for p = 0, 1, …, 8, where pˆ with minimal BIC is indicated with a small star.
The estimated models obtained from these three time series using pˆ as the AR order are:
55

p012345678 1.S. 0.446 0.208 0.223 0.224 0.137* 0.158 0.179 0.188 0.201 2.S. 0.233 0.115 0.119 0.133 0.091* 0.122 0.153 0.181 0.207 3.S. 0.548 0.272 0.272 0.293 0.183* 0.211 0.237 0.268 0.297
Table 4: BICs and pˆ for the three simulated time series
1. Xt = 0.407Xt−1 + 0.254Xt−2 − 0.027Xt−3 − 0.323Xt−4 + εt 2. Xt = 0.326Xt−1 + 0.111Xt−2 − 0.029Xt−3 − 0.264Xt−4 + εt 3. Xt = 0.462Xt−1 + 0.142Xt−2 + 0.088Xt−3 − 0.358Xt−4 + εt.
8.5. Selection of d in an ARIMA(p, d,q) Model
Assume that now the time series Xt follows an ARIMA(p, d, q) with d ∈ {0, 1}. We have also to determine whether Xt is integrated (with d = 1) or not (with d = 0). This can be solved in the following way.
1. Set d = 0. Estimate ARMA models from the original data x , …, x and select pˆ , qˆ 1n00
using BIC. We obtain the minimised BIC in this case, denoted by BIC(0).
2. Set d = 1. Let yt = ∆xt = xt −xt−1 for t = 2,…,n. Estimate ARMA models from the
y , …, y and select pˆ , qˆ using BIC. We obtain the minimised BIC in this case, denoted 2n11
by BIC(1).
3. ComparethetwovaluesofBIC.Selectd=0ifBIC(0)0,αi ≥0fori=1,..,p.Weoften
make the assumption that εt ∼ N (0, 1).
Here ht > 0 is the conditional variance of Zt, which depends on some past observations. This is what conditional heteroscedasticity means. The larger the squared past observations are, the larger the conditional variance.
Z(t) can now replace the white noise term in linear models or can be used directly to model a time series.
We obtain:
E[Zt|Zt−1,…] = E􏰆􏰜htεt|Zt−1,…􏰇 = E􏰆􏰜ht|Zt−1,…􏰇E[εt] = 0
since εt is independent of ht. This is a common property of many financial return series (the
so-called Martingale property). Note, that E [Z(t)] = 0 too. We obtain for the covariance function of Z:
Cov(Zt, Zt+k)
= E [ZtZt+k]
= E􏰆􏰜hε􏰜h ε 􏰇
t t t+k t+k
= E􏰆􏰜hε􏰜h 􏰇E[ε ]
=0
67
tt t+k t+k

since εt+k , for k > 0, is independent of all other three terms and has mean zero. This is another important property of financial returns, that is, they are often uncorrelated with each other.
However, it is clear that Zt and Zt+k are not independent, because the variance of Zt+k depends on the past of Z, and therefore on Zt. An ARCH model is an example of uncorrelated random variables which are not independent.
The process is called a non-linear process because there are non-linear correlations between the observations (and no linear correlation). In particular, here Z2 and Z2 are correlated.
t t+k
Furthermore, note that the conditional distribution of an ARCH model is normal if εt ∼
N (0, 1). Hence, the conditional moments of any order exist.
However, this does not mean that the unconditional variance of an ARCH model exists.
If Var(Zt) < ∞, we have Var(Zt) = E(Zt2). Assume further that the process started from the infinite past. It can be shown that an ARCH process with finite variance is weakly (and strongly) stationary. We then obtain Var(Zt) = E 􏰕Zt2􏰖 = E 􏰕htε2t 􏰖 =E􏰕α+αZ2 +...+αZ2 􏰖 0 1 t−1 p t−p = α0 + α1Var(Zt) + ... + αpVar(Zt) since Z is stationary. We solve this equation for Var(Zt) and obtain Var(Zt) = α0 . (76) This leads to the condition 1−α1 −...−αp p 00,αi ≥0fori=1,…,p,βj ≥0forj =1,…,q
A GARCH model has similar properties as those for an ARCH model, with some key differ-
ences that make the GARCH model more practically relevant.
In particular, in GARCH models the variance of Z depends on Z2 and on the variance of
t t−1 Z , while it only depends on Z2 in ARCH-models.
t−1 t−1
Figure 20 shows the acf of the squared residuals of a GARCH(1,1) model fitted to the daily returns of the FTSE100.
69
ACF ACF
0.0 0.4 0.8 0.0 0.4 0.8

11. Further Reading
• Box, G.E.P. and Jenkins, G.M. (1976). Time Series Analysis. Holden-Day.
• Brockwell, P.J. and R.A. Davis (1991). Time Series: Theory and Methods (2nd Ed.).
Springer.
• Chatfield, C. and Xing, H. (2019). The Analysis of Tine Series: An Introduction with R (7th Ed). CRC Press.
• Cowpertwait,P.S.P.andMetcalf,A.V.(2009).IntroductoryTimeSerieswithR.Springer.
• Diggle, P.J. (1990). Time Series – A Biological Introduction. Oxford University Press.
• Fuller, W.A. (1996). Introduction to Time Series Analysis. John Wiley.
• Hamilton, J. D. (1994). Time Series Analysis. Princeton University Press.
70

A. Stationarity Triangle for an AR(2) Process
In this appendix we prove that the stationarity conditions (i)–(iii) for an AR(2) process given in Section 3.2 are equivalent to the conditions in Theorem 3.3.
We have our AR(2) process φ(B)Xt = εt, where φ(z) = 1 − φ1z − φ2z2.
One method of proving the stationarity region for this process uses the quadratic formula to give expressions for the roots of φ(z) in terms of φ1 and φ2; analysis of these roots gives conditions under which they both have modulus larger than 1.
Here we present an alternative proof, representing the characteristic polynomial φ in a different way. We then have to relate this new representation back to the original representation in terms of φ1 and φ2.
Wewriteφ(z)=(1+a1z)(1+a2z)=1+(a1 +a2)z+a1a2z2,sothat
a1 +a2 =−φ1, a1a2 =−φ2. (78)
(by comparing coefficients)
The solutions to (1+a1z)(1+a2z) = 0 are z = −1/a1 and z = −1/a2. Hence, for stationarity we need |ai| < 1 for i = 1, 2. This corresponds to a square in the (a1, a2) plane, with corners at (−1, 1), (1, 1), (1, −1) and (−1, −1). We map this square into the (φ1, φ2) plane using (78), by considering where the corner points of this square map to: (a1,a2) = (−1,1) maps to (φ1,φ2) = (0,1). (a1,a2) = (1,1) maps to (φ1,φ2) = (−2,−1). (a1,a2) = (1,−1) maps to (φ1,φ2) = (0,1). (a1, a2) = (−1, −1) maps to (φ1, φ2) = (2, −1) . This gives us that our square maps to the following triangular region in the (φ1, φ2) plane: 71 Since the processes represented by points inside our square in the (a1, a2) plane are precisely those which are stationary, so are the processes represented by points inside this triangle. It is easy to check that this triangle is defined by the required inequalities: φ1+φ2 <1 φ2−φ1 <1 |φ2| < 1. B. Proofs of Results in Section 7.2 The following lemmas will be used for some of the proofs in this section. Lemma B.1 Let {an} be a sequence of real numbers and a ∈ R. 1 􏱜n lim an = a =⇒ lim ai = a n→∞ n→∞ n i=1 Proof: By assumption, given ε > 0, we may choose an N such that |an − a| < 1 ε for all 2 n > N. For n > N, we have
􏰞 􏰞 1 􏱜n 􏰞􏰞
􏰞n i=1
􏰞 􏰞
ai−a􏰞􏰞 ≤ 􏰞
1 􏱜N n i=1
1 􏱜n n i=N+1
|ai−a|
|ai−a|+ 1􏱜N 1
|ai − a| + 2ε.
We can choose n large enough so that the first term is smaller than 1 ε. The result follows. ⋄
≤ n
Lemma B.2 (Kronecker’s lemma) If the sequence {aj } is such that
i=1
then
n
lim 􏱜|aj|<∞, n→∞ j=0 lim􏱜n j|aj|=0. n→∞ j=0 n 2 Proof: Set, for example, N = n1/3. Then we have 􏱜n j | a j | = 􏱜N j | a j | + 􏱜n j | a j | , j=0 n j=0 n j=N+1 n 72 where the second term tends to zero by assumption, because N → ∞, and 􏱜N j|aj|