CS代考 NASDAQ 100

Applied Time Series Analysis Section 1: Characteristics of Time Series

Daily sales of frozen food in an Ecuadorian super market

Copyright By PowCoder代写 加微信 powcoder

Daily sales of frozen food
2013-01-01 / 2014-12-31
Jan 01 2013
Apr 01 2013
Jul 01 2013
Oct 01 2013
Jan 01 2014
Apr 01 2014
Jul 01 2014
Oct 01 2014
Dec 31 2014

Daily returns of NASDAQ 100
Nasdaq 100 Returns
2007-01-04 / 2022-12-30
Jan042007 Jul012008 Jan042010 Jul012011 Jan022013 Jul012014 Jan042016 Jul032017 Jan022019 Jul012020 Jan032022

Global Temperature (difference to 1951-1980 average in Celsius)
1880 1900 1920 1940 1960 1980 2000 2020 Time (year)
Global Temperature (difference to 1951-1980 average in Celsius) -0.5 0.0 0.5 1.0

fMRI data – blood oxygenation-level dependent (BOLD) signal intensity
0 20 40 60 80 100 120 Time
Thalamus and Cerebellum
0 20 40 60 80 100 120 Time
BOLD BOLD -0.6 -0.2 0.2 0.6 -0.6 0.0 0.4

Seismic recordings
Earthquake
1000 1500 2000 Time
-0.4 0.0 0.4 -0.4 0.0 0.4
1000 1500 2000 Time

Applications in Time Series Analysis
• Forecasting (e.g., sales, volatility)
• Inference:
• Trend and/or periodicity detection (e.g., global temperature)
• Detection relationships among time series (e.g., brain signals/fMRI data)
• Signal discrimination (e.g., seismic data)

Definition: Time Series
Definition 1.1
A time series is a collection of random variables indexed by time: X1,X2,X3,…, or {Xt}.
In general, let T be an index set. A collection of random variables indexed by t ∈ T is called a stochastic processes.
For time series, we usually have T = {0,±1,±2,…,} = Z, or T = {1,2,…,}.

What makes time series analysis special?
• Usually, only one realization of a time series is observed, i.e., we have one realization of X1, one of X2, etc.
E.g., we only observe once the global temperature in the years 1880 − 2022.
=⇒ If we think of Signal+Noise, we cannot average global temperatures of 2020 to identify signal. We need to come up with models to take other time points into account.
• Usually, there is dependency between different time points (in contrast to the classical iid setting). E.g., X1 and X2 are correlated.
Strong dependency enables good forecast.
Dependency affects inference and often leads to wider confidence intervals.

Definition – White Noise
Definition 1.2
We call a time series white noise, say {εt} with finite variance σε2, if the time series is a collection on uncorrelated random variables with mean 0 and variance σε2. In short, we write εt ∼ wn(0, σε2).
If the random variables {εt} are not only uncorrelated but also independent and identically distributed (iid), we say {εt} is an iid noise or strong white noise. In short, weεt ∼iid(0,σε2).
Remark:εt,et,εt,wt areusuallettersforwhitenoise.

White Noise
300 400 500
300 400 500
white noise
strong white noise
-2 0 2 -4 0 2 4

Definition – Filtering
Suppose we define
That is we filtered our white noise {εt}.
Xt = 13εt+1 + 13εt + 13εt−1.
Let cj ∈ R,j ∈ Z be coefficients satisfying 􏰃j∈Z |cj| < ∞ and {Xt} a time series. If Definition 1.3 Yt =􏰄cjXt−j, j∈Z we call {Yt} a filtered time series and the linear operator defined by the coefficients {cj , j ∈ Z} is called linear filter. If {et } is an iid noise and we have Yt =􏰄cjet−j, j∈Z then we call {Yt} a linear time series. Definition - Moving Average Definition 1.4 Let θ0,θ1,...,θq ∈ R,θq ̸= 0 be coefficients and {εt} a white noise. If we have Xt =􏰄θjεt−j, we call {Xt} a moving average process of order q, in short MA(q). The case q = ∞ is possible as long as the coefficients are summable. Note that a moving average processes is a special case of a filtered times series. It is a one-sided filter. The requirement θq ̸= 0 is to have the order as small as possible. Filtered Time Series and Moving Average Process 300 400 500 300 400 500 Yt = 1/3(εt−1 + εt + εt+1) Xt = 􏰃9j=0 1/10εt−j -1.5 -0.5 0.5 -3 -1 0 1 Definition - Autoregressive Definition 1.5 Let φ1,φ2,...,φp ∈ R,φp ̸= 0 be coefficients and {εt} a white noise. We call Xt =􏰄φjXt−j +εt an autoregressive process of order p, in short AR(p). The requirement φp ̸= 0 is to have the order as small as possible. Autoregressive Process 300 400 500 300 400 500 Xt = 0.9Xt−1 + εt Xt = −0.9Xt−1 + εt -10 0 5 -10 -5 0 5 Definition - Random Walk with Drift Definition 1.6 Let δ ∈ R and {εt} be a white noise. If we have for t ≥ 1 Xt = δ + Xt−1 + εt and X0 is some starting value, we call {Xt} a random walk with drift δ. If δ = 0, we simply call it random walk. We can write this model also as Quite often we have X0 = 0. Xt =δt+􏰄εj +X0. Random walk with drift Xt = δ + Xt−1 + εt 0 50 100 150 200 Time 0 10 20 30 40 Definition - Periodic Signal plus Noise Definition 1.7 Let A ≥ 0,ω,φ ∈ R and {εt} be a white noise. If Xt =Acos(2πωt+φ)+εt we call {Xt} a periodic signal plus noise. A is the amplitude, ω the frequency of oscillation, and φ is the phase shift. E.g., ω = 1/50 corresponds to one cycle every 50 time points. Periodic Signal plus Noise 2cos(2pt 50+0.6p)+N(0, 1) 2cos(2pt 50+0.6p)+N(0, 25) 200 300 400 500 2cos(2pt 50+0.6p) 300 400 500 300 400 500 -20 -10 0 10 -6 -4 -2 0 2 4 -2 -1 0 1 2 Questions? • How to fit these models on data? • Which models is good in which situation? • Can we combine them? • How to use them in applications like forecasting, detection of relationships, etc? • ... Joint Distribution Function Complete description of a times series is provided by the joint distribution function. For any n and time points t1,...,tn and any constants c1,c2,...,cn ∈ R, the joint distribution function is given by Ft1,t2,...,tn(c1,c2,...,cn) = P(Xt1 ≤ c1,Xt2 ≤ c2,...,Xtn ≤ cn). Only in special cases, e.g., the time series is Gaussian, it can be evaluated analytically. Without strong assumptions, it is nearly impossible to learn the joint distribution function from data. Definition - Mean Function Definition 1.8 The mean function for the time series {Xt} is defined for time point t as μX,t =E(Xt), provided the mean exists. If it is clear to which time series we are referring to, we will often drop X ins subscript and simply write μt. Mean Function - Filtered Time Series Let us look at a filtered time series {Yt} given by Yt =􏰄cjXt−j. Since the expectation is a linear operator, we have for a filtered time series the μY,t = E(Yt) = E(􏰄cjXt−j) = 􏰄cjμX,t−j. j∈Z j∈Z Hence, a filtered white noise {εt }( E(εt ) = 0 by definition) has a constant mean of zero. Mean Function - Random Walk with Drift Forarandomwalkwithdrift,Zt =δt+􏰃tj=1εj,wehave μZ,t =E(Zt)=E(δt+􏰄εj)=δt. Definition - Autocovariance Function Definition 1.9 The autocovariance function(ACF) for the time series {Xt}, provided E(Xt2) < ∞ for all t, is defined for time points s and t as γX(s,t)=cov(Xs,Xt)=E[(Xs −μX,s)(Xt −μX,t)]. If it is clear to which time series we are referring to, we will often drop X ins subscript and simply write γ(s, t). Note that γX(s,t)=E[(Xs −μX,s)(Xt −μX,t)]=E[(Xt −μX,t)(Xs −μX,s)]=γX(t,s). The ACF measures linear dependence between different time points (as covariance and correlation). If γX (s , t ) = 0, Xs and Xt are not linearly related, but not necessarily independent. There still be may be some dependence structure between them. Remark - Autocovariance Function The autocovariance function is a non-negative definite (or semi-positive definite) function. That is for any time points t1, . . . , tn the matrix is non-negative definite. γ(t1, t1) γ(t1, t2) γ(t1, t2) γ(t2, t2) γ(t1, tn) γ(t2, tn) .  γ(t1,tn) ... ... γ(tn,tn) To see this, let a1,...,an ∈ R be arbitrary coefficients. We have var(􏰃nj=1 ajXj) ≥ 0, since variances are non-negative. Additionally, we have by the bilinearity1 of the covariance 0 ≤ var(􏰄ajXj) = cov(􏰄ajXj,􏰄akXk) = 􏰄 ajakγX(s,t). j=1 j=1 k=1 j,k=1 1The covariance is a bilinear mapping, i.e. for a,b ∈ R we have cov(aX,bY) = acov(X,Y)b. Definition - Autocorrelation Function Definition 1.10 The autocorrelation function for the time series {Xt}, provided E(Xt2) < ∞ for all t, is defined for time points s and t as γX(s,t) ρX(s,t)= 􏰆γX(s,s)γX(t,t). ρX (s, t) ∈ [−1, 1] due to Cauchy-Schwarz. The abbreviation ACF is also used for the autocorrelation function. To avoid confusion, sometimes ACF(Cov) is used to indicate the autocovariance function. Autocovariance Function - White Noise Let {εt} be a white noise with variance σε2. By definition2, we have 􏰂σ2, s=t, γε(s,t) = cov(εs,εt) = 1{s = t}σε2 = ε 0, s ̸= t. 2Recall, white noise is defined as a collection of uncorrelated random variables Autocovariance Function - Filtered Time Series For a filtered time series Yt = 􏰃j∈Z cjXt−j we have γY (s,t) = cov(􏰄cjXs−j,􏰄cjXt−j) = 􏰄 cj1cj2 cov(Xs−j1,Xt−j2) j∈Z j∈Z j1,j2∈Z = 􏰄 cj1cj2γX(s −j1,t −j2). IfXt ∼wn(0,σε2),wehaveγX(s−j1,t−j2)=1{s−j1 =t−j2}σε2 andweobtain γY(s,t)= 􏰄 cj1cj21{s−j1 =t−j2}σε2 =􏰄cjcj+t−sσε2. j1,j2∈Z j∈Z If {Yt} is moving average processes of order q, i.e, cj = 0,j < 0 and j > q, we have for t ≥ s, recall γY (s,t) = γY (t,s),
γY (s,t) = 􏰄cjcj+t−sσε2 = 􏰄 cjcj+t−sσε2.
j∈Z j=0 Hence, γY (s, t) = 0 for |t − s| > q.

Definition – Cross-Covariance and Cross-Correlation Function
If we have not only one but more time series, we can quantify the linear dependency by the following measures:
Definition 1.11
The cross-covariance function for two time series {Xt} and {yt}, provided E(Xt2) < ∞ and E(Yt2) < ∞ for all t, is defined for time points s and t as γXY(s,t)=cov(Xs,Yt)=E[(Xs −μX,s)(Yt −μY,t)]. Definition 1.12 The cross-correlation function for two time series {Xt} and {Yt}, provided E(Xt2) < ∞ and E(Yt2) < ∞ for all t, is defined for time points s and t as γX ,Y (s , t ) ρXY(s,t)= 􏰆γX(s,s)γY(t,t). ρXY (s, t) ∈ [−1, 1] due to Cauchy-Schwarz. Basic mathematical models to describe time series: white noise {εt} as uncorrelated random variables with mean zero. Building block for other models: linear processes, MA(q), AR(p), random walk with drift. Joint distribution function too general. We quantify dependency by mean function μX (t) and autocovariance function γX (s, t). White noise, linear processes, MA(q) have mean zero. Random walk with drift has a time-dependent mean. For white noise γε(s, t) = 1{s = t}σε2. For linear processes, ACF usually nonzero. For MA(q) processes, γY (s, t) = 0, |t − s| > 0.

Example – Estimation of Mean
1880 1900 1920 1940 1960 1980 2000 2020 Time (year)
Global Temperature (difference to 1951-1980 average in Celsius) -0.5 0.0 0.5 1.0

Example – Estimation of Mean
• Let our time series {Xt} be the global temperature, i.e. we have data X1880, X1881, . . . , X2022.
• We want to estimate μX,2000 = E(X2000), i.e., the mean global temperature in the year 2000.
• What data can we use to construct an average?
• If we use
X2000+L, where, say, L = 20, what is our underlying assumption?
• If we use
μˆX,2000 = what is our underlying assumption?
μˆX ,2000 = 2L + 1
1 2022 􏰄 Xt
143 t=1880

Example – Estimation of Mean
1880 1900 1920 1940 1960 1980 2000 2020 Time (year)
Global Temperature (difference to 1951-1980 average in Celsius) -0.5 0.0 0.5 1.0

Example – Estimation of Mean – Conclusion
• Without assuming some form of regularity, it is impossible to estimate μX,t (or other quantities like γX,t).
• Regularity – e.g. a constant mean – might be only realistic after some preprocessing.

Definition – Strictly Stationary
Definition 1.13
We call a time series {Xt } strictly stationary if for any n and time points t1, . . . , tn and time shift h the joint distribution of {Xt1 , Xt2 , . . . , Xtn } and {Xt1+h, Xt2+h, . . . , Xtn+h} coincide. That is for any constants c1, c2, . . . , cn we have
Ft1,t2,…,tn (c1, c2, . . . , cn) = Ft1+h,t2+h,…,tn+h(c1, c2, . . . , cn).
If the mean function of {Xt} exists, strictly stationarity of {Xt} implies μX,t = μX,t+h
for any h. I.e., μX,t ≡ μ for some μ.
If the ACF of {Xt } exists, we also have γX (s , t ) = γX (s − t , 0) =: γX (s − t ). Hence, the ACF depends only on the time shift between two time points but not the time points itself.

Remark – Strictly Stationary
• Very strong assumption and might be too strong for most applications.
• It is very difficult to assess strict stationary from a single data set.
In our global temperature example, we may can argue that we have a constant mean and a shift-invariant ACF (γX (s, t) = γX (s − t, 0)), but that the temperature distribution in the 19th century behaves the same as in the 21th century is way harder to justify.

Definition – Weakly Stationary Definition 1.14
We call a time series {Xt} weakly stationary if
• the mean function is constant over time, i.e, μX ,t ≡ μ for some μ,
• theACFisshift-invariant,i.e.,γX(s,t)=γX(s−t,0)=:γX(s−t).
Weakly stationary imply that the variance of {Xt} exists for all t. The auto-correlation function simplifies to
γX (s − t) γX (s − t) ρX(s −t) = 􏰆γX(s −s)γX(t −t) = γX(0) .
Usually, weakly stationary is simply called stationary.
If the variance of {Xt} exists for all t, strictly stationarity implies weakly stationarity.

Definition – Jointly Stationary
Definition 1.15
Two time series, say {Xt} and {Yt}, are said to be jointly stationary if they are stationary, and the cross-covariance function satisfies for all time points s,t
γX,Y (s,t) = cov(Xs,Yt) = cov(Xs−t,Y0) = γX,Y (s − t,0) =: γX,Y (s − t).

Remark – Stationarity
• Very often, the time series we observe is not stationary. However, after preprocess the data (detrend, remove periodic patterns…), it might be.
• When you have a time series spanning a long time horizon, even after preprocessing assuming stationarity over the whole horizon might be unrealistic. We might have that the dependency is slowly changing over time, e.g., stock market series or economic series spanning many decades.
One approach of dealing with this is to split the data into smaller chunks and analyze them separately. Each chunk might be stationary then.
Tools for analyzing such data jointly can be found in the literature under locally stationary time series. (Not covered in this course).

Example – Stationarity
Are the models we defined stationary or even strict stationary?
Let {εt } ∼ wn(0, σε2). We know E(εt ) ≡ 0 and we already computed the ACF
γε(s,t)=1{s=t}σε2 =1{s−t=0}σε2 =γε(s−t).
=⇒ A white noise is stationary.
We have no information about the distribution. It could be possible εt ∼ N (0, σε2) and εt+1 ∼ σε2(exp(1) − 1), so in general a white noise do not need to be strictly stationary.
Let {εt} ∼ iid(0,σε2) be a strong white noise. Since it is identically distributed, we haveP(εt ≤c)=P(εs ≤c)foranys,t.
Additionally, since it is independent, we have for any time points t1, . . . , tn
P(εt1 ≤ c1,…,εtn ≤ cn) = 􏰅P(εtj ≤ cj) = 􏰅P(ε0 ≤ cj) = P(εt1+h ≤ c1,…,εtn+h ≤ cn)
j=1 j=1 =⇒ A strong white noise is strictly stationary.

Example – Stationarity
Note the expectation is linear and the covariance is bilinear, see also Slide 31. =⇒ filtering a stationary time series is still stationary.
=⇒ Moving average process is stationary. A linear time series is even strictly stationary. For a random walk we have the ACF γ(s, t) = min(s, t)σε2.
=⇒ A random walk is neither stationary nor strictly stationary.
For the autoregressive process we cannot make a general statement. There exists sets of coefficients (e.g., p = 1, |a1| < 1) which define stationary processes and other which define non-stationary processes (e.g., p = 1, a1 = 1 such that we obtain a random walk again). Estimation Strategy Given data X1, . . . , Xn, we would like to estimate μX (t) and γX (s, t) (and later use this fit time series models to our data). As discussed in the ’Example - Estimation of Mean’, we need stationarity so that we can link μX (t) and γX (s, t) meaningful to averages. Definition - Sample Mean Definition 1.16 Given data X1,...,Xn, we call X n = μˆ X = n If we have stationary data, we have the sample mean. ̄1􏰄n 1􏰄n E(Xn)=n E(Xt)=n μX =μX. Sample Mean - Variance If we have stationary data, we have var(X ̄n ) = cov(􏰄Xt,􏰄Xt) = t=1 t=1 􏰄 γX(t1,t2) t1,t2=1 􏰄 γX(h) 􏰄 1{t1−t2=h} t1 ,t2 =1 γX (h)(n − |h|) γX(t1 −t2) 􏰄 (1− )γX(h). |h| n h=−n+1 n Sample Mean - Variance Explanation for 􏰃n γ(t1 − t2) = 􏰃n−1 (n − |h|)γ(h): t1 ,t2 =1 h=−n+1 Let 1 = (1,...,1) be a vector of n ones. Then cov(􏰄Xt,􏰄Xt) = cov(1⊤(X1,...,Xt)⊤,1⊤(X1,...,Xt)⊤) γ(1,1) γ(1,2) ... γ(1,n) ⊤ γ(1,2) γ(2,2) ... γ(2,n) =1. .. .. .1 .... γ(1,n) ... ... γ(n,n)  γ(0) γ(1) ... γ(n−1) ⊤ γ(1) γ(0) ... γ(n−2) =1. .... .1 .... γ(n − 1) . . . . . . γ(0) Sample Mean - Variance - Remark We obtain the variance √ n−1 |h| var(nX ̄n)= 􏰄(1−n)γX(h). For comparison, in the white noise case with variance σε2, we would obtain var(√nX ̄n) = γ(0) = σε2. Depending on the dependence structure, 􏰃n−1 (1 − |h| )γX (h) can be larger or smaller than σε2.3 In any case, we need to take this into account if we construct confidence intervals for the sample mean. Under stationarity, we have that X ̄n is approximately normally distributed with standard deviation 􏰀1 n−1 |h| 􏰁1/2 σX ̄n= 􏰄(1−)γX(h) (1 − |h| )γX (h) since it is not straightforward. n . 3We will discuss later how to estimate 􏰃n−1 h=−n+1 Definition - Sample Autocovariance Function Definition 1.17 Given data X1, . . . , Xn, the sample autocovariance function is defined as γˆX(h)= 􏰄(Xt+h−X ̄n)(Xt−X ̄n), for h = 0, . . . , n − 1 and with γˆX (h) = γˆX (−h). For h > n − 1, we set γˆX (h) = 0.
The sample autocorrelation function is defined as
ρˆ X ( h ) = γˆ X ( h ) .
Note that in sum we have n − h summands but divide but n. If we would divide by n − h, we would the sample ACF would not be non-negative definite, a property the ACF has.

Property – Sample Autocovariance Function
If our data is stationary, we have4
E[γˆX(h)]= n−|h|γX(h)+O(1).
Under stronger conditions, e.g., {Xt} is a linear processes and the white noise
possesses finite fourth moments, we have 5 γˆX(h)=γX(h)+OP(1/ n).
If {Xt} is an iid noise with finite fourth moments, the sample autocorrelation function ρˆX (h), for h = 1, . . . , H and any fixed H , is approximately normally distributed with zero mean and standard deviation given by
1 σ ρˆ X ( h ) = √ n .
4If we write for some (non-random) series {xn} xn = O(g(n)), it means that there exists some constant C > 0 such that x ≤ Cg(n) for any n, i.e., xn/g(n) is bounded.
5If we write for some random series {Xn} Xn = OP(g(n)), it means that there for any ε > 0 we find some constant M > 0 such that for any n P(Xn ≤ Mg(n)) ≥ 1 − ε, i.e. Xn/g(n) is bounded in probability.

Definition – Sample Cross-Covariance Function
Definition 1.18
GivendataX1,…,Xn andY1,…,Yn,thesamplecross-covariancefunctionisdefined
γˆXY(h)= 􏰄(Xt+h−X ̄n)(yt−y ̄n),
for h = 0, . . . , n − 1 and with γˆXY (h) = γˆYX (−h). For h > n − 1, we set γˆXY (h) = 0.
The sample cross-correlation function is defined as γˆXY (h)
ρˆXY(h)= 􏰆γˆX(0)γˆY(0).
If at least one of the processes is an iid noise then ρˆXY (h) is approximately normally
distributed with mean zero and standard deviation σρˆXY (h) = √1 . n
To use this resul

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com