程序代写 NASDAQ 100 – 200-day Moving Average (MA)

Applied Time Series Analysis Section 2: Exploratory Data Analysis

Linear Regression – Setup in Time Series Context
Let {Xt} be our target time series and {Zt,1},…,{Zt,q} be our covariates. We regress Xt onto Zt,1, . . . , Zt,q, that is we have the regression equation

Copyright By PowCoder代写 加微信 powcoder

Xt =􏰄βjZt,q +εt,
where {εt} is some (hopefully well-behaved) error processes.
Examples for covariates:
• Indicators for distinct time periods, e.g. for daily data starting on Monday we can
use Zt,1 = 1{(t + 4)/7 ∈ Z} to regress on Wednesdays
• A time sequence, e.g., Zt,2 = t or Zt,3 = t2 (often with intercept Zt,4 = 1);
• Nonlinear function of t, e.g, Zt,5 = sin(2πt + φ);
• Lagged values of Xt, e.g. Zt,6 = Xt−1;
• Other time series and/or lagged values of other time series.

Linear Regression – Setup in Time Series Context
Let us say we have n observations and let Zt = (Zt,1,…,Zt,q)⊤, Z = (Z1,…,Zn)⊤, X = (X1, . . . , Xn)⊤, and β = (β1, . . . , βq)⊤. Note that Z is an n × q matrix.
We can estimate β by using Ordinary Least Squares (OLS). That is, we minimize the residual sum of squares (RSS)
RSS(β)=􏰄(Xt −􏰄βjZt,j)2 =􏰄(Xt −β⊤Zt)2 =∥X −Zβ∥2.
t=1 j=1 t=1
where the last equality is under the condition that Z⊤Z is of full rank q.
βˆ = argminβ RSS(β) = (Z⊤Z)−1Z⊤X,

Linear Regression – Setup in Time Series Context
What does the condition ’Z⊤Z is of full rank q’ and constructing (Z⊤Z)−1 imply?
• q < n; we cannot include too many regressors. If the sample size is not really large, including some polynomial trend and also some seasonal indicators could be too much. We need to be selective there. • Inverting the matrix Z⊤Z is easiest and numerically most stable if matrix is close to being diagonal. Keep that in mind when constructing regressors. Example - Regressing onto Quadratic Time Suppose we want to regress on quadratic time, for t = 1, . . . , n. That means we can do This leads to Zt,1 = 1,Zt,2 = t,Zt,3 = t2,  1.00 50.50 3383.50  Z⊤Z/100 =  50.50 3383.50 255025.00  We can also do 3383.50 255025.00 20503333.30 n This leads to 1.00 Z⊤Z/100 = 0.00 0.00 0.00  555277.80 =(t−t ̄)2 −1/n􏰄(t−t ̄)2, n n 0.00 833.25 0.00 Linear Regression - Properties If we assume Z⊤Z is of full rank q and • E(εt) ≡ 0 and cov(εt,Zt) = 0, then βˆ is unbiased, i.e., E(βˆ) = β. • εt iid N (0, σε2 ) and Z deterministic, then • βˆ is also the maximum likelihood estimator of β; βˆ ∼ N(β,σε2(Z⊤Z)−1); εˆ := X − Z βˆ ∼ N (0, σε2 (I n − Z (Z ⊤ Z )−1 Z ⊤ ); • βˆ and εˆ are independent. We will discuss the case with autocorrelated errors later. In most cases and given E(εt ) ≡ 0 and cov(εt , Zt ) = 0, the OLS is still consistent but not optimal any more. Also the asymptotic variance is affected by the autocorrelated errors. Example - Global Temperature We model the global temperature as Yt =β0+β1t+Xt, where {Xt } is a stationary time series and test β1 > 0 in this model. We can do this by regressing Yt onto Zt,1 = 1 and Zt,2 = t.
The variance for βˆ on the previous slide was under the assumption of independent and normally distributed errors.
The independence assumption might be too strong here. We may rather assume {Xt} is a stationary time series. {Xt} might be normally distributed but not independent, we have ρX (h) ̸= 0 for |h| > 0.
One way of taking the dependence in the variance estimation of βˆ into account is using heteroscedasticity and autocorrelation consistent (HAC) covariance matrix estimators, also known as Newey–West estimator. We discuss this later in more detail.

Example – Global Temperature
Lower Confidence Intervals (CI) for the Linear Trend
estimated trend
5% lower CI (assuming iid data)
5% lower CI (assuming stationary time series)
1880 1900 1920 1940 1960 1980 2000 2020 Time (year)
Global Temperature
-0.5 0.0 0.5 1.0

Example – Global Temperature
ACF of detrended (linear) Global Temperature
0 5 10 15 20 Lag
-0.2 0.0 0.2 0.4 0.6 0.8 1.0

Example – Global Temperature
Lower Confidence Intervals (CI) for the Quadratic Trend
estimated trend
5% lower CI (assuming iid data)
5% lower CI (assuming stationary time series)
1880 1900 1920 1940 1960 1980 2000 2020 Time (year)
Global Temperature
-0.5 0.0 0.5 1.0

Example – Global Temperature
ACF of detrended (quadratic) Global Temperature
0 5 10 15 20 Lag
-0.2 0.0 0.2 0.4 0.6 0.8 1.0

Linear Regression – Model Selection
We often face the question what time series we include as covariates; e.g., how many lags of a time series do we need to include, do we regress on each weekday, etc.
A solution to this is to fit many models and compare them; which gives the best fit, i.e., the lowest RSS?
Problem: Including more covariates always decrease RSS. We need to take the number of covariates into account.

Linear Regression – Model Selection
We focus here on information criteria(IC):1
Let l(βˆML) be the log-likelihood for β and let p denote the number of parameters. In general, an information criteria has the form
IC = −l(βˆML) + fn((p)),
where fn is some function which determines the impact of the number of parameters.
The idea is to find a function fn which balances the error of the fit against the number of parameter in the model such it leads to consistent model choices.
The best model is defined by the lowest IC. That also means the value itself is not important. In implementations, constants and scaling can differ.
1For nested models, Analysis of Variance (ANOVA) with F-tests is another approach.

Definition – Akaike’s Information Criteria (AIC)
Let l(βˆML) be the log-likelihood for β and let p denote the number of parameters. Definition 2.1
The Akaike’s Information Criteria (AIC) is defined as AIC = −l(βˆML) + 2p.
In the linear regression context with normal errors, this simplifies to
AIC = log(RSS /n) + n + 2p . n

Definition – Bayesian Information Criteria (BIC)
Let l(βˆML) be the log-likelihood for β and let p denote the number of parameters. Definition 2.2
The Bayesian Information Criteria (BIC) is defined as BIC = −l(βˆML) + log(n)p.
In the linear regression context with normal errors, this simplifies to
BIC = log(RSS/n) + p log(n). n
The BIC is also known as Schwarz’s Information Criteria (SIC).
For larger n, the BIC penalizes the number of parameters more than the AIC. This fact leads in general to smaller models.

Example – Pollution, Temperature and Mortality
weekly data of the Los Angeles County
Cardiovascular Mortality Temperature Particulates
1970 1972 1974
1976 1978 1980
20 40 60 80 100 120

50 60 70 80 90 100
Temperature
Particulates
70 80 90 100 110 120 130
20 40 60 80 100
20 40 60 80 100
70 90 110 130
50 60 70 80 90100

Example – Pollution, Temperature and Mortality
We are interested in the effect of pollution(Particulates Pt) on cardiovascular mortality(Mortality Mt).
We see an overall downward trend.
The scatterplots show some correlation between Particulates and Mortality.
We also see a nonlinear connection between Temperature(Tt) and Mortality. A parabolic relationship centered around the mean temperature(T ̄) can be a good approximation.
This leads to the following model:
M t = β 0 + β 1 t + β 2 ( T t − T ̄ ) + β 3 ( T t − T ̄ ) 2 + β 4 P t + ε t .
Wecomparethemodelwithβ3 =0,β4 =0;β4 =0;β3 =0.

Example – Pollution, Temperature and Mortality
model AIC BIC β3 =0,β4 =0 5.14 5.17 β4=0 5.03 5.07 β3=0 4.84 4.88 full model 4.72 4.77
Full model is the best model according to AIC and BIC.
β4 is highly significant (p − value < 0.001; under normality assumption). Definition - Moving Average Smoother Let {Xt} be a times series. Definition 2.3 We define a linear filter by aj ≥ 0 and 􏰃kj=−k aj = 1. We call the series {Mt} obtained by Mt = 􏰄 a−jXt−j, a moving average smoother. Usually, the coefficients are symmetric, i.e., aj = a−j . Example: Suppose we have daily data. We might be interested in a weekly (7days), monthly(31 days), annually(365 days) average. Forthis,letkbeanevennumberandwesetaj = 1 ,j=−k,...,k.Then,k=3 2k +1 gives us a weekly, k = 15 monthly, k = 182 annually. Example - NASDAQ 100 - 200-day Moving Average (MA) 0.000 0.006 Nasdaq 100 Returns 2007-01-03 / 2023-01-11 Jan 03 2007 Jan 04 2010 Jan 02 2013 Jan 04 2016 Jan 02 2019 Jan 03 2022 Definition - Kernel Smoothing A special case of the moving average smoother is the kernel smoothing. In general, a kernel K : R → R is a function which is normalized to 1, i.e. 􏰋 ∞ K (x )dx = 1. Often, it is also required that the kernel is non-negative, symmetric −∞ and has a bounded support on [−1, 1] (i.e., the kernel is zero on (−∞, −1) and (1, ∞)). Definition 2.4 A kernel smoother {Kt } to bandwidth b ≥ 0 is defined by Kt = 􏰄wi(t,b)Xi, K(t−i ) where w (t,b) = 􏰃 b . i n K(t−j ) j=1 b Remark - Kernel Smoothing The amount of smoothing is control by the bandwidth. The larger b, the smoother the results. Usually, the choice of the bandwidth has more impact than the choice of the kernel. To see that it is a special case of the moving average smoother, note that we can write it also as Examples for kernels: 􏰃n K(t−j)Xt−i. i j=1 b Example - Trend or not? -8 -6 -4 -2 0 2 4 6 300 400 500 Example - Trend or not? Xt , t = 1, . . . , 500, mean=0.49 (sd=0.61) 0 5 10 15 20 Lag Xt , t = 1, . . . , 500, mean=0.99 (sd=0.72) 0 5 10 15 20 Lag Xt , t = 1, . . . , 500, mean=-0.01 (sd=0.84) 0 5 10 15 20 Lag ACF (cov) ACF (cov) ACF (cov) 1234 01234 1234 Example - Trend or not? Yt = Xt − Kt , Kt kernel smoothing Yt , t = 1, . . . , 500, mean=-0.00 (sd=0.19) 0 5 10 15 20 Lag Yt , t = 1, . . . , 500, mean=-0.00 (sd=0.19) 0 5 10 15 20 Lag Yt , t = 1, . . . , 500, mean=-0.00 (sd=0.19) 0 5 10 15 20 Lag ACF (cov) ACF (cov) ACF (cov) -0.2 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 0.0 0.2 0.4 0.6 Remark - Kernel Regression The kernel smoothing can be also written as K(t)=Kt = 􏰃n j=1 K(t−j) . b K(z−Zi )Xi b . K(z−Zj ) b This is also known as Nadaraya–Watson kernel regression. 􏰃n K(z) = i=1 K(t−i )Xi b We can generalize the kernel smoothing to kernel regression. In the kernel smoothing, we regress on time, 1,...,n. For point t, i.e. K(t) = Kt, we average all Xt for which the corresponding time points are close (measured by the kernel) to t. In general, we can regress onto another time series, say Zt. Then for point z, i.e., K(z), we average all Xt, for which the corresponding Zt are close (according to the kernel) to z: Example - Temperature and Mortality weekly data of the Los Angeles County Cardiovascular Mortality Temperature 1970 1972 1974 1976 1978 1980 60 80 100 120 Example - Temperature and Mortality - Kernel(exponential/Gaussian) Regression 50 60 70 80 90 100 Temperature (Fahrenheit) 10 16 21 27 32 38 (Celsius) 29 / 51 Cardiovascular Mortality 70 80 90 100 110 120 130 Example - Temperature and Mortality - Kernel(uniform) Regression 50 60 70 80 90 100 Temperature (Fahrenheit) 10 16 21 27 32 38 Cardiovascular Mortality 70 80 90 100 110 120 130 Example - Temperature and Particulates - Kernel Regression weekly data of the Los Angeles County Particulates Temperature 1970 1972 1974 1976 1978 1980 20 40 60 80 100 Example - Temperature and Particulates - Kernel Regression 50 60 70 80 90 100 10 16 21 Temperature (Fahrenheit)27 32 38 Particulates 20 40 60 80 100 Example - Temperature and Particulates - Kernel Regression Lags 50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100 tempr(t) tempr(t-1) tempr(t-2) 50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100 tempr(t-3) tempr(t-4) tempr(t-5) 50 60 70 80 90 100 50 60 70 80 90 100 50 60 70 80 90 100 tempr(t-6) tempr(t-7) tempr(t-8) 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 20 40 60 80 100 Removing a Deterministic Trend by Regression Suppose our time series is of the form Xt =μt +Yt, where we observe Xt, Yt is a zero mean stationary time series, and μt some deterministic trend. If μt = δ1 + tδ2 is a linear trend, we can remove μt by regressing Xt onto t, see previous Section. More general, if μt = 􏰃qj=0 δjtj is a polynomial trend, we can estimate δj,j = 0,...,q by regressing onto tj,j = 0,...,q. WeobtainYˆ =X −􏰃q δˆtj andrequireestimationofδ,j=0,...,q. t t j=0j j Removing a Deterministic Trend by Taking Differences Suppose our time series is of the form Xt =μt +Yt, where we observe Xt, Yt is a zero mean stationary time series, and μt = δ0 + tδ1 is a linear trend. Note that Hence, μt − μt−1 = δ0 + tδ1 − (δ0 + (t − 1)δ1 = δ1. Xt − Xt−1 = δ1 + Yt − Yt−1. Let Zt = Yt − Yt−1. {Zt} is obtained by filtering a stationary time series, hence it is stationary. =⇒ Xt − Xt−1 removes the linear trend and we obtain a stationary time series. Definition - Backshift Operator Let {Xt} be a times series. Definition 2.5 We define the backshift operator B by BXt = Xt−1. It extend to powers, e.g., B2Xt = B(BXt) = BXt−1 = Xt−2. Thus BkXt = Xt−k. Additionally, we set which leads to B−kXt = Xt+k. B−1Xt = Xt+1 Definition - Difference Operator Let {Xt} be a times series. Definition 2.6 We define the difference operator ∇ by ∇Xt =(1−B)Xt =Xt −Xt−1. We denote ∇Xt as the first difference. The difference of order d is given by ∇dXt =(1−B)dXt. Removing a Deterministic Trend by Taking Differences Suppose our time series is of the form Xt =μt +Yt, where we observe Xt, Yt is a zero mean stationary time series, and μt = 􏰃qj=0 δjtj is a polynomial trend. We can remove the polynomial trend and obtain a stationary time series by taking q differences Zt =:∇qXt =δq +∇qYt. Detrending - Taking Differences vs. Regression Xt =μt +Yt, Differences Parameters no parameters Output ∇q ∇qYt isonlyavailablefort=q+1,...,n,i.e.,weloseqobservations. If we are interested in the trend parameters itself and underlying time series is stationary, for larger sample sizes, doing a regression is the better approach. However, for nonstationary time series like a random walk, the regression approach might not work at all. Recall, if Yt is a random walk ∇Yt becomes a stationary series. ⇒ Taking differences is the more robust approach. If in doubt about random walk, better taking differences for detrending. Regression up to q Estimate of Yt Example - Linear Trend vs. Stochastic Trend 0 20 40 60 80 100 Time 0 20 40 60 80 100 Time -20 -15 -10 -5 0 5 -20 -10 -5 0 5 Example - Linear Trend vs. Stochastic Trend 0 20 40 60 80 100 ACF of diff Y_t −0.2 0.2 0.6 1.0 −2 −1 0 1 2 Example - Linear Trend vs. Stochastic Trend 0 20 40 60 80 100 ACF of diff Z_t −0.4 0.0 0.4 0.8 −4 −2 012 Example - Linear Trend vs. Stochastic Trend detrending Y_t 0 20 40 60 80 100 ACF of trending Y_t −0.2 0.2 0.6 1.0 −6 −2 0 2 4 Example - Linear Trend vs. Stochastic Trend detrending Z_t 0 20 40 60 80 100 ACF of trending Z_t 0.6 1.0 −2 −1 0 1 2 3 Example - Global Temperature - Taking Differences vs. Regression 1880 1900 1920 1940 1960 1980 2000 2020 Detrending by difference Time (year) ACF of differences time series 0 5 10 15 20 Lag ACF Global Temperature 0.6 1.0 -0.2 0.0 0.1 0.2 0.3 Example - Global Temperature - Taking Differences vs. Regression 1880 1900 1920 1940 1960 1980 2000 2020 Detrending Time (year) ACF of detrended time series 0 5 10 15 20 Lag ACF Global Temperature 0.6 1.0 -0.2 0.0 0.2 0.4 Example - Weekly Oil Price Weekly Cushing, OK WTI Spot Price FOB (Dollars per Barrel) 1986-01-03 / 2023-01-06 Jan 03 1986 Jan 05 1990 Jan 07 1994 Jan 02 1998 Jan 04 2002 Jan 06 2006 Jan 01 2010 Jan 03 2014 Jan 05 2018 Jan 07 2022 Returns and Log Returns Let {Pt} be some price time series. The return series {rt} is defined as rt= Pt −1=Pt−Pt−1. Pt −1 Pt −1 The log return series {lrt} is defined as lrt = log( Pt ) = log(Pt) − log(Pt−1). Symmetry Possible values Aggregation Pt −1 log-returns symmetric around zero (−∞, ∞) returns non-symmetric [−1, ∞) 􏰃Tt=1 lrt = log(PT /P0) 􏰊Tt=1(rt + 1) = PT /P0 =⇒ Linear statistical methods are more reasonable for log-returns. Returns and Log Returns -0.2 -0.1 0.0 0.1 0.2 x log(x + 1) -0.2 -0.1 0.0 0.1 Example - Weekly Oil Price Weekly Cushing, OK WTI Spot Price FOB (returns) 1986-01-03 / 2023-01-06 33 Jan 03 1986 Jan 05 1990 Jan 07 1994 Jan 02 1998 Jan 04 2002 Jan 06 2006 Jan 01 2010 Jan 03 2014 Jan 05 2018 Jan 07 2022 1986-01-03 / 2023-01-06 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 Jan 03 1986 1.5 1.0 0.5 0.0 -0.5 -1.0 -1.5 Weekly Cushing, OK WTI Spot Price FOB (log returns) Jan 05 1990 Jan 07 1994 Jan 02 1998 Jan 04 2002 Jan 06 2006 Jan 01 2010 Jan 03 2014 Jan 05 2018 Jan 07 2022 Definition - Power Transformation Let {Xt} be a times series. Definition 2.7 We call the nonlinear function g : R → R 􏰂 X tλ − 1 , λ ̸= 0 log(Xt) ,λ=0 power transformation and the series {Yt} given by Yt = g(Xt) power transformation of the series {Xt}. Such nonlinear transformations can help to improve the approximation to normality or to equalize variability over the length of a single series. 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com