CS代考程序代写 finance Modeling and Forecasting Trend

Modeling and Forecasting Trend
Zhenhao Gong University of Connecticut

Welcome 2
This course is designed to be:
1. Introductory
2. Leading by interesting questions and applications 3. Less math, useful, and fun!
Most important:
Feel free to ask any questions! ‡
Enjoy! 

Recall 3
􏰀 Unobserved Components: trend, seasonal, cycle, noise. yt = Tt + St + Ct + εt.
Or
􏰀 We focus on trend on this session.
yt = Tt × St × Ct × εt.

Trend 4
In business, finance, and economics, for example, trend is produced by slowly evolving preferences, technologies, institutions, and demographics.
Deterministic trend v.s. stochastic trend? We focus on the previous one.

􏰀 Example: Labor Force Participation Rate Females

􏰀 Example: Labor Force Participation Rate Males

Simple linear trend 7
A simple linear trend is a simple linear function of time, Tt = β0 + β1TIMEt.
􏰀 The variable TIMEt = t is constructed artificially and is called a “time trend” or “time dummy.”
􏰀 β0 isthevalueofthetrendattimet=0andβ1 isthe slope of the time trend line.

􏰀 Example: Increasing and Decreasing Linear Trends

Adequacy of the linear trends 9
How the linear trends fit the actual participation rate series? Does the linear trends adequate? How to measure the adequacy?

􏰀 Residual plot
Female Labor Force Participation Rate

􏰀 Residual plot for the linear trend Male Labor Force Participation Rate

Nonlinear trend 12
Some trend appears nonlinear or curved. So we don’t require the trends be linear, only they be smooth.

􏰀 Monthly Volume of Shares Traded on the New York Stock Exchange

Quadratic Trend Models 14
The quadratic tends are quadratic functions of time, Tt = β0 + β1TIMEt + β2TIME2t .
􏰀 Linear trend as a special case when β2 = 0 (Test).
􏰀 We use low-order polynomials to maintain smoothness.
􏰀 The shapes of quadratic trend depend on the signs and sizes of the coefficients.
􏰀 Use to provide local approximations.

􏰀 Various Shapes of Quadratic Trends

􏰀 Residual plot for the Quadratic Trends Volume on the New York Stock Exchange

Log linear trend 17
Log linear trend (Exponential trend) : a trend appears nonlinear in levels but linear in logarithms. If trend is characterized by constant growth at rate β1, then we can write
Tt = β0eβ1TIMEt .
The trend is a nonlinear (exponential) function of time in levels,
but in logarithms we have
ln (Tt) = ln (β0) + β1TIMEt.
Thus, ln(Tt) is a linear function of time.

􏰀 Various Shapes of Exponential Trends

􏰀 Log Volume on the New York Stock Exchange

􏰀 Residual plot for the Log Linear Trends
Log Volume on the New York Stock Exchange

Comparison 21
It’s important to note that, although the same sorts of qualitative trend shapes can be achieved with quadratic and exponential trend, there are subtle differences between them.
􏰀 The nonlinear trends in some series are well approximated by quadratic trend, while the trends in other series are better approximated by exponential trend.
􏰀 In residual plot for the NYSE volume data, the exponential trend looks much better than did the quadratic.

Estimating Trend Models 22
We fit our various trend models to data on a time series y using least-squares regression. That is, we use a computer to find
T
θˆ = argmin 􏰃 [yt − Tt(θ)]2 ,
θ
t=1
where θ denotes the set of parameters to be estimated.

Estimating Trend Models 23
For a linear trend, we has Tt(θ) = β0 + β1TIMEt and θ = (β0, β1), in which case the computer finds
T
􏰊βˆ,βˆ􏰋=argmin􏰃[y −β −β TME]2. 01 t01t
β0 ,β1 t=1
Similarly, in the quadratic trend case the computer finds
T
􏰊βˆ,βˆ,βˆ􏰋=argmin􏰃􏰓y −β −βTME −βTME2􏰔2. 012 t01t2t
β0 ,β1 ,β2 t=1

Estimating Trend Models 24
We can estimate an exponential trend in two ways. First, we can proceed directly from the exponential representation and let the computer find
􏰊ˆ ˆ 􏰋
β0, β1 = argmin
􏰃T 􏰕 β1TIMEt􏰖2 yt − β0e .
β0 ,β1 t=1 Alternatively, we let the computer find
T
􏰊βˆ,βˆ􏰋=argmin􏰃[lny −lnβ −β TIME]2. 01 t01t
β0 ,β1 t=1
Note that the fitted values are the fitted values of log y, so they must be exponentiated to get the fitted values of y.

Forecasting Trend 25
Consider first the construction of point forecasts. Suppose we’re presently at time T, and we want to use a trend model to forecast the h-step-ahead value of a series y. By linear trend model, at time T + h, the future time of interest is
yT+h = β0 + β1TIMET+h + εT+h.
If TIMET+h and εT+h were known at time T, we could immediately know the forecast value of yT+h.

Forecasting Trend 26
In fact, TIMET+h is known at time T (Why?) Specifically, TIMET+h = T + h.
Under the assumption that ε is simply independent zero-mean random noise, the optimal forecast of εT +h for any future period is 0, yielding the point forecast,
yT+h,T = β0 + β1TIMET+h.
The subscript “T + h, T ” means that the forecast is for time T + h and is made at time T .

Forecasting Trend 27
To make the point forecast operational, we just replace unknown parameters with their least squares estimates, yielding
yˆT+h,T = βˆ0 + βˆ1TIMET+h.
To form an interval forecast we assume further assume that εt ∼N(0,σ2),yieldinga95%intervalforecastyT+h±1.96σ. To make this operational, we use
yˆT+h ±1.96σˆ,
where σˆ is the standard error of the trend regression, an estimate of σ.

Forecasting Trend 28
To form a density forecast, we again assume that the trend regression disturbance is normally distributed. The density forecast is
N(yT+h,σ2).
To make this operational, we use the density forecast
N ( yˆ T + h , σˆ 2 ) .

Selecting Forecasting Models 29
We’ve introduced a number of trend models, but how do we select among them when fitting a trend to a specific series? Can we selecting the model with highest R2?
There are a number of powerful modern tools exist to assist with model selection such as MSE, adjusted R2, AIC, and SIC etc.

Selecting Forecasting Models 30
Most model selection criteria attempt to find the model with the smallest out-of-sample 1-step-ahead mean squared prediction error. First consider the mean squared error,
􏰂Tt=1 e2t MSE= T ,
where T is the sample size, et = yt − yˆt, and yˆt = βˆ0 + βˆ1TIMEt.
Remark: MSE is equivalent to the sum of squared residuals.

R2 31
Recall the formula for R2,
2 􏰂Tt=1 e2t
R =1−􏰂Tt=1(yt−y ̄)2.
􏰀 The denominator is the sum of squared deviations of y from its sample mean (total sum of squares) and depends on the data only.
􏰀 Selecting the model that minimizes MSE is equivalent to the model that maximizes R2.
􏰀 Not perform well in the out-of-sample forecasting.

Performance of MSE and R2 32
Selecting forecasting models on the basis of MSE, R2 or any of the equivalent forms turns out to be bad idea.
􏰀 􏰀 􏰀
As more variables are added to a model, in-sample MSE typically will fall continuously.
The reduction in MSE occurs even if the variables added are in fact of no use in forecasting the variable of interest.
In-sample overfitting: including more variables in a forecasting model won’t necessary improve its out-of-sample forecasting performance, although it will improve the model’s fit on historical data.
Remark: MSE is a biased estimator of out-of-sample 1-step-ahead prediction error variance.

Adjusted R2 33
Recall the formula for adjusted R2,
2 􏰂Tt=1 e2t /T − k
R =1−􏰂Tt=1(yt−y)2/T−1.
􏰀 The nominator is the mean squared error corrected for degrees of freedom (s2), which is the usual unbiased estimate of σ2.
􏰀 Selecting the model that maximizes adjusted R2 is equivalent to minimizes the standard error of the regression (or s2).

Degree of freedom penalty 34 To highlight the degree-of-freedom penalty, we can rewrite s2 as
a penalty factor times the MSE,
2 􏰌T􏰍􏰂Tt=1e2t s=T−kT.
􏰀 Including more variables in a regression will not necessarily lower s2 or raise R ̄2.
􏰀 The MSE will fall, but the degrees-of-freedom penalty will rise, so the product could go either way.
􏰀 As with s2, many of the most important forecast model selection criteria are of the form “penalty factor times MSE”.

AIC and SIC 35
Recall their formulas are:
AIC = e(2k)􏰂Tt=1 e2t and SIC = T( k )􏰂Tt=1 e2t .
TT
TT
􏰀 The idea is simply that if we want to get an accurate estimate of the 1-step-ahead out-of-sample prediction error variance, we need to penalize the in-sample residual variance (the MSE) to reflect the degrees of freedom used.
􏰀 How do the penalty factors associated with MSE, s2, AIC and SIC compare in terms of severity?

􏰀 Degrees-of-Freedom Penalties Various Model Selection Criteria

Model selection criteria 37
It’s clear that the different criteria penalize degrees of freedom differently. In addition, we could propose many other criteria by altering the penalty.
􏰀 How, then, do we select among the criteria?
􏰀 What properties might we expect a “good” model selection
criterion to have?
􏰀 Are s2, AIC and SIC “good” model selection criteria?

Consistency 38
We evaluate model selection criteria in terms of a key property called consistency. A model selection criterion is consistent if:
􏰀 when the true model is among the models considered, the probability of selecting the true DGP approaches one as the sample size gets large
􏰀 when the true model is not among those considered, so that it’s impossible to select the true DGP, the probability of selecting the best approximation to the true DGP approaches one as the sample size gets large.

Comparison 39
􏰀 MSE is inconsistent, because it doesn’t penalize for degrees of freedom; that’s why it’s unattractive.
􏰀 s2 does penalize for degrees of freedom but not enough to render it a consistent model selection procedure.
􏰀 The AIC penalizes degrees of freedom more heavily than s2, but it too remains inconsistent.
􏰀 The SIC, which penalizes degrees of freedom most heavily, is consistent.
Remark: SIC is a superior model selection criterion when the set of models is fixed but not when the set of models expand as the sample size grows.

Asymptotic efficiency 40
An asymptotically efficient model selection criterion chooses a sequence of models, as the sample size get large, whose 1-step-ahead forecast error variances approach the one that would be obtained using the true model with known parameters at a rate at least as fast as that of any other model selection criterion.
􏰀 In practical forecasting we usually report and examine both AIC and SIC. Most often they select the same model.
􏰀 When they don’t, this author recommends use SIC.