CS计算机代考程序代写 matlab case study AI Collinearity

Collinearity

Australian National University

(James Taylor) 1 / 5

4.0

KEKE

Perfect Multicollinearity

Perfect multicollinearity occurs in OLS problems when there is an

exact linear relation among the regressors/explanatory data

Mathematically, this is a huge problem, as the data matrix X is not

‘full rank’

Which means (X0X) is not invertible,

Which means b̂ = (X0X)�1X0y doesn’t exist.

The problem is really one of uniqueness, rather then existence. Lots

of di↵erent b̂ options will minimise the squared error.

(James Taylor) 2 / 5

3454845442

Perfect Multicollinearity – Example

Consider the model

y =

BBBBB
@

1 1 0

1 0 1

1 0.5 0.5

0 1 �1

CCCCC
A

BB
@

CC
A+ e

Suppose (b1, b2, b3) = (1, 2, 3) is a least-squares estimator for the

previous problem

Then (a1, a2, a3) = (0, 3, 4) is also a least squares estimator

Because in both cases Xb̂ is the same.

(James Taylor) 3 / 5

Ci Ut4
eg.u a asa a G

Iv Iv Iv

X B

Perfect Multicollinearity – Solution

Very easy solution

Identify a column of X which is a linear combination of the other

columns (usually many choices, any choice is fine)

Remove that column from X, and it’s associated b term

Repeat until the columns are independent

(James Taylor) 4 / 5

(Near) Multicollinearity

(almost) Multicollinearity occurs when the true values of the

regressors are collinear, though we see them with error, so it appears

that they are (just) not collinear.

Means that X0X is nearly singular, so numerical inversions will be

inaccurate.

This is sometimes a problem, but not always.

Can still give acceptable forecasts, but the regression coe�cients

cannot be interpreted.

(James Taylor) 5 / 5

Seasonality

Using Seasonal Dummies

Australian National University

(James Taylor) 1 / 9

4 I

Seasonality

A seasonal pattern is a regular intra-year pattern which repeats every

year

Can appear from many sources; preferences, technology, social

institutions etc.

For example:

any consumption or product which involves the weather (energy,

agriculture, medicine)

retail sales (peaks around Christmas)

personal tax advice (peaks around June/July)

(James Taylor) 2 / 9

Modelling Seasonality: An Example

Easiest way is to introduce seasonal dummy variables

Suppose yt is observed quarterly and has no trend

Could specify yt as fluctuating around µ:

yt = µ + et , E[et | µ] = 0

(James Taylor) 3 / 9

Modelling Seasonality: An Example

Under this specification, E[yt | µ] = µ regardless of which quarter
we’re in.

There is no seasonal variation.

Suppose our historical data shows peaks regularly in the fourth

quarter (Christmas).

How could we model that?

(James Taylor) 4 / 9

Seasonal Dummies

Ceteris parabis, we expect a larger yt if it’s the fourth quarter

Consider the specification

yt = µ + aDt + et

where Dt is a dummy variable which is 1 in the fourth quarter, and 0

otherwise.

What does this dummy variable do?

(James Taylor) 5 / 9

Seasonal Dummies

The dummy variable allows the expectation of yt to di↵er between

the fourth quarter and other quarters.

E[yt | µ, a,Dt = 0] = µ

E[yt | µ, a,Dt = 1] = µ + a

(James Taylor) 6 / 9

More Seasonal Dummies

Similarly, we could include more dummy variables to allow for a more

complex seasonal pattern.

yt = µ + a1D1t + a2D2t + a3D3t + a4D4t + et

In matrix form this is

y =

BBBBB
@

1 D11 D21 D31 D41

1 D12 D22 D32 D42
…

…
…

1 D1T D2T D3T D4T

CCCCC
A

BBBBBBB
@

CCCCCCC
A

+ e

(James Taylor) 7 / 9

X p

More Seasonal Dummies

That is, X is given by

X =

BBBBB
@

1 D11 D21 D31 D41

1 D12 D22 D32 D42
…

…
…

1 D1T D2T D3T D4T

CCCCC
A

As D1t +D2t +D3t +D4t = 1, we have that the rank of X is less

than 5.

Therefore X0X is not invertible.

Therefore we cannot uniquely solve OLS.

(James Taylor) 8 / 9

Dealing with Collinearity

Solving this problem is very easy, just drop the intercept (or more

often, any of the dummies).

Consider instead

yt = a1D1t + a2D2t + a3D3t + a4D4t + et

Then X0X is invertible (as X has only 4 columns)

Additionally, ai gives the mean of y in the i-th quarter

(James Taylor) 9 / 9

IE YtCd41 2

Australian Retail Sales

Seasonality Example

Australian National University

(James Taylor) 1 / 13

4 L

Forecasting Australian Retail Sales

Figure: AUS non-seasonally adjusted retail sales from 2009 Q1 to 2019 Q1, from

FRED

(James Taylor) 2 / 13

A Brief Look at the Data

Two prominent features: trend and seasonality

There is a clear trend pattern, there seems to be an overall linear

upward trend

There is a clear seasonal pattern – sales figures jump in the fourth

quarter

So our specification should allow for trending and seasonal variation

(James Taylor) 3 / 13

Allowing for Seasonal Variation

Focus on modelling seasonality, assume a linear trend for all

specifications

We will consider three di↵erent specifications for seasonal variation

Let Dit be a dummy variable for the i-th quarter, i.e.

Dit =

8
< : 1 if t is in the i-th quarter 0 if t is not in the i-th quarter (James Taylor) 4 / 13 Competing Specifications Consider the following three specifications S1 : yt = a0 + a1t + a4D4t + et S2 : yt = a0 + a1t + a1D1t + a4D4t + et S3 : yt = a1t + a1D1t + a2D2t + a3D3t + a4D4t + et In S1 we only include a dummy variable for the fourth quarter In S3 we allow all quarters to be di↵erent (and remove the intercept) Specification S2 is a compromise. (James Taylor) 5 / 13 Data Australian Retail from 2009Q1 to 2019Q1, not seasonally adjusted, in real terms The dataset AUSRetail.csv has two columns The first column contains the retail sales values, the second is a quarter indicator (James Taylor) 6 / 13 Setting up the problem Consider the third specification S3: S3 : yt = a1t + a1D1t + a2D2t + a3D3t + a4D4t + et Need this in the form y = Xb + e. 0 BBBBBBBBBBB @ y1 y2 y3 ... ... yT 1 CCCCCCCCCCC A = 0 BBBBBBBBBBB @ 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 ... ... ... ... ... ... ... ... ... ... T 1 0 0 0 1 CCCCCCCCCCC A 0 BBBBBBB @ a1 a1 a2 a3 a4 1 CCCCCCC A + 0 BBBBBBBBBBB @ e1 e2 e3 ... ... eT 1 CCCCCCCCCCC A (James Taylor) 7 / 13 Setting up the problem Consider the third specification S3: S3 : yt = a1t + a1D1t + a2D2t + a3D3t + a4D4t + et Need this in the form y = Xb + e. 0 BBBBBBBBBBB @ y1 y2 y3 ... ... yT 1 CCCCCCCCCCC A = 0 BBBBBBBBBBB @ 1 1 0 0 0 2 0 1 0 0 3 0 0 1 0 ... ... ... ... ... ... ... ... ... ... T 1 0 0 0 1 CCCCCCCCCCC A 0 BBBBBBB @ a1 a1 a2 a3 a4 1 CCCCCCC A + 0 BBBBBBBBBBB @ e1 e2 e3 ... ... eT 1 CCCCCCCCCCC A (James Taylor) 7 / 13 OO MATLAB Code for third specification load 'AUSRetail.csv'; y = AUSRetail(:,1); Q = AUSRetail(:,2); T = length(y); t = (1:T)'; %% construct 4 dummy variables D1 = (Q == 1); D2 = (Q == 2); D3 = (Q == 3); D4 = (Q == 4); %% 3rd spec: linear trend + all dummies X = [t D1 D2 D3 D4]; betahat = (X'*X)\(X'*y); yhat = X*betahat; MSE3 = mean((y�yhat).ˆ2); AIC = T*MSE3 + 5*2; BIC = T*MSE3 + 5*log(T); (James Taylor) 8 / 13 I an In-sample Fitting Measures Table: MSE, AIC and BIC under the three seasonality specifications S1 S2 S3 MSE 1.0358 0.8310 0.7194 AIC 52.467 44.072 39.497 BIC 61.035 52.6397 48.065 Number of parameters 3 4 5 As the specifications are nested, the MSE’s are decreasing Both AIC and BIC also prefer the more complex specification (James Taylor) 9 / 13 Hakan Fitted Lines Figure: Fitted Values for AUS non-seasonally adjusted retail sales from 2009 Q1 to 2019 Q1, from FRED (James Taylor) 10 / 13 Fitted Lines Figure: Fitted Values for AUS non-seasonally adjusted retail sales from 2009 Q1 to 2019 Q1, from FRED All fitted values look basically identical To fit the data better we would need seasonal changes that increase in magnitude (James Taylor) 11 / 13 Out-of-Sample Forecasting Table: MSFE under various specifications S1 S2 S3 h=1 1.5458 1.4374 1.3405 h=2 1.8277 1.7048 1.6084 The recursive forecasting exercises start from T0 = 15 The 3rd specification still gives the best forecasts (James Taylor) 12 / 13 MATLAB Code for MSFE for Second Specification T0 = 15; h = 2; % h�step�ahead forecast syhat = zeros(T�h�T0+1,1); ytph = y(T0+h:end); % observed y {t+h} for t = T0:T�h yt = y(1:t); D1t = D1(1:t); D2t = D2(1:t); D3t = D3(1:t); D4t = D4(1:t); Xt = [ones(t,1) (1:t)' D1t D4t]; beta2 = (Xt'*Xt)\(Xt'*yt); yhat2 = [1 t+h D1(t+h) D4(t+h)]*beta2; syhat(t�T0+1) = yhat2; end MSFE2 = mean((ytph�syhat).ˆ2); (James Taylor) 13 / 13 Yt AotaitthDittatDyttEt i I H iI gy yyy Exponential Change Case Study - Australian Retail Data Australian National University (James Taylor) 1 / 9 43 Exponential Change Previous models have had linear, and polynomial growth. These are nice, as they have simple OLS implementations. What if we think the data has an exponential growth pattern? Any data with a (roughly) fixed percentage growth will exhibit exponential growth. (James Taylor) 2 / 9 Exponential Change Consider the model yt = exp(a0 + a1t) exp(x1,tb1 + x2,tb2 + et) This model cannot be written as y = Xb + e, so we cannot use OLS (directly) However, by taking the log of both sides, we have ln(yt) = a0 + a1t + x1,tb1 + x2,tb2 + et We can solve this using OLS methods Sometimes we also want to take logs of the x terms, but it depends. Use your judgement. (James Taylor) 3 / 9 um Exponential Change Consider the model yt = exp(a0 + a1t) exp(x1,tb1 + x2,tb2 + et) This model cannot be written as y = Xb + e, so we cannot use OLS (directly) However, by taking the log of both sides, we have ln(yt) = a0 + a1t + x1,tb1 + x2,tb2 + et We can solve this using OLS methods Sometimes we also want to take logs of the x terms, but it depends. Use your judgement. (James Taylor) 3 / 9 Comparing Logged Models to Raw Models We need to be very careful when comparing logged models to non-logged models. CANNOT just work out MSFE immediately, as the scale will be all wrong (because ln(yt) is much smaller than yt). Need to: 1. Log the y value (maybe also some x values) 2. Estimate the b parameters 3. Forecast ln(ŷt+h) 4. Take the exponent to get ŷt+h 5. Use the ŷt+h to calculate MSFE (James Taylor) 4 / 9 Australian Retail Data Australian Retail Data seems like a strong candidate for exponential growth Growth in retail sales is driven by population growth, real per capita economic growth, and inflation All of these are percentage changes, so retail sales probably increases exponentially We will compare MSFE for logged and non-logged models (James Taylor) 5 / 9 Australian Retail Data Australian Retail Data seems like a strong candidate for exponential growth Growth in retail sales is driven by population growth, real per capita economic growth, and inflation All of these are percentage changes, so retail sales probably increases exponentially We will compare MSFE for logged and non-logged models (James Taylor) 5 / 9 Australian Retail Data - Exponential Model We don’t want to take logs of the dummy variables So really all we do is follow the list: 1. Log the y value 2. Estimate the b parameters 3. Forecast ln(ŷt+h) 4. Take the exponent to get ŷt+h 5. Use the ŷt+h to calculate MSFE (James Taylor) 6 / 9 Competing Specifications Consider the following three specifications lnS1 : ln(yt) = a0 + a1t + a4D4t + et lnS2 : ln(yt) = a0 + a1t + a1D1t + a4D4t + et lnS3 : ln(yt) = a1t + a1D1t + a2D2t + a3D3t + a4D4t + et These are our previous specifications, just with ln(y) (James Taylor) 7 / 9 O iny xet s I e Taz MATLAB Code for MSFE for Logged Third Specification y = log(y); % take the log of the data T0 = 15; h = 1; % h�step�ahead forecast syhat = zeros(T�h�T0+1,1); ytph = y(T0+h:end); % observed y {t+h} for t = T0:T�h yt = y(1:t); D1t = D1(1:t); D2t = D2(1:t); D3t = D3(1:t); D4t = D4(1:t); Xt = [(1:t)' D1t D2t D3t D4t]; beta2 = (Xt'*Xt)\(Xt'*yt); yhat2 = [ t+h D1(t+h) D2(t+h) D3(t+h) D4(t+h)]*beta2; syhat(t�T0+1) = yhat2; end syhat = exp(syhat); ytph = exp(ytph); %un�log the data MSFE2 = mean((ytph�syhat).ˆ2) (James Taylor) 8 / 9 gyu.ae y p q Out-of-Sample Forecasting Table: MSFE under logged and non-logged specifications S1 lnS1 S2 lnS2 S3 lnS3 h=1 1.5458 1.2077 1.4374 1.0807 1.3405 0.9839 h=2 1.8277 1.4387 1.7048 1.2676 1.6084 1.2041 In every case, the logged model performed better than its non-logged version (James Taylor) 9 / 9 Density Forecasting with OLS Australian National University (James Taylor) 1 / 8 44 Density Forecast Point forecasts are great, but we can do more We want to determine the conditional density f (yT+1 | IT , q) Finding this for OLS has some (fixable) problems: Don’t know the distribution of eT+1 Don’t know q = (b, s2)0 Don’t (yet) know even an estimate for s2 (James Taylor) 2 / 8 Density Forecast: Problem 1 OLS technically only requires E(e) = 0 Suppose we are willing to assume et ⇠ N (0, s2) The model specification is yT+1 = xT+1b + eT+1 We have yT+1 ⇠ N (xT+1b, s2) (James Taylor) 3 / 8 Tid SteNN 0,62 mean variance Density Forecast: Problem 2 We still don’t know q A reasonable approximation might be the OLS estimate q̂ = (b̂, ŝ2)0 That is, we calculate f (yT+1 | IT , q̂) We are ignoring parameter uncertainty here. This may be ill-advised. (James Taylor) 4 / 8 f 64 Density Forecast: Problem 3 While we know b̂, we still don’t know ŝ2. For now, we can use the unbiased estimator ŝ2 = 1 T � k T Â t=1 (yt � ŷt)2 where k is the number of b parameters. We will use a (potentially) di↵erent estimator when we get to ’Maximising Log-Likelihood’ (James Taylor) 5 / 8 p ex'x5x'y Density Forecast - Australian Retail Data Our previous sample ends at 2019Q1 Suppose we wish to produce a density forecast for sales in 2019Q2 under S3 Find the OLS parameter estimates, find ŝ2, compute xT+1b̂ As yT+1 is normal, the 95% confidence interval is xT+1b̂ ± 1.96ŝ (James Taylor) 6 / 8 Density Forecast - Australian Retail Data The OLS estimates are b̂ = 0 BBBBBBB @ 0.4822 57.5498 58.1587 59.1157 69.1578 1 CCCCCCC A We also have ŝ2 = 0.8194 Lastly, xT+1 = ⇣ T + 1 0 1 0 0 ⌘ (James Taylor) 7 / 8 time Density Forecast Putting it all together: Compute xT+1b̂ = 78.4104 Hence yT+1 ⇠ N (78.4104, 0.8194) In particular, a 95% confidence interval is (76.6362, 80.1845) (James Taylor) 8 / 8 K K

Related Posts