Homework 1
Homework 1
MAS 640
Due Thursday, January 25 by 1:15pm
You may work in groups of up to 3. The assignment must be completed in R Markdown. One
person per group should submit .Rmd and PDF files to Blackboard. Late submissions will be
penalized 10% per day.
All plots should be properly formatted (axis labels, units, title, etc..).
Chapter 1 Problems
1. Problem 1.1 from text – Use software to produce the time series plot of the data shown in Exhibit 1.2
on page 2. The data are in the file larain of the TSA package (?larain to read more. . . ). Comment
on any noticeable patterns in the time series.
data(larain)
plot(larain, main=’Annual rainfall (in inches) in Los Angeles, 1878-1992′, ylab=’Annual Rainfall (inches)’ )
Annual rainfall (in inches) in Los Angeles, 1878−1992
Time
A
n
n
u
a
l R
a
in
fa
ll
(i
n
ch
e
s)
1880 1900 1920 1940 1960 1980
1
0
2
0
3
0
4
0
2. Problem 1.2 from text – Reproduce the time series plot displayed in Exhibit 1.3 on page 3. The data
are in the file color of the TSA package. Comment on any noticeable patterns in the time series.
data(color)
plot(color, ylab=’Color Property’, main=’Color Property of 35 Consecutive Batches’, type=’o’)
1
Color Property of 35 Consecutive Batches
Time
C
o
lo
r
P
ro
p
e
rt
y
0 5 10 15 20 25 30 35
6
5
7
0
7
5
8
0
8
5
3. Problem 1.6 from text – Construct a time series plot with monthly plotting symbols for the Dubuque
temperature series as in Exhibit 1.7 on page 6. The data are in the file tempdub of the TSA package.
Comment on any noticeable patterns in the time series. Does the weather appear to be changing in
Dubuque?
data(tempdub)
plot(tempdub, ylab=’Monthly Average Temperage (Degrees Fahrenheit)’, main=’Monthly Average Temperature in Dubuque, 1/1964-12/1975′ )
points(y=tempdub, x=time(tempdub), pch=as.vector(season(tempdub)), cex=.75)
2
Monthly Average Temperature in Dubuque, 1/1964−12/1975
Time
M
o
n
th
ly
A
ve
ra
g
e
T
e
m
p
e
ra
g
e
(
D
e
g
re
e
s
F
a
h
re
n
h
e
it)
1964 1966 1968 1970 1972 1974 1976
1
0
2
0
3
0
4
0
5
0
6
0
7
0
JF
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
JJ
A
S
O
N
D
JF
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
JF
M
A
M
J
JA
S
O
N
D
4. The TSA package contains the data set beersales, which lists monthly beer sales (in millions of barrels)
from January 1975 to December 1990. Note that you are using the full time series here. In class, we
only used data from 1980 and later.
• Construct the time series plot of this data.
• Describe any patterns.
• Enhance the usefulness of this plot by adding monthly plotting symbols.
• Which months are associated with higher beer sales? Lower beer sales? Are these findings what
you expected?
data(beersales)
plot(beersales)
points(y=beersales, x=time(beersales), pch=as.vector(season(beersales)), cex=.75)
3
Time
b
e
e
rs
a
le
s
1975 1980 1985 1990
1
0
1
2
1
4
1
6
J
F
M
A
M
J
J
A
S
O
N
D
JFM
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
OND
J
F
M
A
M
J
J
A
S
O
N
D
J
F
MA
M
J
J
A
SO
N
D
JF
M
A
M
J
J
A
S
O
ND
J
F
M
A
M
JJ
A
S
O
N
D
J
F
MA
M
J
JA
S
O
N
D
JF
M
A
MJJ
A
S
O
N
D
JF
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
JF
M
A
M
J
J
A
SO
N
D
JF
MA
M
J
JA
S
O
N
D
J
F
M
A
MJ
J
A
S
O
N
D
J
F
M
A
MJJ
A
S
O
N
D
Chapter 3 Problems
5. Problem 3.4 from text – The data file hours in the TSA package contains monthly values of the average
hours worked per week in teh U.S. manufacturing sector for July 1982 through June 1987.
a. Display and interpret the time series plot for these data.
b. Reconstruct the time series plot but include plotting symbols for various months. Does your
interpretation change?
c. If you were to model this data using regression techniques, what would you include in the model?
data(hours)
plot(hours, ylab=’Average Hours’, main=’Average Hours per Week in U.S. Manufacturing, July 1982 – June 1987′)
points(y=hours, x=time(hours), pch=as.vector(season(hours)))
4
Average Hours per Week in U.S. Manufacturing, July 1982 − June 1987
Time
A
ve
ra
g
e
H
o
u
rs
1983 1984 1985 1986 1987
3
9
.0
4
0
.0
4
1
.0
JASO
N
D
J
F
M
AM
J
J
A
SON
D
JFM
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
J
A
SON
D
J
F
M
AM
J
J
A
S
O
N
D
JFM
A
M
J
6. Problem 3.5 from text – The data file wages in the TSA package contains monthly values of the average
hourly wages (in dollars) for workers in the U.S. apparely and textile products industrry for July 1981
through June 1987.
a. Display and interpret the time series plot for these data.
b. Use least squares to fit a linear time trend to this time series. Interpret the regression output.
Save the standardized residuals from the fit for further analysis.
c. Construct and interpret the time series plot of the standardized residuals.
d. Use least squares to fit a quadratic time trend to the wages time series. Interpret the regression
output. Save the standardized residuals form the fit for future analysis.
e. Construct and interpret the time series plot of the standardized residuals from part d.
f. Plot the sample autocorrelations for the standardized residuals from part d and interpret.
data(wages)
t <- time(wages)
t.squared <- t^2
#a
plot(wages)
5
Time
W
a
g
e
s
1982 1983 1984 1985 1986 1987
8
.0
8
.5
9
.0
9
.5
#b
fit1 <- lm(wages ~ t)
summary(fit1)
Call:
lm(formula = wages ~ t)
Residuals:
Min 1Q Median 3Q Max
-0.23828 -0.04981 0.01942 0.05845 0.13136
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.490e+02 1.115e+01 -49.24 <2e-16 *** t 2.811e-01 5.618e-03 50.03 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.08257 on 70 degrees of freedom Multiple R-squared: 0.9728, Adjusted R-squared: 0.9724 F-statistic: 2503 on 1 and 70 DF, p-value: < 2.2e-16 # Linear trend fits data well (Rsquared = .97) # Average wages are estimated to increase by $0.28 per month, on average (increasing trend) resids <- rstudent(fit1) 6 #c plot(ts(resids), type='o') Time ts (r e si d s) 0 10 20 30 40 50 60 70 − 3 − 2 − 1 0 1 #d fit2 <- lm(wages ~ t + t.squared) summary(fit2) Call: lm(formula = wages ~ t + t.squared) Residuals: Min 1Q Median 3Q Max -0.148318 -0.041440 0.001563 0.050089 0.139839 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.495e+04 1.019e+04 -8.336 4.87e-12 ***
t 8.534e+01 1.027e+01 8.309 5.44e-12 ***
t.squared -2.143e-02 2.588e-03 -8.282 6.10e-12 ***
—
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ‘ 1
Residual standard error: 0.05889 on 69 degrees of freedom
Multiple R-squared: 0.9864, Adjusted R-squared: 0.986
F-statistic: 2494 on 2 and 69 DF, p-value: < 2.2e-16
7
# new model fits slightly better (Rsquared = .986)
resids2 <- rstudent(fit2)
#e
plot(ts(resids2), type='o')
Time
ts
(r
e
si
d
s2
)
0 10 20 30 40 50 60 70
−
2
−
1
0
1
2
# This looks better than before, still not PERFECT
#f
acf(resids2)
8
5 10 15
−
0
.4
0
.0
0
.2
0
.4
0
.6
Series resids2
Lag
A
C
F
9
7. Problem 3.6 from text - Using the beersales time series from Problem 4. . .
a. Look back at the time series plot and remember what patterns exist..
b. Use least squares to fit a seasonal-means model to the data. Interpret the regression output. Save
the standardized residuals for future analysis.
c. Construct and interpret the time series plot of the standardized residuals from part b. Include
monthly plotting symbols to check for seasonality.
d. Use least squares to fit a season-means model plus a quadratic time trend to the data. Does this
model fit better? Save the standardized residuals for future analysis.
e. Construct and interpret the time series plot of the standardized residuals from part d. Again use
monthly plotting symbols to check for seasonality.
f. Calculate and interpret the sample autocorrelations for the standardized residuals from part d.
data(beersales)
t <- time(beersales)
t.squared <- t^2
month <- season(beersales)
#a
plot(beersales)
points(y=beersales, x=time(beersales), pch=as.vector(season(beersales)), cex=.75)
Time
b
e
e
rs
a
le
s
1975 1980 1985 1990
1
0
1
2
1
4
1
6
J
F
M
A
M
J
J
A
S
O
N
D
JFM
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
OND
J
F
M
A
M
J
J
A
S
O
N
D
J
F
MA
M
J
J
A
SO
N
D
JF
M
A
M
J
J
A
S
O
ND
J
F
M
A
M
JJ
A
S
O
N
D
J
F
MA
M
J
JA
S
O
N
D
JF
M
A
MJJ
A
S
O
N
D
JF
M
A
M
JJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
JF
M
A
M
J
J
A
SO
N
D
JF
MA
M
J
JA
S
O
N
D
J
F
M
A
MJ
J
A
S
O
N
D
J
F
M
A
MJJ
A
S
O
N
D
#b
fit1 <- lm(beersales ~ month)
summary(fit1)
Call:
10
lm(formula = beersales ~ month)
Residuals:
Min 1Q Median 3Q Max
-3.5745 -0.4772 0.1759 0.7312 2.1023
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 12.48568 0.26392 47.309 < 2e-16 *** monthFebruary -0.14259 0.37324 -0.382 0.702879 monthMarch 2.08219 0.37324 5.579 8.77e-08 *** monthApril 2.39760 0.37324 6.424 1.15e-09 *** monthMay 3.59896 0.37324 9.643 < 2e-16 *** monthJune 3.84976 0.37324 10.314 < 2e-16 *** monthJuly 3.76866 0.37324 10.097 < 2e-16 *** monthAugust 3.60877 0.37324 9.669 < 2e-16 *** monthSeptember 1.57282 0.37324 4.214 3.96e-05 *** monthOctober 1.25444 0.37324 3.361 0.000948 *** monthNovember -0.04797 0.37324 -0.129 0.897881 monthDecember -0.42309 0.37324 -1.134 0.258487 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.056 on 180 degrees of freedom Multiple R-squared: 0.7103, Adjusted R-squared: 0.6926 F-statistic: 40.12 on 11 and 180 DF, p-value: < 2.2e-16 resids1 <- rstudent(fit1) #c plot(y=resids1, x=as.vector(time(beersales)), type='l') points(y=resids1, x=as.vector(time(beersales)), pch=as.vector(season(beersales)) ,cex=.75) 11 1975 1980 1985 1990 − 3 − 2 − 1 0 1 2 as.vector(time(beersales)) re si d s1 J F M A M J J A S ON D J F M A M J JA S O N D J F M A M J JA S O N D J F M A M J J A S OND J F M A M J J A S O N D J F MA M J J A SO N D J F MA MJ J A S O N D J F M A M J J A S O N DJ F M AMJJ A S O N DJ F M A MJ J A S O N D J F M A M J J A S O ND J F M AM J J A S O N D J F M A M J J A S O ND J F M AM J JA S O N D J FM A MJ J A SO N D J F M A M J J A S ON D #d fit2 <- lm(beersales ~ month + t + t.squared) summary(fit2) Call: lm(formula = beersales ~ month + t + t.squared) Residuals: Min 1Q Median 3Q Max -2.03203 -0.43118 0.04977 0.34509 1.57572 Coefficients: Estimate Std. Error t value Pr(>|t|)
(Intercept) -7.150e+04 8.791e+03 -8.133 6.93e-14 ***
monthFebruary -1.579e-01 2.090e-01 -0.755 0.45099
monthMarch 2.052e+00 2.090e-01 9.818 < 2e-16 ***
monthApril 2.353e+00 2.090e-01 11.256 < 2e-16 ***
monthMay 3.539e+00 2.090e-01 16.934 < 2e-16 ***
monthJune 3.776e+00 2.090e-01 18.065 < 2e-16 ***
monthJuly 3.681e+00 2.090e-01 17.608 < 2e-16 ***
monthAugust 3.507e+00 2.091e-01 16.776 < 2e-16 ***
monthSeptember 1.458e+00 2.091e-01 6.972 5.89e-11 ***
monthOctober 1.126e+00 2.091e-01 5.385 2.27e-07 ***
monthNovember -1.894e-01 2.091e-01 -0.906 0.36622
monthDecember -5.773e-01 2.092e-01 -2.760 0.00638 **
12
t 7.196e+01 8.867e+00 8.115 7.70e-14 ***
t.squared -1.810e-02 2.236e-03 -8.096 8.63e-14 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5911 on 178 degrees of freedom
Multiple R-squared: 0.9102, Adjusted R-squared: 0.9036
F-statistic: 138.8 on 13 and 178 DF, p-value: < 2.2e-16
resids2 <- rstudent(fit2)
#e
plot(y=resids2, x=as.vector(time(beersales)), type='l')
points(y=resids2, x=as.vector(time(beersales)), pch=as.vector(season(beersales)), cex=.75 )
1975 1980 1985 1990
−
3
−
2
−
1
0
1
2
3
as.vector(time(beersales))
re
si
d
s2
J
F
M
A
M
J
J
A
S
ON
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
JA
S
O
N
D
J
F
M
A
M
J
J
A
S
ON
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
MA
M
J
J
A
SO
N
D
J
F
M
A
MJ
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
AMJJ
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
J
F
M
AM
J
JA
S
O
N
D
J
FM
A
MJ
J
A
S
O
N
D
J
F
M
A
M
J
J
A
S
O
N
D
#f
acf(resids2)
13
5 10 15 20
−
0
.1
0
.0
0
.1
0
.2
0
.3
Series resids2
Lag
A
C
F
8. Problem 3.8 from text - The dat afile retail in the TSA package lists total U.K. retail sales (in billions
of pounds) from January 1986 through March 2007.
a. Display and interpret the time series plot for these data. Be sure to use plotting symbols that
permit you to look for seasonality.
b. Use least squares to fit a season-means plus linear tiem trend to this time series. Interpret the
regression output. Save the standardized residuals for future analysis.
c. Construct and interpret the time series plot of the standardized residuals from part b.
data(retail)
#a
plot(retail)
points(y=retail, x=time(retail), pch=as.vector(season(retail)), cex=.75)
14
Time
S
a
le
s
1990 1995 2000 2005
5
0
1
0
0
1
5
0
JFM
AMJJA
SO
N
D
JFM
AMJ
JASO
N
D
J
FM
AMJ
JAS
O
N
D
J
FMA
MJJAS
O
N
D
JFM
AMJ
JASO
N
D
J
F
MAMJ
JAS
O
N
D
JFM
AMJJAS
O
N
D
JFM
AMJJAS
O
N
D
J
F
MAMJ
JAS
O
N
D
JFM
AMJ
J
AS
O
N
D
JFM
AMJ
JAS
O
N
D
JF
MAM
JJAS
O
N
D
JFM
AMJ
J
AS
O
N
D
JFM
AMJ
JAS
O
N
D
J
FM
AMJ
JAS
O
N
D
JFM
AMJ
JAS
O
N
D
JF
MAMJ
J
AS
O
N
D
JFM
AMJ
J
AS
O
N
D
JF
M
AMJJAS
O
N
D
JF
MAMJ
JAS
O
N
D
JF
M
AMJJAS
O
N
D
JF
M
#b
t <- time(retail)
t.squared <- t^2
month <- season(retail)
fit <- lm(retail ~ t + t.squared + month)
summary(fit)
Call:
lm(formula = retail ~ t + t.squared + month)
Residuals:
Min 1Q Median 3Q Max
-21.023 -2.067 -0.014 1.960 14.811
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 7.210e+04 3.142e+04 2.295 0.02261 *
t -7.581e+01 3.147e+01 -2.409 0.01675 *
t.squared 1.990e-02 7.881e-03 2.526 0.01219 *
monthFebruary -3.015e+00 1.276e+00 -2.363 0.01892 *
monthMarch 7.469e-02 1.276e+00 0.059 0.95336
monthApril 3.516e+00 1.291e+00 2.723 0.00694 **
monthMay 3.178e+00 1.291e+00 2.461 0.01454 *
monthJune 3.144e+00 1.291e+00 2.435 0.01560 *
15
monthJuly 6.125e+00 1.291e+00 4.744 3.60e-06 ***
monthAugust 3.210e+00 1.291e+00 2.486 0.01361 *
monthSeptember 3.499e+00 1.291e+00 2.710 0.00722 **
monthOctober 8.626e+00 1.291e+00 6.680 1.63e-10 ***
monthNovember 2.089e+01 1.291e+00 16.175 < 2e-16 ***
monthDecember 5.261e+01 1.291e+00 40.744 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.231 on 241 degrees of freedom
Multiple R-squared: 0.9773, Adjusted R-squared: 0.9761
F-statistic: 797.8 on 13 and 241 DF, p-value: < 2.2e-16
resids <- rstudent(fit)
#c
plot(y=resids, x=as.vector(time(retail)), type='l')
points(y=resids, x=as.vector(time(retail)), pch=as.vector(season(retail)), cex=.75)
1990 1995 2000 2005
−
4
−
2
0
2
4
as.vector(time(retail))
re
si
d
s
JFM
AM
J
J
AS
O
N
D
JF
MA
M
JJ
AS
O
N
D
J
F
M
AMJ
J
AS
O
N
D
J
FM
A
M
JJ
AS
O
N
D
JF
M
A
M
JJ
AS
O
N
D
J
F
M
AMJ
JAS
O
N
D
JF
MAMJ
J
AS
O
N
D
JFMA
M
J
J
ASO
N
D
J
FM
AMJ
J
A
S
O
N
DJFMAMJJA
S
O
N
D
J
FM
A
M
J
JA
SO
N
D
J
F
M
AM
JJ
AS
ON
D
JFMA
M
J
J
ASO
ND
J
FMA
MJ
JA
S
O
NDJ
FMAMJ
JA
S
ON
D
JF
M
A
M
J
JASO
N
D
J
F
MAM
J
J
AS
O
N
D
J
FM
A
MJ
J
A
S
O
N
D
JFM
AMJJ
AS
O
N
D
JF
M
A
M
J
J
AS
O
N
D
JFM
AMJJA
S
O
N
D
J
F
M
16
Chapter 1 Problems
Chapter 3 Problems