UNSW ECON2209 Assessment Problem Set 1: Solutions
At the start of an R session for this course, remember to type library(fpp3) in the R Studio Console. This will then load (most of) the R packages you will need, including some data sets.
• Total value: 10 marks.
• Submission is due on Friday of Week 3 (4 March), 5pm.
Copyright By PowCoder代写 加微信 powcoder
• A submission link is on the Moodle site under Assessments.
• Submit your answer document in PDF format. Your file name should follow this naming convention:
PS1_your first name_zID_your last name_ECON2209.pdf
For example: PS1_John_z1234567_Smith_ECON2209.pdf
• You get one opportunity to submit your file. Make sure that you submit the file that you intend to submit.
• Your submitted answers should include the R code that you used and any figures produced. Note that in the bottom right quadrant of RStudio, under the Plots tab there is an export button. This can be used to export figures for inserting into your answer document; e.g. select “Copy to Clipboard” and paste into a Word document. (Other methods are also possible.) You do not need to use Word. You could use e.g. R Markdown, in which case the figures would be automatically included in your PDF file.
• Problems are not all of equal value.
Problem 1 [3 marks]:
The USgas package contains data on the demand for natural gas in the US.
a. Install the USgas package: install.packages(“USgas”)
b. Create a tsibble from us_residential with date as the index and state as the key.
c. Plot (in one figure) the monthly residential natural gas consumption by state for Florida, Texas, and Minnesota from the start of 2010.
Submit your code, figure and any observations on the plotted series. You are expected to comment on any interesting aspects of each series and to compare them.
Solution 1
library(USgas)
us_tsibble_res <- us_residential %>%
as_tsibble(index=date, key=state)
us_tsibble_res %>%
filter(state %in% c(“Florida”, “Texas”, ” “, “Minnesota”)) %>%
filter(year(date) >= 2010) %>%
autoplot(y/1e3) +
labs(y = “billion cubic feet”) + xlab(“Month”)
2010 2015 2020
state Florida
Minnesota Texas
This solution puts gas into units of billions of cubic feet (by using ‘y/1e3’). This is not required, but it makes the figure a bit tidier than otherwise.
Any legitimate comments on the data series are fine, but can include:
• Pronounced seasonality, especially noticeable in , Texas and Minnesota, less so in Florida. Demand goes up in the winter months, which makes sense if natural gas is being used for heating.
• Florida has the smallest consumption of natural gas for residential use, possibly because it is warmer
than the other states.
• Texas is warmer than Minnesota in the winter yet has higher consumption. This could reflect that
natural gas is a more readily available option for household heating in Texas than it is in Minnesota. It could also reflect that there are cheaper sources of energy than natural gas for heating in Minnesota, such as hydroelectric power. But perhaps it simply reflects that the population of Minnesota (5.75 million) is a lot smaller than the population of Texas (29.1 million). Texas has a population around five times the size of Minnesota, yet its consumption of gas is not five times as large. So adjusting for population will show that per capita consumption is higher in Minnesota than Texas.
• Similar comparisons between the other states can also be made; e.g. note that Florida has a population of around 21.5 million, which is approximately 75% of the population of Texas, yet it has much less than 75% of natural gas consumption by Texas in the winter months.This is likely to be due to climate differences, as the lack of a discernible seasonal pattern in the series indicates that climate is not a big driver of natural gas consumption in Florida.
• Some data points seem to be missing near the end of the period covered by the data. (Notice how R still makes us a nice plot even with missing data.) Given their timing, the missing observations may be due to problems with data collection during the early period of the pandemic.
Problem 2 [4 marks]:
For this problem you will will explore features of a data series provided in us_employment. Specifically, the number of people employed in the “Leisure and Hospitality: Food Services and Drinking Places” sector of the US economy.
Use the following graphics functions: autoplot(), gg_season(), gg_subseries(), gg_lag(), ACF().
billion cubic feet
• Can you spot any seasonality, cyclicity and trend? • What do you learn about the series?
• What can you say about the seasonal patterns?
• Can you identify any unusual years?
Hint 1: Data are only available for this sector from January 1990, but for other sectors in the data set they are available for a longer period. Hence there are a lot of “NA” (“Not Available”) values in the data set for this sector. You probably do not want to plot these! You can use a filter command to drop observations before January 1990, but a simple way to exclude the NA values is to use the command drop_na():
Hint 2: For the lag plots, you may want to consider plotting up to 12 lags using gg_lag(Employed, geom= ‘point’, lags=1:12). Explain why this may be of interest.
Solution 2
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
drop_na() %>%
autoplot(Employed)
2020 [1M]
There is a strong trend and seasonality. Some cyclic behaviour is seen, with a big drop due to the global financial crisis.
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
drop_na() %>%
gg_season(Employed)
Apr Jul Aug Sep Oct
Employment seems highest around June and lowest in January in this sector, perhaps because fewer people go out to eat and drink following the Christmas- period.
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
drop_na() %>%
gg_subseries(Employed)
Apr Jul Aug Sep Oct
This plot confirms that there is on average a peak in employment in this sector around the warmer months of June, July and August, with employment being on average lowest in January. The downward spike in the plots for each month across the years shows that global financial crisis affected employment in all months.
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
drop_na() %>%
gg_lag(Employed, geom= ‘point’, lags=1:12)
CEU7072200001
6000 800010000120060000 800010000120060000 800010000120060000 80001000012000
lag(Employed, n)
Mar Apr Jul Aug Sep Oct Nov Dec
The dots do not deviate much from the 45 degree line for any lag length. This is because of the strong trend. But the relationship is closest for the case of 12 lags; the dots lie almost completely on the 45 degree line. This is because Employment in January is likely to be most similar to employment 12 months earlier (i.e. January of the previous year), and same for the other months. Hence our interest in looking at lags up to 12 to see if this is true.
us_employment %>%
filter(Title == “Leisure and Hospitality: Food Services and Drinking Places”) %>%
drop_na() %>%
ACF(Employed) %>%
autoplot()
6 12 18 24
In this plot, as with the previous one, the trend is so dominant that it is hard to see anything else. We need to remove the trend so we can explore the other features of the data.
Problem 3 [3 marks]:
For the following series, find an appropriate Box-Cox transformation in order to stabilise the variance, if required. Report your observations on each series and the transformations you tried.
• Cement from aus_production
• Business class passengers between Melbourne and Sydney from ansett
Solution 3
Australian cement production (2 marks)
aus_production %>%
autoplot(Cement)
Quarter [1Q]
Variation in this series appears to change across different levels of the series. This suggests that a quite strong
Cement acf
transformation may be appropriate. This can be seen if a strong transformation (such as log) is used.
aus_production %>%
autoplot(log(Cement))
1960 Q1 1980 Q1 2000 Q1
Quarter [1Q]
Guerrero’s method suggests that something around λ = −0.161 is appropriate. This is a very strong transformation, as it is close to 0.
## # A tibble: 1 x 1
aus_production %>%
features(Cement, features = guerrero)
## ## ## 1
lambda_guerrero
aus_production %>%
autoplot(box_cox(Cement, -0.161))
Quarter [1Q]
This series appears very similar to the case when logs were taken. Either seems to do a good job of stabilising the variation in the series so that it is not changing with the level of the series.
box_cox(Cement, −0.161) log(Cement)
Business class passengers between Melbourne and Sydney (1 mark)
ansett %>%
filter(Airports == “MEL-SYD”, Class == “Business”) %>%
autoplot(Passengers) +
labs(title = “Business passengers”, subtitle = “MEL-SYD”)
Business passengers MEL−SYD
The data series does not appear to vary proportionally with the level of the series. There are some periods in this time series that may need further attention, such as the very large and temporary increase in the business passengers in the first half of 1992, but they are probably better resolved with modelling rather than through using transformations.
Passengers
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com