CS计算机代考程序代写 Hive 1

1

Homework Assignment #4

Due Date: 11/17/21

Notes: Assignments are due at the start of class. No late assignments will be accepted

except for prior communications with appropriate reasons (medical, conference, etc.).

1. COVID-19 Data

Analyze the data at https://github.com/nytimes/covid-19-data, which is a series of data files

with cumulative counts of coronavirus cases in the United States over time from the New

York Times.

(1) Please use an ARIMA model to analyze the national level data (i.e., the us.csv file).

Using the most update-to-date data, what is the order of the model? Please visualize the

data and report the fitted model.

(2) Similar to the first question, now we want to test the forecasting performance of the

ARIMA model. Please leave the last 1 day, last 5 day and last 10 days of data, respectively,

and use the remaining data to fit an ARIMA model (in total, 3 models will be fitted). Then,

perform the 1-step, 5-step, and 10-step forecasting using each of the fitted ARIMA model,

respectively. Report the forecasted values, and forecasting root mean squared errors.

(3) Describe your observations on the above analysis, will ARIMA be a good model for

the analysis. If not, please discuss how you can better model the data (there is no need to

implement the approaches you discussed).

2. Motor Cycle Data

Analyze the “motor cycle data” (use “library(MASS)”, then load “data(mcycle)”, the data

are x=times, y=accel). Use smoothing splines to fit the data (see the help function for

smooth.spline). Try different degree of freedoms (df) in [5, 20]. Find the optimal degree of

freedom in [5, 10] according to the cross-validation criterion (in the function

“smooth.spline”, specify “cv=T”). What is the λ and cross-validation error of the best fit?

Please answer the following questions:

(1) The plot for the observation points and the optimal smoothing spline fit.

(2) The plot for the observation points and the three smoothing splines with df=5, 10, 15

(three different colored curves). Then you should also add a “legend” to denote these

lines.

(3) Plot the cross validation errors against different df’s from 5 to 20 (show both points and

lines). The step of df’s is 0.5. (Hint: from this plot you can find the optimal df.)

(4) Use the “wd” function in library “wavethresh”, perform the wavelet analysis with any

wavelet basis and resolution for the first 128 points of “accel”. Then soft threshold the

wavelet coefficients with the function “threshold”. Finally, reconstruct the thresholded

coefficients with “wr”. Compare the reconstructed profile with the spline profile in (1).

3. Functional Regression

Electrocardiogram (ECG) signal can reflect the heart rate condition, and is usually

equipped as a wearable sensor during the labor-intensive tasks. In this problem, by treating

the ECG signals as functional data, please extract relevant features and then use a

classification method to discriminate between normal and abnormal ECGs.

https://github.com/nytimes/covid-19-data

2

We will use a public dataset available at the UCR Times Series Classification Archive

(https://www.cs.ucr.edu/~eamonn/time_series_data_2018/). The training data set can be

found as ’ECG200TRAIN’, and the testing data set as ’ECG200TEST’. In the data, each

row is an observation. The first column is the class label, and the remaining columns are

measured ECG data.

Please try to extract functional features of ECG with B-spline, and then classify the signal

into normal and abnormal conditions with logistic regression. Report the classification

model and results.

https://www.cs.ucr.edu/~eamonn/time_series_data_2018/