CS计算机代考程序代写 decision tree 01:960:486 COMPUTING AND GRAPHICS IN APPLIED STATISTICS Final Examination December 15, 2020

01:960:486 COMPUTING AND GRAPHICS IN APPLIED STATISTICS Final Examination December 15, 2020
Please submit your answers into Canvas today by 10:45 pm EST. I will be online for emergency situations (for students who cannot access the test, cannot submit the test, etc.). Assume I am proctoring the test and I have no knowledge regarding the questions on the test. Just state any assumption(s) you make if you have difficulty understanding a question and/or parts of a question. You do not need to carry through calculations.
There are 11 questions on 3 pages. Based on experience gained from administering other course exams, the strongly suggested method for completing the exam is to write your answers on your own paper and take pictures or scans of the pages to submit into Canvas. You only need to submit pages with answers. Just be sure to clearly label your answers. Also, do not wait until 10:45 pm to assemble your answers, Canvas will stop accepting submissions promptly at 10:45 pm. Partial credit is given so be sure to answer all questions. Do your best, exams are graded on a curve (if needed).
Reminder: The test is open book, open notes, open online resources. You are to work alone. Rutgers Honors Pledge is in effect.
1. Rutgers Honor Pledge is in effect for this Final Examination. Please include this statement at the beginning of your answer sheet: “On my honor, I have neither received nor given any unauthorized assistance on this final examination.”
2. (10 pts) A linear spline smoothing function with three knots is fitted to a data set with one response variable and one explanatory variable. Indicate whether each of the following statements is True or False. Briefly justify your answer.
a) The estimated smoothing function minimizes SSE among all linear functions fit through all the data.
b) The estimated smoothing function minimizes SSE among all cubic functions fit through all the data.
3. (10 pts) Suppose you have five observations on 2 predictors x1 and x2 and a response y: (xi1, xi2, yi) where i=1, 2, 3, 4, 5
and the following R statements: nn=length(y)
yy=y[2:nn]
xx1=x1[1:nn-1]
xx2=x2[1:nn-1] lmod=lm(yy~xx1+xx2)
Write the y-vector and the X-matrix for the model being estimated.
4. (10 pts) Suppose you have a time series of closing stock market prices for company A over each trading day in a year. You decide to use loess as a smoother but you are not sure what span to use and what degree polynomial you would use for the local smoothing. You are requested to use a polynomial of degree 1, 2, 3, or 4 together with a span that minimizes SSE. Briefly describe the next steps you would take to select an optimal span and an optimal degree polynomial in the loess smoothing.
5. (10 pts) You are presented with a time series of closing stock market prices for company B. For this analysis you are asked to perform loess smoothing and to use either a span of 0.25 or a span of 0.50. a) Which choice of span would most likely lead to a larger SSE. Explain your answer.
b) Which choice of span would most likely lead to a larger bias. Explain your answer.
6. (10 pts) The scree method is a popular method for determining the number of Principal Components to use in a Principal Component Regression.
Based on the following eigenvalues: 183.0 18.6 12.7 5.8 1.9 0.1
that resulted from performing Principal Component Analysis on six features, can you suggest another method to determine the number of Principal Components to use in a Principal Component Regression?

7. (10 pts) State whether you agree or disagree with the following statement. Explain your reasoning.
The term ’curse of dimensionality’ reflects the fact that nonparametric regression models would require a large sample size to estimate the underlying nonlinear functions accurately when the number of explanatory variables is large.
8. (10 pts) The area under the ROC curve for a Principal Components logistic regression model was computed to be equal to 0.5 when 0.3 was used as the cutoff for classifying an event. Does a value of 0.5 for the area under the ROC curve indicate a reasonable model for prediction? Explain.
b) What cutoff value leads to a sensitivity of 1.0?
9 (10 pts) In this clustering exercise p(# features)=2 and the 8 data points (x=feature 1,y=feature 2) defined by x=c(1,1,7,4,6,5,0,3), y=c(9,4,3,7,4,3,1,8) are plotted below. A clustering with k=3 and Manhattan distance were selected. The initial cluster assignment was made and the centroids of the 3 clusters were computed to be:
(1, 9), (4, 7) and (0,1).
a) Show the clustering results for the next iteration and their members and compute the new centroids.
b) Use the five data points (1,9), (1,4), (7,3), (6,4), (3,8) from Question 9a and Manhattan distance to perform “bottom-up” hierarchical clustering. Show each step of the clustering. Specify the final clusters and their members.

10. (10 pts) The figure below shows the classification tree rules for a binary response variable
Y , based on the value of two explanatory variables (X1,X2). Draw its corresponding decision tree diagram.
11. (10 pts) Suppose you have data points X1, X2, . . . X1000 from a sample of experimental units from country A that represent observations on a continuous variable such as weight. Suppose you have data points X1*, X2*, . . . X*1000 from a sample of experimental units from country B that represent observations on the same continuous variable.
The first stage of the analysis requires the forming of pairs of experimental units (X, X*) that minimize
sum of (X – X*)^2 over all 1000 paired units. Every unit from country A will be paired with a unit from country B and every unit from country B will be paired with a unit from country A. Calculus is not needed to answer the questions below.
a) Outline a greedy procedure you would use to form the pairs by minimizing (X – X*)^2 for each pairing. b) Outline a procedure you would use to minimize the sum of (X – X*)^2 over all paired units.
c) Would the two procedures likely lead to the same pairing? Explain your answer.
ENJOY YOUR HOLIDAY AND WINTER BREAK. TAKE CARE OF YOURSELF.