SS9850B Assignment #2 (Due April 3 2020)
Please carefully read the following instructions:
• There are 3 questions in this assignment. 4 marks for each subquestion.
• Show all your works in details and provide your code for computations. Also, well summarize your numerical results in each question.
• This assignment is an INDIVIDUAL work. Plagiarism will earn ZERO mark.
• ONLY R functions/packages mentioned in this course are allowed. Using other
functions/packages that are not used in this course will lose marks.
• Submit your solutions to the Drop Box in the course site. Delayed submission will lose marks.
1
Instructor: L.-P. Chen
1. (Wine Data Set) These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars. The analysis deter- mined the quantities of 13 constituents (including Alcohol, Malic acid, Ash, Alcalinity of ash, Magnesium, Total phenols, Flavanoids, Nonflavanoid phenols, Proanthocyanins, Color intensity, Hue,OD280/OD315 of diluted wines, and Proline) found in each of the three types of wines. The sample size is 178. The dataset is available in the course site. The main interest of this dataset is to study multiclassification of the three types of wines. Let y denote the predicted class of observations.
(a) Instead of estimating Σ (or Σi) by empirical estimates and then take an inverse, here we consider to use glasso to estimate Θ Σ−1 (or Θi Σ−1) directly. After that, replacing
i
the estimator Θ (or Θi) to the linear discriminant function (or quadratic discriminant
function) gives graphical-based linear discriminant analysis (or quadratic discriminant analysis). In this question, use the graphical-based linear discriminant analysis and quadratic discriminant analysis to obtain y. In addition, summarize the confusion table for y and y, use macro averaged metrics to evaluate recall, precision, F-measure, and then conduct performance of classification.
(b) Use the support vector machine method to obtain y. In addition, summarize the con- fusion table for y and y, use macro averaged metrics to evaluate recall, precision, F- measure, and then conduct performance of classification.
(c) Suppose that we only consider the predictor Proline. Use the nonparametric density estimation method in Section 4.1 to explore the multiclassification. For the choices of kernel function, we examine the Gaussian kernel and the biweight kernel; for the bandwidth selection, we use the cross-validation (CV) method.
After obtaining y, summarize the confusion table for y and y, use macro averaged metrics to evaluate recall, precision, F-measure, and then conduct performance of classification.
## Hint : programming code of CV function :
J=function(h){
fhat=Vectorize(function(x) density(X,from=x,to=x,n=1,bw=h)$y) fhati=Vectorize(function(i) density(X[−i],from=X[i],to=X[i],n=1,bw=h)$y)
F=fhati (1:length(X))
return(integrate(function(x) fhat(x)ˆ2,−Inf ,Inf)$value−2∗mean(F)) }
(d) Summarize your findings in (a)-(c).
2
2. (Simulation studies) Consider the following linear model:
y = X1β1 + X2β2 + X3β3 + X5β5 − 5√ρX6β6 + X7β7 + ε, (1)
where X = (X1, · · · , Xp) is a p-dimensional vector of covariates and each Xk is generated from N(0,1). The correlations of all Xk except X6 are ρ, while X6 has the correlation √ρ with all other p − 1 variables. Suppose that the sample size is n = 200.
(a) Show that X6 is marginally independent of y.
(b) Now, consider p = 1000 and generate the artificial data based on model (1) for 1000 repetitions. Specifically, let βi = 1 for every i = 1, · · · , 7 and set ρ = 0.8. After that, use the SIS and iterated SIS methods to do variable selection and estimate the parameters associated with selected covariates. Finally, summarize the estimator in the following table:
Table 1: Simulation result for (b)
∥∆β∥1 ∥∆β∥2 #S #FN
SIS Iterated SIS
(c) Here we consider the scenario that is different from (b). Let p = 50 and X ∼ N (0, ΣX ) with entry (j, k) in ΣX being 0.6|j−k| for j, k = 1, · · · , p. We generate the artificial data based on (1) for 1000 repetition with βi = 1 for every i = 1,··· ,7. After that, use the lasso, adaptive lasso, and Elastic net (set α = 0.5) methods to estimate the parameters. Finally, summarize numerical results in the following table.
Table 2: Simulation result for (c)
∥∆β∥1 ∥∆β∥2 #S #FN
lasso adaptive lasso Elastic net (α = 0.5)
(d) Summarize your findings for parts (b) and (c), respectively.
Note: Let β be the estimator, then ∆β is defined as ∆β = β − β with the ith component
being βi − βi. Therefore, ∥∆β∥1 and ∥∆β∥2 are defined as
• ∥∆β∥1 =
p
βi − βi; p 2
i=1
•∥∆β∥2= βi−βi .
i=1
3
3. In the class, we have discussed that the estimator of Θ can be obtained by solving Θ−1 −S−λΦ=0,
where S and Ψ are defined in the note.
(a) Let W denote the working version of Θ−1 such that WΘ = I, where I is the identity
matrix. Show that
W11β−s12 −λψ12 =0, (2) where β = −θ12 , and θ12, θ22, s12, and ψ12 are defined in the note.
θ22
(b) Suppose that β is obtained by solving (2). Please show that θ12 = −β · θ22 and θ22 = (w − w⊤ W −1w )−1. Also, interpret the meaning of θ .
22 12 11 12 12
4
Hint: Regarding simulation studies with 1000 repetitions.
In Question 2, you are asked to use simulation studies with 1000 repetitions to estimate the
parameters. Specifically, based on the kth artificial data that are independently generated, you are
able to obtain the estimator, denoted by β(k). As a result, with 1000 repetitions, the final estimator 1000
isgivenbyβ= 1 β(k). 1000
k=1
5