CS代考 STSCI 4520 Final Exam

BTRY/STSCI 4520 Final Exam

#########################################################
############# BTRY/STSCI 4520 ########################

Copyright By PowCoder代写 加微信 powcoder

############# Final Exam ########################
############# Due: May 20, 2018 ########################
#########################################################

# Instructions: save this file in the format _Final.R.
# Complete each question using code below the question number.
# You need only upload this file to CMS.

# Note, we assume your working directory contains any files
# that accompany this one.

# Further note: 10% will be deducted if your file produces
# an error when run. If your code produces an error and you
# cannot find it, comment out that portion of the code and we
# will give partial credit for it.

# Do not use either the function set.seed() or rm(list=ls())
# in your code.

#### IMPORTANT INSTRUCTIONS FOR THE FINAL

## The final is to be completed as though it were in-class.
## That means
## 1. You must complete this work independently, without
## collaboration or external assistance. Violations will
## be treated under the academic code.
## 2. We will not provide office hours. We will monitor
## Piazza and provide clarifications where questions are
## unclear.
## 3. While we will not, in general, debug your work for you,
## we are happy to try to explain error messages if you can
## isolate the line of code that produces them and, preferably,
## reproduce them with a few-line script.
## [Example: R tries to interpret dimensions, but doesn’t
## always get it right. So the code
## b = matrix(2,1,1)
## t(1:3)/b
## returns an error, but (1:3)/b does not. We think it reasonable
## to explain something like this to you if you get a message
## Error in t(1:3)/b : non-conformable arrays
## along with other instances of “I know where the error is, I just
## don’t know why R has a problem or how to fix it”]

################################################
# Question 1: Control Variates and Functionals #
################################################

# This question covers some recent ideas in Monte Carlo
# integration first published just two years ago in
# https://rss.onlinelibrary.wiley.com/doi/abs/10.1111/rssb.12185

# Recall that if we are interested in estimating
# mu = E g(X) where X ~ f(x)
# a Monte Carlo estimate is given by
# gbar = (1/n) sum g(Xi) where Xi ~ f(x)
# Control Variates use a function h(x) where we
# know that
# E h(X) = 0 if X ~ f(x)
# (If we know Eh(X) = m, then use h*(x) = h(x)-m in place of h(x).)
# Then if h(x) looks like g(x) we can expect that
# hbar = 1/n sum h(Xi)
# (which “should” be zero) might tell us something about
# how far gbar is from mu.

# That is, if hbar is too high, it is likely that gbar is also
# too high. In particular, we can improve gbar by
# gbar2 = gbar – alpha hbar
# Note that since hbar has mean zero, the expectation of gbar2 is
# still mu. We found in class that the optimal alpha was
# alpha = cov(g(X),h(X))/var(h(X))
# which we can calculate from the g(Xi) and h(Xi) values.

# a) Write a function to calculate Vanilla and Control Variate
# estimates of gbar from a sample X. It should take arguments
# X (a sample of data) and functions g and h, and should
# return a 2-vector giving the Vanilla and Control Variate estimates.

MCfunc1 = function(X,g,h){

# b) i) Use this to write a function to conduct a simulation
# study in which you set g to be sin(x+0.1) and h to be
# -exp(x) + exp(1/2) and generate X from a standard normal
# (the expectation of exp(X) is exp(1/2 when X is N(0,1)).
# You should have length(X) be N = 100 and conduct R = 50
# replications and return the percentage variance reduction
# due to control variates

MCsim1 = function(N,R){

# ii) What variance reduction do you achieve?

# c) We can combine both control variates and antithetic
# sampling by looking at
# (1/n) sum (g(Xi) + g(-Xi))/2 + alpha (h(Xi) + h(-Xi))/2
# (note that if f(x) is symmetric about zero, X and -X
# have the same distribution).

# i) Write a function to conduct a simulation as above
# but with antithetic sampling, too. You should return
# the Vanilla Monte Carlo variance and the variance
# reduction due to each of antithetic sampling, control
# variates and both.
# It makes a big difference that you do the antithetic
# average before the control variate calculation.
# In this case use the same number N of random samples
# for antithetic sampling rather than fixing the number
# of function evaluations.

MCsim2 = function(N,R){

return( list( Vanilla.Var = ,
Anti.Reduction = ,
Control.Reduction = ,
Both.Reduction = ) )

# ii) Which adds most reduction? Does the joint reduction
# exceed what you would expect after observing
# the reduction for each individual strategy?

# Bonus: The original question had you set g(x) = sin(x)
# and h(x) = x (we’ll use this with Control Functionals below).
# Why is this an uninformative choice of functions for the
# simulation you just conducted?

# d) The better the agreement between h and g, the more
# effective the control variates strategy will be. Eg,
# compare the h we used in parts b and c with using
# h(x) = x.

# Oates et. al. 2016 propose to approximate g non-parametrically.
# Specifically, we’ll let
# h(x) = sum_j d_j psi_j(x)
# To do this, we need to know the integral of h(x). To accomplish
# this, the authors propose using a modified form of a kernel.
# Specifically, they use the first X_1,…,X_(N/2) samples and
# psi_j(x) = phi'(x-X_j;s) + phi(x-X_j;s) f'(x)/f(x)
# where phi(x;s) is a normal density with standard deviation s and ‘
# represents a derivative with respect to x.
# The reason for this choice is that
# int psi_j(x)f(x)dx = int [phi'(x-X_j;s) f(x) + phi(x-X_j;s) f'(x)]dx
# = int [phi(x-X_j;s) f(x)]’ dx
# = phi(Inf,s)f(Inf) – phi(-Inf,s)f(-Inf)
# = 0
# With this choice of psi, we get the d_j by simple least-squares regression
# of g(Xi) on psi_j(Xi) for i = 1,…,N/2.
# To carry this out you should
# – evaluate psi_j(X_i) for for the X_i in your data
# – estimate d_j by using the matrix of phi_j(X_i) to predict g(X_i)
# using lm. Some coefficients will be returned as NA — set them to 0.
# – then use these d_j to give you sum_j d_j phi_j(x) for any x.

# i) Write a function to evaluate this h(x) for a vector of values x. It
# should have inputs
# x — the vector of values at which to evaluate h(x)
# X — a vector of draws from f(x)
# s — the bandwidth
# g — the function we want to approximate
# In this case you can assume that f(x) is a standard normal. It should
# return a vector of values of length(x). You may use

hfunc = function(x,X,s,g){

# ii) Using g(x) = sin(x), and s = 0.2, set

xvec = seq(-2,2,by = 0.1)

# iii) plot h(x) based on 100 standard normals.

# e) i) Write a function to implement
# 1. Vanilla Monte Carlo
# 2. Control Variates, using h(x) = x
# 3. Control Functionals
# – Use the first half of the values X_1,…,X_N
# to produce hfunc as in 1d
# – Use the second half to obtain the control variate
# estimate with this hfunc
# These should be based on X being N(0,1)
# Your function should take arguments N, g and s and return
# the three estimates above as a 3-vector

ControlFunctionals = function(N,g,s){

# ii) When g(x) = sin(x) we know that E g(X) = 0. Based on
# 50 replicates at N = 100, using s = 0.2, what is the
# expected squared error of each of your estimates?

# f) One of the claims of the paper is that we can
# achieve a faster convergence rate than (1/sqrt(N))
# because as we get more data, h(x) gets closer to g(x).
# i) Write a function to conduct a simulation study so that
# for each value of

Ns = c(100,200,500,1000)

# you estimate the root-average-squared-error (RMSE) of
# each of the three estimates in 1e based on R = 20
# simulations using values of s in

svec = 2/sqrt(Ns)

# that correspond to each value in Ns. (Here we have
# smaller s so that we can approximate g better when
# we have more data points).

# You should return the matrix of mean squared errors.

# CAUTION: this may take a few minutes to run.

ControlFuncSim = function(Ns,R,s){

# ii) Plot log(RMSE) versus log(N) for the three estimates.
# This should give you approximately a straight line with
# the order of convergence given by the slope.

# iii) Confirm that vanilla and control variates approaches have
# MSE = O(1/Ns); what is the order for Control Functionals?

####################################################
# Question 2: Kernel Density Estimation and #
# Minimum estimates #
####################################################

# In this question we will use Kernel Density Estimation
# as an intermediate step in statistical procedures.
# But we will first need to deal with KDE’s. Recall that
# we can estimate a nonparametric density by
# fhat(x,s) = (1/n) sum phi(x-Xi;s)
# (using the same notation as in Question 1).

# a) i) Write a function to evaluate fhat for a vector
# of evaluation points x, using data X with bandwidth s.
# You should not need to use either for loops or apply
# statements.

kde = function(x,X,s){

# The data we will use for this question comes from a study of
# the effectiveness of a drug pyrantel used to treat parasites
# in horses. These data record the logit of the ratio of
# the number of parasite eggs found in horse feces before
# and after treatment.

eggrate = c(-1.1653, -0.7538, -1.3218, -2.3394, -1.9766, -1.8718, -1.5041)

# ii) Using s = 0.2 produce a plot of the kernel density estimate
# on values of x from -2.5 to -0.5.

# b) In order to apply this well, we need to choose a
# bandwidth. To do this we’ll use cross-validation.
# Specifically we’ll define a score for each h as
# CV(s) = sum log fhat^(-i)(Xi;s)
# that is. For each Xi, we leave Xi out of the data
# and estimate fhat^(-i) without Xi, then we see how high
# fhat^(-i) thinks the density is at Xi. The higher
# the density, the better we are at predicting where future
# data might be. Hnce the optimal s is the one that maximizes
# i) Write a function to evaluate CV(s) for each s. Bonus 2 points
# for avoiding for loops and apply statements.

kde.cv = function(X,s){

# ii) Use this function to decide on the optimal s for the eggrate
# data above with possible bandwidths given in

ss = seq(0.1,1,by=0.1)

# to give the value sbest that maximizes the cross-validated
# likelihood.

# c) We will also need to be able to simulate data from fhat.
# To do this, we observe that we can express fhat as being exactly
# the density that corresponds to the following
# 1. Choose one of the Xi at random
# 2. Simulate a new z from a N(Xi,s) distribution.
# i) Write a function to simulate N observations from fhat given
# X and s. 2 bonus points for avoiding for loops

kde.sim = function(N,X,s){

# ii) Use this to simulate 1000 data points from your estimate
# above (with optimal bandwidth). Draw a histogram of these points
# and add the original data as vertical red lines.

# d) One of the advantages of having a density estimate is
# that there are more ways that you can compare your data to
# a parametric family of models.
# In particular, one measure of how different two distributions
# are is Hellinger distance
# HD(f,g) = int ( sqrt(f(x)) – sqrt(g(x)) )^2 dx
# = 2 – 2 int sqrt(f(x)*g(x)) dx
# For a family of densities f(x,theta) and an estimate
# fhat(x), the minimum estimate (MHDE)
# is obtained by maximizing
# A(theta) = \int sqrt( f(x,theta)*fhat(x) ) dx
# In our case, we will use a normal family for the mean
# f(x,theta) = dnorm(x,mean=theta)
# and our KDE.
# We still need to approximate the integral. Here we will
# sample from fhat and use the approximation
# A(theta,fhat) = int sqrt( f(x,theta)/fhat(x) ) fhat(x) dx
# ~= (1/N) sum sqrt( f(Xi,theta)/fhat(Xi) )
# where the Xi are sampled from fhat as in part c.

# i) Write a function to calculate A(theta), for f(x,theta) given
# by dnorm(x,mean=theta) using the original data X and a sample
# Xsamp generated from fhat with bandwidth s.

Afn = function(theta,X,Xsamp,s){

# ii) And a further function to find the optimum value based on
# Data X, bandwidth s and sample size N.
# You may use any optimization function you like (Giles
# used ‘optimize’) and can assume the maximum lies in
# (-10,10).

HD.opt = function(X,s,N){

# iii) Use this to obtain the value that minimizes Hellinger
# distance for the data using the optimal bandwidth
# above based on 1000 Monte Carlo samples.

hellinger.opt =

# e) i) One reason to look for the MHDE is robustness
# to outliers. To see this add an additional data point
# to your data set with values in

O = seq(-1,10,by=0.2)

# plot the value of the MHDE and the average
# value in your data versus the values in O above. Keep
# the optimal bandwidth you calculated in part c.

# ii) Which of these (average or MHDE) do you think is
# more reasonable?

# f) One of the reasons that Hellinger distance has
# received some attention is that it is supposed to be
# as precise as the mean, at least asymptotically.
# However, for a given sample, it can be difficult to
# work out how precise it is. One suggestion was
# to use it within a Bayesian analysis. See
# https://link.springer.com/article/10.1007%2Fs11749-014-0360-z
# The idea is to replace the posterior with
# exp( n*A(theta) ) pi(theta)
# of pi(theta) is the prior.
# i) Using a N(0,10) prior, write a function to conduct a random
# walk MCMC algorithm. The function should take in
# X — your data
# s — bandwidth for the KDE
# N — number of Monte Carlo samples to evaluate A
# nmc — length of the MCMC chain
# sig — random walk variance
# and return a set of posterior samples and the acceptance
# rate. You should keep the same Monte Carlo samples for
# each evaluation.

HD.MCMC = function(X,s,N,nmc,sig){

return( list( samples = , acc = ) )

# ii) Experimentally choose sig to give you between 30% and 40%
# acceptance rate and produce a histogram of the posterior
# based on 1000 samples after 1000 samples burn-in and thinning
# to every 5th sample.

# Bonus: in fact, MCMC can be used with a stochastic likelihood.
# That is, it still works if you use new Monte Carlo samples
# each time. Apply this using N = 100, but draw new samples
# for each step in the MCMC. How high can you get your
# acceptance rate?

HD.MCMC2 = function(X,s,nsamples,nmc,sig){

return( list( samples = , acc = ) )

######################################
# Question 3: Random Effects Methods #
######################################

# In the additional exercises after Lecture 18, we
# saw how Poisson random effects models can be fit
# with MCMC. Here we’ll look at extensions of
# this and Frequentist alternatives.

# In this question we will consider a logistic model
# with random effects. That is, we observe outcomes that
# are either 0 or 1 and write out
# P(Y=1|X,Z) = plogis( beta0 + beta1*x + z)
# where x is a covariate and z is a subject-specific
# random effect.
# We can turn this into an over-all probability by
# P(Y|X,z) = P(Y=1|X,Z)^Y*(1-P(Y=1|X,Z))^(1-Y)
# which evaluates to P(Y=1|X,Z) when Y = 1 and
# (1-P(Y=1|X,Z)) when Y = 0.

# Here we regard Z as a latent normal random variable that
# applies to several data points. By way of example, the

toenail = read.table(‘toenail.txt’,head=TRUE)
toenail$Subject = as.factor(toenail$Subject)

# gives an indicator of nail health (good or bad) for
# treatment for foot fungus. This is recorded for 7 visits each
# for 12 Subjects at different times given by number of Months.
# Here we will write Y_ij for the i-th measurement from the j-th
# patient and say
# P(Y_ij=1|X_ij,Z_j,beta0,beta1) = plogis( beta0 + beta1*X_ij + Z_j)
# where $X_ij$ is the time of the i-th visit of subject j. Z_j
# is a different value for each subject so that some subjects
# have consistently worse feet than others depending on the size
# We don’t get to see the Z_j, so we will specify that
# Z_j ~ N(0, \sigma^2)
# The likelihood of the observed data in group j is
# P(Y_1j,..,Y_7j|X_j) = \int prod( P(Y_ij|X_ij,Z_j,beta0,beta) ) phi(Z_j;0,sigma)dZ_j
# and the likelihood for the parameters beta0, beta1, sigma is
# l(beta0,beta1,sigma) = sum_j log P(Y_1j,…,Y_7j|X_j)

# a) When only one random effect applies to each notation we can
# evaluate the integral above numerically. To do this, we will
# use a Gauss-Hermite approximation. This can be obtained through
# the ecoreg package (you will need to install it).

library(ecoreg)

# The function gauss.hermite produce Gauss-Hermite points

gh = gauss.hermite(21)

# Where we can approximate \int f(x) phi(x) dx by
# sum(gh[,’Weights’]*f(gh[,’Points’])
# Note that this is for a standard N(0,1), to make this N(0,s^2)
# you need to multiply gh[,’Points] by s.

# i) Write a function to evaluate the negative of the log likelihood
# of the toenail data for any beta0, beta1 and sigma given
# the vector theta = c(beta0,beta1,sigma)

logistic.nll = function(theta,data){

# ii) Hence find the values of beta0, beta1 and sigma that
# maximize the log likelihood. You may use optim or any other
# optimizer that you wish.

# Good starting values are

theta = c(-0.2,-0.2,1.2)

# b) As an alternative, we can use Monte Carlo integration for
# each random effect. To do this, we’ll replace the Gauss-Hermite
# points and weights with a sample of 1000 N(0,1) random values
# and weights 1/1000. To change this to integrating with
# respect to a N(0,s^2) you can make the same changes as for
# Gauss-Hermite quadrature. Use the same 1000 points for each
# i) Write a function to calculate the negative log likelihood based
# on this approximation. You may hard-code the use of N = 1000
# Monte Carlo points.

logistic.nll.mc = function(theta,data){

# ii) How would you go about optimizing this likelihood? Do so.

# c) Given a likelihood, we can always carrying out an MCMC
# analysis to find a posterior. In this case we will
# fix sigma to be 1.2 to make the computation simpler.
# Setting your prior on beta0 and beta1 to be
# P(beta0,beta1) = N(0,10)*N(0,10)
# i) Carry out a random walk MCMC for 5000 steps using the
# likelihood you found in part a using a standard deviation

sds = c(1,0.2)

# for the steps in each dimension of your random walk. Use the same
# initial value for beta0 and beta1 as in part a.

# ii) How far is your expected a-posteriori estimator from the maximum
# likelihood estimate in part a?

# iii) Produce 95% credible intervals for beta0 and beta1

# d) Rather than doing the integration numerically, MCMC also
# allows you to simply include the random effects as additional
# parameters. That is, we can write the joint distribution of
# all of the data, the Z’s and and beta0 and beta1 (still keeping
# sigma fixed at 1.2) as
# P(Y_ij|X,Z_j,beta0,beta1)P(Z_j)P(beta0)P(beta1)
# And conducting a random walk on beta0,beta1,Z_1,..,Z_K

# i) Write a function to calculate this log posterior using the same
# prior as as in part c.

joint.posterior = function(theta,Z,data){

# and carry out a random walk MCMC using step sizes with variance

sds2 = sds/4

# for theta and steps of standard deviation 0.5 for Z.
# Initialize Z

程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com