代写 R C algorithm Scheme statistic #########################################################

#########################################################
################### BTRY 4520 ###########################
################## Homework 5 ###########################
############## Due: May 3, 2019 #########################
#########################################################

# Instructions: save this file in the format _HW5Q1.R.
# Complete each question using code below the question number.
# You need only upload this file to CMS.

# Note, we assume your working directory contains any files
# that accompany this one.

# Further note: 10% will be deducted if your file produces
# an error when run. If your code produces an error and you
# cannot find it, comment out that portion of the code and we
# will give partial credit for it.

# Do not use either the function set.seed() or rm(list=ls())
# in your code.

####################################
# Question 1: Logistic Regression #
####################################

# Logistic regression is used to model a binary outcome in terms
# of covariates. That is, we have data (Y_i,X_i) where Y_i is
# either 0 or 1, and we need to model P(Y = 1|X).

# Here we observe the constraints that this is a probability
# and must be between 0 and 1. A common way to achieve this is
# to write
#
# P(Y = 1|X,b) = g( b0 + b1 X1 + … + bp Xp )
#
# for some coefficients b = (b0,b1,…,bp).

# The function g now makes sure that the input is between 0 and 1.
# A classical choice of this is
#
# g(z) = exp(z)/(1+exp(z))
#
# Below you may also find it useful to observe that
#
# g'(z) = g(z)(1-g(z))

# For this question we will use the data in

grains = read.table(‘SpiderBeach.txt’,head=TRUE)

# which records the size of sand grains on 28 Japanese beaches
# and whether the burrowing wolf spider was found there. If
# we plot

plot( spiders ~ grain.size, data = grains)

# we can see an apparent increase in the likelihood of coming
# across spiders as grain sizes increase.

# A) Implement a function to take a matrix X, and a vector b and
# return the vector of probabilities P(Y=1|X)

LogistProb = function(X,b)
{

}

# B) Given data, we would like to estimate b. To do this we will
# maximize the likelihood of our observed outcomes. That is
#
# L(b) = prod P(Y=1|X,b)^Y * (1- P(Y=1|X,b))^(1-Y)
#
# (if you think about this, the terms in the product are
# P(Y=1|X,b) if Y is 1, and P(Y=0,X,b) if Y is 0).
#
# However, we generally use the log of the likelihood
#
# l(b) = sum Y log P(Y=1|X,b) + (1-Y) log( 1-P(Y=1|X,b) )

# Write functions to evaluate l(b) (here we assume that X is
# NOT given with a column of 1’s for the b0 term, but you can
# put it in if you like).

Logist.loglik = function(b,X,Y)
{

}

# its gradient with respect to b.

Logist.dloglik = function(b,X,Y)
{

}

# And it’s hessian with respect to b

Logist.d2loglik = function(b,X,Y)
{

}

# Note, it may help to represent these as matrix multiplications.
# You will likely get an expression of the form
#
# H_jk = sum_i w_i X_ij X_ik
#
# which is also obtained by t(X)%*%W%*%X if W is diagonal.

# C) Write a function to maximize the log likelihood
# using a Newton-Raphson method. Assume that you start
# from b = 0. You need only return the maximizing value of b.

Logist.NR = function(X,Y,tol=1e-8,maxit=1000)
{

}

# Provide an estimate of the parameters for the SpiderBeach
# data. You can check this with the glm function (make sure to
# specify family=’binomial’ for logistic regression).

# Plot the data and add a line giving the predicted probability
# as grain size changes, based on 101 equally spaced points over
# the range of x values. Upload this as HW5Q1C.png.

# D) Write a function to start with an estimated b and
# conduct a parametric bootstrap. That is, construct new
# data by simulating new Y’s from P(Y=1|X,b), then re-estmate
# b’s and construct percentile bootstrap confidence intervals
# for b, using the input b’s as your estimate.

# In this case you should also produce a bias correction. Your
# function should return a 2-by-p matrix with columns indicating
# the different entries in b.

Logist.boot = function(X,b,nboot = 1000)
{

}

# Provide confidence intervals for b0 and b1 in the SpiderBeach
# data.

# BONUS The lm function in R will give estimates that minimize
# a weighted sum of squared errors:
#
# sum W_i (Y_i – X_i b)^2
#
# with estimate
#
# bhat = (t(X)%*%W%*%X)^{-1} t(X)%*%W%*%Y
#
# where W is a diagonal matrix. Describe how you could use lm
# in place of explicit NewtonRaphson calculations.

source(‘HW5Q1.R’)

# Part A

all.equal( as.vector(LogistProb(matrix(c(0.4,0.6,0.8),3,1),c(1,2))),
c(0.8581489,0.9002495,0.9308616),tol=1e-6)

# Part B

all.equal( Logist.loglik(c(0,0.25),matrix(grains$grain.size,28,1),grains$spiders), -18.56395,tol=1e-6)

all.equal( Logist.dloglik(c(0,0.25),matrix(grains$grain.size,28,1),grains$spiders),
matrix(c(4.097711, 3.099247),2,1),tol=1e-6)

all.equal( Logist.d2loglik(c(0,0.25),matrix(grains$grain.size,28,1),grains$spiders)[1,],
c(-6.965371, -3.592547),tol=1e-6)

# Part C

all.equal( Logist.NR(matrix(grains$grain.size[-c(5,10)],26,1),grains$spiders[-c(5,10)]),
matrix(c(-2.205532,5.951406),2,1), tol=1e-6)

# Part D

set.seed(3042019)
all.equal( Logist.boot(matrix(grains$grain.size[-c(5,10)],26,1),c(-2,5))[,1],
c(-3.113813, 3.842143),tol=1e-2)
#########################################################
################### BTRY 4520 ###########################
################## Homework 5 ###########################
############## Due: May 3, 2019 #########################
#########################################################

# Instructions: save this file in the format _HW5Q1.R.
# Complete each question using code below the question number.
# You need only upload this file to CMS.

# Note, we assume your working directory contains any files
# that accompany this one.

# Further note: 10% will be deducted if your file produces
# an error when run. If your code produces an error and you
# cannot find it, comment out that portion of the code and we
# will give partial credit for it.

# Do not use either the function set.seed() or rm(list=ls())
# in your code.

################################################
# Question 2: Constrained Levenberg-Marquardt #
################################################

# So far, our optimization methods have all assumed that the maximum is inside
# the region that we are looking at. However, it’s possible that we want to define
# a boundary. If the function is still increasing at the boundary, you’d use that as
# the boundary (or some point along it) as your maximum.
#
# Note that it is important that you be able to return a point that is exactly on
# the boundary.
#
# In statistics, it’s often naturally to assume that at least some parameters are
# postive (variances, for example are always non-negative).
#
# Here we will examine the problem for beta = (beta_1,…,beta_p)
#
# beta = argmax f(beta) such that beta_j >= 0 for some particular j
#
# and we will use a modification of the Levenberg-Marquardt algorithm (see code in
# Lecture 10) to do this. Here we will use a three-dimensional
#
# beta = (beta_0,beta_1,beta_2)
#
# where we want beta_2 > 0. Note that the indices in R shift this so that
# beta_2 is given by beta[3] in a vector in R.

# However, we will start simply producing a Levenberg-Marquardt modification
# to the Gauss-Newton algorithm without any constraints.

# Our modivation for this is from the data

troutdata = read.table(‘troutpcb.txt’,header=T)

# which contains records of PCB levels of trout in Cayuga lake in 1972. There
# are 28 trout with their PCB levels and age in the data.

# We will want to minimize the squared error for the model
#
# log(PCB) = beta_0 + beta_1 * Age/( 1 + beta_2 + Age) + e
#
# in other words
#
# y = f(beta,Age) + e
#
# where beta_2 > 0. We will fit this with a Levenberg-Marquardt modification
# of Gauss-Newton methods, so we will need functions to calculate the vector
# of f(beta,Age) and J(beta,Age) = df/dbeta. These will be the arguments of
# fn and J in the functions below.
#
# As with Gauss-Newton methods, you can approximate the Hessian of squared error:
# H = t(J)%*%J
# The derivative of squared error is of course t(J)%*%(y – f)

# Ai. Write down a function fn to calculate f(beta,Age) and J(beta,Age)
# implementning the functions above.

# fn should return a vector of values, one for each element of the vector Age,
# J should return an n-by-3 matrix, with one row for each element of the vector Age.

fn = function(beta,Age){

}

J = function(beta,Age){

}

# Aii. Implement the Levenberg-Marquardt modification of the Gaus-Newton algorithm.
# That is, we use
#
# t(J(beta,Age))%*%(log(PCB)-fn(beta,Age))
#
# for the negative gradient of squared error and
#
# t(J(beta,Age))%*%J(beta,Age)
#
# for the Hessian. You may evaluate
#
# sum( (log(PCB) – fn(beta,Age))^2 )
#
# within your algorithm without conding a function to calculate this. Remember that
# in this case you are minimizing so that the Hessian should be positive definite
# not negative definite. Your Levenberg-Marquardt algorithm to should be modified to
# make sure this is the case. (Yes, you need to work out what that modification is).

# Your function should return the solution and iteration number as in implementations
# of GaussNewton or LevenbergMarquardt in Lecture10_code.r

GNLevMarq = function(beta,X,Y,fn,J,tol=1e-8,maxit=100){

return(list(beta=beta,iter=iter,iterhist=iterhist))

}

# Try this on the troutpcb data using a starting point

beta0 = c(-1,5,0.1)

result1 = GNLevMarq(beta0,troutdata$Age,log(troutdata$PCB),fn,J)

# Now we will consider adding a boundary condition.
#
# To enforce the boundary conditions, we modify the algorithm in the following manner
# for the current estimate beta:
#
# – If beta is _not_ on the boundary (ie beta_j > 0)
# – Calculate the Levenberg-Marquardt step with the current value of lambda.
# That is, calculate beta* = beta + g(lambda) where g is the step to be taken
#
# – If this step takes you over the boundary (beta^*_j < 0), shrink the # step until it hits the boundary. Ie, choose # beta* = beta + alpha*g(lambda) # where alpha is chosen so that beta^*_j >= 0 (and at least one is exactly 0)
#
#
# – If beta is on the boundary already:
# – Calculate the Levenberg-Marquardt step with the current value of lambda.
# That is, calculate beta* = beta + g(lambda) where g is the step to be taken
#
# – If beta* moves back inside the boundary, take the step.
#
# – If beta* crosses the boundary, calculate the Levenberg-Marquardt step dropping
# the dimensions of beta that are on the boundary. That is, if beta_2 = 0, keep
# beta_2 = 0 and proceed as though you were only optimizing beta_0 and beta_1
#
#
# Try the appropriate one of these with the current value of lambda. If it improves squared
# error, make this move and divide lambda by 2. If it does not, multiply lambda by 2 and try
# the appropriate move (from the old value of beta). Keep doubling lambda until you do
# improve squared error.

# Bi. Write a function to implement this step when beta is in the interior

LMStepInterior = function(beta,X,Y,fn,J,lambda)
{

}

# You can test this with value

beta0 = c(-1,1,0.02)

# At lambda = 0 you shoud stay in the interior of the space

LMStepInterior(beta0,troutdata$Age,log(troutdata$PCB),fn,J,0)

# At lambda = 50 you should try to cross the boundary and pull back to it

LMStepInterior(beta0,troutdata$Age,log(troutdata$PCB),fn,J,50)

# Bii. Write a function to implement the step when beta is on the boundary

LMStepBoundary = function(beta,X,Y,fn,J,lambda)
{

return( betastar )
}

# You can test this by consdering

beta1 = c(-1,1,0)

# With lambda = 0 you should move back into the interior of the space,

LMStepBoundary(beta1,troutdata$Age,log(troutdata$PCB),fn,J,0)

# with lambda = 50 you should stay on the boundary

LMStepBoundary(beta1,troutdata$Age,log(troutdata$PCB),fn,J,50)

# C. Put these together in a constrained Levenberg-Marquardt algorithm, for
# our purposes, specify the starting value of lambda rather than choosing
# based on the Hessian.

ConstrainedLevenbergMarquardt = function(beta,X,Y,fn,J,lambda=50,tol=1e-8,maxit=100){

return( list(sol=beta, iter=iter) )
}

# D. Apply this to the trout data with starting values given by
# beta0 above and lambda = 50

source(‘HW5Q2.R’)

# Part A

all.equal(fn(c(1,1,1),c(1,4,7)), c(4/3, 5/3, 16/9) )

all.equal( J(c(1,1,1),c(1,4,7))[2,], c(1,2/3,-1/9) )

all.equal(GNLevMarq(beta0,troutdata$Age,log(troutdata$PCB),fn,J,maxit=3)$beta,
matrix(c(-0.9452889,5.4935518,4.7699064),3,1),tol=1e-6)

# Part B

all.equal( LMStepInterior(beta0,troutdata$Age,log(troutdata$PCB),fn,J,0),
matrix(c(3.1723159,0.5947732,14.5796836),3,1),tol=1e-6)

all.equal( LMStepInterior(beta0,troutdata$Age,log(troutdata$PCB),fn,J,50),
matrix(c(-0.7943829,1.1842198,0),3,1),tol=1e-6)

all.equal( LMStepBoundary(beta1,troutdata$Age,log(troutdata$PCB),fn,J,0),
matrix(c(3.3586664,0.3999969,14.6399720),3,1),tol=1e-6)

all.equal( LMStepBoundary(beta1,troutdata$Age,log(troutdata$PCB),fn,J,50),
c(-0.5176988,1.4328901,0),tol=1e-6)

# Part C

# NOTE: here I check the first few steps of the algorithm. You, of course, should run it to convergence
# in your solutions

all.equal( ConstrainedLevenbergMarquardt(beta0,troutdata$Age,log(troutdata$PCB),fn,J,lambda=100,maxit=1)$sol,
matrix(c(-0.4248085 , 1.5205827 , 0.0000000),3,1),tol=1e-6)

all.equal( ConstrainedLevenbergMarquardt(beta0,troutdata$Age,log(troutdata$PCB),fn,J,lambda=100,maxit=2)$sol,
matrix(c(-0.17607800 , 1.81721084 , 0.01521779),3,1),tol=1e-6)

all.equal( ConstrainedLevenbergMarquardt(beta0,troutdata$Age,log(troutdata$PCB),fn,J,lambda=100,maxit=3)$sol,
matrix(c(-0.1211046 , 2.0503818 , 0.1583751),3,1),tol=1e-6)

all.equal( ConstrainedLevenbergMarquardt(beta0,troutdata$Age,log(troutdata$PCB),fn,J,lambda=100,maxit=4)$sol,
matrix(c(-0.2096945,2.3396091,0.4523016),3,1),tol=1e-6)

#########################################################
################### BTRY 4520 ###########################
################## Homework 5 ###########################
############## Due: May 3, 2019 #########################
#########################################################

# Instructions: save this file in the format _HW5Q1.R.
# Complete each question using code below the question number.
# You need only upload this file to CMS.

# Note, we assume your working directory contains any files
# that accompany this one.

# Further note: 10% will be deducted if your file produces
# an error when run. If your code produces an error and you
# cannot find it, comment out that portion of the code and we
# will give partial credit for it.

# Do not use either the function set.seed() or rm(list=ls())
# in your code.

###########################################
# Question 3: Varying Coefficient Models #
###########################################

# This function will consider a variation on local linear regression.
# Here we will instead look at what is described as a varying
# coefficient model.

# To describe this model this, the data in

ethanol = read.csv(‘ethanol.csv’)

# which contains readings of Nitrous Oxide (NOx) produced by a
# gasoline engine as well as the compression of the gas (C) in
# the engine and E — a measure of the air/ethanol mix.

# For these data, the relationship between NOx and C changes
# depending on E. If we examine

plot(ethanol$C[ethanol$E<0.8],ethanol$NOx[ethanol$E<0.8] ) plot(ethanol$C[ethanol$E>1],ethanol$NOx[ethanol$E>1] )

# We see somewhat different relationships. For that reason
# we consider a model of the form
#
# NOx = beta0(E) + beta1(E) C
#
# That is, NOx has a linear relationship with C, but that
# relationship depends in a non-parametric way on E.

# a) We will be interested in being able to find beta0(e) and
# beta1(e) for any value e in the range of E. To do this, we
# can perform a linear regression for Y~C (using the lm()
# function) with weights given by dnorm(e-E,sd=h) for bandwidth h.
#
# Write a function to compute a varying coefficient estimate
# of beta0(e) and beta1(e) at a given e and return the two-vector
# of coefficients.

VarCoef = function(e,data,h)
{

}

# Use this to plot the estimated varying coefficient
# estimates for the ethanol data over the range of
# E, using h = 0.1.

# b) To choose a bandwidth, we can again apply cross
# validation. It’s possible to represent our predicted
# values as a linear smoother yhat = S(h)Y, but here
# you can just do it manually with a for loop.
#
# Write a function to calculate the cross-validated
# squared error for predicting Y from these data.
#
# In this case, unlike in local linear regression,
# we haven’t centered by C and do need to use the whole
# model to predict each Y, rather than just an intercept.

VarCoefCV = function(h,data){

}

# use this to choose an optimal bandwidth from among those in

hs = seq(0.01,0.2,by = 0.01)

# c) Fixing the bandwidth at the optimum you found in b,
# perform a residual bootstrap for the values of
# beta0(e) and beta1(e) for each of the 61 values of e
# in

e = seq(0.6,1.2,by=0.01)

# (you should calculate all 61 values for each bootstrap
# re-sample).
#
# Write a function to use this sequence e, the data
# the bandwidth h, and produce a length(e)-by-3
# matrices called conf.band.b0 and conf.band.b1 with
# columns giving
#
# estimate – 1.96 boostrap.se, estimate, estimate+1.96 bootstrap se
#
# for each value in e where bootstrap.se is the standard
# deviation over nboot = 200 bootstrap samples.
#
# You should also calculate the percentage of bootstrap
# samples for which the bootstrap beta0(e) is inside the
# confidence interval for ALL of the values of e (this is
# like a multiple testing problem).

VarCoefBoot = function(e,data,h,nboot)
{

return( conf.band.b0 = , conf.band.b1 = ,
numinside.b0 = , numinside.b1 = )

}

# Use the result to plot our estimated beta0(e) and beta1(e) with
# confidence bands around them. Are these a good representation
# of the variability of your estimates? Upload your plot as HW5Q3c.png.

# d) The intervals in Part c are described as being POINTWISE
# confidence intervals because they cover the value of beta0(e)
# (say) at each e. But it’s possible that every individual
# bootstrap curve goes outside the bands at some value of e, even
# if at any given e, 95% of the bootstrap curves are contained
# in the interval. (eg, bootstrap 1 is outside at e[45], bootstrap 2
# at e[22] etc).

# Instead, UNIFORM confidence bands are intended to ensure that
# 95% of bootstrap curves are completely contained within the band.
# To produce these, we take an idea from Lab 5 and do the following.
#
# 1. From your bootstrap, calculate the t-statistics
#
# beta0.t(e) = (beta0.boot(e) – beta0.est(e))/sd(beta0.boot(e))
#
# where beta0.est is the original estimate, and beta0.boot is
# the estimate for each bootstrap sample. That is you should have
# length(e)-by-nboot t statistics.
#
# 2. Find the maximum value
#
# max.beta0.t = max |beta0.t(e)|
#
# over e for each bootstrap to give nboot values of max.beta0.t.
#
# 3. Obtain Q as the 95th quantile of max.beta0.t — that is
# at most 5% of the curves are ever more that Q*sd(beta0.boot(e))
# away from beta0.est(e).
#
# 4. Return the bands
#
# [beta0.est(e) – Q*sd(beta0.boot(e)), beta0.est(e) + beta0.boot(e)]
#
#
# Write a function to repeat your calcuations in part c, but
# replace 1.96 with Q calculated as above.

VarCoefBoot2 = function(e,data,h,nboot)
{

return( conf.band.b0 = , conf.band.b1 = ,
numinside.b0 = , numinside.b1 = )

}

# Do 95% of your bootstrap curves fall inside these bands?
# Produce a plot of your estimate and uniform confidence bands.
# Upload this plot as HW5Q3d.png

# Small bonus: Why do we bootstrap t-statistics instead of just looking
# at a band of fixed width C (ie [beta0(e) – C, beta0(e) + C]
# with C chosen so that 95% of curves lie in this band?

# Big Bonus: Describe how to estimate this model with splines
# instead of kernels? What would you do to estimate the
# partially linear model
#
# Y = beta0(E) + beta1*C?

source(‘HW5Q3.R’)

# Part a

all.equal(VarCoef(1,ethanol,0.1),c(2.51603346,0.00683997),tol=1e-6)
all.equal(VarCoef(0.5,ethanol,0.3),c(1.908030111,0.003452755), tol=1e-6)

# Part b

all.equal(VarCoefCV(hs[5],ethanol),6.395744,tol=1e-6)

# Part c

set.seed(19532706)
results = VarCoefBoot(c(0.6,1,1.2),ethanol,0.3,100)

all.equal(results$conf.band.b0[2,],c(1.400634,2.146129,2.891624),tol=1e-2)
all.equal(results$conf.band.b1[3,],c(-0.08096679,-0.017721659,0.04552347),tol=1e-2)
results$numinside.b0 == 89

# Part d

set.seed(28071929)
results = VarCoefBoot2(c(0.6,1,1.2),ethanol,0.3,100)

all.equal(results$conf.band.b0[3,],c(1.2337180,2.147877,3.062035),tol=1e-2)
all.equal(results$conf.band.b1[1,],c(-0.07719028,0.007852045,0.09289437),tol=1e-2)
results$numinside.b1 == 95#########################################################
################### BTRY 4520 ###########################
################## Homework 5 ###########################
############## Due: May 3, 2019 #########################
#########################################################

# Instructions: save this file in the format _HW5Q4.R.
# Complete each question using code below the question number.
# You need only upload this file to CMS.

# Note, we assume your working directory contains any files
# that accompany this one.

# Further note: 10% will be deducted if your file produces
# an error when run. If your code produces an error and you
# cannot find it, comment out that portion of the code and we
# will give partial credit for it.

# Do not use either the function set.seed() or rm(list=ls())
# in your code.

#########################################
# Question 4: 2 Dimensional Quadrature #
#########################################

# A) From JMR Exercize 11.4.4:

# Consider
#
# I = integral_0^1 3/2*sqrt(x) dx = 1
#
# Let Tn be the approximation to I given by the trapezoid rule with a partition
# of size n and let Sn be the approximation given by Simpson’s rule
# with a partition of size n.
# Let nT(epsilon) be the smallest value of n for which |Tn-I| < epsilon and let # nS(epsilon) be the smallest value of n for which |Sn-I| < epsilon. # # Write functions that calculate nT(epsilon) and nS(epsilon) nT = function(epsilon){ } nS = function(epsilon){ } # Use these to plot nT(epsilon) and nS(epsilon) against log(epsilon) for # epsilon = 2^(-k), k = 2, . . . , 16. # Upload this plot as HW5Q4A.png. # B) Now consider a two-dimensional integral. We know that # # integral_(-1)^1 integral_(-1)^1 sqrt(1 - x^2 - y^2) dx dy = 2*pi/3 # # where the square root is taken to be positive and set to zero when it isn't defined. # This is because it solves the equation for the unit sphere: x^2 + y^2 + z^2 = 1, for # z, so we are calculating the volume of a half sphere. # i. Write a function to implement Simpson's method to two dimensions # (hint: do the inner integral first, then the outer integral). It should use take inputs # n, for the number of grid points in each dimension and a function to calculate f(x,y) # and return the integral. Use the same number, n, of points in x and y. Simpson2d = function(fn,n){ } # ii. You can evaluate 2*pi/3 in R; many grid points do you need to make your partition so that # your numerical integral is accurate to within 10^(-4)? N.Needed = # iii. Now consider extending this scheme to integrate over p dimensions. For a fixed # partition size, how does the numerical cost of your scheme grow with p? You should answer # this in comments. # BONUS: Provide an analysis the approximation accuracy of your two- # dimensional scheme in terms of the size of the partition. source('HW5Q4.R') # Part A nT(5e-3) == 17 nS(5e-3) == 11 # Part B all.equal( Simpson2d( function(x,y){ exp(x)*cos(x+y) }, 21 ), 3.253831, tol=1e-6)