CS代考 STSCI 4520 Review of the Final

BTRY/STSCI 4520 Review of the Final

Reminders/Comments
This is a 􏰂nal exam that means

You should work on your own
We will answer clari􏰂cation questions but not de-bug your code No extensions
but it is open book
You may use any written resource, but only packages that we speci􏰂cally refer to.
We are happy to explain what error messages mean, but you should isolate the line of code that generates them.
On the 􏰂nal
Structured like a homework; 3 questions, 6 parts each
Sub-parts are structured as (i): do the coding, (ii) and (iii): plot or comment on results.
Happy to tell you what code should do, written responses are up to you.

Question 1: Control Functionals
Reminder: control variates
Monte Carlo estimate if Xi ∼ f (·)
Eg (X ) = g (x )f (x )dx ≈ N g (Xi )
If we know Eh(X) = 0 and cor(g(X),h(X)) ̸= 0 then N1 􏰊g(Xi)−αN1 􏰊h(Xi)
has smaller variance if α = cov(g (X ), h(X ))/var(h(X )). α? Estimate from values of g(Xi) and h(Xi).

Example on Final
Part a: X ∼ N (0, 1), g (x ) = sin(x + 0.1)
Use h(x) = −ex + e1/2 because E 􏰅−eX 􏰆 = −e1/2.

Comparisons
Parts b/c with antithetic sampling
Note: for X ∼ N (0, 1), (X , −X ) are antithetic. We want to look at
1 Vanilla: N1 􏰉 g (Xi )
2 Control Variates: N1 􏰉g(Xi)−αN1 􏰉h(Xi)
3 Antithetic N1 􏰉(g (Xi ) + g (Xi ))/2
4 Both: N1 􏰉(g(Xi) + g(−Xi))/2 − αN1 􏰉(h(Xi) + h(−Xi))/2
For both, calculate alpha from values of (g (Xi ) + g (−Xi ))/2 and (h(Xi ) + h(−Xi ))/2
Question about relative improvement are for you to think about. Also bonus on using g(x) = sin(x) or h(x) = x.

Control Functionals
Part d taken from Oates, Girolami, Chopin, 2016
Idea, if we can make hN(x) closer to g(x) as N → ∞ we will do better faster.
Use X1,…,XN/2 to get a better hN(x), XN/2+1,…,XN for control variate integral.
How? Nonparametric smoothing!
hN(x) = 􏰊djkj(x)
dj make hN(x) approximate g(x) but we know 􏰌
kj(x)f(x)dx =0

Control Functional Details
Need Ekj (X ) = 0: modify a kernel function
kj (x) = φ′(x − Xj ; s) + φ(x − Xj ; s)f ′(x)/f (x)
where φ(z;s) is normal density with standard deviation s (dnorm(z,sd=s))
φ(z;s) = √1 e−z2/2s2. 2πs
kj(x)f(x)dx= (φ(x−Xj;s)f(x)dx+φ(x−Xj;s)f(x))dx
= dx (φ(x − Xj ; s)f (x))dx
= φ(∞ − Xj ; s )f (∞) − φ(−∞ − Xj ; s )f (−∞) = 0

Control Functional Implementation
1 Calculate matrix to store for 1, . . . , N/2
Kij =kj(Xi)=φ′(Xi −Xj;s)+φ(Xi −Xj;s)f′(Xi)/f(Xi)
remember φ(z; s) is N(0, s), f (x) is N(0, 1).
2 Obtain dj by regressing g(Xi) on the matrix Kij
You may use the function lm If dj is NA, set it to zero.
3 Plug in XN/2+1,…,XN to
hN(x)= 􏰊 djkj(Xl)
(note that the Xj used to de􏰂ne kj stay the same.
kj : sort of kernel/sort of basis. No smoothing penalty because g(Xi) has no error.

Using g(x) = sin(x)
N = 5 N = 50

Convergence Rates
Parts e/f: simulation based on control functionals Example is based on g(x) = sin(x), control variate h(x) = x (more e􏰀ective
than parts a-c). Note s changes with N.
End of Part f:
MSE ≈CN →log( MSE)≈αlog(N)+log(C)
looks like a linear model.

Question 2: Kernel Density Estimates
Kernel Density Estimation, given X1, . . . , Xn: n
fˆ ( x ) = 1 􏰊 φ ( x − X ; s ) sni
i=1 (φ(z;s) is still N(0,1) density).
E􏰄cient calculation Choice of s Simulation

Cross Validation
Part b to choose s, we want to make the density on a new point as high as possible.
fˆ−i (X ): density with bandwidth s evaluated at X estimated sii
without using Xi .
s small ⇒ peaks will miss new points, s large → low density
everywhere.
Use 􏰉 log fˆ−i (X ) as in log likelihood. si
Possible Densities CV Scores
CV can be calculated e􏰄ciently.

Simulation
Part c Form of density is like
Z|k ∼ N(Xk,s),k ∼ Unif(1,…,n)
1 Choose an observation at random
2 Simulate from N(Xk,s)
Can also be vectorized e􏰀ectively.

Part d Measure of distance between densities
(􏰍g(x)−􏰍f(x))2dx = 2−2 􏰍g(x)􏰍f(x)dx = 2−2A(g,f)
or use a􏰄nity
􏰌􏰌􏰎􏰎 A(g,f)= 􏰍g(x)􏰍f(x)dx = g(x)f(x)dx ≈ 1 􏰊 g(Zj)
f(x) N f(Zj) Statistical use: 􏰂nd θ to maximize a􏰄nity between fˆ(x) and f(x;θ)
Here we have fˆ(x) from KDE and f(x;θ) is N(θ,1):
􏰎φ(Zj −θ,1) fˆ(Zj )
A(θ) = 􏰊 N

Optimization and Robustness
To 􏰂nd maxθ A(θ) use optimize function between -10 and 10. Part e Reason for Hellinger is robustness 􏰁 adding a large value to
data doesn’t change estimate much.
Add O to data (8 points instead of 7)
Don’t change s with value of O (yes this is cheating). Do simulate new Monte Carlo points with each value.

Hellinger’s Posterior
Hooker and Vidyashankar, 2014 suggested 􏰂nding a posterior based on
P(θ) = e2nA(θ)π(θ)
Here we let π(θ) be N(0,10) and sample from P(θ) by MCMC
For this question:
Keep same MC samples with A(θ) for each θ. Experiment to get acceptance to around 30%; decide on thinning from visual inspection (don’t work too hard).

Stochastic Objective Functions
Monte Carlo integration ⇒ evaluation of A(θ) random. Using same samples makes problem smooth, still random.
Each θ 10 samples same θ
In MCMC, stochastic posteriors are ok, but decrease acceptance rate.

Mixed E􏰀ects Logistic Models
Example data
12 Subjects, each measured 7 times
Record toenail health + time from treatment
Logistic model (can use plogis)
eβ0+β1Xij +Zj P(Yij =1|Zj)= β0+β1X +Z
1+e ij j Xij = time of visit, Zj = e􏰀ect of subject j.

Generative Model
Don’t get to see subject e􏰀ects Zj : model for a new data set is 1 Generate new Z ,…,Z from N(0,σ2)
2 Then generate Yij from P(Yij = 1|Zj)
Since we don’t see Zj probability of observed data is 􏰌
P(Yi1,…,Yij) = Where we can write
P(Y1,…,Y7j|z) = 􏰋P(Yij|z) i=1
P(Yi1,…,Yij|z)φ(z;σ)dz
=􏰋P(Yij =1|z)Yij(1−P(Yij =1|z))1−Yij i=1
=􏰋[YijP(Yij =1|z)+(1−Yij)(1−P(Yij =1|z))] i=1
Use most convenient form, or dbinom.

Gauss Hermite Approximation
Part a Package ecoreg function GaussHermite(21) produces zq,
wq in columns Points and Weights so that 􏰌􏰊
P(Yi1,…,Yij|z)φ(z;σ)dz ≈ wqP(Yi1,…,Yij|σzq)
One subject at z ; circle size = w0.05. qq

log Likelihood
Parameter vector θ = (β0, β1, σ) has negative log likelihood l(θ) = −􏰊logP(Y1j,…,Y7j|θ)
(Note that θ changes P(Y1j , . . . , Y7j ) above 􏰁 dropped to make
notation cleaner).
Write function logistic.nll to evaluate:
Obtain P(Yij|zq,Xij,θ) from plogis (incl. σ change to zq). Evaluate P(Y1j,…,Y7j|zq,X·j,θ) from formulas or dbinom. Obtain
P(Y1j,…,Y7j|X1j,…,X7j,θ) ≈ 􏰊wqP(Y1j,…,Y7j|zq,X·j,θ) negative log likelihood is minus sum of logs.

Maximizing and Alternatives
Maximum likelihood estimator minimizes negative log likelihood. Use
optim(par=theta,fn=logistic.nll,data=toenail)
to get minimum.
Part b Monte Carlo alternative
Replace Gauss-Hermite approximation with
P(Yi1,…,Yij|z)φ(z;σ)dz ≈ N
with Z1,…,ZN ∼ N(0,σ2).
Possible to code so minimal changes from Part a.
P(Yi1,…,Yij|Zk)

Part c: MCMC
Fix σ = 1.2 (MCMC techniques for variances are more 􏰂ddly). Set a prior β0 ∼ N(0, 10), β1 ∼ N(0, 10).
Easiest to work with log posterior: log likelihood + log prior (remember log=TRUE in density functions)
Start θ = (−0.2, −0.2)
Use random walk proposal N(β0, 1) and N(β1, 0.2)
Obtain mean values and quantiles from chain.

Part d: Extended MCMC
What if we also include Z1, . . . , Z12 as values to be sampled? Whole likelihood is
P(Y,Z,θ) = 􏰋 􏰋P(Yij|Xij,Zj,θ) P(Zj|σ) ji
easier to turn into logs; keep σ = 1.2.
For MCMC proposal is
1 β0 ∼ N(β0, 0.25), β1 ∼ N(β1, 0.05) 2 Zj∼N(0,0.5)
Z1 chain gives P(Z1|Y11, . . . , Y71) (remember to thin)

Part e: SMC
Alternative random number generation from f (z):
1 Generate Z1,…,ZN ∼ g(z)
2 Resample (with replacementa) using probabilities Wi =f(Zi)/g(Zi)togetZi∗
sample has input prob=W to specify sampling probability.
Our example
Zi ∼N(0,σ2)
Wi =P(Y·1|Zi,X·1,θ)
Y·1, X·1 data from Subject 1, θ from Part a estimate.

For a hard optimization problem:
1 Write squared error function
2 try function to avoid errors

BTRY/STSCI 4520: Many Topics Covered Brie􏰃y Computational considerations
Programming strategies + computational complexity Numerical stability
Numerics Optimization
Random number generation Integration (numerical/Monte Carlo)
Statistics Simulation
Boostraps and permutation Nonparametric smoothing
Maximum likelihood and LASSO penalties MCMC
Many tools enable practical, modern statistics.

程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts