BTRY/STSCI 4520 Review of the Final
Reminders/Comments
This is a nal exam that means
Copyright By PowCoder代写 加微信 powcoder
You should work on your own
We will answer clarication questions but not de-bug your code No extensions
but it is open book
You may use any written resource, but only packages that we specically refer to.
We are happy to explain what error messages mean, but you should isolate the line of code that generates them.
On the nal
Structured like a homework; 3 questions, 6 parts each
Sub-parts are structured as (i): do the coding, (ii) and (iii): plot or comment on results.
Happy to tell you what code should do, written responses are up to you.
Question 1: Control Functionals
Reminder: control variates
Monte Carlo estimate if Xi ∼ f (·)
Eg (X ) = g (x )f (x )dx ≈ N g (Xi )
If we know Eh(X) = 0 and cor(g(X),h(X)) ̸= 0 then N1 g(Xi)−αN1 h(Xi)
has smaller variance if α = cov(g (X ), h(X ))/var(h(X )). α? Estimate from values of g(Xi) and h(Xi).
Example on Final
Part a: X ∼ N (0, 1), g (x ) = sin(x + 0.1)
Use h(x) = −ex + e1/2 because E −eX = −e1/2.
Comparisons
Parts b/c with antithetic sampling
Note: for X ∼ N (0, 1), (X , −X ) are antithetic. We want to look at
1 Vanilla: N1 g (Xi )
2 Control Variates: N1 g(Xi)−αN1 h(Xi)
3 Antithetic N1 (g (Xi ) + g (Xi ))/2
4 Both: N1 (g(Xi) + g(−Xi))/2 − αN1 (h(Xi) + h(−Xi))/2
For both, calculate alpha from values of (g (Xi ) + g (−Xi ))/2 and (h(Xi ) + h(−Xi ))/2
Question about relative improvement are for you to think about. Also bonus on using g(x) = sin(x) or h(x) = x.
Control Functionals
Part d taken from Oates, Girolami, Chopin, 2016
Idea, if we can make hN(x) closer to g(x) as N → ∞ we will do better faster.
Use X1,…,XN/2 to get a better hN(x), XN/2+1,…,XN for control variate integral.
How? Nonparametric smoothing!
hN(x) = djkj(x)
dj make hN(x) approximate g(x) but we know
kj(x)f(x)dx =0
Control Functional Details
Need Ekj (X ) = 0: modify a kernel function
kj (x) = φ′(x − Xj ; s) + φ(x − Xj ; s)f ′(x)/f (x)
where φ(z;s) is normal density with standard deviation s (dnorm(z,sd=s))
φ(z;s) = √1 e−z2/2s2. 2πs
kj(x)f(x)dx= (φ(x−Xj;s)f(x)dx+φ(x−Xj;s)f(x))dx
= dx (φ(x − Xj ; s)f (x))dx
= φ(∞ − Xj ; s )f (∞) − φ(−∞ − Xj ; s )f (−∞) = 0
Control Functional Implementation
1 Calculate matrix to store for 1, . . . , N/2
Kij =kj(Xi)=φ′(Xi −Xj;s)+φ(Xi −Xj;s)f′(Xi)/f(Xi)
remember φ(z; s) is N(0, s), f (x) is N(0, 1).
2 Obtain dj by regressing g(Xi) on the matrix Kij
You may use the function lm If dj is NA, set it to zero.
3 Plug in XN/2+1,…,XN to
hN(x)= djkj(Xl)
(note that the Xj used to dene kj stay the same.
kj : sort of kernel/sort of basis. No smoothing penalty because g(Xi) has no error.
Using g(x) = sin(x)
N = 5 N = 50
Convergence Rates
Parts e/f: simulation based on control functionals Example is based on g(x) = sin(x), control variate h(x) = x (more eective
than parts a-c). Note s changes with N.
End of Part f:
MSE ≈CN →log( MSE)≈αlog(N)+log(C)
looks like a linear model.
Question 2: Kernel Density Estimates
Kernel Density Estimation, given X1, . . . , Xn: n
fˆ ( x ) = 1 φ ( x − X ; s ) sni
i=1 (φ(z;s) is still N(0,1) density).
Ecient calculation Choice of s Simulation
Cross Validation
Part b to choose s, we want to make the density on a new point as high as possible.
fˆ−i (X ): density with bandwidth s evaluated at X estimated sii
without using Xi .
s small ⇒ peaks will miss new points, s large → low density
everywhere.
Use log fˆ−i (X ) as in log likelihood. si
Possible Densities CV Scores
CV can be calculated eciently.
Simulation
Part c Form of density is like
Z|k ∼ N(Xk,s),k ∼ Unif(1,…,n)
1 Choose an observation at random
2 Simulate from N(Xk,s)
Can also be vectorized eectively.
Part d Measure of distance between densities
(g(x)−f(x))2dx = 2−2 g(x)f(x)dx = 2−2A(g,f)
or use anity
A(g,f)= g(x)f(x)dx = g(x)f(x)dx ≈ 1 g(Zj)
f(x) N f(Zj) Statistical use: nd θ to maximize anity between fˆ(x) and f(x;θ)
Here we have fˆ(x) from KDE and f(x;θ) is N(θ,1):
φ(Zj −θ,1) fˆ(Zj )
A(θ) = N
Optimization and Robustness
To nd maxθ A(θ) use optimize function between -10 and 10. Part e Reason for Hellinger is robustness adding a large value to
data doesn’t change estimate much.
Add O to data (8 points instead of 7)
Don’t change s with value of O (yes this is cheating). Do simulate new Monte Carlo points with each value.
Hellinger’s Posterior
Hooker and Vidyashankar, 2014 suggested nding a posterior based on
P(θ) = e2nA(θ)π(θ)
Here we let π(θ) be N(0,10) and sample from P(θ) by MCMC
For this question:
Keep same MC samples with A(θ) for each θ. Experiment to get acceptance to around 30%; decide on thinning from visual inspection (don’t work too hard).
Stochastic Objective Functions
Monte Carlo integration ⇒ evaluation of A(θ) random. Using same samples makes problem smooth, still random.
Each θ 10 samples same θ
In MCMC, stochastic posteriors are ok, but decrease acceptance rate.
Mixed Eects Logistic Models
Example data
12 Subjects, each measured 7 times
Record toenail health + time from treatment
Logistic model (can use plogis)
eβ0+β1Xij +Zj P(Yij =1|Zj)= β0+β1X +Z
1+e ij j Xij = time of visit, Zj = eect of subject j.
Generative Model
Don’t get to see subject eects Zj : model for a new data set is 1 Generate new Z ,…,Z from N(0,σ2)
2 Then generate Yij from P(Yij = 1|Zj)
Since we don’t see Zj probability of observed data is
P(Yi1,…,Yij) = Where we can write
P(Y1,…,Y7j|z) = P(Yij|z) i=1
P(Yi1,…,Yij|z)φ(z;σ)dz
=P(Yij =1|z)Yij(1−P(Yij =1|z))1−Yij i=1
=[YijP(Yij =1|z)+(1−Yij)(1−P(Yij =1|z))] i=1
Use most convenient form, or dbinom.
Gauss Hermite Approximation
Part a Package ecoreg function GaussHermite(21) produces zq,
wq in columns Points and Weights so that
P(Yi1,…,Yij|z)φ(z;σ)dz ≈ wqP(Yi1,…,Yij|σzq)
One subject at z ; circle size = w0.05. qq
log Likelihood
Parameter vector θ = (β0, β1, σ) has negative log likelihood l(θ) = −logP(Y1j,…,Y7j|θ)
(Note that θ changes P(Y1j , . . . , Y7j ) above dropped to make
notation cleaner).
Write function logistic.nll to evaluate:
Obtain P(Yij|zq,Xij,θ) from plogis (incl. σ change to zq). Evaluate P(Y1j,…,Y7j|zq,X·j,θ) from formulas or dbinom. Obtain
P(Y1j,…,Y7j|X1j,…,X7j,θ) ≈ wqP(Y1j,…,Y7j|zq,X·j,θ) negative log likelihood is minus sum of logs.
Maximizing and Alternatives
Maximum likelihood estimator minimizes negative log likelihood. Use
optim(par=theta,fn=logistic.nll,data=toenail)
to get minimum.
Part b Monte Carlo alternative
Replace Gauss-Hermite approximation with
P(Yi1,…,Yij|z)φ(z;σ)dz ≈ N
with Z1,…,ZN ∼ N(0,σ2).
Possible to code so minimal changes from Part a.
P(Yi1,…,Yij|Zk)
Part c: MCMC
Fix σ = 1.2 (MCMC techniques for variances are more ddly). Set a prior β0 ∼ N(0, 10), β1 ∼ N(0, 10).
Easiest to work with log posterior: log likelihood + log prior (remember log=TRUE in density functions)
Start θ = (−0.2, −0.2)
Use random walk proposal N(β0, 1) and N(β1, 0.2)
Obtain mean values and quantiles from chain.
Part d: Extended MCMC
What if we also include Z1, . . . , Z12 as values to be sampled? Whole likelihood is
P(Y,Z,θ) = P(Yij|Xij,Zj,θ) P(Zj|σ) ji
easier to turn into logs; keep σ = 1.2.
For MCMC proposal is
1 β0 ∼ N(β0, 0.25), β1 ∼ N(β1, 0.05) 2 Zj∼N(0,0.5)
Z1 chain gives P(Z1|Y11, . . . , Y71) (remember to thin)
Part e: SMC
Alternative random number generation from f (z):
1 Generate Z1,…,ZN ∼ g(z)
2 Resample (with replacementa) using probabilities Wi =f(Zi)/g(Zi)togetZi∗
sample has input prob=W to specify sampling probability.
Our example
Zi ∼N(0,σ2)
Wi =P(Y·1|Zi,X·1,θ)
Y·1, X·1 data from Subject 1, θ from Part a estimate.
For a hard optimization problem:
1 Write squared error function
2 try function to avoid errors
BTRY/STSCI 4520: Many Topics Covered Briey Computational considerations
Programming strategies + computational complexity Numerical stability
Numerics Optimization
Random number generation Integration (numerical/Monte Carlo)
Statistics Simulation
Boostraps and permutation Nonparametric smoothing
Maximum likelihood and LASSO penalties MCMC
Many tools enable practical, modern statistics.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com