We will consider a series of questions relating to an application of Bayesian inference to numerical analysis, specically quadrature.
We are going to consider the function
and its denite integral
fx expx2 Z 1
Z fxdx. 1
The function f has no elementary antiderivative, so the calculation of Z is not straightforward. There is a famous method for computing Z with the trick of considering Z2 instead, rewriting the resulting d integral in polar coordinates, and making a convenient substitution. If you havent seen this, its beautiful and worth checking out. The result is
Z p.
We will consider modeling f with a Gaussian process prior distribution: pf GPf; , K,
and conditioning on the following set of data D x, y:
x 2.5, 1.5, 0.5, 0.5, 1.5, 2.5 ; y expx2
0.0019305, 0.1054, 0.7788, 0.7788, 0.1054, 0.0019305 . We will x the prior mean function to be identically zero; x 0.
. First, let us consider the question of model, specically kernel, selection. Consider the following four choices for the covariance function K:
K1x, x0 expx x02
K2x, x0 expx x0
K3x, x0 1 p3x x0 expp3x x0
Note that I am not parameterizing any of these kernels; please consider them to be xed as given.
Each kernel denes a Gaussian process model for the data in a natural way:
pf Mi GPf; , Ki. Consider a uniform prior distribution over these models:
PrMi 13 i 1,2,3.
a Compute the log model evidence for each model given the data D above.
This is commonly credited to Gauss, but the idea goes back at least to Poisson.
b Compute the model posterior PrM D.
c Can you nd a kernel with higher model evidence given the data above? I will award
an extra credit point to the person who provides the kernel with the highest evidence.
. Now lets turn to prediction.
a For each kernel above, plot the predictive distribution over the interval x 2 6, 6. For each model Mi, please plot, in a separate gure, the predictive mean py x, D, Mi and a credible interval. These plots should be the result of a computer program. Please add legends and axes labels, etc., and plot the true function on the same interval for reference. You can take a look through the course materials to get an idea of the sort of plots I am looking for.
b In addition, please write out the predictive mean and standard deviation at x 0 for each of the kernels, py x 0, D, Mi.
c What is the modelmarginal predictive distribution, py x, D? Write this in terms of the modelconditional predictive distribution and the model posterior.
d Assume that the model posterior is uniform; PrMi D 13 for all models i this is not the case, if you are worried about your answer to b. Plot the modelmarginal predictive mean function Ey D over the interval x 2 6, 6.
. Let us consider a simple numerical estimate of the integral using the midpoint rule. Let x be an evenly spaced grid of n points in the interval 6, 6 with spacing , starting with 6 2 and ending with 6 2, and let f fx. Then a midpoint rule estimate of the integral is
Xn Z f xi .
i1
a Show that Z has a Gaussian distribution. What is its mean and variance?
b Take the limit of our belief about Z as ! 0, assuming K can be integrated. Interpret the result in a broader context.
. Now we will consider integration.
Perform Bayesian quadrature to estimate the denite integral R 6 f x dx, using the model
6
M1 from question . What is the predictive mean and standard deviation, pZ D, M1?
Please give a numeric answer. How does this compare with the true answer?
. Finally,wewillconsideradecisionproblem.Supposewehavealreadymadesomeobservations D. How can we select the mostinformative next observation x, fx to make? This is a decision problem where the action space parametrizing the next observation location is the domain,x 2X.
Suppose that we are to estimate Z with a point estimate Z, and that we have selected the
squared loss
Z , Z Z Z 2 .
a GivenasetofobservationsD,whatistheBayesianoptimalaction?Whatistheexpected loss of that action?
b ComputetheexpectedlossoftheBayesianoptimalactionafteraddinganewobservation to D located at a point x. Plot this result as a function of x 2 6, 6. What is the optimal location to measure the function next? By symmetry there may be multiple equivalent answers.
c Condition the function on an observation of the function at the chosen location and plot the predictive distribution as in part a. Recompute the predictive distribution for Z. Did our estimate improve?