Stochastic Search and Optimisation
Assignment 4: Fitting a Gaussian Process using Stochastic Gradients REVISED VERSION
You will need to submit two files: a programme file titled YOUR NAME assign4.r or .py and a report as a pdf file titled YOUR NAME assign4.pdf.
The report should be typed in a 12 point font and should be no more than four pages long (ex- cluding figures). You must express yourself in your own words and acknowledge your sources: see the university rules on academic misconduct
Copyright By PowCoder代写 加微信 powcoder
Your code must be submitted as a single file, and must be properly documented, including clear instructions on how to run the code. Data files may be included separately. See the module website for restrictions on the external libraries you can use, and note that jupyter notebooks are not accepted.
Note that your assessment is based on the report! You must submit your code, and I will look at it to check how well you have documented it, but I expect you to explain what you have done and present your results in the report. I will usually only run your code if it is not clear what is going on from the report, in which case you will probably be losing marks. Of course, should I choose to run your code, it must be capable of reproducing the results presented in the report.
“The Tropical Rainfall Measuring Mission (TRMM) is a joint mission between NASA and the Japan Aerospace Exploration (JAXA) Agency to study rainfall for weather and climate research. The TRMM satellite ended collecting data on April 15, 2015 (see http://pmm.nasa.gov/ trmm/mission-end). Launched in late November 1997, with a design lifetime of 3 years, the TRMM satellite produced over 17 years of valuable scientific data. TRMM carried 5 instruments: a 3-sensor rainfall suite (PR, TMI, VIRS) and 2 related instruments (LIS and CERES). TRMM delivered a unique 17-year dataset of global tropical rainfall and lightning. The TRMM dataset became the space standard for measuring precipitation, and led to research that improved our understanding of tropical cyclone structure and evolution, convective system properties, lightning-storm relationships, climate and weather modeling, and human impacts on rainfall. The data also supported operational applications such as flood and drought monitoring and weather forecasting.”1
In this assignment we will fit a Gaussian process to TRMM rainfall data for August 2016, given in the file TRMM 3B43M 2016-08-01 rgb 1440×720.csv, available on the course website. This file is exactly as provided on the TRMM website (accessed 2022-03-18) https://neo.gsfc.nasa.gov/view.php?datasetId=TRMM_3B43M A colour-map of the data is given in Figure 1.
The TRMM data are from a Mercator projection of the globe, however we will ignore this fact and treat it as R2 data. That is, we will suppose that the (x, y)-co-ordinate for the observation in row i column j is just (j, i) (ignoring the unmeasured parts of the globe).
Important: to avoid numerical overflow you should scale down the data by a factor of 100. 1 https://trmm.gsfc.nasa.gov/
Figure 1: Monthly rainfall for August 2016 from the TRMM satellite.
1. (1 mark) Figure 2 gives an estimate of the semivariogram for our data, in the x-co-ordinate direction. Let z(a) be the observation at a and put Ar = {(a, b) : a = b ± (r, 0)} for r = 1,2,…, then the estimate for γ((r,0)) is
γˆ((r,0))=1 1 (z(a)−z(b))2 2 |Ar| (a,b)∈Ar
Based on this figure, which of the following semivariograms is most appropriate for our data?
spherical
exponential
rational quadratic Gaussian
2. (2 marks) Estimate the semivariogram in the y-co-ordinate direction, that is at points (0,r) for r = 1,2,…,200. Just give a plot of your estimate in the report.
Explain why this indicates the semivariogram is anisotropic.
We will fit an anisotropic version of the semivariogram you identified in Question 1. Using the notation from the lectures, to create an anisotropic version of your semivariogram do the following:
For h = (x, y), replace ∥h∥ with
Replace the scaling constant c with
x2 + φy2. x2 + c2y2
c1 x2+y2 .
So your semivariogram now has four parameters, which we collect together into a vector θ = (c1, c2, a, φ).
0 100 200 300 400 500 600 700
Figure 2: Estimated semivariogram in the x-co-ordinate direction.
This approach assumes that we have range and sill anisotropy that are aligned with our existing co-ordinate system. For a more general approach to anisotropy see for example Eriksson & Siska (2000)2
3 (3 marks) Use least squares to estimate a and c1 from the estimated semivariogram in the x-co-ordinate direction.
Fixing these parameters, use least squares to further estimate φ and c2 from the estimated semivariogram in the y-co-ordinate direction.
The following code may be useful:
loss1 <- function(theta, x, z) {
fx <- theta[1]*(1 - exp(-x/theta[2]))
sum((z - fx)^2)
theta1 <- optim(c(1.5, 100), loss1, x = 0:720, z = gamma_x)$par
Here gamma_x is a vector of length 721 giving the estimated semivariogram at points (0, 0), (1, 0), (2, 0), . . . , (720, 0) and fx calculates the exponential semivariogram at those points. A plot of the fitted function over the estimated semivariogram is given in Figure 3.
When writing this up you should give the form of the semivariogram restricted to each co-ordinate direction, and give plots of the estimated and fitted semivariograms in each co-ordinate direction.
2Eriksson, M. and Siska, P., Understanding Anisotropy Computations. Mathematical Geology 32, 2000.
gamma(||h||)
0.0 0.5 1.0 1.5
0 100 200 300 400 500 600 700
Figure 3: Estimated semivariogram in the x-co-ordinate direction, with least squares fit using the exponential variogram.
NEW CONTENT STARTS HERE
We now look at the problem of finding a Gaussian process to match our data. Unfortunately the sill anisotropy in the semivariogram—expressed by the direction dependence of the scaling constant c—means that our process can not be stationary. (Moreover it also seems that the presence of sill anisotropy also rules out a process with stationary increments, which we consid- ered in the previous version of the assignment.) Sill anisotropy can be caused by a non-constant mean, which turns out to be the case here.
Inspecting the data we fit a mean of the form EZ(x,y) = f(y) where Z(x,y) is the value of the process at (x,y) and f is a smooth function of y-co-ordinate. We then subtract the fitted mean to give a process W, which we hope will be stationary. We also scaled the process down by a factor of 100 to avoid numerical overflow later.
z <- as.matrix(read.csv(file = "TRMM_3B43M_2016-08-01_rgb_1440x720.csv",
header = FALSE, skip = 160, nrows = 400))
image(0:1440, 0:400, t(log(z)), col = hcl.colors(10, palette="Blues", rev=TRUE),
useRaster=TRUE, asp=1, xlab="", ylab="")
z.smooth <- smooth.spline(rowMeans(z), nknots = 10)$y
plot(1:400, rowMeans(z), xlab="y co-ord", ylab="mean", col="blue")
lines(1:400, z.smooth, col="red", lwd=2)
w <- (z - z.smooth)/100
image(0:1440, 0:400, t(w), col = hcl.colors(10, palette="Reds", rev=TRUE),
useRaster=TRUE, asp=1, xlab="", ylab="")
gamma(||h||)
0.0 0.5 1.0 1.5
0 500 1000 1500
0 100 200 300 400
Note that in the plots above the y-co-ordinate has been flipped (recall that we labelled (0, 0) the top-left corner).
0 200 400 0 200 400
50 100 200 300
If we look at the semivariogram of the de-meaned observations w in the x and y co-ordinate directions we see that the sill anisotropy has gone, though the range anisotropy—expressed by the direction dependence of the constant a—remains.
gamma_x <- semivariogram_x(w)
gamma_y <- semivariogram_y(w)
plot(c(0, max(length(gamma_x), length(gamma_y))),
c(0, max(gamma_x, gamma_y)), type = "n", xlab = "", ylab = "")
lines(0:(length(gamma_x)-1), gamma_x, col = "red")
lines(0:(length(gamma_y)-1), gamma_y, col = "blue")
0 200 400 600
We will model W as a stationary Gaussian process with mean 0 and covariance function Σ(u, v) = Σ0 − γA(u − v)
where Σ0 = Σ(u, u) and γA is an anisotropic semivariogram obtained by taking an isotropic semivariogram (in our case either the spherical, exponential, rational quadratic, or Gaussian)
and replacing the ∥h∥/a term with a term of the form h2x /a2x + h2y /a2y , where h = (hx , hy )T . Note that this is a special case of replacing ∥h∥2 with hT Ah where A is a positive semi-definite
matrix. In this case a = ax and
10 A= 0 a2/a2 .
In addition we will assume that there is no nugget (this is reasonable from looking at the data plot, which is smooth), which means that the semivariogram scaling constant c will be equal to Σ0. We will estimate Σ0 directly using the sample variance of w
Si00 <- var(as.vector(w))
Given Σ0 and thus c, our covariance function now depends only on the parameters ax and ay. 6
0.0 0.5 1.0 1.5
4 (2 marks) Write a function CovMat(x, theta) that takes an n × 2 matrix x with row i representing a point xi ∈ R2, and a parameter vector θ, and returns the covariance matrix Σ with (i,j)-th entry Σ(xi,xj).
Give your code in the report. Your function will also need the fitted value of Σ0; you can pass this as a parameter or hard-code it.
5 (2 marks) Here is some code for calculating the minus log likelihood of W observed at points x
minuslogL <- function(theta, Si00, x, y) {
# returns the essential bit of the minus log likelihood for a Gaussian process
# where we have observations y at points x (one point per row)
# the mean is 0 and the covariance matrix is given by CovMat(x, theta, Si00)
Si <- CovMat(x, theta, Si00)
return(as.vector(log(det(Si)) + t(y)%*%solve(Si, y)))
Use this to write a function minuslogLest that randomly samples k points in {1, . . . , 1440}× {1, . . . , 400}, and returns the minus log likelihood obtained using the observations at these points. Our function is a (shifted and scaled) approximation of the minus log likelihood for the whole sample.
Give your code in the report.
6 (8 marks) Use stochastic gradient descent to find θ that minimises the expected value of the function constructed in Question 5.
Don’t give your code in the report, but do explain your choices of
Sample size k for estimating the minus log likelihood. Starting point.
The sequences an and cn.
Stopping condition.
You should also plot traces of your estimated parameters. Hint: use common random numbers.
Remark: my final estimates for ax and ay were quite different to those obtained from least squares fitting with the estimated semivariogram (as in Question 3).
2 marks are reserved for presentation, clarity of expression, and documentation of code.
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com