程序代写代做 Bayesian In [ ]:

In [ ]:
using LinearAlgebra
using Statistics
using Random
using PyPlot
using Distributions
In [ ]:
seed = rand(UInt64)
Random.seed!(seed)
@show seed

Prior and data likelihood model ¶

The data $y = (y_1,\ldots, y_n)^T \in \mathbb R^n$ is modeled as $$ y = X\beta + \epsilon $$ where $\epsilon \sim \mathcal N(0,N)$ and $\beta \sim \mathcal N(0,\Sigma)$ are independent.

Suppose that $X$, $\Sigma$ and $N$ are known where the columns of $X$ correspond to basis functions that explain some of the variantion in $y$ (with $\epsilon$ representing the remaining variation in $y$).

Modivation¶

The model we explore here is just an artifical simulation that allows you to explore some aspects of Bayesian inference, regression and signal-plus-noise models. However, to really get a feel for what is going on, it is useful to have a mental toy model or storyline that you can image as you do the analysis.

A good (for me at least) storyline for this $y$ is that represents tempurature readings from a remote sensing device which takes readings at hourly intervals and sends the data via sattelite at weekly intervals. Imagine that your the meterolgyst who invented this remote sensing device and air dropped it somewhere over antarcitca a week ago. So it has been running for the past week ($n = 7*24=$ $7$ days x $24$ hours) and just uploaded it’s data. Further suppose this device is extremely small, light and uses little power. To get the device this small and efficient you needed to give up some measurement accuracy. In this test run your trying to demonstrate to the scientific community that one can use statistics to post-processes the noisy weekely tempurature readings to make accurate hindcasts and forcasts of tempurature.

Set up parameters of the model¶

Dimension of $y$ and $\beta$
In [ ]:
n = 7*24
m = 5

Covariance matrix $N = var(\epsilon)$¶

From lab experiements it was determined that the detector noise is approximately Normal with expected value $0$ and variance $0.2$. Also at hourly intervals the noise appears independent.
In [ ]:
diagN = fill(0.2, n)

However on the last day of recording something happend to the remote sensing equipment which causes it to send back tempurature readings that are extra noisy
In [ ]:
diagN[6*24:end] .*= 50

Here is a plot of the noise variance profile over the past week.
In [ ]:
fig, ax = plt.subplots(1, figsize=(10, 8))
ax.plot(1:n, diagN, label=”noise variance”)
ax.plot(1:n, zeros(n), “k–“)
ax.set_xlabel(“hour”)
ax.set_ylabel(“variance”)
ax.legend()

Now we put diagN on the diagonal of a n \times n matrix
In [ ]:
N = Diagonal(diagN)
N½ = sqrt(N) # == Diagonal(sqrt.(diagN))

Construct $X$¶

The way we will estimate the tempurature is to assume a linear model on hourly temurature. We need terms that represent the daily cycle of highs and low, the weekly mean tempurature and finally two terms that capture a drift of tempuruate (in an attempt to model a heat waves and cold fronts moving in/out of the area). All this is put in the columns of $X$ with $\beta$ representing unknown loadings on these column basis vectors.
In [ ]:
Xcol1 = ones(n) # overall mean
Xcol2 = [i/n for i in 1:n] # drift term 1
Xcol3 = [(i/n)^2 for i in 1:n] # drift term 2
Xcol4 = [sin(2π*7*(i/n)) for i in 1:n] # daily cycle term 1
Xcol5 = [cos(2π*7*(i/n)) for i in 1:n] # daily cycle term 2

Now put them all together as columns of $X$.
In [ ]:
X = hcat(Xcol1, Xcol2, Xcol3, Xcol4, Xcol5)

Covariance matrix $\Sigma = var(\beta)$¶

This is the tricky part (which we will gloss over in this exercise). To do a Bayesian analysis of the coefficients $\beta$ on the columns of $X$ given a weeks worth of data, we need a prior $p(\beta)$. Ideally $p(\beta)$ would describe the ensemble of possible $\beta$ which generate physically realistic weekly tempurature trends (given what we know about the environment of the area). This takes a lot of thought. So suppose you just decide to model $\beta$ as Normal, mean $0$ with a diagonal covariance matrix $\Sigma$ as follows.
In [ ]:
diagΣ = fill(5.0,m)
diagΣ[2] = 0.5
diagΣ[3] = 0.25
Σ = Diagonal(diagΣ)
Σ½ = sqrt(Σ)

Two stage forward simulation ¶

Forward simulation step 1: simulate β¶
In [ ]:
β = Σ½ * randn(m)

Forward simulation step 2: given β simulate y¶
In [ ]:
ϵ = N½ * randn(n)
y = X * β + ϵ

Note: the data you get is just $y$, not $\beta$ or $\epsilon$¶
Both $\beta$ and $\epsilon$ are unkown to you. We have reserved $\beta$ so we can check how well our estimates and posterior samples infer it, but just keep in mind that with real data, you never can look at the true $\beta$ since only nature knows what it is/was.
In [ ]:
noise_free_y = X * β
day_cycle = X[:,4:5] * β[4:5]
week_drift = X[:,1:3] * β[1:3]
In [ ]:
fig, ax = plt.subplots(1, figsize=(10, 8))
t = [i/n for i in 1:n]
ax.plot(t, y, “.”, label=L”observed $y$”)
ax.plot(t, noise_free_y, label=L”noise free $y$”)
#plot(t, day_cycle, label=”day_cycle”)
ax.plot(t, week_drift, label=”week_drift”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature”)
ax.legend()

Fisher MLE estimate of β¶
In [ ]:
#📌##############❓
β̂ =
In [ ]:
#📌##############❓
varβ̂ =

The way Frequentist model uncertainty is to think about the cloud of possible $\hat\beta$ one gets for different noise simulations $\epsilon \sim \mathcal N(0,N)$ with $\beta$ fixed. In a way, this is something like a stability analysis. You got one $\hat\beta$ and want to test how stable/unstable/sensitive it is to different realizations of the noise vector $\epsilon$.

However, to compute a stability anaysis on in-lab simulations you need to stipulate some $\beta$ to generate new hypotheical $y$ simulations. Suppose you just guess some fiducial $\beta$, call it $\beta_{fiducial}$, run the simulations with that and measure the cloud of possible errors.
In [ ]:
βfid = [1.0, -0.4, 0.2, 1.0, 1.0]

Here is one simulation
In [ ]:
sim_yfid = X * βfid + N½ * randn(n)
In [ ]:
fig, ax = plt.subplots(1, figsize=(10, 8))
t = [i/n for i in 1:n]
ax.plot(t, y, “.”, label=L”simulated $y_{fiducial}$”)
ax.set_title(L”A simulated data set (in-lab) with fiducial $\beta$”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature”)
ax.legend()

Here are multiple simulations
In [ ]:
sims_yfid = X * βfid .+ N½ * randn(n, 1000)

Here are the MLE estimates of $\beta_{fiducial}$ for each simulation
In [ ]:
sims_β̂fid = (X’ * (N \ X)) \ (X’ * (N \ sims_yfid))

Here are the cloud of errors $\hat\beta_{fiducial} – \beta_{fiducial}$
In [ ]:
errors_β̂fid = sims_β̂fid .- βfid

Now we imagine that errors_β̂fid models the cloud of possible errors for the data we observed, i.e. that for our $\hat\beta$ the error \hat\beta – \beta behaves like a draw from errors_β̂fid.

Lets look at pairwise plots of the cloud β̂[i] + errors_β̂fid
In [ ]:
i = 2 # choose β coordinate
j = 3 # choose β coordinate

β̂fidistr = latexstring(“\$\\hat\\beta_$i +\$ error sim”)
truβfidistr = latexstring(“true \$\\beta_$i\$”)
β̂fidjstr = latexstring(“\$\\hat\\beta_$j + \$ error sim”)
truβfidjstr = latexstring(“true \$\\beta_$j\$”)

fig, ax = plt.subplots(2, 2, figsize=(10, 8), sharex=”col”) # , sharey=true)
# column 1: βi on the xaxis
ax[1,1].set_xlabel(β̂fidistr)
ax[2,1].set_xlabel(β̂fidistr)
ax[2,1].set_ylabel(β̂fidjstr)
ax[1,1].hist(β̂[i] .+ errors_β̂fid[i,:], bins = 20, alpha = 0.5, label=β̂fidistr, density=true)
ax[1,1].axvline(β[i], color=”r”, label=truβfidistr)
ax[2,1].scatter(β̂[i] .+ errors_β̂fid[i,:], β̂[j] .+ errors_β̂fid[j,:], alpha = 0.1, label=”MLE estimate + error sim”)
ax[2,1].scatter(β̂[i], β̂[j], color=”k”, label=”MLE estimate”)
ax[2,1].scatter(β[i], β[j], color=”r”, label=L”true $\beta$”)
# column 2: βj on the xaxis
ax[2,2].set_xlabel(β̂fidjstr)
ax[1,2].set_xlabel(β̂fidjstr)
ax[1,2].set_ylabel(β̂fidistr)
ax[2,2].hist(β̂[j] .+ errors_β̂fid[j,:], bins = 20, alpha = 0.5, label=β̂fidjstr, density=true)
ax[2,2].axvline(β[j], color=”r”, label=truβfidjstr)
ax[1,2].scatter(β̂[j] .+ errors_β̂fid[j,:], β̂[i] .+ errors_β̂fid[i,:], alpha = 0.1, label=”MLE estimate + error sim”)
ax[1,2].scatter(β̂[j], β̂[i], color=”k”, label=”MLE estimate”)
ax[1,2].scatter(β[j], β[i], color=”r”, label=L”true $\beta$”)

ax[1,1].legend()
ax[1,2].legend()
ax[2,2].legend()
ax[2,1].legend()
fig.tight_layout()

Bayesian posterior inference for $\beta$¶

Posterior on $\beta|y \sim \mathcal N(E(\beta|y), var(\beta|y))$ ¶
In [ ]:
#📌 ################❓
Eβ𝕀y =
In [ ]:
#📌 ################❓
varβ𝕀y =

The way Bayesian think about $p(\beta|y)$ is that simulations from it represented the subset of possible true $\beta$ values, among all possible $\beta$ values quantified by $p(\beta)$), which are consistent with the observation $y$.

Here is one possible value for the true $\beta$… a single simulation from $p(\beta|y)$.
In [ ]:
#📌 ################❓
sim_pβ𝕀y =

Multiple independent simulations from $p(\beta|y)$, stored in the columns
In [ ]:
#📌 ################❓
nβsims = 1000
sims_β𝕀y =

Marginal histograms
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:m
ax.hist(sims_β𝕀y[i,:], alpha = 0.5, label=”β_$i sims”)
ax.axvline(β[i], color=”red”, alpha = 0.5)
end
ax.set_ylabel(“count”)
legend()

Marginal density estimates
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:m
ax.hist(sims_β𝕀y[i,:], alpha = 0.5, density=true, label=”β_$i sims”)
ax.axvline(β[i], color=”red”, alpha = 0.5)
end
ax.set_ylabel(“density”)
legend()

Pairwise plots
In [ ]:
i = 2 # choose β coordinate
j = 3 # choose β coordinate

βistr = latexstring(“\$\\beta_$i\$”)
truβistr = latexstring(“true \$\\beta_$i\$”)
βjstr = latexstring(“\$\\beta_$j\$”)
truβjstr = latexstring(“true \$\\beta_$j\$”)

fig, ax = plt.subplots(2, 2, figsize=(10, 8), sharex=”col”) # , sharey=true)
# column 1: βi on the xaxis
ax[1,1].set_xlabel(βistr)
ax[2,1].set_xlabel(βistr)
ax[2,1].set_ylabel(βjstr)
ax[1,1].hist(sims_β𝕀y[i,:], bins = 20, alpha = 0.5, label=βistr, density=true)
ax[1,1].axvline(β[i], color=”r”, label=truβistr)
ax[2,1].scatter(sims_β𝕀y[i,:], sims_β𝕀y[j,:], alpha = 0.1, label=”posterior sample”)
ax[2,1].scatter(β[i], β[j], color=”r”, label=L”true $\beta$”)
# column 2: βj on the xaxis
ax[2,2].set_xlabel(βjstr)
ax[1,2].set_xlabel(βjstr)
ax[1,2].set_ylabel(βistr)
ax[2,2].hist(sims_β𝕀y[j,:], bins = 20, alpha = 0.5, label=βjstr, density=true)
ax[2,2].axvline(β[j], color=”r”, label=truβjstr)
ax[1,2].scatter(sims_β𝕀y[j,:], sims_β𝕀y[i,:], alpha = 0.1, label=”posterior sample”)
ax[1,2].scatter(β[j], β[i], color=”r”, label=L”true $\beta$”)

ax[1,1].legend()
ax[1,2].legend()
ax[2,2].legend()
ax[2,1].legend()
fig.tight_layout()

Sample the posterior for $\beta_2 + 2\beta_3$ ¶
(which is the derivative of the drift at the last hour of the week)
In [ ]:
#📌 ################❓
sims_∂t1𝕀y =
In [ ]:
fig, ax = plt.subplots(1)
ax.hist(sims_∂t1𝕀y, alpha = 0.5, label=”posterior sims”, density=true)
ax.axvline(β[2] + 2*β[3], color=”r”, alpha=0.5, label=L”true $\beta_2 + 2 \beta_3$”)
ax.set_xlabel(L”$\beta_2 + 2 \beta_3$”)
ax.set_ylabel(“density”)
legend()

Approximate $Pr(|\beta_2 + 2\beta_3| > 1)$ with samples¶
In [ ]:
sum(abs.(sims_∂t1𝕀y) .> 1) / length(sims_∂t1𝕀y)

Find the exact value of $Pr(|\beta_2 + 2\beta_3| > 1)$ via Gaussianity¶
In [ ]:
#📌 ################❓
μ = # <-- E(β2 + 2 * β3 | y) In [ ]: #📌 ################❓ v = # <-- var(β2 + 2 * β3 | y) In [ ]: Z = Normal(0,1) ccdf(Z, 1/√v - μ/√v) + cdf(Z, -1/√v - μ/√v) Sample the posterior for $|\beta_2| + |\beta_3|$¶ In [ ]: #📌 ################❓ sims_abs𝕀y = In [ ]: fig, ax = plt.subplots(1) ax.hist(sims_abs𝕀y, alpha = 0.5, label="posterior sims", density=true) ax.axvline(abs(β[2]) + abs(β[3]), color="r", alpha=0.5, label=L"true $|\beta_2| + |\beta_3|$") ax.set_ylabel("density") ax.set_xlabel(L"|\beta_2| + |\beta_3|") legend() Approximate $Pr(|\beta_2| + |\beta_3| > 1)$ with samples¶
In [ ]:
#📌 ################❓
# compute the estimated probability here

Hindcast $X\beta$¶
In [ ]:
#📌 ################❓
sims_Xβ𝕀y =

Plot the posterior sims of $X\beta$
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(100,nβsims)
ax.plot(t, sims_Xβ𝕀y[:,i],”k”, alpha=0.1)
end
ax.plot(t, y, “.”, label=”observed y”)
ax.plot(t, X * β, label=L”true $X \beta$”)
ax.plot(t, sims_Xβ𝕀y[:,1], “k”, alpha=0.1, label=L”$X \beta$ posterior sim”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature”)
ax.legend()

Here is another plot showing the differences $X\beta – X\beta_{true}$ where $\beta$ is simulated from the posterior
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(500,nβsims)
ax.plot(t, sims_Xβ𝕀y[:,i] – X * β,”k”, alpha=0.1)
end
ax.plot(t, sims_Xβ𝕀y[:,1] – X * β, “k”, alpha=0.1, label=L”(X \beta_{sim}-X\beta_{true})”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature difference”)
ax.legend()

Hindcast drift term X[:,1:3] * β[1:3]¶
In [ ]:
#📌 ################❓
sims_drift𝕀y =

Plot the posterior sims of X[:,1:3] * β[1:3]
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(100,nβsims)
ax.plot(t, sims_drift𝕀y[:,i],”k”, alpha=0.1)
end
ax.plot(t, y, “.”, label=”observed y”)
ax.plot(t, X[:,1:3] * β[1:3], label=”true drift”)
ax.plot(t, sims_drift𝕀y[:,1], “k”, alpha=0.1, label=”drift posterior sim”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature difference”)
ax.legend()

Plot the posterior sims of X[:,1:3] * β[1:3] – the true value
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(100,nβsims)
ax.plot(t, sims_drift𝕀y[:,i] – X[:,1:3] * β[1:3],”k”, alpha=0.1)
end
ax.plot(t, sims_drift𝕀y[:,1] – X[:,1:3] * β[1:3], “k”, alpha=0.1, label=”drift posterior sim”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature difference”)
ax.legend()

Forcast $X\beta$ into next week¶

The extended design matrix $X^\prime$ to predict into the future 1 week is given by
In [ ]:
Ycol1 = ones(n)
Ycol2 = [1 + i/n for i in 1:n]
Ycol3 = [(1 + i/n)^2 for i in 1:n]
Ycol4 = [sin(2π*7*(1 + i/n)) for i in 1:n]
Ycol5 = [cos(2π*7*(1 + i/n)) for i in 1:n]
Y = hcat(Ycol1, Ycol2, Ycol3, Ycol4, Ycol5)
In [ ]:
t′ = [i/n for i in 1:2n]
X′ = [ X
Y ]

Plot the posterior sims of $X^\prime\beta$
In [ ]:
#📌 ################❓
sims_X′β𝕀y =
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(100,nβsims)
ax.plot(t′, sims_X′β𝕀y[:,i],”k”, alpha=0.1)
end
ax.plot(t, y, “.”, label=”observed y”)
#ax.plot(t′, X′ * β, label=L”true $X^\prime \beta$”)
ax.plot(t′, sims_X′β𝕀y[:,1], “k”, alpha=0.1, label=L”extrapolated $X \beta$ posterior sim”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature difference”)
ax.legend()

Forcast drift term X[:,1:3] * β[1:3]¶

Plot the posterior sims of $X^\prime\beta$
In [ ]:
#📌 ################❓
sims_drift′β𝕀y =
In [ ]:
fig, ax = plt.subplots(1, figsize=(14, 8))
for i ∈ 1:min(100,nβsims)
ax.plot(t′, sims_drift′β𝕀y[:,i],”k”, alpha=0.1)
end
ax.plot(t, y, “.”, label=”observed y”)
#ax.plot(t′, X′[:,1:3] * β[1:3], label=”true extended drift”)
ax.plot(t′, sims_drift′β𝕀y[:,1], “k”, alpha=0.1, label=”extrapolated drift posterior sim”)
ax.set_xlabel(L”time $t$ ($t=1$ means one week)”)
ax.set_ylabel(“temperature difference”)
ax.legend()