Lecture 24 (April 16, 2019)¶
In today’s lecture, we discuss the general idea of variance reduction. We will focus on antithetic method
In [0]:
import numpy as np
import matplotlib.pyplot as plt
import math
import scipy.special as spsp
import scipy.stats as spst
Why is variance reduction important?¶
When we simulate the system, we generate some system outputs, which are nothing but samples from a distribution. For example, we might generate samples from a uniform distribution. Each sample will be considered a system output. Some more complicated examples will be how much time, on average, consumer stays in the system (queueing system), what is the peak number of consumers waiting in the system (queueing system), or the time it takes for us to run out of functional machines (inventory system). In all the cases, the system outputs will follow a specific distribution, we might either know the distribution function or we do not.
One reason for people to draw the samples are to use the mean of the system outputs to approximate the population mean of the distribution of the system output.
In addition, when we have a lot of system outputs, we can rely on the central limit theorem to help us construct a confidence interval for the population. The confidence interval will follow:
$(\bar{x} -z_{1-\alpha/2}\frac{s}{\sqrt{n}}, \bar{x} + z_{1-\alpha/2}\frac{s}{\sqrt{n}})$
where $\bar{x}$ is the mean of all the system outputs,
$z_{1-\alpha/2}$ is the critical value corresponding to $\alpha$. If we are interested in the 95% confidence interval, $z_{1-\alpha/2}$ will be equal to 1.96.
$s$ is the sample standard deviation of all of the system outputs we have
$n$ is the number of system outputs we have.
Our goal is to find a narrower confidence interval, so that we will have more trust in our simulation result.
Strategies to have a narrow confidence interval¶
In order to have a narrower confidence interval, we have two options
• Use a higher $n$. This requires more rounds of simulation.
• Have a lower $s$. This requires us to generate system outputs from another distribution that has the same expected value as the distribution of the current system but with a lower standard deviation.
Practice [first strategy]¶
For example, we might be interested in constructing a 95% confidence interval for the mean of Unif(0,1) based on sampling.
Let’s try both n=100 and n=1000.
In [0]:
#draw samples
size1=100
samples1=np.random.rand(size1)
size2=1000
samples2=np.random.rand(size2)
#generate confidence interval
mean1= np.mean(samples1)
var1= np.var(samples1,ddof=1)
lower1= mean1-1.96*var1**0.5/size1**0.5
upper1= mean1+1.96*var1**0.5/size1**0.5
mean2= np.mean(samples2)
var2= np.var(samples2,ddof=1)
lower2=mean2-1.96*var2**0.5/size2**0.5
upper2=mean2+1.96*var2**0.5/size2**0.5
print(“Confidence interval based on 100 samples:(“, lower1, “,” , upper1,”) “)
print(“Confidence interval based on 1000 samples:(“, lower2, “,” , upper2,”) “)
print(“width of the first CI: “, upper1-lower1)
print(“width of the second CI: “, upper2-lower2)
Confidence interval based on 100 samples:( 0.47489922410544366 , 0.5908748032085488 )
Confidence interval based on 1000 samples:( 0.4798765302929112 , 0.5159948676196072 )
width of the first CI: 0.11597557910310513
width of the second CI: 0.036118337326696004
Practice [second strategy]¶
Again, let’s assume that we are interested in constructing a 95% confidence interval for the mean of Unif(0,1) based on sampling.
Assume that we know that unif(0.4, 0.6) as the same population mean as Unf(0,1) and we know how to sample from this distribution. We would simply sample from Unif(0.4,0.6) to construct the confidence interval.
In [0]:
#draw samples
size=100
samples1=np.random.rand(size)
a=0.4
b=0.6
samples2= a+(b-a)*np.random.rand(size)
#generate confidence interval
mean1= np.mean(samples1)
var1= np.var(samples1,ddof=1)
lower1= mean1-1.96*var1**0.5/size**0.5
upper1= mean1+1.96*var1**0.5/size1**0.5
mean2= np.mean(samples2)
var2= np.var(samples2,ddof=1)
lower2=mean2-1.96*var2**0.5/size**0.5
upper2=mean2+1.96*var2**0.5/size**0.5
print(“Confidence interval based on 100 samples:( “, lower1, “,” , upper1,”) “)
print(“Confidence interval based on 1000 samples:( “, lower2, “,” , upper2,”) “)
print(“width of the first CI: “, upper1-lower1)
print(“width of the second CI: “, upper2-lower2)
Confidence interval based on 100 samples:( 0.4488362076012605 , 0.5598997596671492 )
Confidence interval based on 1000 samples:( 0.4908713048983199 , 0.5120962455972926 )
width of the first CI: 0.11106355206588875
width of the second CI: 0.021224940698972683
Antithetic method¶
Antithetic method is one of the methods we could use to find a distribution that has the same mean but with a much smaller variance comparing to the distribution of our original system output.
For the regular method, we generate $x_1$, $x_2$, $x_3$, $\dots$ , $x_n$ independently.
The expected value and the variance for the sample mean are:
$E(\frac{x_1+x_2+\dots + x_n}{n})=\frac{1}{n}(E(x_1)+E(x_2)+ \dots ++E(x_n) )=\frac{1}{n} (\underbrace{\mu+\mu+\dots+\mu}_\text{n elements})=\mu$
$Var(\frac{x_1+x_2+\dots + x_n}{n})=\frac{1}{n^2}(Var(x_1)+Var(x_2)+\dots+Var(x_n))=\frac{1}{n^2}(\underbrace{\sigma^2 + \sigma^2 + \dots + \sigma^2}_\text{n elements})=\frac{\sigma^2}{n}$
Antithetic method is suggesting that we can split all the output into pairs: $(x_1,x_2), (x_3,x_4), …, (x_{n-1}, x_n)$. We will still assume that all the system outputs are generated from the same distribution, but we are able to generate $x_1$, $x_2$, $x_3$, $\dots$ , $x_n$ in a way such that two elements within a pair are negatively correlated, while elements in different pairs are independent. We can then do the following
Step 1:
generate $y_1=\frac{x_1+x_2}{2}, y_2=\frac{x_3+x_4}{2}, y_3=\frac{x_5+x_6}{2}, \dots, y_{\frac{n}{2}}=\frac{y_{n-1}+y_n}{2}$
Step 2:
The expected value of the sample mean can be approximated using $\bar{y}$
The variance of the sample mean can be approximated using $\frac{\text{sample variance of } y}{n/2}$ (we are using $n/2$ because there are only $n/2$ number of $y$). The variance of the sample mean in this case is smaller than $\frac{\sigma^2}{n}$.
Practice¶
In [0]:
#draw samples (this array contains x1, x3, x5, .. x99)
size=50
samples= np.random.rand(size)
## this array contains x2, x4, x6, …, x100
samples_anti= 1-samples
## generate y
y= (samples+samples_anti)/2
#generate confidence interval
mean1= np.mean(y)
var1= np.var(y, ddof=1)
lower1= mean1-1.96*var1**0.5/size**0.5
upper1= mean1+1.96*var1**0.5/size**0.5
print(“Confidence interval using antithetic method 🙁 “, lower1, “,” , upper1,”) “)
Confidence interval using antithetic method 🙁 0.5 , 0.5 )
Prove that this is a valid method¶
In order for this method to be valid, we need the following to be true
(1) The expected value of the sample mean of $y$ is the same as $\mu$
(2) $\frac{\text{ variance of } y}{n/2}$ is smaller than $\frac{\sigma^2}{n}$.
Proof the expected value of $\bar{y}$ is still $\mu$
$E(\frac{y_1+y_2+\dots+y_{n/2}}{n/2})=\frac{2}{n}(\underbrace{E(y_1)+E(y_2)+ \dots E(y_{n/2})}_{\frac{n}{2} \text{ terms}})$
Since $E(y_i)=E(\frac{x_{2i-1}+x_{2i}}{2})=\mu$, we have
$$E(\frac{y_1+y_2+\dots+y_{n/2}}{n/2})=\frac{2}{n}(\underbrace{\mu+\mu+ \dots \mu)}_{\frac{n}{2} \text{ terms}})=\mu$$
Proof that the variance of $\bar{y}$ (with $n/2$ samples) is smaller than $\bar{x}$ (with $n$ samples)
$Var(\frac{y_1+y_2+\dots+y_{n/2}}{n/2})=\frac{4}{n^2}(var(\frac{x_1+x_2}{2})+var(\frac{x_3+x_4}{2})+\dots+var(\frac{x_{n-1}+x_n}{2}))=\frac{1}{n^2}(var(x_1+x_2)+var(x_3+x_4)+\dots+var(x_{n-1}+x_n))$
If elements within pairs are independent, we will have $E(\frac{y_1+y_2+\dots+y_{n/2}}{n/2})=\frac{n\sigma^2}{n^2}=\frac{\sigma^2}{n}$.
However, since elements within the pairs are negatively correlated, we will have $var(x_1+x_2)=2*\sigma^2-2cov(x_1,x_2)<2\sigma^2, var(x_3+x_4)=2*\sigma^2-2cov(x_1,x_2)<2\sigma^2,...$.
Thus, $E(\frac{y_1+y_2+\dots+y_{n/2}}{n/2})<\frac{\sigma^2}{n}$
For example, in the previous example, we have samples and samples_anti sharing perfect negative correlation.
Which means $Var(x_1+x_2)=Var(x_1)+Var(x_2)+2*Cov(x1,x2)=\sigma^2+\sigma^2-2\sigma^2=0$. This is guessing $y$ actually becomes a fixed value, which is simply the population mean of the distribution.
Summary¶
To re-capture, if we can generate $x_1$, $x_2$, $x_3$, ... still from the same distribution. However, if we can find a way to generate $x_2$ based on $x_1$, $x_4$ based on $x_3$ such that negative correlation. We will then generate $y_1=\frac{x_1+x_2}{2}, y_2=\frac{x_3+x_4}{2}, y_3=\frac{x_5+x_6}{2}, \dots, y_{\frac{n}{2}}=\frac{y_{n-1}+y_n}{2}$. We will use sample mean of $y$ and $\frac{\text{sample variance} (y)}{n/2}$ as our approximation for the expected value of $\bar{x}$ and the variance of ${\bar{x}}$.
The reason this method works is that the distribution of $y$ will follow the same (unknown) population mean but a much smaller variance. Thus, estimating the variance of the population mean using $y$ will give us a much smaller value (Thus a much narrower confidence interval).
Putting this in application¶
Useful Case 1:
If samples are drawn using inverse transform method, we can generate $u\sim unif(0,1)$ and use this $u$ to generate the $x_1, x_3, \dots $. We then use $1-u$ to generate $x_2, x_4, \dots$. These two samples will be from the same target distribution, but will be negatively correlated. Thus, we could use anti-thetic method.
Practice¶
• Generate 1000 samples of $x$ from Exp(0.1). Construct the 95% confidence interval for the mean of the distribution.
• Generate 500 samples of $x$ from Exp(0.1). Also generate the antithetic parts of these 500 samples. Construct the 95% confidence interval for the mean of the distribution.
In [0]:
size = 50
lmbda = 0.1
u = np.random.rand(50)
samples = -1/lmbda*np.log(u)
anti_samples = -1/lmbda*np.log(1-u)
y = (samples+anti_samples)/2
sample_mean = np.mean(y)
sample_var = np.var(y,ddof=1)
upper = sample_mean+1.96*sample_var**0.5/size**0.5
lower = sample_mean-1.96*sample_var**0.5/size**0.5
print(upper,lower)
print(upper-lower)
11.420185555883695 8.60764727239377
2.812538283489925
In [0]:
#original method
lmbda= 0.1
size=100
u=np.random.rand(size)
samples=- 1/lmbda*np.log(u)
#antithetic method
size1=50
u= np.random.rand(size1)
samples_original=-1/lmbda*np.log(u)
samples_anti=-1/lmbda*np.log(1-u)
y=(samples_original+samples_anti)/2
#generate confidence interval
mean1= np.mean(samples)
var1= np.var(samples, ddof=1)
lower1= mean1-1.96*var1**0.5/size**0.5
upper1= mean1+1.96*var1**0.5/size**0.5
mean2= np.mean(y)
var2= np.var(y, ddof=1)
lower2= mean2-1.96*var2**0.5/size1**0.5
upper2= mean2+1.96*var2**0.5/size1**0.5
print("Confidence interval based on regular method:( ", lower1, "," , upper1,") ")
print("Confidence interval based on anti-thetic method:( ", lower2, "," , upper2,") ")
print("width of the first CI: ", upper1-lower1)
print("width of the second CI: ", upper2-lower2)
Confidence interval based on regular method:( 9.209767804532513 , 14.323741346783626 )
Confidence interval based on anti-thetic method:( 8.985863763014528 , 10.443547273732204 )
width of the first CI: 5.113973542251113
width of the second CI: 1.4576835107176755
Useful Case 2:
When x is generated from a normal distribution $N(\mu, \sigma^2)$, $2\mu-x$ will be from the same normal distribution and negatively correlated with $x$. Thus, we can use $x$ and $2\mu-x$ as pairs.
Practice¶
Prove that $x$ and $2\mu-x$ are valid antithetic parts?
In [0]:
Efficiency of the method¶
In order for this method to work, the cost to generate the correlated part should not be very high. Otherwise, we might as well just increase our sample size to achieve variance reduction.
This method requires negative correlation between two elements inside a pair. Positive correlation inside the pair will lead to variance increase.
In [0]: