Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Statistical error of Monte Carlo estimates
Now that you’ve had some exposure to Monte Carlo simulation, we turn to the issue of the size of the variability in estimates from one run to another and how this varies with the number of repetitions of the random experiment in each run. This variability is called the statistical error in the estimate.
It is di↵erent from other kinds of error we shall meet shortly, which are those relevant for deterministic problems. In those cases, the answer is the same every time we run the calculation — but it’s not the correct answer! Such errors are also present in stochastic estimates but they are dominated by the statistical errors (otherwise we wouldn’t regard them as stochastic).
We call each simulated random experiment an instance or realization. The whole simulation consists of repetitions of such instances. Finally, we can run the simulation many times.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Variability of random samples
To explain how Monte Carlo estimates vary, we model the estimates from each repetition as observations of some quantity derived from the same random experiment, with each observation being independent of the rest.
We now treat the observations from each repetition in the simulation just as one would treat observations of an actual experiment using the tools of statistics. Since typically it is easy to generate many repetitions in a simulation, it will be su cient to use statistical methods suitable for large sample sizes. This will allow us to construct confidence intervals i.e. error bars, by using the power of the Central Limit Theorem.
We refer to the document ‘StatsNotes.pdf’ for details and proofs, but describe the key ideas here.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Random variables
In any simulation, we could examine many possible properties of interest. A random variable is a way of summarizing the outcome of a random experiment so we can focus on some particular aspect of it.
Example: Assume we throw two dice. There are 36 outcomes in the sample space but we may only be interested in the total of the two faces showing, or the maximum value showing or just in whether we threw a ‘double’ or not. So we can define 3 di↵erent random variables acting on the same sample space:
X1(!) = total of two faces showing X2(!) = maximum face showing
X3(!) = 1 if ! is a double, 0 otherwise.
X3 is called an indicator random variable because it just indicates whether a ‘double’ has occurred.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Definition of a random variable
A random variable is a function from the sample space ⌦ of a random experiment to the real numbers R. Random variables are usually denoted by upper case letters X,Y,Z,…. So if X : ⌦ ! R is a random variable, X assigns a real number X(!) to each outcome ! 2 ⌦.
Example: Consider the random experiment of tossing a coin 3 times and observing the sequence of results. If Y = the number of heads obtained, then Y is a random variable.
! Y(!) hhh 3 thh 2 hth 2 hht 2 htt 1 tht 1 tth 1 ttt 0
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Discrete versus continuous
In this example, the range of Y is Y(⌦) = {0,1,2,3}. This is an example of a discrete random variable which means that its values are only a finite or countably infinite set. Usually this will happen when the sample space itself is a finite or countably infinite set.
If the sample space is uncountably large and the set of values of X is an interval (a, b) ✓ R, then X is a continuous random variable.
Example: Some common discrete random variables: Bernoulli, binomial, geometric, Poisson
Example: The most common continuous random variable: normal.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Probability mass function
Since a random variable X assigns to each outcome in ⌦ a number, and each outcome has some probability, it follows that each value in the range of X has a probability. The probability distribution of a discrete random variable X is defined by its probability mass function (pmf)
pX(x)=Pr(X =x). It has the following properties:
pX (x ) is non-zero at only a finite or countably infinite set of x
values, say either x1,x2,…,xn or x1,x2,…;
pP(x) 0; X
x pX(x)=1.
where the sum is over all possible values of x.
Any function satisfying these conditions is said to be a probability mass function.
Continuous random variables have a similar function, called the probability density function, but where we integrate over a range of possible values.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Independent random variables
We can extend the idea of independence from events to random variables.
Random variables X and Y are independent if any event defined using X is independent of any event defined using Y
i.e. foranysetsAandB,theevents{X 2A}and{Y 2B}are independent.
Example: for independent experiments such a tossing a coin and rolling a die at the same time, any random variable describing the coin toss will be independent of any random variable describing the die roll.
Now consider repeated random experiments of the same kind that are independent of each other
Example: sampling from a large population with replacement (so the population is identical at each sampling)
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Random sample from a distribution
We say a random sample from a distribution is a sequence of mutually independent random variables X1, X2 . . . Xn, with the same distribution. Also called an independent identically distributed (iid) sequence.
Given a random sample, we can estimate the pmf by pˆ ( x ) = | X i = x |
n
We hope that pˆ(x) ! pX (x) as the sample size n ! 1 in accordance
with our notion of probability as a long-run frequency.
Example: When you simulate your random experiment n times (n repetitions), you are generating a random sample of size n from a population of possible simulation runs the size of the period of the random number generator. So, unless you run your simulation an awful lot or your random number generator is poor, you can safely regard your repetitions as independent observations.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Expectation
Faced with the varying results from many instances of a random experiment, we often want to know the average behaviour. This is given by the expectation or expected value of a random variable.
Let X be a discrete random variable with set of possible values D and pmf pX (x). Then the expected value or mean value of X denoted by
E(X)=μX =Xx·pX(x). x2D
Thus E(X) is the weighted sum of the values of X where the weight of x is its probability pX (x ). It represents the centre of mass of the probability distribution.
Example: If Y is the number of heads in three coin tosses,
E(Y ) = 0 · 1 + 1 · 3 + 2 · 3 + 3 · 1 = 12 = 3 . 888882
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Expectation is linear
Generally if X is a discrete random variable with pmf pX (x) and h(x) is any function then the expected value of the discrete random variable
h(X) is
X
E[h(X)] = h(x) · pX (x). x2D
If the function h is linear i.e. h(x) = ax + b, then we have the special property that
E(h(X)) = h(E(X)) but this is usually not true!
If a, b 2 R are constants
E(aX +b)=aE(X)+b.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Variance is not linear
We also want to know how much spread there is about the mean μ.
The variance of X denoted var(X) or X2 or just 2 provides a mea- sure of spread or variability or dispersion. It is defined by
var(X) = X(x μ)2 · pX (x) = E((X μ)2) x2D
and measures how close is the distribution to its mean.
If all possible values of X are near μ then var(X) is small, while if the spread is large so is var(X).
var(aX+b)=a2· X2 =a2var(X)
Notice that var(X + b) = var(X ) reflecting the fact that the variance is unchanged by a simple translation.
So var(aX + b) is quadratic in a but not a function of b.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Standard deviation
The standard deviation of X is
= X =pvar(X)=q X2.
We often use the standard deviation as a scale for the variation in X since it has the same units as X.
From the properties of the variance, it follows that
aX+b =|a|· X.
We will use a suitable standard deviation to measure the statistical error of our Monte Carlo estimates.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Statistics
Consider taking a random sample of size n from a population. Before the sample we are uncertain about what the value of each of the n observations will be. The first observation must be considered as a random variable X1, the second observation another random variable X2 and so on.
After taking a sample, each observation is a number
x1,x2,…,xn
These numbers are the observed values of the Xi . We assume the population distribution is not known and we want to make inferences about this distribution.
We next have to decide what information about the population we want and how to find it from some function of the random variables X1 . . . Xn — called a statistic. A statistic that is used to estimate a parameter or characteristic of a population is called an estimator. We use the notation ✓ˆ to represent a statistic which estimates a parameter ✓.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Estimators
For a statistic ✓ˆ to estimate ✓ to be useful, it should be:
1 accurate
the expected value of the statistic E(✓) should be close to the parameter ✓
2 precise ˆ
ˆ
var(✓) not too big In particular,
ˆˆ
an estimator ✓ is unbiased if E(✓) = ✓
Usually we prefer unbiased estimators unless this causes a big price to be ˆ
paid in terms of var(✓).
In the context of simulations, each random experiment is an observation and the relevant property of each instance is an observed value.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Sample mean
An important statistic in simulations is the sample mean. The sample mean is the random variable defined by
1 Xn X=n Xi
i=1
which is a natural estimator μˆ of the population mean μ.
After the sample is taken the corresponding point estimate for the population mean is the single number
1 Xn x=n xi
i=1
which is the observed value of X in the particular sample.
The sample mean is generally how we compute the average behaviour of a random experiment, by combining the observed values from each instance.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
The sample mean is unbiased
The important result:
is true for any random variables, discrete or continuous, provided the expected values exist. Combining this with the linearity result shows that
E(a1X1 +···+anXn)=a1E(X1)+···+anE(Xn). If each E (Xi ) = μ (as is the case for a random sample), then
1Xn 1Xn 1
E(X)= n E(Xi)= n μ= n(nμ)=μ
i=1 i=1
So the expected value of the sample mean (an estimator of the
parameter μ) is just μ itself.
E (X + Y ) = E (X ) + E (Y )
The sample mean is an unbiased estimator of the population mean.
But how precise is it?
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Variance of the sample mean
For independent random variables X,Y var(X +Y)=var(X)+var(Y)
Now we can find var(X).
1Xn 1Xn 1Xn 1
var(X) = var(n Xi) = n2 var(Xi) = n2 X2 = n X2 i=1 i=1 i=1
If X1,…,Xn is a random sample (independent and identically dis- tributed) from a distribution with variance X2 , then
v a r ( X ) = 1 X2 n
This means that the variability of the sample mean falls as the sample size increases. A large sample size gives a more precise estimate (as you would expect).
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Sample proportion
An important special case is where the random variables Xi are indicator variables of some property of the population. Then E (Xi ) = p, the probability of observing that property in the population. The sample mean of these indicator variables is just the sample proportion Pˆ (the proportion of trials with the property in question).
As above, we get
ˆ 1Xn 1Xn 1Xn 1
E(P)=E(n Xi)= n E(Xi)= n p= n(np)=p i=1 i=1 i=1
The sample proportion is an unbiased estimator of the popula- tion proportion (probability).
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Variance of sample proportion
The sample proportion is special because we know everything about its
variance, justPby knowing its mean!
Since Pˆ = 1 n Xi where the Xi are Bernoulli random variables: n i=1
ˆ 1Xn 1Xn 1 p(1 p) var(P)=var(n Xi)= n2 var(Xi)= n2 ·np(1 p)= n
i=1 i=1
If X1,…,Xn is a random sample of some property with probability p, then
var(Pˆ) = p(1 p) n
Again, the variability of the sample proportion falls as the sample size increases. This explains qualitatively the observations from Lecture 5.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Standard errors
The standard deviation of an estimator is also called the standard error. It is the standard error that gives a measure of the statistical error.
X pn
Example:
Note that we don’t know X without further work. q
The sample mean has standard error
Example:
We know this once we have an estimate for p.
p(1 p) n
The sample proportion has standard error
To get a more quantitative description, we must know the distribution of the estimator.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Central Limit Theorem
This is where we use large sample statistics — if the random sample is large enough, then any sum of random variables is well-approximated by a normal random variable. The sample mean (and sample proportion) are special cases.
Let X1,…,Xn be a random sample from a distribution with mean μ and variance 2. If n is su ciently large, then X has approximately a normal distribution with mean μ and variance 2/n.
Thus approximately X =d N(μ, 2/n) for large n, no matter what the distribution of X !!.
This remarkable result explains the widespread occurrence of the normal distribution in many di↵erent circumstances — any random variable that arises as the sum of many independent random variables is approximately normally distributed.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Confidence intervals
Suppose we have an estimator ✓ˆ of a parameter ✓ with a known sampling distribution. Then we can find a random interval (L,U) which has a fixed probability 1 ↵ of including the fixed but unknown parameter ✓.
Pr(✓ ✏[L,U])=1 ↵
We call (L, U ) the 100(1 ↵)% confidence interval (CI).
We will take ↵ = 0.05 i.e. a 95% (two-sided) confidence interval. Example: For a 95% CI, we want Pr(✓ˆ < L) = 0.025 and
Pr (✓ˆ < U ) = 0.975. L and U are each random variables
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Coverage
Example: For a normal RV with variance 2, L = μ 1.96 and U = μ + 1.96
Now we take a sample {x1 ...xn} to find an observed value of ✓ and hence observed values of L and U — call them l and u.
We also call the interval (l,u) a 100(1 ↵)% confidence interval (CI) for ✓ BUT IT IS NOT TRUE that
Pr(l < ✓ < u) = 1 ↵
since ✓ is fixed — either ✓ is in (l,u) or it isn’t!
But if you repeated this process many times, the set of intervals (li , ui ) would contain ✓ about 100(1 ↵)% of the time.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Confidence interval for the sample proportion
For n large enough we have
Pˆ⇠d N(p,p(1 p))
n
Then the approximate 95% confidence interval for p is pˆ 1.96·rpˆ(1 pˆ) , pˆ+1.96·rpˆ(1 pˆ)!.
nn where pˆ is the observed value of Pˆ.
Example: Confidence intervals for de M ́er ́e’s bet.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Confidence interval for the mean
The CLT tells us that approximately X =d N(μ, 2/n) for large n.
If don’t know the variance 2, we will have to estimate the variance of X using an estimator S2, the sample variance. Then for large n,
X μd
Z = S/pn ⇠N(0,1).
In this circumstance we can use the observed values x and s of X and S to get confidence intervals. For example,
for a two-sided 95% confidence interval we would have
✓ss◆ x 1.96· pn , x +1.96· pn .
So to construct a CI for the mean, we need an estimate of the sample variance S2.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Sample variance
The sample variance is defined as
1 Xn
n=1
S2 = n 1 (Xi X)2 n=1
1"Xn 2# = n 1 Xi2 nX .
The sample variance S2 is an estimator ˆ2 for the population variance 2. Again after the sample is taken the observed value of S2 is the single number given by
1 Xn
s2 = n 1 (xi x)2
n=1
1"Xn #
= n 1 xi2 nx2 . n=1
which is the point estimate of 2.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
Sample variance
E(S2)= 1 ⇥n X2 nvar(X)⇤= 1 n X2 n·1 X2 = X2 n 1 n 1 n
Hence
Note: the divisor n 1 is used in calculating the sample variance since it gives an unbiased estimate of 2. We have “used up” one of the members of the sample in calculating the sample mean which replaced the population mean in the formula for variance.
The sample standard deviation S = pS2 is an estimator for the population standard deviation . Note: S is not an unbiased estimator of , but often the bias is small.
S2 is an unbiased estimator of 2.
Numerical Methods & Scientific Computing: lecture notes
Stochastic simulation
Statistical errors
End of Lecture 6