Chapter 3
Methods for Generating Random Variables
3.1 Introduction
One of the fundamental tools required in computational statistics is the ability to simulate random variables from specified probability distributions. On this topic many excellent references are available. On the general subject of methods for generating random variates from specified probability distri- butions, readers are referred to [72, 97, 117, 119, 120, 160, 234, 240, 251, 255]. On specific topics, also see [3, 4, 33, 43, 71, 101, 161, 165, 195].
In the simplest case, to simulate drawing an observation at random from a finite population, a method of generating random observations from the discrete uniform distribution is required. Therefore, a suitable generator of uniform pseudo-random numbers is essential. Methods for generating random variates from other probability distributions all depend on the uniform random number generator.
In this text we assume that a suitable uniform pseudo-random number generator is available. Refer to the help topic for .Random.seed or RNGkind for details about the default random number generator in R. For reference about different types of random number generators and their properties see Gentle [117, 119] and Knuth [169].
The uniform pseudo-random number generator in R is runif. To generate a vector of n (pseudo) random numbers between 0 and 1, use runif(n). Throughout this text, whenever computer generated random numbers are mentioned, it is understood that these are pseudo-random numbers. To gener- ate n random Uniform(a, b) numbers, use runif(n, a, b). To generate an n by m matrix of random numbers between 0 and 1, use matrix(runif(n*m), nrow=n, ncol=m) or matrix(runif(n*m), n, m).
In the examples of this chapter, several functions are given for generating random variates from continuous and discrete probability distributions. Gen- erators for many of these distributions are available in R (e.g., rbeta, rgeom, rchisq, etc.), but the methods presented below are general and apply to many other types of distributions. These methods are also applicable for external libraries, stand-alone programs, or nonstandard simulation problems.
Most of the examples include a comparison of the generated sample with 61
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
62 Statistical Computing with R
the theoretical distribution of the sampled population. In some examples, histograms, density curves, or QQ plots are constructed. In other examples summary statistics such as sample moments, sample percentiles, or the em- pirical distribution are compared with the corresponding theoretical values. These are informal approaches to check the implementation of an algorithm for simulating a random variable.
Example 3.1 (Sampling from a finite population). The sample function can be used to sample from a finite population, with or without replacement.
> #toss some coins
> sample(0:1, size = 10, replace = TRUE)
[1] 0 1 1 1 0 1 1 1 1 0
> #choose some lottery numbers
> sample(1:100, size = 6, replace = FALSE)
[1] 51 89 26 99 74 73
> #permuation of letters a-z
> sample(letters)
[1] “d” “n” “k” “x” “s” “p” “j” “t” “e” “b” “g”
“a” “m” “y” “i” “v” “l” “r” “w” “q” “z”
[22] “u” “h” “c” “f” “o”
> #sample from a multinomial distribution
> x <- sample(1:3, size = 100, replace = TRUE,
> table(x) x
123 17 35 48
prob = c(.2, .3, .5))
⋄
In the sections that follow, various methods of generating random variates from specified probability distributions are presented. Before discussing those methods, however, it is useful to summarize some of the probability functions available in R. The probability mass function (pmf) or density (pdf), cumu- lative distribution function (cdf), quantile function, and random generator of many commonly used probability distributions are available. For example, four functions are documented in the help topic Binomial:
dbinom(x, size, prob, log = FALSE)
pbinom(q, size, prob, lower.tail = TRUE, log.p = FALSE)
qbinom(p, size, prob, lower.tail = TRUE, log.p = FALSE)
rbinom(n, size, prob)
The same pattern is applied to other probability distributions. In each case, the abbreviation for the name of the distribution is combined with first letter d for density or pmf, p for cdf, q for quantile, or r for random generation from the distribution.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Random Generators of Common Probability Distributions in R
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 63
A partial list of available probability distributions and parameters is given in Table 3.1. For a complete list, refer to the R documentation [294, Ch. 8]. In addition to the parameters listed, some of the functions take optional log, lower.tail, or log.p arguments, and some take an optional ncp (noncen- trality) parameter.
TABLE 3.1:
Available in R
Distribution
beta
binomial chi-squared exponential
F
gamma
geometric lognormal negative binomial normal
Poisson
Student’s t uniform
Selected Univariate Probability Functions
cdf
pbeta pbinom pchisq pexp
pf pgamma pgeom plnorm pnbinom pnorm ppois
pt
punif
Generator Parameters
rbeta rbinom rchisq rexp
rf rgamma rgeom rlnorm rnbinom rnorm rpois
rt
runif
shape1, shape2 size, prob
df
rate
df1, df2
shape, rate or scale prob
meanlog, sdlog size, prob
mean, sd
lambda
df
min, max
3.2
The Inverse Transform Method
The inverse transform method of generating random variables is based on the following well-known result (see [21, p. 201] or [244, p. 203]).
Theorem 3.1 (Probability Integral Transformation). If X is a continuous random variable with cdf FX(x), then U = FX(X) ∼ Uniform(0, 1).
The inverse transform method of generating random variables applies the probability integral transformation. Define the inverse transformation
F−1(u)=inf{x: FX(x)=u}, 0 0 the cdf of X is FX(x) = 1−e−λx. The inverse transformation is F−1(u) = −1 log(1 − u). Note that U and 1 − U
λ
a random sample of size n with parameter lambda: -log(runif(n)) / lambda
A generator rexp is available in R. However, this algorithm is very useful for implementation in other situations, such as a C program. ⋄
3.2.2 Inverse Transform Method, Discrete Case
The inverse transform method can also be applied to discrete distributions. If X is a discrete random variable and
…
the numerical value of the logical expression u > 0.6.
n <- 1000
p <- 0.4
u <- runif(n)
x <- as.integer(u > 0.6)
> mean(x)
[1] 0.41
> var(x)
[1] 0.2421421
#(u > 0.6) is a logical vector
Compare the sample statistics with the theoretical moments. The sample mean of a generated sample should be approximately p = 0.4 and the sample vari- ance should be approximately p(1 − p) = 0.24. Our sample statistics are x ̄ = 0.41 (se = 0.24/1000 =. 0.0155) and s2 =. 0.242. ⋄
R Note 3.2
In R one can use the rbinom (random binomial) function with size=1 to generate a Bernoulli sample. Another method is to sample from the vector (0,1) with probabilities (1 − p, p).
rbinom(n, size = 1, prob = p)
sample(c(0,1), size = n, replace = TRUE, prob = c(.6,.4))
Also see Example 3.1.
Example 3.5 (Geometric distribution). Use the inverse transform method to generate a random geometric sample with parameter p = 1/4.
The pmf is f(x) = pqx, x = 0,1,2,…, where q = 1−p. At the points of discontinuity x = 0,1,2,…, the cdf is F(x) = 1 − qx+1. For each sample element we need to generate a random uniform u and solve
1−qx F ) is exactly x − 1.
rlogarithmic <- function(n, theta) {
#returns a random logarithmic(theta) sample size n
u <- runif(n)
#set the initial length of cdf vector
N <- ceiling(-16 / log10(theta))
k <- 1:N
a <- -1/log(1-theta)
fk <- exp(log(a) + k * log(theta) - log(k))
Fk <- cumsum(fk)
x <- integer(n)
for (i in 1:n) {
x[i] <- as.integer(sum(u[i] > Fk)) #F^{-1}(u)-1
while (x[i] == N) {
} }
x+1 }
#if x==N we need to extend the cdf
#very unlikely because N is large
logf <- log(a) + (N+1)*log(theta) - log(N+1)
fk <- c(fk, exp(logf))
Fk <- c(Fk, Fk[N] + fk[N+1])
N <- N + 1
x[i] <- as.integer(sum(u[i] > Fk))
Generate random samples from a Logarithmic(0.5) distribution.
n <- 1000
theta <- 0.5
x <- rlogarithmic(n, theta)
#compute density of logarithmic(theta) for comparison
k <- sort(unique(x))
p <- -1 / log(1 - theta) * theta^k / k
se <- sqrt(p*(1-p)/n) #standard error
In the following results, the relative frequencies of the sample (first line) match
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 69 the theoretical distribution (second line) of the Logarithmic(0.5) distribution
within two standard errors.
> round(rbind(table(x)/n, p, se),3) 1234567 0.741 0.169 0.049 0.026 0.008 0.003 0.004 p 0.721 0.180 0.060 0.023 0.009 0.004 0.002 se 0.014 0.012 0.008 0.005 0.003 0.002 0.001
Remark 3.1. A more efficient generator for the Logarithmic(θ) distribution is implemented in Example 3.9 of Section 3.4.
3.3 The Acceptance-Rejection Method
Suppose that X and Y are random variables with density or pmf f and g, respectively, and there exists a constant c such that
f(t) ≤ c g(t)
for all t such that f(t) > 0. Then the acceptance-rejection method (or rejection method) can be applied to generate the random variable X.
The Acceptance-Rejection Method
1. Find a random variable Y with density g satisfying f(t)/g(t) ≤ c, for all t such that f(t) > 0. Provide a method to generate random Y.
2. For each random variate required:
(a) Generate a random y from the distribution with density g. (b) Generate a random u from the Uniform(0, 1) distribution.
(c) If u < f(y)/(cg(y)), accept y and deliver x = y; otherwise reject y and repeat from Step 2a.
⋄
Note that in Step 2c,
P(accept|Y)=PU < f(Y) Y= f(Y).
cg(Y ) cg(Y )
The last equality is simply evaluating the cdf of U. The total probability of
acceptance for any iteration is therefore
P(accept|y)P(Y = y) = f(y) g(y) = 1,
y ycg(y) c
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
70 Statistical Computing with R
and the number of iterations until acceptance has the geometric distribution with mean c. Hence, on average each sample value of X requires c iterations. For efficiency, Y should be easy to simulate and c small.
To see that the accepted sample has the same distribution as X, apply Bayes’ Theorem. In the discrete case, for each k such that f(k) > 0,
P(k |accepted) = P(accepted |k)g(k) = [f(k)/(cg(k))] g(k) = f(k). P (accepted) 1/c
The continuous case is similar.
Example 3.7 (Acceptance-rejection method). This example illustrates the acceptance-rejection method for the beta distribution. On average, how many random numbers must be simulated to generate 1000 variates from the Beta(α = 2, β = 2) distribution by this method? It depends on the upper bound c of f(x)/g(x), which depends on the choice of the function g(x).
The Beta(2,2) density is f(x) = 6x(1 − x), 0 < x < 1. Let g(x) be the Uniform(0,1) density. Then f(x)/g(x) ≤ 6 for all 0 < x < 1, so c = 6. A random x from g(x) is accepted if
f(x) = 6x(1−x) =x(1−x)>u. cg(x) 6(1)
On average, cn = 6000 iterations (12000 random numbers) will be required for a sample size 1000. In the following simulation, the counter j for iterations is not necessary, but included to record how many iterations were actually needed to generate the 1000 beta variates.
n <- 1000
k <- 0 #counter for accepted
j <- 0 #iterations
y <- numeric(n)
while (k < n) {
u <- runif(1)
j <- j + 1
x <- runif(1) #random variate from g
if (x * (1-x) > u) {
#we accept x
k <- k + 1
y[k] <- x
} }
>j
[1] 5873
In this simulation, 5873 iterations (11746 random numbers) were required to generate the 1000 beta variates. Compare the empirical and theoretical percentiles.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 71
#compare empirical and theoretical percentiles
p <- seq(.1, .9, .1)
Qhat <- quantile(y, p) #quantiles of sample
Q <- qbeta(p, 2, 2) #theoretical quantiles
se <- sqrt(p * (1-p) / (n * dbeta(Q, 2, 2)^2)) #see Ch. 2
The sample percentiles (first line) approximately match the Beta(2,2) per- centiles computed by qbeta (second line), most closely near the center of the distribution. Larger numbers of replicates are required for estimation of percentiles where the density is close to zero.
> round(rbind(Qhat, Q, se), 3)
10% 20% 30% 40% 50% 60% 70% 80% 90%
Qhat 0.189 0.293 0.365 0.449 0.519 0.589 0.665 0.741 0.830
Q 0.196 0.287 0.363 0.433 0.500 0.567 0.637 0.713 0.804
se 0.010 0.010 0.010 0.011 0.011 0.011 0.010 0.010 0.010
Repeating the simulation with n = 10000 produces more precise estimates.
> round(rbind(Qhat, Q, se), 3)
10% 20% 30% 40% 50% 60% 70% 80% 90%
Qhat 0.194 0.292 0.368 0.436 0.504 0.572 0.643 0.716 0.804
Q 0.196 0.287 0.363 0.433 0.500 0.567 0.637 0.713 0.804
se 0.003 0.004 0.004 0.004 0.004 0.004 0.004 0.004 0.003
Remark 3.2. See Example 3.8 for a more efficient beta generator based on the ratio of gammas method.
3.4 Transformation Methods
Many types of transformations other than the probability inverse trans- formation can be applied to simulate random variables. Some examples are
1. IfZ∼N(0,1),thenV =Z2 ∼χ2(1).
2. If U ∼ χ2(m) and V ∼ χ2(n) are independent, then F = U/m has the
V/n
F distribution with (m, n) degrees of freedom.
3. If Z ∼ N(0,1) and V ∼ χ2(n) are independent, then T = √Z V/n
Student t distribution with n degrees of freedom.
4. If U, V ∼ Unif(0,1) are independent, then
Z1 =−2logU cos(2πV), Z2 =−2logU sin(2πV)
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
⋄
has the
Copyright © 2019. CRC Press LLC. All rights reserved.
72
Statistical Computing with R
are independent standard normal variables [255, p. 86].
If U ∼ Gamma(r, λ) and V ∼ Gamma(s, λ) are independent, then X =
U has the Beta(r, s) distribution. U+V
5.
6.
Generators based on transformations (5) and (6) are implemented in Ex- amples 3.8 and 3.9. Sums and mixtures are special types of transformations that are discussed in Section 3.5. Example 3.21 uses a multivariate transfor- mation to generate points uniformly distributed on the unit sphere.
Example 3.8 (Beta distribution). The following relation between beta and gamma distributions provides another beta generator.
If U, V ∼ Unif(0,1) are independent, then
log(V)
X = 1+ log(1−(1−θ)U)
has the Logarithmic(θ) distribution, where ⌊x⌋ denotes the integer part
of x.
If U ∼ Gamma(r, λ) and V ∼ Gamma(s, λ) are independent, then X=U
U+V
has the Beta(r, s) distribution [255, p.64]. This transformation determines an
algorithm for generating random Beta(a, b) variates. 1. Generate a random u from Gamma(a, 1).
2. Generate a random v from Gamma(b, 1).
3. Deliverx= u .
u+v
This method is applied below to generate a random Beta(3, 2) sample.
n <- 1000
a <- 3
b <- 2
u <- rgamma(n, shape=a, rate=1)
v <- rgamma(n, shape=b, rate=1)
x <- u / (u + v)
The sample data can be compared with the Beta(3, 2) distribution using a quantile-quantile (QQ) plot. If the sampled distribution is Beta(3, 2), the QQ plot should be nearly linear.
q <- qbeta(ppoints(n), a, b)
qqplot(q, x, cex=0.25, xlab="Beta(3, 2)", ylab="Sample")
abline(0, 1)
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 73
0.2 0.4 0.6 0.8 1.0 Beta(3,2)
FIGURE 3.2: QQ Plot comparing the Beta(3, 2) distribution with a simu- lated random sample generated by the ratio of gammas method in Example 3.8.
The line x = q is added for reference. The QQ plot of the ordered sample vs. the Beta(3, 2) quantiles in Figure 3.2 is very nearly linear, as it should be if the generated sample is in fact a Beta(3, 2) sample. ⋄
Example 3.9 (Logarithmic distribution, version 2). This example provides another, more efficient generator for the logarithmic distribution (see Example 3.6). If U, V are independent Uniform(0,1) random variables, then
log(V)
X = 1+ log(1−(1−θ)U) (3.3)
has the Logarithmic(θ) distribution ([72, pp. 546-8], [165]). This transforma- tion provides a simple and efficient generator for the logarithmic distribution.
1. Generate u from Unif(0,1).
2. Generate v from Unif(0,1).
3. Deliver x = ⌊1 + log(v)/ log(1 − (1 − θ)u)⌋.
Below is a comparison of the Logarithmic(0.5) distribution with a sample generated using transformation (3.3). The empirical probabilities p.hat are within two standard errors of the theoretical probabilities p.
n <- 1000
theta <- 0.5
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Sample
0.2 0.4 0.6 0.8 1.0
74
Statistical Computing with R
u <- runif(n) #generate logarithmic sample
v <- runif(n)
x <- floor(1 + log(v) / log(1 - (1 - theta)^u))
k <- 1:max(x) #calc. logarithmic probs.
p <- -1 / log(1 - theta) * theta^k / k
se <- sqrt(p*(1-p)/n)
p.hat <- tabulate(x)/n
> print(round(rbind(p.hat, p, se), 3))
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
p.hat 0.740 0.171 0.052 0.018 0.010 0.006 0.003
p 0.721 0.180 0.060 0.023 0.009 0.004 0.002
se 0.014 0.012 0.008 0.005 0.003 0.002 0.001
The following function is a simple replacement for rlogarithmic in Ex- ample 3.6.
rlogarithmic <- function(n, theta) {
stopifnot(all(theta > 0 & theta < 1))
th <- rep(theta, length=n)
u <- runif(n)
v <- runif(n)
x <- floor(1 + log(v) / log(1 - (1 - th)^u))
return(x)
}
R Note 3.3
The tabulate function bins positive integers, so it can be used on the logarithmic sample. For other types of data, recode the data to positive integers or use table. If the data are not positive integers, tabulate will truncate real numbers and ignore without warning integers less than 1.
⋄
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 75 R Note 3.4
In the rlogarithmic function above, notice the & operator in stopifnot(all(theta > 0 & theta < 1)). Here we must use & rather than &&. The & operator performs an elementwise AND compar- ison, returning a logical vector. The && operator, however, evaluates from left to right until a single logical result is obtained. For example
x <- 1:5
>1
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
76 Statistical Computing with R
tribution. See Bean [26] for an introductory level presentation of these and many other interesting relationships between families of distributions.
In R it is of course easier to use the functions rchisq, rgeom and rnbinom to generate chisquare, geometric and negative binomial random samples. The following example is presented to illustrate a general method that can be applied whenever distributions are related by convolutions.
Example 3.10 (Chisquare). This example generates a chisquare χ2(ν) ran- dom variable as the convolution of ν squared normals. If Z1, . . . , Zν are iid N(0,1) random variables, then V = Z12 + · · · + Zν2 has the χ2(ν) distribution. Steps to generate a random sample of size n from χ2(ν) are as follows:
1. Fill an n × ν matrix with nν random N(0,1) variates.
2. Square each entry in the matrix (1).
3. Compute the row sums of the squared normals. Each row sum is one random observation from the χ2(ν) distribution.
4. Deliver the vector of row sums.
An example with n = 1000 and ν = 2 is shown below.
n <- 1000
nu <- 2
X <- matrix(rnorm(n*nu), n, nu)^2 #matrix of sq. normals
#sum the squared normals across each row: method 1
y <- rowSums(X)
#method 2
y <- apply(X, MARGIN=1, FUN=sum) #a vector length n
> mean(y)
[1] 2.027334
> mean(y^2)
[1] 7.835872
A χ2(ν) random variable has mean ν and variance 2ν. Our sample statistics below agree very closely with the theoretical moments E[Y ] = ν = 2 and E[Y 2] = 2ν + ν2 = 8. Here the standard errors of the sample moments are 0.063 and 0.089, respectively. ⋄
R Note 3.5
This example introduces the apply function. The apply function ap- plies a function to the margins of an array. To sum across the rows of matrix X, the function (FUN=sum) is applied to the rows (MARGIN=1). Notice that a loop is not used to compute the row sums. In general for efficient programming in R, avoid unnecessary loops. (For row and column sums it is easier to use rowSums and colSums.)
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 77
Mixtures
A random variable X is a discrete mixture if the distribution of X is a weighted sum FX(x) = θiFXi(x) for some sequence of random variables X1,X2,… and θi > 0 such that i θi = 1. The constants θi are called the mixing weights or mixing probabilities. Although the notation is similar for sums and mixtures, the distributions represented are different.
A random variable X is a continuous mixture if the distribution of X is FX(x) = ∞ FX|Y=y(x)fY (y) dy for a family X|Y = y indexed by the real
−∞
numbers y and weighting function fY such that ∞ fY (y) dy = 1. −∞
Compare the methods for simulation of a convolution and a mixture of normal variables. Suppose X1 ∼ N(0,1) and X2 ∼ N(3,1) are independent. The notation S = X1 + X2 denotes the convolution of X1 and X2. The dis- tribution of S is normal with mean μ1 + μ2 = 3 and variance σ12 + σ2 = 2. To
simulate the convolution:
1. Generate x1 ∼ N(0, 1). 2. Generate x2 ∼ N(3, 1). 3. Deliver s = x1 + x2.
We can also define a 50% normal mixture X, denoted FX (x) = 0.5FX1 (x)+ 0.5FX2 (x). Unlike the convolution above, the distribution of the mixture X is distinctly non-normal; it is bimodal.
To simulate the mixture:
1. Generate an integer k ∈ {1, 2}, where P (1) = P (2) = 0.5.
2. If k = 1 deliver random x from N(0, 1); if k = 2 deliver random x from N(3, 1).
In the following example we will compare simulated distributions of a convolution and a mixture of gamma random variables.
Example 3.11 (Convolutions and mixtures). Let X1 ∼ Gamma(2, 2) and X2 ∼ Gamma(2, 4) be independent. Compare the histograms of the samples generated by the convolution S = X1 + X2 and the mixture FX = 0.5FX1 + 0.5FX2 .
n <- 1000
x1 <- rgamma(n, 2, 2)
x2 <- rgamma(n, 2, 4)
s <- x1 + x2
u <- runif(n)
k <- as.integer(u > 0.5) #vector of 0’s and 1’s x<-k*x1+(1-k)*x2 #themixture
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
#the convolution
Copyright © 2019. CRC Press LLC. All rights reserved.
78 Statistical Computing with R
par(mfcol=c(1,2)) #two graphs per page
hist(s, prob=TRUE, xlim=c(0,5), ylim=c(0,1))
hist(x, prob=TRUE, xlim=c(0,5), ylim=c(0,1))
par(mfcol=c(1,1)) #restore display
The histograms shown in Figure 3.3, of the convolution S and mixture X, are clearly different. ⋄
R Note 3.6
The par function can be used to set (or query) certain graphical pa- rameters. A list of all graphical parameters is returned by par(). The command par(mfcol=c(n,m)) configures the graphical device to dis- play nm graphs per screen, in n rows and m columns.
"
c
c
Histogram of s 00
ro 0~0
Histogram of x
[1] 5 1 4 2 1 3 2 3
> rate[1:8]
[1] 3.0 1.0 2.5 1.5 1.0 2.0 1.5 2.0
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Density
0.00 0.05 0.10 0.15 0.20 0.25 0.30
Methods for Generating Random Variables 81 ⋄
Example 3.14 (Plot density of mixture). Plot the densities (not density estimates) of the gamma distributions and the mixture in Example 3.13. (This example is a programming exercise that involves vectors of parameters and repeated use of the apply function.)
The density of the mixture is
5
f(x) = θjfj(x), x > 0, (3.4) j=1
where fj is the Gamma(3, λj ) density. To produce the plot, we need a function to compute the density f(x) of the mixture.
f <- function(x, lambda, theta) {
#density of the mixture at the point x
sum(dgamma(x, 3, lambda) * theta)
}
The function f computes the density of the mixture (3.4) for a single value of x. If x has length 1, dgamma(x, 3, lambda) is a vector the same length as lambda; in this case (f1 (x), . . . , f5 (x)). Then dgamma(x, 3, lambda)*theta is the vector (θ1f1(x), . . . , θ5f5(x)). The sum of this vector is the density of the mixture (3.3) evaluated at the point x.
x <- seq(0, 8, length=200)
dim(x) <- length(x) #need for apply
#compute density of the mixture f(x) along x
y <- apply(x, 1, f, lambda=lambda, theta=p)
The density of the mixture is computed by function f applied to the vec- tor x. The function f takes several arguments, so the additional arguments lambda=lambda, theta=prob are supplied after the name of the function, f.
A plot of the five densities with the mixture is shown in Figure 3.5. The code to produce the plot is listed below. The densities fk can be computed by the dgamma function. A sequence of points x is defined and each of the densities is computed along x.
#plot the density of the mixture
plot(x, y, type="l", ylim=c(0,.85), lwd=3, ylab="Density")
for (j in 1:5) {
#add the j-th gamma density to the plot
y <- apply(x, 1, dgamma, shape=3, rate=lambda[j])
lines(x, y)
}
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
⋄
Copyright © 2019. CRC Press LLC. All rights reserved.
82
Statistical Computing with R
R Note 3.7
The apply function requires a dimension attribute for x. Since x is a vector, it does not have a dimension attribute by default. The di- mension of x is assigned by dim(x) <- length(x). Alternately, x <- as.matrix(x) converts x to a matrix (a column vector), which has a dimension attribute.
02468 x
FIGURE 3.5: Densities from Example 3.14: A mixture (thick line) of several gamma densities (thin lines).
Example 3.15 (Poisson-Gamma mixture). This is an example of a contin- uous mixture. The negative binomial distribution is a mixture of Poisson(Λ) distributions, where Λ has a gamma distribution. Specifically, if (X|Λ = λ) ∼ Poisson(λ) and Λ ∼ Gamma(r,β), then X has the negative binomial distri- bution with parameters r and p = β/(1 + β) (see, e.g., [26]). This example illustrates a method of sampling from a Poisson-Gamma mixture and com- pares the sample with the negative binomial distribution.
#generate a Poisson-Gamma mixture
n <- 1000
r <- 4
beta <- 3
lambda <- rgamma(n, r, beta) #lambda is random
#now supply the sample of lambda’s as the Poisson mean
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Density
0.0 0.2 0.4 0.6 0.8
Methods for Generating Random Variables 83 x <- rpois(n, lambda) #the mixture
#compare with negative binomial
mix <- tabulate(x+1) / n
negbin <- round(dnbinom(0:max(x), r, beta/(1+beta)), 3)
se <- sqrt(negbin * (1 - negbin) / n)
The empirical distribution (first line below) of the mixture agrees very closely with the pmf of NegBin(4, 3/4) (second line).
> round(rbind(mix, negbin, se), 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
mix 0.334 0.305 0.201 0.091 0.042 0.018 0.005 0.003 0.001
negbin 0.316 0.316 0.198 0.099 0.043 0.017 0.006 0.002 0.001
se 0.015 0.015 0.013 0.009 0.006 0.004 0.002 0.001 0.001
Multivariate Distributions
⋄
3.6
Generators for the multivariate normal distribution, multivariate normal mixtures, Wishart distribution, and uniform distribution on the sphere in Rd are presented in this section.
3.6.1 Multivariate Normal Distribution
A random vector X = (X1 , . . . , Xd ) has a d-dimensional multivariate nor- mal (MVN) distribution denoted Nd(μ,Σ) if the density of X is
f(x) = 1 exp{−(1/2)(x − μ)T Σ−1(x − μ)}, x ∈ Rd, (3.5) (2π)d/2 |Σ|1/2
whereμ=(μ1,…,μd)T isthemeanvectorandΣisad×dsymmetricpositive
definite matrix
σ11 σ12 … σ1d
σ21 σ22 … σ2d Σ = . . .
… σd1 σd2 … σdd
with entries σij = Cov(Xi,Xj). Here Σ−1 is the inverse of Σ, and |Σ| is the determinant of Σ. The bivariate normal distribution is the special case N2(μ, Σ).
A random Nd(μ,Σ) variate can be generated in two steps. First generate Z = (Z1,…,Zd), where Z1,…,Zd are iid standard normal variates. Then
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
84 Statistical Computing with R
transform the random vector Z so that it has the desired mean vector μ and covariance structure Σ. The transformation requires factoring the covariance matrix Σ.
Recall that if Z ∼ Nd (μ, Σ), then the linear transformation C Z + b is multivariate normal with mean Cμ+b and covariance CΣCT . If Z is Nd(0, Id), then
CZ + b ∼ Nd(b, CCT ).
Suppose that Σ can be factored so that Σ = CCT for some matrix C. Then
CZ + μ ∼ Nd(μ, Σ),
and CZ + μ is the required transformation.
The required factorization of Σ can be obtained by the spectral decompo-
sition method (eigenvector decomposition), Choleski factorization, or singular value decomposition (svd). The corresponding R functions are eigen, chol, and svd.
Usually, one does not apply a linear transformation to the random vectors of a sample one at a time. Typically, one applies the transformation to a data matrix and transforms the entire sample. Suppose that Z = (Zij) is an n × d matrix where Zij are iid N(0,1). Then the rows of Z are n random observations from the d-dimensional standard MVN distribution. The required transformation applied to the data matrix is
X = ZQ + JμT , (3.6)
where QT Q = Σ and J is a column vector of ones. The rows of X are n random observations from the d-dimensional MVN distribution with mean vector μ and covariance matrix Σ.
Method for generating multivariate normal samples
To generate a random sample of size n from the Nd(μ,Σ) distribution:
1. Generate an n × d matrix Z containing nd random N (0, 1) variates
(n random vectors in Rd).
2. Compute a factorization Σ = QT Q.
3. Apply the transformation X = ZQ + JμT .
4. Deliver the n × d matrix X.
Each row of X is a random variate from the Nd(μ,Σ) distribution.
The X = ZQ + JμT transformation can be coded in R as follows. Recall that the matrix multiplication operator is %*%.
Z <- matrix(rnorm(n*d), nrow = n, ncol = d)
X <- Z %*% Q + matrix(mu, n, d, byrow = TRUE)
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 85
ThematrixproductJμT isequaltomatrix(mu, n, d, byrow = TRUE).This saves a matrix multiplication. The argument byrow = TRUE is necessary here; the default is byrow = FALSE. The matrix is filled row by row with the entries of the mean vector mu.
In this section, each method of generating MVN random samples is il- lustrated with examples. Also note that there are functions provided in R packages for generating multivariate normal samples. See the mvrnorm func- tion in the MASS package [293], and rmvnorm in the mvtnorm package [121]. In all of the examples below, the rnorm function is used to generate standard normal random variates.
Spectral decomposition method for generating Nd(μ,Σ) samples
The square root of the covariance is Σ1/2 = PΛ1/2P−1, where Λ is the di- agonal matrix with the eigenvalues of Σ along the diagonal and P is the matrix whose columns are the eigenvectors of Σ corresponding to the eigenvalues in Λ. This method can also be called the eigen-decomposition method. In the eigen-decomposition we have P −1 = P T and therefore Σ1/2 = P Λ1/2P T . The matrix Q = Σ1/2 is a factorization of Σ such that QT Q = Σ.
Example 3.16 (Spectral decomposition method). This example provides a function rmvn.eigen to generate a multivariate normal random sample. It is applied to generate a bivariate normal sample with zero mean vector and
1.0 0.9 Σ= 0.9 1.0 .
# mean and covariance parameters
mu <- c(0, 0)
Sigma <- matrix(c(1, .9, .9, 1), nrow = 2, ncol = 2)
The eigen function returns the eigenvalues and eigenvectors of a matrix.
rmvn.eigen <-
function(n, mu, Sigma) {
# generate n random vectors from MVN(mu, Sigma)
# dimension is inferred from mu and Sigma
d <- length(mu)
ev <- eigen(Sigma, symmetric = TRUE)
lambda <- ev$values
V <- ev$vectors
R <- V %*% diag(sqrt(lambda)) %*% t(V)
Z <- matrix(rnorm(n*d), nrow = n, ncol = d)
X <- Z %*% R + matrix(mu, n, d, byrow = TRUE)
X
}
Print summary statistics and display a scatterplot as a check on the results of the simulation.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
86
Statistical Computing with R
# generate the sample
X <- rmvn.eigen(1000, mu, Sigma)
plot(X, xlab = "x", ylab = "y", pch = 20)
> print(colMeans(X))
[1] -0.001628189 0.023474775
> print(cor(X))
[,1] [,2]
[1,] 1.0000000 0.8931007
[2,] 0.8931007 1.0000000
Output from Example 3.16 shows the sample mean vector is (−0.002, 0.023) and sample correlation is 0.893, which agree closely with the specified param- eters. The scatter plot of the sample data shown in Figure 3.6 exhibits the elliptical symmetry of multivariate normal distributions. ⋄
−3 −2 −1 0 1 2 3 x
FIGURE 3.6: Scatterplot of a random bivariate normal sample with mean vector zero, variances σ12 = σ2 = 1 and correlation ρ = 0.9, from Example 3.16.
SVD Method of generating Nd(μ,Σ) samples
The singular value decomposition (svd) generalizes the idea of eigenvectors to rectangular matrices. The svd of a matrix X is X = U DV T , where D is a vector containing the singular values of X, U is a matrix whose columns contain the left singular vectors of X, and V is a matrix whose columns contain the right singular vectors of X. The matrix X in this case is the population
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
y
−3 −2 −1 0 1 2 3
Methods for Generating Random Variables 87
covariance matrix Σ, and UV T = I. The svd of a symmetric positive definite matrix Σ gives U = V = P and Σ1/2 = UD1/2VT. Thus the svd method for this application is equivalent to the spectral decomposition method, but is less efficient because the svd method does not take advantage of the fact that the matrix Σ is square symmetric.
Example 3.17 (SVD method). This example provides a function rmvn.svd to generate a multivariate normal sample, using the svd method to factor Σ.
rmvn.svd <-
function(n, mu, Sigma) {
# generate n random vectors from MVN(mu, Sigma)
# dimension is inferred from mu and Sigma
d <- length(mu)
S <- svd(Sigma)
R <- S$u %*% diag(sqrt(S$d)) %*% t(S$v) #sq. root Sigma
Z <- matrix(rnorm(n*d), nrow=n, ncol=d)
X <- Z %*% R + matrix(mu, n, d, byrow=TRUE)
X
}
This function is applied in Example 3.19. ⋄ Choleski factorization method of generating Nd(μ,Σ) samples
The Choleski factorization of a real symmetric positive-definite matrix is X = QT Q, where Q is an upper triangular matrix. The Choleski factorization is implemented in the R function chol. The basic syntax is chol(X) and the return value is an upper triangular matrix R such that RT R = X.
Example 3.18 (Choleski factorization method). The Choleski factoriza- tion method is applied to generate 200 random observations from a four- dimensional multivariate normal distribution.
rmvn.Choleski <-
function(n, mu, Sigma) {
# generate n random vectors from MVN(mu, Sigma)
# dimension is inferred from mu and Sigma
d <- length(mu)
Q <- chol(Sigma) # Choleski factorization of Sigma
Z <- matrix(rnorm(n*d), nrow=n, ncol=d)
X <- Z %*% Q + matrix(mu, n, d, byrow=TRUE)
X }
In this example, we will generate the samples according to the same mean and covariance structure as the four-dimensional iris virginica data.
y <- subset(x=iris, Species=="virginica")[, 1:4]
mu <- colMeans(y)
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
88
Statistical Computing with R
Sigma <- cov(y)
> mu
Sepal.Length Sepal.Width Petal.Length Petal.Width
6.588 2.974 5.552 2.026
> Sigma
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 0.40434286 0.09376327 0.30328980 0.04909388
Sepal.Width 0.09376327 0.10400408 0.07137959 0.04762857
Petal.Length 0.30328980 0.07137959 0.30458776 0.04882449
Petal.Width 0.04909388 0.04762857 0.04882449 0.07543265
#now generate MVN data with this mean and covariance
X <- rmvn.Choleski(200, mu, Sigma)
pairs(X)
The pairs plot of the data in Figure 3.7 gives a 2-D view of the bivariate distribution of each pair of marginal distributions. The joint distribution of each pair of marginal distributions is theoretically bivariate normal. The plot can be compared with Figure 5.1, which displays the iris virginica data. (The iris virginica data are not multivariate normal, but means and correlation for each pair of variables should be similar to the simulated data.) ⋄
Sepal.Length
5.0 6.0 7.0
2.5 3.0 3.5
Sepal.Width
4.0
1.5 2.0 2.5
FIGURE 3.7: Pairs plot of the bivariate marginal distributions of a simulated multivariate normal random sample in Example 3.18. The parameters match the mean and covariance of the iris virginica data.
Remark 3.3. To standardize a multivariate normal sample, we invert the pro- cedure above, substituting the sample mean vector and sample covariance matrix if the parameters are unknown. The transformed d-dimensional sam-
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Petal.Length
4.5 5.0 5.5 6.0 6.5
Petal.Width
Copyright © 2019. CRC Press LLC. All rights reserved.
1.5 2.0 2.5 2.5 3.0 3.5
4.0
4.5 5.0 5.5 6.0 6.5
5.0 6.0 7.0
Methods for Generating Random Variables 89 ple then has zero mean vector and covariance Id. This is not the same as
scaling the columns of the data matrix. ⋄ Comparing Performance of Generators
We have discussed several methods for generating random samples from specified probability distributions. When several methods are available, which method is preferred? One consideration may be the computational time re- quired (the time complexity). Another important consideration, if the purpose of the simulation is to estimate one or more parameters, is the variance of the estimator. The latter topic is considered in Chapter 6. To compare the empiri- cal performance with respect to computing time, we can time each procedure.
R provides the system.time function, which times the evaluation of its argument. This function can be used as a rough benchmark to compare the performance of different algorithms. In the next example, the system.time function is used to compare the CPU time required for several different meth- ods of generating multivariate normal samples.
Example 3.19 (Comparing performance of MVN generators). This example generates multivariate normal samples in a higher dimension (d = 30) and compares the timing of each of the methods presented in Section 3.6.1 and two generators available in R packages. This example uses a function rmvnorm in the package mvtnorm [121]. This package is not part of the standard R distribution, but can be installed from CRAN. The MASS package [293] is one of the recommended packages included with the R distribution.
library(MASS)
library(mvtnorm)
n <- 100
d <- 30
N <- 2000
mu <- numeric(d)
#sample size
#dimension
#iterations
set.seed(100)
system.time(for (i in 1:N)
rmvn.eigen(n, mu, cov(matrix(rnorm(n*d), n, d))))
set.seed(100)
system.time(for (i in 1:N)
rmvn.svd(n, mu, cov(matrix(rnorm(n*d), n, d))))
set.seed(100)
system.time(for (i in 1:N)
rmvn.Choleski(n, mu, cov(matrix(rnorm(n*d), n, d))))
set.seed(100)
system.time(for (i in 1:N)
mvrnorm(n, mu, cov(matrix(rnorm(n*d), n, d))))
set.seed(100)
system.time(for (i in 1:N)
rmvnorm(n, mu, cov(matrix(rnorm(n*d), n, d))))
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
90
Statistical Computing with R
set.seed(100)
system.time(for (i in 1:N)
cov(matrix(rnorm(n*d), n, d)))
Most of the work involved in generating a multivariate normal sample is the factorization of the covariance matrix. The covariances used for this example are actually the sample covariances of standard multivariate normal samples. Thus, the randomly generated Σ varies with each iteration, but Σ is close to an identity matrix. In order to time each method on the same covariance matrices, the random number seed is restored before each run. The last run simply generates the covariances, for comparison with the total time.
The results below (summarized from the console output) suggest that there are differences in performance among these five methods when the covariance matrix is close to identity. The Choleski method is somewhat faster, while rmvn.eigen and mvrnorm (MASS) [293] appear to perform about equally well. The similar performance of rmvn.eigen and mvrnorm is not surprising, be- cause according to the documentation for mvrnorm, the method of matrix decomposition is the eigendecomposition. Documentation for mvrnorm states that “although a Choleski decomposition might be faster, the eigendecompo- sition is stabler.”
Timings of MVN generators
user system elapsed
rmvn.eigen
rmvn.svd
rmvn.choleski
mvrnorm
rmvnorm
generate Sigma 0.64
0.00 1.77
0 2
0.02 1.27
0.00 1.77
0.00 2.17
0.00 0.65
1.78
2
1.22
1.77
2.17
The system.time function was also used to compare the methods in Ex- amples 4.1 and 4.2. The code (not shown) is similar to the examples above.
See Chapter 15, on the topic of benchmarking, for more convenient meth- ods of comparing running times of functions. Example 15.3 shows a simpler way to compare the timings above that automatically produces a table and relative timings.
3.6.2 Mixtures of Multivariate Normals
A multivariate normal mixture is denoted
pNd(μ1, Σ1) + (1 − p)Nd(μ2, Σ2) (3.7)
where the sampled population is Nd(μ1, Σ1) with probability p, and Nd(μ2, Σ2) with probability 1 − p. As the mixing parameter p and other parameters are
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
⋄
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 91
varied, the multivariate normal mixtures have a wide variety of types of de-
partures from normality. For example, a 50% normal location mixture is sym-
metric with light tails, and a 90% normal location mixture is skewed with
1 √3.
heavy tails. A normal location mixture with p = 1 − 2 (1 − 3 ) = 0.7887,
provides an example of a skewed distribution with normal kurtosis [147]. Pa- rameters can be varied to generate a wide variety of distributional shapes. Johnson [160] gives many examples for the bivariate normal mixtures. Many commonly applied statistical procedures do not perform well under this type of departure from normality, so normal mixtures are often chosen to compare the properties of competing robust methods of analysis.
If X has the distribution (3.7), then a random observation from the dis- tribution of X can be generated as follows.
To generate a random sample from pNd(μ1, Σ1) + (1 − p)Nd(μ2, Σ2)
1. Generate U ∼ Uniform(0,1).
2. If U ≤ p generate X from Nd(μ1, Σ1); otherwise generate X from Nd(μ2, Σ2).
The following procedure is equivalent.
1. Generate N ∼ Bernoulli(p).
2. If N = 1 generate X from Nd(μ1, Σ1); otherwise generate X from Nd(μ2, Σ2).
Example 3.20 (Multivariate normal mixture). Write a function to generate a multivariate normal mixture with two components. The components of a location mixture differ in location only. Use the mvrnorm(MASS) function [293] to generate the multivariate normal observations.
First we write this generator in an inefficient loop to clearly illustrate the steps outlined above. (We will eliminate the loop later.)
library(MASS) #for mvrnorm
#inefficient version loc.mix.0 with loops
loc.mix.0 <- function(n, p, mu1, mu2, Sigma) {
#generate sample from BVN location mixture
X <- matrix(0, n, 2)
for (i in 1:n) {
k <- rbinom(1, size = 1, prob = p)
if (k)
X[i,] <- mvrnorm(1, mu = mu1, Sigma) else
X[i,] <- mvrnorm(1, mu = mu2, Sigma)
}
return(X) }
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
92 Statistical Computing with R
Although the code above will generate the required mixture, the loop is rather inefficient. Generate n1, the number of observations realized from the first component, from Binomial(n, p). Generate n1 variates from component 1 and n2 = n−n1 from component 2 of the mixture. Generate a random permu- tation of the indices 1:n to indicate the order in which the sample observations appear in the data matrix.
#more efficient version
loc.mix <- function(n, p, mu1, mu2, Sigma) {
#generate sample from BVN location mixture
n1 <- rbinom(1, size = n, prob = p)
n2 <- n - n1
x1 <- mvrnorm(n1, mu = mu1, Sigma)
x2 <- mvrnorm(n2, mu = mu2, Sigma)
X <- rbind(x1, x2) #combine the samples
return(X[sample(1:n), ]) #mix them
}
To illustrate the normal mixture generator, we apply loc.mix to generate a random sample of n = 1000 observations from a 50% 4-dimensional normal location mixture with μ1 = (0, 0, 0, 0) and μ2 = (2, 3, 4, 5) and covariance I4.
x <- loc.mix(1000, .5, rep(0, 4), 2:5, Sigma = diag(4))
r <- range(x) * 1.2
par(mfrow = c(2, 2))
for (i in 1:4)
hist(x[ , i], xlim = r, ylim = c(0, .3), freq = FALSE,
main = "", breaks = seq(-5, 10, .5))
par(mfrow = c(1, 1))
It is difficult to visualize data in R4, so we display only the histograms of the marginal distributions in Figure 3.8. All of the one-dimensional marginal dis- tributions are univariate normal location mixtures. Methods for visualization of multivariate data are covered in Chapter 5. Also, an interesting view of a bivariate normal mixture with three components is shown in Figure 12.13. ⋄
3.6.3 Wishart Distribution
Suppose M = XT X, where X is an n×d data matrix of a random sample from a Nd(μ,Σ) distribution. Then M has a Wishart distribution with scale matrix Σ and n degrees of freedom, denoted M ∼ Wd(Σ,n) (see, e.g., [13, 194]). Note that when d = 1, the elements of X are a univariate random
sample from N(μ, σ2) so W1(σ2, n) =D σ2χ2(n).
An obvious, but inefficient, approach to generating random variates from a
Wishart distribution is to generate multivariate normal random samples and compute the matrix product XT X. This method is computationally expensive
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 93
−4 −2 0 2 4 6 8 10 x[, i]
−4 −2 0 2 4 6 8 10 x[, i]
−4 −2 0 2 4 6 8 10 x[, i]
−4 −2 0 2 4 6 8 10 x[, i]
FIGURE 3.8: Histograms of the marginal distributions of multivariate nor- mal location mixture data generated in Example 3.20.
because nd random normal variates must be generated to determine the d(d+ 1)/2 distinct entries in M.
A more efficient method based on Bartlett’s decomposition [25] is summa- rized by Johnson [160, p. 204] as follows. Let T = (Tij) be a lower triangular d × d random matrix with independent entries satisfying
iid
1.Tij ∼N(0,1),i>j.
2. Tii ∼χ2(n−i+1),i=1,…,d.
Then the matrix A = TTT has a Wd(Id,n) distribution. To generate Wd(Σ,n) random variates, obtain the Choleski factorization Σ = LLT , where L is lower triangular. Then LALT ∼ Wd(Σ,n) [25, 140, 218]. Implementation is left as an exercise.
3.6.4 Uniform Distribution on the d-Sphere
The d-sphere is the set of all points x ∈ Rd such that ∥x∥ = (xT x)1/2 = 1. Random vectors uniformly distributed on the d-sphere have equally likely directions. A method of generating this distribution uses a property of the
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Density
0.00 0.10 0.20 0.30
Density
0.00 0.10 0.20 0.30
Density
0.00 0.10 0.20 0.30
Density
0.00 0.10 0.20 0.30
94 Statistical Computing with R multivariatenormaldistribution(see[97,160]).IfX1,…,Xd areiidN(0,1),
then U = (U1, . . . , Ud) is uniformly distributed on the unit sphere in Rd, where Uj= Xj , j=1,…,d. (3.8)
Algorithm to generate uniform variates on the d-Sphere 1. For each variate ui, i = 1,…,n repeat
(a) Generate a random sample xi1,…,xid from N(0,1).
(b) Compute the Euclidean norm ∥xi∥ = (x2i1 + · · · + x2id)1/2.
(c) Set uij = xij/∥xi∥, j = 1,…,d. (d) Deliverui =(ui1,…,uid).
To implement these steps efficiently in R for a sample size n,
1. Generate nd univariate normals in n × d matrix M. The ith row of M
corresponds to the ith random vector ui.
2. Compute the denominator of (3.8) for each row, storing the n norms in
vector L.
3. Divide each number M[i,j] by the norm L[i], to get the matrix U,
where U[i,] = ui = (ui1,…,uid).
4. Deliver matrix U containing n random observations in rows.
Example 3.21 (Generating variates on a sphere). This example provides a function to generate random variates uniformly distributed on the unit d- sphere.
runif.sphere <- function(n, d) {
# return a random sample uniformly distributed
# on the unit sphere in R ^d
M <- matrix(rnorm(n*d), nrow = n, ncol = d)
L <- apply(M, MARGIN = 1,
FUN = function(x){sqrt(sum(x*x))})
D <- diag(1 / L)
U <- D %*% M
U }
The function runif.sphere is used to generate a sample of 200 points uni- formly distributed on the circle.
#generate a sample in d=2 and plot
X <- runif.sphere(200, 2)
par(pty = "s")
plot(X, xlab = bquote(x[1]), ylab = bquote(x[2]))
par(pty = "m")
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
(X12 +···+Xd2)1/2
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 95 The circle of points is shown in Figure 3.9. ⋄
R Note 3.8
The apply function in runif.sphere returns a vector containing the n norms ∥x1∥, ∥x2∥, . . . , ∥xn∥ of the sample vectors in matrix M.
R Note 3.9
The command par(pty = "s") sets the square plot type so the circle is round rather than elliptical; par(pty = "m") restores the type to maximal plotting region. See the help topic ?par for other plot param- eters.
−1.0 −0.5 0.0 0.5 1.0 x1
FIGURE 3.9: A random sample of 200 points from the bivariate distribution (X1,X2) that is uniformly distributed on the unit circle in Example 3.21.
Uniformly distributed points on a hyperellipsoid can be generated by ap- plying a suitable linear transformation to a Uniform sample on the d-sphere. Fishman [97, 3.28] gives an algorithm for generating points in and on a sim- plex.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
x2
−1.0 −0.5 0.0 0.5 1.0
96 Statistical Computing with R
Exercises
3.1 Write a function that will generate and return a random sample of size n from the two-parameter exponential distribution Exp(λ,η) for arbi- trary n, λ, and η. (See Examples 2.3 and 2.6.) Generate a large sample from Exp(λ,η) and compare the sample quantiles with the theoretical quantiles.
3.2 The standard Laplace distribution has density f (x) = 1 e−|x| , x ∈ R. 2
Use the inverse transform method to generate a random sample of size 1000 from this distribution. Use one of the methods shown in this chap- ter to compare the generated sample to the target distribution.
3.3 The Pareto(a, b) distribution has cdf ba
F (x) = 1 − x , x ≥ b > 0, a > 0.
Derive the probability inverse transformation F−1(U) and use the in- verse transform method to simulate a random sample from the Pareto(2, 2) distribution. Graph the density histogram of the sample with the Pareto(2, 2) density superimposed for comparison.
3.4 The Rayleigh density [162, Ch. 18] is
f(x)= x e−x2/(2σ2), x≥0,σ>0.
σ2
Develop an algorithm to generate random samples from a Rayleigh(σ) distribution. Generate Rayleigh(σ) samples for several choices of σ > 0 and check that the mode of the generated samples is close to the theoretical mode σ (check the histogram).
3.5 A discrete random variable X has probability mass function x01234
p(x) 0.1 0.2 0.2 0.2 0.3
Use the inverse transform method to generate a random sample of size 1000 from the distribution of X. Construct a relative frequency table and compare the empirical with the theoretical probabilities. Repeat using the R sample function.
3.6 Prove that the accepted variates generated by the acceptance-rejection sampling algorithm are a random sample from the target density fX .
3.7 Write a function to generate a random sample of size n from the Beta(a,b) distribution by the acceptance-rejection method. Generate a random sample of size 1000 from the Beta(3,2) distribution. Graph the histogram of the sample with the theoretical Beta(3,2) density su- perimposed.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
Methods for Generating Random Variables 97
3.8 Write a function to generate random variates from a Lognormal(μ,σ) distribution using a transformation method, and generate a random sample of size 1000. Compare the histogram with the lognormal density curve given by the dlnorm function in R.
3.9 The rescaled Epanechnikov kernel [92] is a symmetric density function fe(x) = 3(1 − x2), |x| ≤ 1. (3.9)
4
Devroye and Györfi [74, p. 236] give the following algorithm for simu- lation from this distribution. Generate iid U1 , U2 , U3 ∼ Uniform(−1, 1). If |U3| ≥ |U2| and |U3| ≥ |U1|, deliver U2; otherwise deliver U3. Write a function to generate random variates from fe, and construct the his- togram density estimate of a large simulated random sample.
3.10 Prove that the algorithm given in Exercise 3.9 generates variates from the density fe (3.9).
3.11 Generate a random sample of size 1000 from a normal location mixture. The components of the mixture have N(0,1) and N(3,1) distributions with mixing probabilities p1 and p2 = 1 − p1. Graph the histogram of the sample with density superimposed, for p1 = 0.75. Repeat with different values for p1 and observe whether the empirical distribution of the mixture appears to be bimodal. Make a conjecture about the values of p1 that produce bimodal mixtures.
3.12 Simulate a continuous Exponential-Gamma mixture. Suppose that the rate parameter Λ has Gamma(r, β) distribution and Y has Exp(Λ) dis- tribution. That is, (Y |Λ = λ) ∼ fY (y|λ) = λe−λy . Generate 1000 ran- dom observations from this mixture with r = 4 and β = 2.
3.13 It can be shown that the mixture in Exercise 3.12 has a Pareto distri- bution with cdf
β r F(y)=1− β+y , y≥0.
(This is an alternative parameterization of the Pareto cdf given in Ex- ercise 3.3.) Generate 1000 random observations from the mixture with r = 4 and β = 2. Compare the empirical and theoretical (Pareto) distributions by graphing the density histogram of the sample and su- perimposing the Pareto density curve.
3.14 Generate 200 random observations from the 3-dimensional multivariate normal distribution having mean vector μ = (0,1,2) and covariance
matrix
1.0−0.5 0.5 Σ= − 0.5 1.0 − 0.5
0.5−0.5 1.0
using the Choleski factorization method. Use the R pairs plot to graph
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.
98 Statistical Computing with R
an array of scatter plots for each pair of variables. For each pair of vari- ables, (visually) check that the location and correlation approximately agree with the theoretical parameters of the corresponding bivariate normal distribution.
3.15 Write a function that will standardize a multivariate normal sample for arbitrary n and d. That is, transform the sample so that the sample mean vector is zero and sample covariance is the identity matrix. To check your results, generate multivariate normal samples and print the sample mean vector and covariance matrix before and after standard- ization.
3.16 Efron and Tibshirani discuss the scor (bootstrap) test score data on 88 students who took examinations in five subjects [91, Table 7.1], [194, Table 1.2.1]. Each row of the data frame is a set of scores (xi1, . . . , xi5) for the ith student. Standardize the scores by type of exam. That is, standardize the bivariate samples (X1,X2) (closed book) and the trivariate samples (X3,X4,X5) (open book). Compute the covariance matrix of the transformed sample of test scores.
3.17 Compare the performance of the Beta generator of Exercise 3.7, Exam- ple 3.8 and the R generator rbeta. Fix the parameters a = 2,b = 2 and time each generator on 1000 iterations with sample size 5000. (See Example 3.19.) Are the results different for different choices of a and b?
3.18 Write a function to generate a random sample from a Wd(Σ,n) (Wishart) distribution for n > d + 1 ≥ 1, based on Bartlett’s decompo- sition.
Rizzo, Maria L.. Statistical Computing with R, Second Edition, CRC Press LLC, 2019. ProQuest Ebook Central, http://ebookcentral.proquest.com/lib/ualberta/detail.action?docID=5731927.
Created from ualberta on 2021-03-06 10:32:01.
Copyright © 2019. CRC Press LLC. All rights reserved.