代写 R STAT 513/413: Lecture 6 Random numbers: distributions

STAT 513/413: Lecture 6 Random numbers: distributions
(Transforming uniformly distributed numbers to something else)

Almost all of them are somewhere out there
Naming convention in R
ddistrib(x,…) – density function
pdistrib(x,…) – cumulative distribution function
qdistrib(p,…) – quantile function
rdistrib(n,…) – random numbers
If no other parameter is specified, it assumes the “standard” version of the distribution
For instance, rnorm(100) generates 100 random numbers from normal distribution with mean 0 and standard deviation 1
And runif(50) generates 50 random numbers from the “standard random generator”: random numbers with uniform distribution on (0, 1)
And almost all of them are somewhere out there, but beware: not all packages are made equal!
1

Also useful
For discrete distributions with finitely many outcomes, sample() is also useful – but beware whether it is sampling with replacement or without!
> sample(1:20,10)
[1] 918131519 420 3 616
> sample(1:5,10)
Error in sample.int(length(x), size, replace, prob) :
cannot take a sample larger than the population when ’replace = FALSE’
> sample(1:5,10,replace=TRUE)
[1] 2 3 3 5 4 3 1 4 3 3
> sample(1:20,10,replace=TRUE)
[1] 7 115 2 71710191519
2

Inversion
Method: apply quantile function on the standard uniform generator Example: exponential distribution, f(x) = λe−λx
Works well, but sometimes may be hard to come by
(x−μ)2 2σ2
> y=1+2*qnorm(runif(10000))
And now, check it on a qqplot(); a version for general distributions:
> qqplot(qnorm(ppoints(length(y))),y)
> abline(0,1) # fitting line by trial and error
> abline(0,2)
> abline(1,2)
For the special case of normal distribution there is something a bit more tailored
> qqnorm(y)
> qqline(y)
Example: normal distribution, f(x) =
1 2πσ
e
3

A useful evaluation tool: qqplot() Normal Q-Q Plot
-5 0 5
y
Sample Quantiles
-5 0 5
-4 -2 0 2 4 -4 -2 0 2 4
qnorm(ppoints(length(y))) Theoretical Quantiles
Everything is fine if they are approximately on a line; they still may be rescaled, that is, y is a + bx – differ by mean and variance, say. (All trial lines are seen at the left-hand plot; finally, only the last one, going through the data, should remain.)
If x’s and y’s correspond to exactly the same distribution, the line must be y = x (that is, abline(0,1))
4

= P[U 􏰀 F(x)] = F(x)
Right? Do you understand why?
as U is uniform on (0,1)
Why does inversion work?
P[F−1(U) 􏰀 x] applying nondecreasing F on both sides leaves = P[F(F−1(U)) 􏰀 F(x)] the inequality unchanged; F(F−1(U)) = U
The derived equality above meamns that the cumulative distribution function, P[F−1(U) 􏰀 x], of the variable X = F−1(U) is F(x)
Which is exactly what we wanted
5

The default for rnorm(): inversion To learn more about random number generators in R:
> ?RNG
> RNGkind()
[1] “Mersenne-Twister” “Inversion”
These are defaults. The second one specifies the way of generating random numbers with normal distribution – we will discuss that later. … normal.kind can be “Kinderman-Ramage”, “Buggy Kinderman-Ramage” (not for set.seed), “Ahrens-Dieter”, “Box-Muller”, “Inversion” (the default), or “user-supplied”. (For inversion, see the reference in qnorm.) The Kinderman-Ramage generator used in versions prior to 1.7.0 (now called “Buggy”) had several approximation errors and should only be used for reproduction of old results. The “Box-Muller” generator is stateful as pairs of normals are generated and returned sequentially. The state is reset whenever it is selected (even if it is the current normal generator) and when kind is changed.
6

Another method: acceptance-rejection
Generate random numbers with a density f(x) = 3x(2−x), x ∈ [0, 2] 4
x is between 0 and a=2, y between 0 and b=0.75
a
b
X=U accepted U rejected
0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0
U
1. generate U uniform on [0,] and V uniform on [0,b]
2. if f(U) 􏰀 V, take X = U (accept) 3. if not, repeat (reject)
7
V
0.0 0.2 0.4 0.6 0.8

It works!
Inversion
0.0 0.5 1.0 1.5 2.0
0.0 0.5 1.0 1.5 2.0
Acceptance/Rejection
8

Theoretical justification
9

Transformation methods
When the desired distribution is a distribution of transformed variable with other distribution, and the transformation is not difficult – then why not to go this way
Example: Example:
Chi-square – sum of k squares of standard normal Box-Muller transform. If U,V are independent random
variables with uniform distribution on (0,1), then √−2logUcos(2πV) and √−2logUsin(2πV)
are independent random variables with standard normal distribution
As a special case, one can consider also convolutions (that is, sums), and mixtures
10

Example: Poisson-Gamma mixture
> lambda = rgamma(6,4,3)
> lambda
[1] 1.4796001 3.4630581 0.4897232 0.5519801 0.4214378 2.1400598 > set.seed(1)
> rpois(6,lambda)
[1] 1 3 0 2 0 4
> set.seed(1)
> rpois(6,lambda[1])
[1] 1 1 2 3 0 3
> set.seed(1)
> x=NULL
> for (k in (1:length(lambda))) x[k] = rpois(1,lambda[k])
>x
[1] 1 3 0 2 0 4
Beware: it usually works, but better check… Finally, Textbook: 3.1, 3.2, 3.3, 3.4, 3.5
11