代写 R algorithm graph STAT 513/413: Lecture 5 Random numbers: basics

STAT 513/413: Lecture 5 Random numbers: basics
(What, after all, is “completely random”?)

Computers: random or deterministic?
R version 3.5.3 (2019-03-11) — “Great Truth”
…
> runif(6)
[1] 0.7811872 0.2256249 0.1029852 0.9418297 0.1088203 0.7156732
Now, quit R and start it again:
R version 3.5.3 (2019-03-11) — “Great Truth”
…
> runif(6)
[1] 0.7370521 0.4887825 0.2355647 0.1671979 0.9691297 0.2913628
1

How about this though
R version 3.5.3 (2019-03-11) — “Great Truth”
…
> set.seed(257)
> runif(6)
[1] 0.6854492 0.8406895 0.8107704 0.9535350 0.3566798 0.6871050
Again, quit and restart R:
R version 3.5.3 (2019-03-11) — “Great Truth”
…
> set.seed(257)
> runif(6)
[1] 0.6854492 0.8406895 0.8107704 0.9535350 0.3566798 0.6871050
2

Coin tossing
Some features are easier seen in the discrete setting: random 0’s and 1’s that correspond to “coin tossing”
> floor(2*runif(30))
[1] 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 0 1 1 1 0 1 0 1 1 0 1
In fact, those are supposed to look like outcomes of the Bernoulli distribution with p = 0.5 – which is the special case of the binomial distribution, with n = 1 and p = 0.5
> rbinom(30,1,0.5)
[1] 1 1 1 0 1 1 0 0 1 1 1 0 1 1 0 1 1 0 1 1 0 1 0 0 0 0 0 1 1 1
And yet another way to get them is
> sample(c(0,1),30,replace=TRUE)
[1] 0 1 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 1 1 1 0 1 1 1 1 1
3

Are all those the same?
> set.seed(1)
> table(floor(2*runif(1000000)))
01 499630 500370 > set.seed(1)
> table(rbinom(1000000,1,0.5)) 01
499630 500370
> set.seed(1)
> table(sample(c(0,1),1000000,replace=TRUE))
01 499630 500370
Seems like they are… Really?
> sum(abs(floor(2*runif(1000000))-rbinom(1000000,1,0.5)))
[1] 499707
Or not?
4

Yes, they really are
> set.seed(7); a=floor(2*runif(1000000))
> set.seed(7); b=rbinom(1000000,1,0.5)
> set.seed(7); c=sample(c(0,1),1000000,replace=TRUE)
> sum(abs(a-b))
[1] 0
> sum(abs(a-c))
[1] 0
5

Random numbers in computers: why?
• computer experiments
• graphics
• to mitigate “Buridan ass” situations
Jean Buridan – 14th century French philosopher; the paradox of a donkey dying of hunger between two equally distant piles of hay was consider before by Aristotle (4th century BC, a man equally hungry and thirsty) and Al-Ghazali (11th century, two equally good dates)
6

Random number generation in computers: how?
• ”Hardware generators”
would be truly random
but they are too slow
and sometimes we want reproducibility too (debugging!)
• So they are generated via deterministic algorithms
“in the state of sin” (von Neumman)
recursively: each random number depends on the previous ones they are thus not random, but “appear random” (Knuth)
If we initialize the generator in a certain way, we obtain each time the same sequence: this is good, for instance, when we still work on the algorithm (“debugging”)
If we then want to run the experiment in an impartial fashion, we can initialize the generator in a “random” fashion: in R, there is no randomize command or similar, but we can do a workaround: either quit and reenter R. If we do not want to exit R, there is still a way:
7

Random seed
The state of the random number generator is kept in an invisible vector called .Random.seed
> ls()
[1] “a” “b” “x”
> ls(all=TRUE)
[1] “.Random.seed” “a” “b” “x”
> .Random.seed
6 1384307324 -971116865 2006898581
[622] 1925606336 -709985239 -2145007461 -1987340948 -427022882
[1] 403 …
It is recommended that users do not manipulate .Random.seed directly, but rather use the command set.seed(). This command has one argument, which is either an integer – to set the seed – or NULL, to “make the seed random”, to “randomize”.
Randomize: once we consider the algorithm correct, and want then to run the computer experiment in an “impartial fashion”, we would like to initialize the generator in a “random” fashion. A workaround will be to quit and reenter R, but we can do it without quitting
8

Randomization
> set.seed(1)
> runif(6)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897
> runif(6)
[1] 0.94467527 0.66079779 0.62911404 0.06178627 0.20597457 0.17655675
> set.seed(1)
> runif(6)
[1] 0.2655087 0.3721239 0.5728534 0.9082078 0.2016819 0.8983897
> set.seed(NULL)
> runif(6)
[1] 0.43856362 0.63981600 0.34235782 0.04603239 0.99229755 0.44843181
Instead of set.seed(NULL), rm(.Random.seed) works to the same effect: once .Random.seed is deleted, it is immediately replaced by a new one “created from the current time and process ID”.
9

Random number generators in R
To learn more about random number generators in R:
> ?RNG
> RNGkind()
[1] “Mersenne-Twister” “Inversion”
These are defaults. The second one specifies the way of generating random numbers with normal distribution – we will discuss that later. The first one gives you the type of the basic – uniform distribution on (0,1) – random generator: the options are ”Wichmann-Hill”, ”Marsaglia-Multicarry”: ”Super- Duper”, ”Mersenne-Twister”, ”Knuth-TAOCP-2002”, ”Knuth- TAOCP”, ”L’Ecuyer-CMRG”, and ”user-supplied” – but that is something you really need not to know…
10

So, what “appear completely random” means here?
Well, that is a question… and we are not going to solve it here.
The best way to think about it is this way: you think of random numbers as outcomes of independent and identically distributed random variables with certain distribution. The computer ones are good, if they practically exhibit the same properties as the true ones would; and those “completely random” will have some kind of uniform distribution.
It is best to think about this in terms of coin tosses, sequences of 0’s and 1’s. What would be expect of those?
For instance, that both of them come about equally likely… Or that pairs of them come equally likely…
Or… or… or… or…
The second volume, first part of The Art of Computer Programming by D. E. Knuth has a nice discussion for those interested
11

Do they come equally likely?
> table(floor(2*runif(100)))
01
57 43
> table(floor(2*runif(1000000)))
01 499767 500233
> table(floor(2*runif(1000000)))
01 500058 499942
> table(floor(2*runif(1000000)))
01 499813 500187
12

How about the pairs?
> x=floor(2*runif(1000000))
> table(x[seq(1,length(x),2)]*10+x[seq(2,length(x),2)])
0 1 10 11
124911 124923 125089 125077
> x=floor(2*runif(1000000))
> table(x[seq(1,length(x),2)]*10+x[seq(2,length(x),2)])
0 1 10 11
125242 124509 124878 125371
> x=floor(2*runif(1000000))
> table(x[seq(1,length(x),2)]*10+x[seq(2,length(x),2)])
0 1 10 11
124931 125273 124405 125391
13

How can I say they really are equally likely?
In chisq.test(x) : Chi-squared approximation may be incorrect
> x=floor(2*runif(1000000))
> z=table(x[seq(1,length(x),2)]*10+x[seq(2,length(x),2)])
> chisq.test(z)
Chi-squared test for given probabilities
data: z
X-squared = 1.6493, df = 3, p-value = 0.6483
> x=floor(2*runif(1000000))
> z=table(x[seq(1,length(x),2)]*10+x[seq(2,length(x),2)])
> chisq.test(z)
Chi-squared test for given probabilities
data: z
X-squared = 7.2551, df = 3, p-value = 0.0642
%%% but I didn’t set or keep the seed!!!
14

An important remark concerning your assignments
… and also your research practice…
ALWAYS KEEP – that is, CONTROL – THE SEEDS!!
And also… for the final run
everybody should set a seed differently from the others
15

Related Posts