Statistics Probability for Data Science and
Question A v1
Laboratory I
Test01 Solutions 2017 November 15
Pier and Fra play a game using independent tosses of an unfair coin.
A head comes up on any toss with probability p 0, 1. The coin is tossed repeatedly until either the 2nd time head comes up, in which case Pier wins and the game stops; or the 2nd time tail comes up, in which case Fra wins and the game stops. Note that a full game involves 2 or 3 tosses at most.
1. Consider a probabilistic model for the game in which the outcomes are the sequences of heads and tails in a full game. Provide a list of the outcomes and their probabilities of occurring.
Because of the independence of the coin tosses, the outcomes and their probabilities are as
follows:
2. What is the probability that Pier wins the game?
The event of Pierpa winning is HH, HTH, THH. Adding the probabilities of the outcomes in
NHH p2
NHTH p21 p NCNHTT p1 p2 NTHH p21 p NTHT p1 p2 NATT 1p2
this event gives
3. What is the conditional probability that Pier wins the game given that head comes up on the 1st
toss?
PPierpa wins 1sttoss H
p2 p21pp21pp232p.
PPierpa wins 1sttoss H PHH, HTH P1sttoss H P1sttoss H
p2 p21 p p
p2p. 4. What is the conditional probability that head comes up on the 1st toss given that Pier wins the game?
PPierpa wins PPierpa wins 1
p23 2p
PPierpa wins 1sttoss H PHH, HTH P1sttoss H Pierpa wins
p2 p21 p 2 p
32p
.
Question B v1
The pdf of the continuous random variable X is defined as follows
f xw
c
1×12 X
for x 0, 1, 0 otherwise.
1. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function characteristic of the random variables probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of the random variables probability distribution.
Definition: Given a probability space ,P, a random vector, or pvariate random variable, is any measurable mapping
X Rp
that assigns a pdimensional vector XE to each individual outcome E . 2. Find the value of the constant c that makes fX x a true density function.
By definition, we know that a pdf must integrate to 1, which in this case means that
1c 1 1×121 1 1
1E01x12 dxcE01x12dxc 12 c2 1×0 2c2c1c2. 0
Hence
x 0,1 0 otherwise
is as welldefined density function. 3. Find the median of X.
Recall that the median is the value t such that tt
21×12 fxw 1
X
1Ef xdxE 1 dx.
0 X 021×12
So, as before: 1 E t
2
1 dx 1 xt 1
2 021×12 0
1 t 1 2
t 3. 4
1 t
4. Find the probability that X 1 and then the probability that X 1 .
22
Once we have the density, it is straightforward to compute probabilities by integration: PX 1 0 because X is a continuous random variable,
2
PX1 1PX11E
12
2 2 0 21×12 0 2
1 12 1 dx2 1x 11 0.71.
5. Let Y e1X, find the density of Y.
Remember that if X is a continuous random variable with probability density function f ,
and if we consider an invertible transformation Y gX, then the pdf of Y is given by X f yf g1yadg1ya.
Y X ady a 2
HerewehaveY gXe1X henceXg1YlnY1andalso a d g1ya a d lny 1a 1.
ady a ady a y
In addition, when X varies between 0 and 1 its support Y goes from e to e2. So, finally,
f y 1 for ye,e2. Y 2y 2lny
Check its a density not requested
integrate functionx 12sqrt1x, 0, 1
integrate functionx 12xsqrt2 logx, exp1, exp2 transformed
Take a look
parmfrow c1,2
curve 12sqrt1 x, 0, 1,
col purple, lwd 4,
main Density before, xlab x, ylab expressionfXx curve 12xsqrt2 logx, exp1, exp2,
col lavender, lwd 4,
main Density after, xlab y, ylab expressionfYy
1 with absolute error 2.2e13
1 with absolute error 1.8e06
Density before Density after
original
0.0 0.2 0.4 0.6 0.8 1.0 3 4 5 6 7
xy
3
fXx 12345
0.2 0.4
0.6 0.8
fYy
Question B v2
X and Y have the following discrete joint pmf:
p x,ywcxy forx1,0,1, andy1,0,1,
X,Y 0 otherwise.
To start with, for the sake of simplicity lets unfold the joint pmf compactly defined above:
p
X,Y
x, y a
XY 1 0 1 1 0 c 2c
0c0c 1 2c c 0
1. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function characteristic of the random variables probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of the random variables probability distribution.
Definition: Given a probability space ,P, a random vector, or pvariate random variable, is any measurable mapping
X Rp
that assigns a pdimensional vector XE to each individual outcome E .
2. Determine the value of the constant c that makes the previous function a true joint pmf.
Summing up all the joint masses we must get 1, so
p x,y
x,y10c2cc018c1c 1. X,Y 8
3. Determine the marginal distribution of X and its expectation.
Decoring the previous table with the marginals summing up by rows and cols we get
XY 1 0 1
1 0
113
x, y a
313 848
EX 1 3 0 1 1 3 0. 848
4. Determine the conditional distribution of X Y 0, its expectation and variance. By simply normalizing the second column of the table we get
848 112
p
0 0
X,Y
888
1 1 1 0 3 488
323 888
1
So the required marginal is
x 1 0 1
having expected value equal to:
p x X
4
p 1,0 1 1 PX1Y0 X,Y 8 ,
pY 0 2 2 p 0, 0 0 8
PX0Y0 X,Y 0,
pY 0
p 1,0 1 1
2 8
PX1Y0 X,Y 8 .
pY 0
VarXY 0EX2Y 0121 020121 1.
2
Definitively not, X and Y are dependent. In fact the joint does not factorize into its marginals. For
2 8
Hence
EXY 011 0011 0
22 22
5. Are X and Y independent? Explain your answer.
example:
p 1,10jp 1p 132
X,Y XY
8
5
Question C v1
Detect only linear relationship. Unable to detect the existence of nonlinear dependency between pairs of random variables. In general, we cannot expect any causal eect between correlated random variables.
2. Are S1 and S2 uncorrelated, positively correlated, or negatively correlated? Give a oneline intuitive justification.
S1 and S2 are negatively correlated. Intuitively, a large number of tosses that result in a 1 suggests a smaller number of tosses that result in a 2.
3. Compute the covariance CovS1,S2 of S1 and S2. Hint: It may be useful to express Si in terms of Bernoulli trials, work on single trials and then combine
Let Onet and Twot be Bernoulli random variables equal to 1 success if and only if the tth toss resulted in a 1 or in a 2 respectively. Since Onet j 0 implies Twot 0, we have
EOnet Twot 0
IND Ber1k 1 1
Consider n independent tosses of a ksided fair die. Denote by S the number of tosses out of n that result
in face i 1,…,k.
1. Briefly but meaningfully comment on the possible use and misuse of the correlation coecient flX, Y .
i
and Thus
EOnet Twos EOnet ETwos k k
ES1S2 EOne1OnenTwo1Twon nEOne1Two1Twon nn111,
for t j s.
kk nn1 n2 n
game functionn, k sample1:k, size n, replace T
The Bernoulli variables of interest For each game I want the pair s1, s2
svars functiongame csumgame1, sumgame2
Play M times the game n 100, k 6
M 104
n 100; k 6
set.seed1234 for reproducibility
out1 replicateM, svarsgamen n, k k 2 x M
Compare
round c exact exactcovn n, k k, approx covout11,, out12, , 3
and
The covariance of S1 and S2 is negative as expected.
Exact result
CovS1,S2ES1S2ES1ES2 k2 k2 k2.
4. Write a simulation in R to check or get the answer to part 3. for n 100 and k 6, 12.
exactcov functionn, k nk2 The game
6
Play M times the game n 100, k 12
M 104
n 100; k 12
set.seed1234 for reproducibility
out2 replicateM, svarsgamen n, k k 2 x M
Compare
round c exact exactcovn n, k k, approx covout21,, out22, , 3
exact approx
2.778 2.687
exact approx
0.694 0.651
7