Search Moodle
Search
Information
The Datasaurus Dozen dataset contains 12 sets of bivariate data (x-y data). Each sub-dataset has five summary statistics that are (almost) the same in each case. These statistics are the mean of x, mean of y, standard deviation of x, standard deviation of y, and Pearson correlation between x and y. However, scatter plots reveal that each sub-dataset looks very different. We will use this dataset to illustrate that it is important to plot the data, and we should not rely only on summary st
Use R to read in the file datasaurus_dozen.txt correctly and save it as a data frame called datasaurus_dozen.
Question 6 Not yet
answered
Marked out of 2.00
Question 7 Not yet
answered
Marked out of 3.00
Create a vector called data.types that contains the names of the 12 unique sub-datasets in the datasaurus_dozen data frame.
Search Moodle
Search
Question 8 Not yet
answered
Marked out of 5.00
Use a loop to compute the five statistics previously mentioned for each type of sub-dataset contained in the data.types vector you created. Summarise your results in a data frame called five.stats that contains the following 6 columns:
data.type: sub-dataset type.
mean.x: the mean of the x component for each sub-dataset.
mean.y: the mean of the y component for each sub-dataset.
sd.x: the standard deviation of the x component for each sub-dataset.
sd.y: the standard deviation of the y component for each sub-dataset
corr.xy: the Pearson correlation between components x and y for each sub-dataset.
If your computations are correct, you should see that the 5 statistics are very similar across all 12 sub- datasets.
Note: You can use the built-in functions sd(x) and cor(x,y) to compute the standard deviation of a vector x and the Pearson correlation between vectors x and y, respectively.
6×21
Search Moodle
Search
Question 9 Not yet
answered
Marked out of 4.00
Use a loop to produce 12 plots containing scatterplots of each sub-dataset. Use different colours for each sub-dataset. Display all 12 plots in the same plotting window using, for example,
par(mfrow = c(3,4), pty = ‘s’, mar = c(2,1,1,1)). Your plot should look similar to the one shown below.
Question 10 Not yet
answered
Marked out of 11.00
Search Moodle
In this question, we will generate a random variable following a Gamma distribution with mean and variance . To this end, implement the following:
(a) [2 marks] Create a function called f that takes x, a and b as arguments and computes
where is the Gamma function. The function is the Gamma density evaluated at x with shape parameter a
and rate parameter b.
Note: in R, you can compute using gamma(a).
(b)[2marks]Createafunctioncalledgthattakesxandlambda( )asargumentsandcomputes The function is the Exponential density evaluated at x with rate parameter lambda ( ).
Note: in R, you can compute using exp(z).
(c) [5 marks] If has a Gamma distribution with mean and standard deviation , its corresponding shape ( )
and scale ( ) parameters can be computed as follows:
Createafunctioncalledgen.gammathattakesn,mu( )andsigma( )asarguments.Yourfunctionshoulddothe following:
Search
Step 1: compute and Step 2: Compute
Step 3: Compute Step 4:
using the formulae in (1).
– Step 4.1: Generate a value rexp(1, rate = lambda).
– Step 4.2: Generate a value runif(1).
– Step 4.3: Compute
– Step 4.4: If , then set
To this end, use the R function
and
.
with an Exponential distribution with rate
with a Uniform distribution in the interval [0,1]. To this end, use the R function
. Otherwise, return to Step 4.1.
Repeat Step 4 until you generate n values of . The function gen.gamma should return the n values of .
(d) [2 marks] Using the gen.gamma function, generate 1,000 Gamma distributed numbers with mean 47 and standard deviation 26, i.e., mean and standard deviation very close to data y in Datasaurus Dozen. Save your results in a vector called x. Create a density histogram of the generated values using hist(x, freq = F).
xx
u .λ y
)1( aσμX
b
μX
2σ
σμ
.2σ =b ,2σ =a μ 2μ
ze λg
.xλ−eλ = )x(g λ
, e x)a(Γ =)x(f xb− 1−a ab
f
)⋅(Γ
y=x α