MAST20005/MAST90058: Week 2 Problems
1. Let X1,X2,…,X9 be a random sample. We are told that E(X3) = 7 and var(X4) = 4.
(a) What is sd(X2)?
(b) What is var(X7 + X8)?
Copyright By PowCoder代写 加微信 powcoder
(c) What is cov(X3, X4)?
(d) What is an approximate distribution of the sample mean, X ̄?
2. Let Y = X1 +···+X15 be the sum of iid rvs, each with pdf f(x) = (3/2)x2 where −1 < x < 1.
(a) What is E(X1)?
(b) Calculate E(Y ) and var(Y ).
(c) We would like to calculate Pr(−0.3 < Y < 1.5). Use the Central Limit Theorem to approximate this probability. Hint: Φ(−0.1) = 0.4602 and Φ(0.5) = 0.6915, where Φ(·) is the standard normal cdf.
3. In each of the following scenarios, is the sample that is described a random sample? What assumptions are being made? Are they realistic? What is the ‘population’ in each case?
(a) Tingjin runs a plant experiment. He creates ten pots, plants an identical seed in each one, and leaves them in the same spot in the sun. After 6 weeks he measures the height of each plant, giving measurements x1, x2, . . . , x10.
(b) Damjan measures the height of all of his immediate family members, giving mea- surements y1, y2, y3, y4.
(c) Every day, Robert counts the number of people sitting down on South Lawn. He does this for 100 days in a row, giving counts z1, z2, . . . , z100.
4. Consider the following realisations from X: 43.1 48.9 42.6 43.7 41.0
Note that: 5i=1 xi = 219.3, 5i=1 x2i = 9654.27.
(a) Reminder from the lectures: the sample mean, x ̄ = n−1 ni=1 xi, is a measure of
location; the sample variance, s2 = (n − 1)−1 n (xi − x ̄)2, is a measure of spread;
the sample standard deviation, s =
i. Show that ni=1(xi − x ̄) = 0 and s2 = (n − 1)−1(ni=1 x2i − nx ̄2).
s2, is another measure of spread.
ii. Compute the mean and standard deviation for the dataset above.
(b) How would your answers to part (a) change if we multiplied each data point by 2?
(c) Reminder: the default (‘Type 7’) sample quantiles in R are defined as πˆp = x(k), where k = 1 + (n − 1)p. The set of statistics {minimum, qˆ , qˆ , qˆ , maximum} is
often called the ‘five-number summary’, where the qˆ are the sample quartiles. i
i. Calculate the five-number summary of the above dataset.
ii. The interquartile range (IQR) is qˆ −qˆ . Calculate the IQR of the above dataset.
iii. Draw a box plot for this dataset using these quantiles.
(d) An alternate definition of sample quantiles is given by: πp = x(i) + r · (x(i+1) − x(i)), where (n + 1)p = i + r such that i is an integer and 0 r < 1.
i. Re-compute the five-number summary using this definition.
ii. Show that πp are the ‘Type 6’ quantiles (as defined in the lectures).
(e) Outliers are observations that don’t seem to belong with the rest of the data. They can occur through data entry errors or problems with an experiment. Extreme observations are not necessarily errors but there is a crude convention for when to identify and label them as ‘outliers’: an observation x is an outlier if x < πˆ0.25 − k × IQR or x > πˆ0.75 + k × IQR, where typically k = 1.5. These are often depicted graphically on a box plot, by only extending the whiskers up to k × IQR from each quartile and plotting each outlier as an individual point. (This is the default way that R will draw box plots.) According to this convention, are there any outliers in the sample above?
5. Create a sample of 4 numbers from {1, 2, 3, 4}, with repeats allowed, that maximises the sample variance.
6. The following are Prostaglandin-endoperoxide synthase 2 (COX2) measurements on tissue samples from 10 mice (COX2 is a protein involved in inflamatory processes related to cancer):
10.39 10.43 9.99 11.17 8.91 11.20 11.38 7.74 10.61 11.11
(a) Calculate the five-number summary (you may use either Type 6 or Type 7 quantiles).
(b) Are there any outliers in this sample (using R’s default convention)?
(c) Draw a box plot for this dataset.
7. The following are observations on maximum rainfall (cm/day) in a year measured by a weather station in Tasmania ( Airport) in 10 consecutive years:
9.9 4.7 20.5 1.8 4.7 9.8 20.5 20.2 6.5 3.0
(a) Draw a histogram of these data.
(b) Suppose that the random variable Y follows the extreme value (EV) distribution which depends on a location parameter, θ, and a scale parameter, ξ. This distribution is obtained as maximum of a set of values (maximum wind speed, precipitation, peak flow, etc.). This distribution has the property that we can write Y as
where Z has the standard EV distribution with cdf F (z) = e−e−z . Thus, the inverse
cdf function is F −1(p) = − ln(− ln p), so we expect that, −1 k
x(k) ≈θ+ξF n+1 .
Use a QQ plot to assess whether the EV model looks correct for these data.
How could you estimate the parameters θ and ξ based on your plot?
(Drawing the plot will be easier with a computer in the lab class, but you
can discuss this problem in the tutorial beforehand.)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com