STA255: Assignment
Shahriar Shams Winter 2021
Submission deadline: April 07, 2021, 10.00am (Local Toronto time) (Late submis- sions will not be accepted)
Submitting this assingment is mandatory for every student in order to pass the course. There is not any minimum score that you have to achieve, but submission is mandatory.
Instructions on creating documents for submission
• We will use crowdmark for submission and grading which only accepts PDF, JPG and PNG files.
• I recommend using R-markdown(if you are familiar with it). If you are not familiar with R-markdown, you can write your answers using Microsoft Word and in the end save them as pdfs. Handwritten answers will not be accepted.
• Crowdmark link to upload your documents will be emailed to you later.
• The numerical calculations involved in this assignment are simple and you are already familiar with them (hopefully). Calculations are mostly repetitive in nature! I suggest using R.
• If you are a Python user, feel free to use Python in place of R to answer any of the questions. You can also use Microsoft Excel (for Q1-Q3) if you want.
• For each answer, make sure you have provided your codes and outputs. If you use Excel, take screen shots of the worksheet showing formula used and outputs and submit them as part of your answer(as appendix for example).
• Make sure your answers are easy to read and nicely presented.
1
Academic Integrity
Each student will work alone. You are not allowed to ask anyone for help on any platform. Don’t ask for solutions to anyone. Do not share your codes or answers. If you need clarification on any of these questions, you are allowed to ask questions on Ed or ask questions during office hours (please do not email us). And please do not post your solution on Ed and ask “does it look ok?”.
When submitting your assignment on crowdmark, there will be a space for an academic integrity statement. Write this following statement on paper/ipad/surface and upload a screenshot of it.
Statement:
I am attesting to the fact that I, [name] (write your full name here), [stnum] (write your student number here), have abided fully to the Code of Behaviour on Academic Matters. I have not committed academic misconduct, and am aware of the penalties that may be imposed if I have committed an academic offence.
2
Question 1 (5 points)
Suppose you have a population of size 7 [i.e. N=7]. You measure some quantity (X) and the corresponding numbers are:
11, 12, 13, 14, 15, 16, 17
a) Calculate the population mean (μ) and print/show the value.
2 2 Nj=1(Xj −μ)2
b) Calculate the population variance (σ ) using the formula σ = N and
print/show the value.
c) Imagine you are taking samples (of size n = 4) from this population with replacement. Imagine every possible way that you could have a sample of size 4 with replacement from this population. (hint: there will be 7 ∗ 7 ∗ 7 ∗ 7 = 2401 possible combinations)
R code to get all possible combinations
d) For each of these samples of size 4, calculate the sample mean and record it (either as a new object in R or as a new column if you are using excel). Lets call this new column X_bar. So you should have 2401 values in this column.
e) You should have noticed that the values in the X_bar column are repetitive. Construct a frequency table based on the column X_bar. [i.e. write down which values showed up how many times]. Now using the frequencies (also known as counts) calculate proportion of each of those repeated values.
f) Plot these proportions against the values and connect the points to form a curve. Does the shape of this plot look like any known distribution? Name the distribution.
g) Using the table of proportions[from part(e)] or otherwise, calculate the mean of these 2401 numbers (values under X_bar) and compare it to your answer of 1(a).
h) Using the table of proportions[from part(e)] or otherwise, calculate the variance of these 2401 numbers. Use the population variance formula (i.e. divide by 2401, not 2400). What is the relationship of this answer to your answer of 1(b)?
i) Which theorem did you demonstrate empirically in part f, g and h?
X=c(11, 12, 13, 14, 15, 16, 17)
d=expand.grid(X,X,X,X) #You can continue your calculations using this “d”
# For excel users, this following line will create a csv file for you.
write.csv(d,file=”Question1.csv”,row.names = F)
3
Question 2 (4 points)
This question continues from question 1(c). For each of these sample of size 4, calculate the sample variance using the following two formulas
S2= 1 (Xi−X ̄)2 n−1
and
Assume the population variance, σ2 = 4. (you should get 2401 different values of S2 and and
2401 different values of σˆ2)
a. By calculating (numerically, using the 2401 different values) Bias[S2] and Bias[σˆ2]
check the unbiasedness of these two estimators.
b. By calculating all three components separately, show that the following identity is true
MSE[σˆ2] = var[σˆ2] + (Bias[σˆ2])2
σˆ 2 = 1 ( X i − X ̄ ) 2 n
4
Question 3 (2 points)
Even though we need sample size n to be large to apply central limit theorem, but let’s apply it anyway. Suppose you know that the population variance, σ2 = 4.
a. For each of these 2401 cases, calculate a 90% confidence interval for μ and finally calculate the proportion of the intervals that includes μ = 14.
b. What proportion of the intervals do you expect to include μ = 14? Why do you see a difference in your calculation [in part (a)] and your expectation? And under what condition you expect these two numbers to be similar?
5
Question 4 (3 points)
In a lecture, we demonstrated an R code that replicates the sampling distribution of X ̄. Here is the code that was used in the lecture.
sample_4m_normal=function(){ s=rnorm(30,mean=10,sd=2) return(mean(s))
} X_bar=replicate(10000,sample_4m_normal()) plot(density(X_bar))
Simply change the distribution and number of samples in this code to do this question. Produce the density of X ̄ = X1+X2+…+Xn
n
f) CLT says for large n, X ̄ converges(in distribution) to a Normal distribution. By comparing your graphs from parts (a) to (e), can you comment on how large n has to be in order for X ̄ to converge to a Normal distribution? What role the skewness of the original distributions (Unif[0,2], χ2df=2 and χ2df=50 ) play here?
Presentation of your answers for the entire assignment worth 1 point.
a) whenn=2,X∼Unif[0,2]
b) whenn=7,X∼Unif[0,2]
c) whenn=7,X∼χ2df=2
d) whenn=50,X∼χ2df=2
e) whenn=7,X∼χ2df=50
6