CS计算机代考程序代写 Stat 466/866, Fall 2021 Due: Friday, Nov 5th

Stat 466/866, Fall 2021 Due: Friday, Nov 5th

Assignment 2

1 of 4

Note: For this assignment, you again only need to submit your SAS program file. Use internal comments

to identify the question numbers and respond to questions. Please ensure your SAS program file is

executable. Use variable lists and arrays to reduce your code were practical.

1. Put your name in a title so that if the entire assignment is executed, your name will appear above all
output. For each question, add a secondary title providing the overall question number.

2. This question requires you to use the SAS data step with probability functions to calculate all
probabilities. Use proc print with observation numbers suppressed to display all results.

a. Suppose this Halloween a child who loves potato chips goes trick-or-treating to 10 random homes
in Kingston. Suppose that 1 in 4 homes gives out a bag of chips. Assume that the number of bags of

chips the child gets follows a binomial distribution with 10 trials and a success probability of 0.25.

i. Calculate and display the probability that the child gets 2 or more bags of chips.
ii. Use a do loop to generate a dataset with 11 rows and 2 columns (variables) named Chips and

Prob . Chips should be 0 for the first row, 1 for the 2nd row, 2 for the 3rd row and so on. Prob

is the probability that the child gets the exact number bags of chips as indicated by the Chips

column.

b. Suppose that the number of trick-or-treaters that visit your house follows a Poisson distribution with
a mean of 10.

i. Calculate and display the probability that more than 12 trick or treaters will visit your house
this year.

ii. If you give each trick-or-treater one chocolate bar, what’s the minimum number of chocolate
bars you need to buy to be at least 95% certain that you won’t run out of chocolate bars this

Halloween. Use a function that returns this number.

3. Fun with Fibonacci

The Fibonacci sequence is one of the most famous sequences in nature and mathematics. If you are

not familiar with it look it up. The first and second elements of the Fibonacci sequence are 0 and 1 (some

references consider 1 the first element). Each additional element is the sum of the prior two elements, so

element 3 is 0+1=1, element 4 is 1+1=2, element 5 is 1+2=3 and so on.

The ratio of an element in the Fibonacci sequence divided by the prior element in the sequence

converges to a number called the ‘Golden Ratio’. The Golden Ratio is quite remarkable in mathematics,

nature, art and architecture (again look it up if you are not familiar with it). The golden ratio is equal to
1+√5

2
(approximately 1.61803).

In a single data step, generate a data set containing one row and variables F1 through F20 containing

the first 20 elements of the Fibonacci sequence. Also create variables GR1 though GR20 where GRi is

equal to Fi/Fi-1. GR1 and GR2 are undefined so they will be left as missing. Do this question by using

arrays. You will assign F1 and F2 the values of 0 and 1 respectively, but the remaining elements of the

arrays can be calculating by using arrays to apply the Fibonacci and Golden ratio formulae

Famous Mathematical Sequences and Series

Stat 466/866, Fall 2021 Due: Friday, Nov 5th

Assignment 2

2 of 4

The remainder is for STAT 866 only:

The dataset you just created contains only 1 row with different variables containing the elements of the

sequences. Create another dataset that contains only 3 variables named i, Fibonacci and GoldenRatio and

has 1 row for each of the first 20 elements of Fibonacci sequence. Use this dataset to create the figure

shown below. Note that only the first 7 elements are displayed, and the y-axis ranges from 0 to 8 in

increments of 1. Also notice the labelled reference line at 1.61803 and the dashed Golden Ratio Estimate.

Stat 466/866, Fall 2021 Due: Friday, Nov 5th

Assignment 2

3 of 4

4. Simulating and Assessing Distributions.

a) In a single data step generate a data set named samples containing 200, 000 rows with 8 variables
containing random numbers from the 8 distributions described in table 1 below. Generate these

rows using a double do loop with an outer index variable named sample that increments from 1 to

10,000 and an inner index variable named n which increments from 1 to 20. You will need to look

up the rand function to generate the 8 random variables. Set the seed to 1234 so we all generate

the same “random” sample.

Table 1:

Distribution Distribution Parameters and Notes Mean of

distribution

Normal =0, 2=3 0

Continuous Uniform Min=0 Max=6 3

Discreate Uniform Takes on values 1, 2, 3, 4, 5, 6 with equal probability 3.5

Bernoulli Probability of success is 0.5 0.5

Poison ==3 3

Chi-Squared With 2 degrees for freedom 2

T (with 1 and 3

degrees of freedom)

Generate three separate variables named t1 and t3 following t-

distributions with 1 and 3 degrees of freedom respectively.

0 if defined,

median=0

b) Run each of these 8 variables through a single procedure that will provide detail univariate
descriptive statistics including histograms and other plots of the distributions. Have the procedure

add a kernel smooth and normal density reference curve to the histograms.

c) Now run each of these random variables through PROC MEANS to create a new dataset named
averages that contains the average, variance and 95% confidence limits for each variable in each

of the 10,000 samples of size 20. Tell the procedure not to display any results (or this will take

forever to run). This procedure will read in the 200,000 rows of the 8 random variables and will

generate a dataset with 10,000 rows (one for each sample). For each distribution, the output data

set should contain one variable containing the mean, one variable containing the variance, one

variable containing the lower confidence limit and another variable containing the upper

confidence limit. I don’t care if the dataset contains some additional variables. If possible, use an

option so that SAS will automatically name the output variables by combining the input variable

name with the statistic name.

d) Run the 10,000 sample averages generated in c through the same procedure with the same options
as use used in part b. Look at the results including the histogram. Do any of these averages appear

approximately normal? Which one(s)? In a comment, refer to at least two results from the

procedure output to support your response for each random variable.

e) Now read the dataset you created in c) into a new data step to create 8 new variables (one for each
distribution) containing the value 1 if the true mean is contained within the normal approximate

95% sample confidence limits and 0 otherwise. Create 8 additional variables that contain the

width of the 95% confidence intervals for each distribution.

f) Use PROC MEANS step to display the coverage rates of the 95% confidence intervals and the
widths of each confidence interval for each distribution. Have this PROC MEANS only display

two decimals. If the coverage rate rounds to between 0.94 and 0.96 inclusive consider the

Stat 466/866, Fall 2021 Due: Friday, Nov 5th

Assignment 2

4 of 4

coverage rate nominal. If the coverage rate rounds to >0.96 consider it conservative and if it is

<0.94 consider it anti-conservative. In a comment, tell me which distributions had a nominal coverage rate with a sample size of 20. g) Don’t provide the code for this, but re-estimate the coverage rates for a sample size of 3 instead of 20. In a comment, list the distributions that maintained their nominal coverage rate with a sample size of 3. When running the coverage simulation with n=3, did you notice any sign of a problem in the log? Don’t worry about fixing anything, just comment if you log suggested any issues. h) For STAT 866 only: Do you think the sample means would approach normality, and the coverage rates would become nominal for all 8 distributions, if we made the sample size large enough? Why or why not? i) For STAT 466 only. The figure below shows the mean (circle) and 95% confidence interval for the first 100 simulated samples of size 20 from the normal distribution with mean 0 and variance=3. Try to generate the exact same figure with the same axis labels and legend. Notice the dashed reference line at 0 and the different color of the of the 95% confidence intervals not containing the true mean. j) For STAT 866 only. Try to generate the 95% CI coverage indicator variables for the 8 distributions listed in table 1 in a single data step. That is, you are redoing most of the work done in parts a, c and e in s single data step. Have the resulting dataset contain 10, 000 rows (simulations) each providing the coverage rate for a sample size of 100. Hint: this should make extensive use of arrays and variable lists. Figure required for STAT866 only: