MT4113: Assignment 1
Richard Glennie
Set: Monday 21 Sept. Due: Mon 5 Oct 23:59.
In this assignment, you will write functions in R that use control structures such as if-statements and iteration loops. The functions will produce samples from different statistical distributions: the normal distribution, the χ2 distribution, and Student’s t-distribution.
The assigment is marked out of 50 and will count towards 10% of your final grade. A detailed marking scheme is given at the end of this document. Overall, this assignment will assess how you translate algorithms into code and how well you write clear, modular code that adheres to a consistent style.
Background
As we will see in this course (and you may see in other statistics courses), many methods rely on the ability of computers to produce samples from statistical distributions.
The random numbers generated by computers are not really random – they use deterministic algorithms that produce a set of numbers that have the proper distributional properties (e.g., the right mean and variance) and no discernible pattern. Because the algorithms are deterministic, they always produce numbers in the same order, and so to be useful they need to be set off at a random start point – the seed.
There are many distributions from which we might want to generate random values. Computer algorithms begin by generating numbers from the uniform distribution and then use various tricks that use these uniform deviates to generate values from the distribution of interest. For this assignment, assume that the random uniform number generator in R, runif(), does a good job of creating uniformly distributed deviates.
Using runif(), you will write functions that produce random deviates conforming to three distributions (normal, χ2 and Student’s t).
The algorithms are as follows:
Algorithm for normally-distributed samples The Box-Muller method (Maindonald 1984) generates a
pair of independent normally-distributed samples, X1 and X2. Algorithm: Box-Muller method.
• 1. Generate A ∼ uniform(0, 1) and B ∼ uniform(0, 1)
• 2. Let X1 = sin(2πA)−2logeB and X2 = cos(2πA)−2logeB
• 3. Deliver X1 and X2
Generating χ2-distributed samples Larsen and Marx (1981:284) give the following theorem, from which an algorithm immediately follows:
Let Z1, Z2, …, Zm be m independent standard normal random variables. Then
m
Z i2 ∼ χ 2m i=1
1
Generating Student’s t-distributed samples Mood, Graybill and Boes (1974:250) give the following theorem, from which an algorithm immediately follows. If Z has a standard normal distribution (∼ N (0, 1)) and U has a chi-squared distribution with m degrees of freedom (∼ χ2m), and if Z and U are independent, then
Z
t = U/m
is distributed with a Student’s t-distribution with m degrees of freedom. Your assignment
Problem statement
• Write an R function, my_rnorm() that returns a vector of pseudo-random values from a normal distribution using the Box-Muller algorithm.
– The algorithm delivers values with mean 0 and sd 1. You can transform them into values with mean μ and sd σ by multiplying the values by σ and then adding μ.
– The algorithm describes how to generate pairs of normally distributed deviates; think about how your code should behave to produce an odd number of deviates.
– rnorm() accepts vector arguments, but your functions should be designed to accept only single numbers. This makes it easier for you to code.
– Just like the R function rnorm(), the function should have the following arguments:
Argument name
n mean sd
Description Default
number of values to return none mean of values to return 0 standard deviation of values to return 1
• Write an R function my_rchisq() returning a vector of pseudo-random χ2-distributed deviates.
Argument name
n df
Description Default
number of values to return none degrees of freedom of the distribution 1
• Write an R function my_rt() returning a vector of pseudo-random t-distributed deviates.
Argument name
n df
Description Default
number of values to return none degrees of freedom of the distribution 1
• These last two functions differ from their R counterparts rchisq() and rt(), in that your functions will not use the non-centrality parameter argument.
Make sure each of your functions are clearly written, with a useful amount of commenting. Pay attention to the tips on good programming style given in the tidyverse style guide, lectures, and practicals, including things like spacing between blocks of code, indenting and spaces within lines of code and use of clear variable names. Use vectorization where appropriate. Include appropriate checks on inputs within your functions; when the function detects an incorrect input it should stop program execution with exactly the error message invalid arguments (use the stop function). This is similar to the behaviour or rnorm:
rnorm(-5)
## Error in rnorm(-5): invalid arguments
2
Your functions should be able to handle common types of incorrect inputs, such as the example given above. Please do not use any additional R libraries in your solution beyond those loaded in a standard R installation.
Your solution therefore should not include the library() or require() functions.
What to hand in
Upload the following to MMS:
• A single text file named asmt1.r containing at least the three functions (suitably commented) you have written. Please be careful to exactly follow the instructions about function names, arguments, returned values, etc. – the functions must be drop-in replacements for rnorm(), rchisq() and rt(). I will be using automated testing to check the functions, if you do not follow the instructions exactly, my tests will fail on your code, and you will not get credit for your work.
Here is an example of an automated test – checking that the function my_rnorm() produces a vector of 10 numbers when asked:
• Also within the text file named asmt1.r add one or more additional functions (not open code) used to test the my_rnorm(), my_rchisq() and my_rt() functions. The test function(s) will not be part of the automated testing. This part of the assignment is open-ended; I wish to see what creativity you can use to test your own code. However the test function(s) must be functions and contain sufficient documentation so that I know what it is the test(s) are trying to accomplish. If your test code is not written in the form of functions, they will interfere with my automated testing procedure, causing possible loss of marks.
If you hand work in late, it will be subject to the late work penalty described on the School’s web page [http://www.st-andrews.ac.uk/maths/current/ug/information/latepenalties/] and summarised as A late piece of work is penalised with an initial penalty of 15% of the maximum available mark, and then a further 5% per 8-hour period, of part thereof. Do not wait until the morning of the due date to try uploading your work to MMS – if the upload fails at that point, your work will be discounted as late.
Because this assignment counts towards your final grade, it is important that you do not collaborate with others in completing the work. You should be comfortable with the following statement, which you must include as a comment at the beginning of your code file
# I confirm that the attached is my own work, except where clearly indicated in the text.
If you got stuck and needed to ask your peers, please use comments to tell me in your code file where you got help from others. For more information, see the Academic misconduct section of the university web site [http://www.st-andrews.ac.uk/students/rules/academicpractice/]. This holds for all assessed project work on this course. Plagiarism cannot be tolerated on this or any other assignment for this module.
Marking scheme
There are 50 marks on offer for the assignment.
• Up to 30 marks for producing working functions my_rnorm(), my_rchisq() and my_rt(). As stated above, I will use automated testing to check your functions work, including trapping input errors. This means I will run a battery of tests on your functions and compare the results with what they should produce.
– If you are unable to create my_rnorm() according to the Box-Muller algorithm provided, you may receive partial credit for the assignment by calling the R function, rnorm() in your χ2 and
source(‘asmt1.r’)
x <- my_rnorm(n=10)
pass.test <- (length(x)==10 & is.numeric(x))
# pass.test takes on the value TRUE if the object 'x' contains 10 numeric values
3
Student’s t-distribution random variate generation. Marks you would have earned in the testing of
my_rnorm() will be lost by adopting this strategy.
• Up to 12 marks for good programming style in each function (presence and usefulness of comments, use
of indenting, presence of error checks, etc.). I will assess this by making a visual check of your code,
and I will assign marks for style out of 12.
• Up to 8 marks are available for the testing functions you produce. I will check them for accuracy,
creativity and ambitiousness. Note the marks available for this portion of the assignment is less than the number of marks available for good programming style; allocate your effort accordingly.
4