MT4113: Assignment 1
MT4113: Assignment 1
Richard Glennie
Set: Monday 20 Sept. Due: Mon 4 Oct 23:59.
In this assignment, you will write functions in R that use control structures such as if-statements and iteration
loops. The functions will produce samples from different statistical distributions: the normal distribution,
the χ2 distribution, and Student’s t-distribution.
The assignment is marked out of 50 and will count towards 10% of your final grade. A detailed marking
scheme is given at the end of this document. Overall, this assignment will assess how you translate algorithms
into code and how well you write clear, modular code that adheres to a consistent style.
Background
As we will see in this course (and you may see in other statistics courses), many methods rely on the ability
of computers to produce samples from statistical distributions.
The random numbers generated by computers are not really random – they use deterministic algorithms that
produce a set of numbers that have the proper distributional properties (e.g., the right mean and variance)
and no discernible pattern. Because the algorithms are deterministic, they always produce numbers in the
same order, and so to be useful they need to be set off at a random start point – the seed.
There are many distributions from which we might want to generate random values. Computer algorithms
begin by generating numbers from the uniform distribution and then use various tricks that use these uniform
deviates to generate values from the distribution of interest. For this assignment, assume that the random
uniform number generator in R, runif(), does a good job of creating uniformly distributed deviates.
Using runif(), you will write functions that produce random deviates conforming to three distributions
(normal, χ2 and Student’s t).
The algorithms are as follows:
Algorithm for normally-distributed samples The Box-Muller method (Maindonald 1984) generates a
pair of independent normally-distributed samples, X1 and X2.
Algorithm: Box-Muller method.
• 1. Generate A ∼ uniform(0, 1) and B ∼ uniform(0, 1)
• 2. Let X1 = sin(2πA)
√
−2logeB and X2 = cos(2πA)
√
−2logeB
• 3. Deliver X1 and X2
Generating χ2-distributed samples Larsen and Marx (1981:284) give the following theorem, from which
an algorithm immediately follows:
Let Z1, Z2, …, Zm be m independent standard normal random variables. Then
m∑
i=1
Z2i ∼ χ
2
m
1
Generating Student’s t-distributed samples Mood, Graybill and Boes (1974:250) give the following
theorem, from which an algorithm immediately follows. If Z has a standard normal distribution (∼ N(0, 1))
and U has a chi-squared distribution with m degrees of freedom (∼ χ2m), and if Z and U are independent,
then
t =
Z√
U/m
is distributed with a Student’s t-distribution with m degrees of freedom.
Your assignment
Problem statement
• Write an R function, my_rnorm() that returns a vector of pseudo-random values from a normal
distribution using the Box-Muller algorithm.
– The algorithm delivers values with mean 0 and sd 1. You can transform them into values with
mean µ and sd σ by multiplying the values by σ and then adding µ.
– The algorithm describes how to generate pairs of normally distributed deviates; think about how
your code should behave to produce an odd number of deviates.
– rnorm() accepts vector arguments, but your functions should be designed to accept only single
numbers. This makes it easier for you to code.
– Just like the R function rnorm(), the function should have the following arguments:
Argument name Description Default
n number of values to return none
mean mean of values to return 0
sd standard deviation of values to return 1
• Write an R function my_rchisq() returning a vector of pseudo-random χ2-distributed deviates.
Argument name Description Default
n number of values to return none
df degrees of freedom of the distribution 1
• Write an R function my_rt() returning a vector of pseudo-random t-distributed deviates.
Argument name Description Default
n number of values to return none
df degrees of freedom of the distribution 1
• These last two functions differ from their R counterparts rchisq() and rt(), in that your functions
will not use the non-centrality parameter argument.
Make sure each of your functions are clearly written, with a useful amount of commenting. Pay attention to
the tips on good programming style given in the tidyverse style guide, lectures, and practicals, including
things like spacing between blocks of code, indenting and spaces within lines of code and use of clear variable
names. Use vectorization where appropriate. Include appropriate checks on inputs within your functions;
when the function detects an incorrect input it should stop program execution with exactly the error message
invalid arguments (use the stop function). This is similar to the behaviour or rnorm:
rnorm(-5)
## Error in rnorm(-5): invalid arguments
2
Your functions should be able to handle common types of incorrect inputs, such as the example given above.
Please do not use any additional R libraries in your solution beyond those loaded in a standard R installation.
Your solution therefore should not include the library() or require() functions.
What to hand in
Upload the following to MMS:
• A single text file named asmt1.r containing at least the three functions (suitably commented) you have
written. Please be careful to exactly follow the instructions about function names, arguments, returned
values, etc. – the functions must be drop-in replacements for rnorm(), rchisq() and rt(). I will be
using automated testing to check the functions, if you do not follow the instructions exactly, my tests
will fail on your code, and you will not get credit for your work.
Here is an example of an automated test – checking that the function my_rnorm() produces a vector of 10
numbers when asked:
source(‘asmt1.r’)
x <- my_rnorm(n=10)
pass.test <- (length(x)==10 & is.numeric(x))
# pass.test takes on the value TRUE if the object 'x' contains 10 numeric values
• Also within the text file named asmt1.r add one or more additional functions (not open code) used
to test the my_rnorm(), my_rchisq() and my_rt() functions. The test function(s) will not be part of
the automated testing. This part of the assignment is open-ended; I wish to see what creativity you
can use to test your own code. However the test function(s) must be functions and contain sufficient
documentation so that I know what it is the test(s) are trying to accomplish. If your test code is not
written in the form of functions, they will interfere with my automated testing procedure, causing
possible loss of marks.
If you hand work in late, it will be subject to the late work penalty described on the School’s web page
[http://www.st-andrews.ac.uk/maths/current/ug/information/latepenalties/] and summarised as A late
piece of work is penalised with an initial penalty of 15% of the maximum available mark, and then a further
5% per 8-hour period, of part thereof. Do not wait until the morning of the due date to try uploading your
work to MMS – if the upload fails at that point, your work will be discounted as late.
Because this assignment counts towards your final grade, it is important that you do not collaborate with
others in completing the work. You should be comfortable with the following statement, which you must
include as a comment at the beginning of your code file
# I confirm that the attached is my own work, except where clearly indicated in the text.
If you got stuck and needed to ask your peers, please use comments to tell me in your code file where you
got help from others. For more information, see the Academic misconduct section of the university web site
[http://www.st-andrews.ac.uk/students/rules/academicpractice/]. This holds for all assessed project work on
this course. Plagiarism cannot be tolerated on this or any other assignment for this module.
Marking scheme
There are 50 marks on offer for the assignment.
• Up to 30 marks for producing working functions my_rnorm(), my_rchisq() and my_rt(). As stated
above, I will use automated testing to check your functions work, including trapping input errors. This
means I will run a battery of tests on your functions and compare the results with what they should
produce.
– If you are unable to create my_rnorm() according to the Box-Muller algorithm provided, you
may receive partial credit for the assignment by calling the R function, rnorm() in your χ2 and
3
http://www.st-andrews.ac.uk/maths/current/ug/information/latepenalties/
http://www.st-andrews.ac.uk/students/rules/academicpractice/
Student’s t-distribution random variate generation. Marks you would have earned in the testing of
my_rnorm() will be lost by adopting this strategy.
• Up to 12 marks for good programming style in each function (presence and usefulness of comments, use
of indenting, presence of error checks, etc.). I will assess this by making a visual check of your code,
and I will assign marks for style out of 12.
• Up to 8 marks are available for the testing functions you produce. I will check them for accuracy,
creativity and ambitiousness. Note the marks available for this portion of the assignment is less than
the number of marks available for good programming style; allocate your effort accordingly.
4
Background
Your assignment
Problem statement
What to hand in
Marking scheme