程序代写代做代考 algorithm graph Midterm Project STAT 280, Fall 2020

Midterm Project STAT 280, Fall 2020
Instructor: Dr. Simone Brugiapaglia
Due date: Friday, November 6, 2020 at 23:59 pm EST Instructions
• Students shall upload their solutions on Moodle before the due date. Late submissions will not be accepted
• Solutions to the proposed problems should be uploaded as a single RScript file (.R) or as a single PDF file created using R Markdown (preferred option).
• In both cases, make sure to clearly state the problem numbers in your solutions. Moreover, always show the code used to solve an exercise (not the output only, unless explicitly stated) and comment your code using # where appropriate.
• Readability of code and clarity of presentation will be taken into account when marking.
• For any question, you can contact the instructor at simone.brugiapaglia@concordia.ca
• Important note about graphics: When producing figures, use appropriate additional features such as titles, axis labels, legends, suitable markers, colors, etc. to make your plots clearly understandable. Clarity of visualization will be taken into account during marking.
Problem 1: Student Survey Data [15 points]
Load the MASS library using the command library(MASS)
and consider the dataset survey.
a. We want to find the mean height of students in the dataset. However, by using the mean command, we
obtain:
mean(survey$Height)
## [1] NA
Explain why this is happening. How can we compute the mean height? b. Find the median heights of male students and of female students.
c. Compute the mean pulse rate for male students under 20. 1

d. Suppose we want to extract the data of students who are taller than 190 cm. Try to execute the command:
survey[survey$Height > 190, ]
What happens? How can we extract the desired data?
e. Extract a dataset showing only pulse rate, exercising frequency, and age of students who are less than 17 or more than 40 years old.
f. The ages are recorded as numeric values representing a number of years. Modify the data frame so that the age is measured using an integer number of months (for example, 20.167 years should be converted to 242 months).
g. What is the percentage of left handers who do not clap with their left hand on top?
h. Create a histogram showing the distribution of students’ heights and use a QQ plot to compare it a standard normal sample. (These two plots should appear side by side in the same figure). Are students’ heights close or far from being normally distributed?
i. Create a box plot showing pulse rates (y-axis) for students with different exercising frequency (x-axis). What do you observe?
j. Consider the following variables: span of writing hand, span of non-writing hand, pulse rate, height, and age. Are there any pairs of variable in this set that exhibit a linear dependence relation?
Problem 2: Convergence speed of fixed-point iterations [15 points] Part a) Implementation
Create a function with header:
fixed.point(g, x0, TOL, Nmax) The function takes as inputs:
• a scalar function g
• an initial guess x0
• a tolerance parameter TOL (default value 10−3)
• a maximum number of iterations Nmax (default value 100)
The function should implement the fixed-point iteration xn = g(xn−1) and it should stop as soon as either |g(xn) − xn| < TOL or n ≥ Nmax. The function should return: • a vector, called iter, containing the sequence of approximations (x0, . . . , xn) generated by the method. Part b) Testing We want to apply the function fixed.point() to approximate the number p = 71/5. We consider the following functions: 6x+7/x4 x5 −7 􏰀7􏰁1/4 g1(x) = 7 , g2(x) = x − 5x4 , g3(x) = x i. By using a suitable visualization strategy, show that the functions g1,g2,g3 have one unique fixed point on the interval [1, 2]. 2 ii. Apply the function fixed.point to g1, g2 and g3 with x0 = 1, TOL = 10−10 and Nmax = 100. Print the number of iterations employed in each case, and the corresponding absolute errors associated with the last computed approximation. iii. Show the convergence plots associated with the test in part ii (i.e., plot the absolute error as a function of the iteration). Use the logscale for the y axis. If you can, show the three convergence plots in the same plot region (if you are not able to do so, show three plots side by side using the same y range for a clear comaprison). What do you conclude about the speed of convergence of the proposed methods? Problem 3: Sorting vectors with pivoting [15 points] Part a) Implementation We want to implement an algorithm for sorting the elements of a vector based on a pivoting strategy. The corresponding function should have the following header: midtermsort(x) Assume the input to be a vector x of length n. The output of midtermsort(x) should be a vector containing entries of x sorted in increasing order. The algorithm should do the following: 1. if n ≤ 1, return x; otherwise: 2. pick a random index pivot, with 1 ≤ pivot ≤ n; 3. define two vectors: • a vector y, containing all entries of x strictly smaller than x[pivot]; • a vector z, containing all entries of x strictly larger than x[pivot]; 4. create a vector x.pivot containing the entries of x not used in y nor z; 5. sort the vectors y and z by recursively applying the function midtermsort() to them. Call the corresponding sorted vectors y.sorted and z.sorted, respectively; 6. return the vector build by concatenating y.sorted, x.pivot, and z.sorted (in this order). Part b) Testing Note: If you were not able to implement midtermsort(), perform the following tests using the built-in function sort() instead. i. Apply midtermsort() to a random normal vector of dimension 20. Produce a scatter plot of the entries of the sorted vector. Compare the result with the output produced by the built-in function sort() (if you implemented midtermsort()). ii. For n = 10k with k = 2, 2.25, 2.5, 2.75, 3, ..., 4 measure the user time employed to sort a random normal vector of dimension n using midtermsort(). Plot the corresponding computing times as a function of the vector length n, using a log scale for both the x and y axes. 3