程序代写代做 York University

York University
MATH 4939 – Midterm
Instructor: Georges Monette
February 15, 2019 – 10:30 am to 11:20 am (50 minutes)
WARNING
DO NOT OPEN THIS BOOKLET UNTIL YOU ARE INSTRUCTED TO DO SO
Student number: ___________________________
Family name: (in BLOCK letters)___________________________
Given name: (in BLOCK letters)___________________________
Signature___________________________ Information:
Be sure to read questions closely. Some may ask for multiple pieces of information. Make sure to respond completely. If you need more space to answer, write “OVER” and continue the answer on the back of the page.
The marks for each questions are shown at the end of the question. The sum of the marks is 80. Aids allowed: Non-programmable calculator, ruler, pencils, pens, erasers.
WARNING
DO NOT OPEN THIS BOOKLET UNTIL YOU ARE INSTRUCTED TO DO SO
1

MATH 4939 Midterm February 15, 2019
1. Describe how variable selection strategies should be affected by the purpose of an analysis and the way variables were obtained. (10 points)
2. A study of arrests for posession of marijuana in Toronto in the early 2000s recorded data for 5,226 arrests by Toronto police over a period of approximately 2 years. For each arrest we consider the variables: colour of the person arrested (Black or White), sex (Male or Female), employed (Yes or No) and ‘released’ (Yes or No) according to whether the person arrested was released directly on the spot by the police or whether they were taken to jail before being released on bail.
The following is some output from a logistic regression of ‘released’ on the other variables: – 2–

MATH 4939 Midterm February 15, 2019
A colleague of yours notes that none of the coefficients are significant and concludes that there is no evidence that ‘colour,’ in particular, is related to the probability of release. What would you say to your colleague?
(10 points)
– 3–

MATH 4939 Midterm February 15, 2019
3. (continued from the previous question) Consider the following output:
Describe unambiguously the null and alternative hypotheses tested in the third line of the anova table and in the sixth line of the anova table. What is the ‘real-world’ interpretation of these tests? (10 points)
– 4–

MATH 4939 Midterm February 15, 2019
4. Consider the general linear model with the usual notation. Let η1 = L1β and η2 = L2β. Suppose that within both L1 and L2 the rows are linearly independent and that the rows of L1 can be expressed as linear combinations of the rows of L2. Show that the Wald tests for η1 = 0 and for η2 = 0 come to identical conclusions. (10 points)
5. Write a generic function tran and a set of methods so that tran(x, a, b) replaces every instance of the value a in x with b. e.g. tran(c(1,2,2), 2, 3) should return the vector 1, 3, 3. The function should work with numeric, character and factor objects and should return an object of the same type. (10 points)
– 5–

MATH 4939 Midterm February 15, 2019
6.
Consider of vector of strings containing names of people. Each string contains one name which can be in various formats: ‘Mary Ellen Brown’ (i.e. first name followed by middle name if any) and by last name), ‘Brown, Mary Ellen’ (last name, followed by first and middle names), ‘Paul Smith’ (if there is no middle
name) or ‘Paul, Smith’. Write a function in R that takes two arguments: a vector of such strings and a single character string. The function counts how often the second argument occurs as a last name in the vector that is the first argument. (10 points)
7.
Consider the following function in R:
f <- function(x = {y <- 5; 2}, y = 10) { x + y } State what the following expression will return and explain why: a) f() b) f(20) c) f(-10,2) d) f(y = 20) e) f(x = 21) *(10 points)* – 6– MATH 4939 Midterm February 15, 2019 8. The following is some output from a linear regression of life expectancy in a number of countries regressed on HE (health expenditures per capita per year in dollars US), smoke (cigarettes per capita per year), hiv and special, that are two indicator variable to identify anomalous countries. a) Construct a hypothesis matrix to estimate the predicted difference in life expectancy associated with an increase of 1,000 cigarettes per capita per year for a country with a level of health expenditures equal to 2,000 and cigarette consumption equal to 1,000. b) Construct a hypothesis matrix to estimate the predicted difference in life expectancy associated with an increase of 1,000 dollars in health expenditures per capita per year for a country with a level of health expenditures equal to 2,000 and cigarette consumption equal to 1,000. (10 points) – 7–