MATH 208 Final Exam December 8th-11th, 2020
Question 1 [50 points]
The data for this question comes from the STAR dataset from the AER library. Below is a summary and five sample
Copyright By PowCoder代写 加微信 powcoder
rows of a modified version of that dataset containing information from a study examining the effect of reducing class
size on student performance in primary school.
str(STAR_data)
‘data.frame’: 3114 obs. of 6 variables:
$ student_ID: int 1 2 3 4 5 6 7 8 9 10 …
$ stark : Factor w/ 3 levels “regular”,”small”,..: 2 2 1 2 1 1 2 2 1 3 …
$ star1 : Factor w/ 3 levels “regular”,”small”,..: 2 2 1 2 1 1 2 2 1 3 …
$ readk : int 447 450 448 447 431 451 478 455 430 437 …
$ read1 : int 507 579 651 533 558 548 514 530 490 503 …
$ read2 : int 568 588 614 608 608 596 569 608 622 552 …
STAR_data %>% slice(sample(1:n(), 5))
student_ID stark star1 readk read1 read2
1 1127 regular regular+aide 455 571 669
2 1556 regular+aide regular 456 483 560
3 856 regular regular+aide 450 512 571
4 611 regular regular 416 553 618
5 2296 regular+aide regular+aide 451 629 643
Besides the Student ID, we will focus on four other measures from the data: stark and star1, which indicate the
type of class in kindergarten and grade 1, respectively (“regular”, “small”, or “regular+aide”); and readk, read1,
and read2 which are reading scores from kindergarten, grade 1 and grade 2 respectively.
(a) [5 pts] Write a line of code that will generate the following tibble (or data.frame) with the total number of
students who were in each type of class in kindergarten:
# A tibble: 3 x 2
# Groups: stark [3]
1 regular 1067
2 small 987
3 regular+aide 1060
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(b) [5 pts] Write a line of code that will generate the following tibble (or data.frame) with the total number of
students who were in each combination of type of class in kindergarten and grade 1, as below:
count_table
# A tibble: 9 x 3
# Groups: stark, star1 [9]
stark star1 n
1 regular regular 518
2 regular small 85
3 regular regular+aide 464
4 small regular 29
5 small small 924
6 small regular+aide 34
7 regular+aide regular 491
8 regular+aide small 85
9 regular+aide regular+aide 484
(c) [5 pts] Assume the tibble from part (b) is called count_table as above. Now write a line of code that
produces a tibble which gives, for each class type in kindergarten, the proportion of students in each class type
in grade 1:
Here is some code which creates an object STAR_what.
STAR_what <- STAR_data %>%
pivot_longer(cols=readk:read2,names_to=”Test”,values_to=”Score”) %>%
select(-student_ID)
(d) [5 pts] What class of object is STAR_what?
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
In class we used xtabs to create contingency tables of counts of combinations of qualitative variables, as in this
STAR_who_denom <- xtabs(~star1+Test+stark,data=STAR_what)
STAR_who_denom
, , stark = regular
star1 read1 read2 readk
regular 518 518 518
small 85 85 85
regular+aide 464 464 464
, , stark = small
star1 read1 read2 readk
regular 29 29 29
small 924 924 924
regular+aide 34 34 34
, , stark = regular+aide
star1 read1 read2 readk
regular 491 491 491
small 85 85 85
regular+aide 484 484 484
(e) [5 pts] What will the code STAR_who_num[1,3,2] return as output?
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
xtabs can also be used to sum up values of another variable for different combinations of star1, Test and stark
by putting the variable name in front of the ~. For example, we can find the total of all scores by using
STAR_who_num <- xtabs(Score~star1+Test+stark,data=STAR_what)
STAR_who_num
, , stark = regular
star1 read1 read2 readk
regular 273728 306238 228798
small 45797 50785 37660
regular+aide 249580 276710 205622
, , stark = small
star1 read1 read2 readk
regular 15396 17009 12617
small 500773 552478 413608
regular+aide 18338 20488 14927
, , stark = regular+aide
star1 read1 read2 readk
regular 261220 290488 218272
small 44596 49270 37070
regular+aide 258514 286343 212980
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(f) [5 pts] Using STAR_who_num and STAR_who_denom, write a single line of code that assigns the average score
for each star1 by Test by stark combination to an object called STAR_avg as seen below:
, , stark = regular
star1 read1 read2 readk
regular 528.4324 591.1931 441.6950
small 538.7882 597.4706 443.0588
regular+aide 537.8879 596.3578 443.1509
, , stark = small
star1 read1 read2 readk
regular 530.8966 586.5172 435.0690
small 541.9621 597.9199 447.6277
regular+aide 539.3529 602.5882 439.0294
, , stark = regular+aide
star1 read1 read2 readk
regular 532.0163 591.6253 444.5458
small 524.6588 579.6471 436.1176
regular+aide 534.1198 591.6178 440.0413
(g) [10 pts] Write a line of code that creates an array that contains the difference between the average read2 and
readk scores for each stark by star1 combination using STAR_avg above.
star1 regular small regular+aide
regular 149.4981 151.4483 147.0794
small 154.4118 150.2922 143.5294
regular+aide 153.2069 163.5588 151.5764
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(h) [10 pts] Write code (possibly multiple lines) using the original STAR_what to produce a tibble containing the
same rows and columns as the object in part (g).
# A tibble: 3 x 4
# Groups: star1 [3]
star1 regular small `regular+aide`
1 regular 149. 151. 147.
2 small 154. 150. 144.
3 regular+aide 153. 164. 152.
END OF QUESTION 1
Question 2 [50 points]
We will re-use the same data that was used in Question 1. The description is repeated below for your convenience.
The data for this question comes from the STAR dataset from the AER library. Below is a summary and five sample
rows of a modified version of that dataset containing information from a study examining the effect of reducing class
size on student performance in primary school. T
str(STAR_data)
‘data.frame’: 3114 obs. of 6 variables:
$ student_ID: int 1 2 3 4 5 6 7 8 9 10 …
$ stark : Factor w/ 3 levels “regular”,”small”,..: 2 2 1 2 1 1 2 2 1 3 …
$ star1 : Factor w/ 3 levels “regular”,”small”,..: 2 2 1 2 1 1 2 2 1 3 …
$ readk : int 447 450 448 447 431 451 478 455 430 437 …
$ read1 : int 507 579 651 533 558 548 514 530 490 503 …
$ read2 : int 568 588 614 608 608 596 569 608 622 552 …
STAR_data %>% slice(sample(1:n(), 5))
student_ID stark star1 readk read1 read2
1 2159 regular regular 465 564 622
2 2171 regular regular+aide 410 494 586
3 187 regular regular+aide 436 521 566
4 1320 small small 443 558 659
5 1946 regular+aide regular 545 519 584
Besides the Student ID, we will focus on four other measures from the data: stark and star1, which indicate the
MATH 208 Final Exam December 8th-11th, 2020
type of class in kindergarten and grade 1, respectively (“regular”, “small”, or “regular+aide”); and readk,read1,
and read2 which are reading scores from kindergarten, grade 1 and grade 2 respectively.
(a) [6 pts] Below are partially obscured code and two plots of the values of class types for kindergarten and grade
p1<-ggplot(STAR_data,aes(x=star1,fill=stark)) + geom_YYYYYYY() +
scale_fill_viridis_d() + ggtitle("Plot 1") + theme_bw()
p2<-ggplot(STAR_data) + geom_XXXXXXX(aes(x=product(stark,star1),fill=stark))+
scale_fill_viridis_d() + ggtitle("Plot 2")+ theme_bw()
grid.arrange(grobs=list(p1,p2),nrow=2,ncol=1)
CONTINUED ON NEXT PAGE
regular small regular+aide
regular+aide
regular+aide
regular small regular+aide
regular+aide
Identify these two plots by name:
Plot 1 Plot 2
(b) [8 pts] Using these plots, describe the describe the association between stark and star1. In particular, what
does knowing the type of grade 1 class type tell us about the possible kindergartn class type for the students
in this sample?
MATH 208 Final Exam December 8th-11th, 2020
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(c) [6 pts] Although these plots look similar, they are in fact different. There are two important differences in how
these plots were constructed, one which is more obvious than the other. Explain what those two differences
(d) [6 pts] Write a line of code to create new factor variables in STAR_data for stark and star1 named stark_mod
and star1_mod which combine the “regular” and “regular+aide” levels into a single level “not small”.
Below is a figure along with the code (partially obscured) which generated it.
not small small
MATH 208 Final Exam December 8th-11th, 2020
ggplot(STAR_data,aes(x=_______,fill=________,y=read2)) +
geom_______() + ggtitle("Plot e") + theme_bw()
MATH 208 Final Exam December 8th-11th, 2020
(e) [4 pts] What are the missing geometry and aesthetics that generated the figure on the previous page (that is,
what are the words that are missing in the code above for Plot e)?
(f) [5 pts] Based on these plots, do you think there is evidence of an association between the modified type of
class variables and the grade 1 reading test score? Explain your answer in 3 sentences or fewer.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
Below is a plot of the reading test scores for kindergarten and grade 1 for the STAR_data by levels of the modified
kindergarten class type.
350 400 450 500 550 600
350 400 450 500 550 600
(g) [4 pts] Identify the two kinds of plots in Panel g1 and g2 by name (note that there are two of the same kind
of plot in each panel)
• Panel g1:
• Panel g2:
(h) [6 pts] From Panels g1 and g2, would you conclude that there is an association between readk and read1 in
either group? Does the association between the two reading test varies seem to vary by levels of the modified
kindergarten class type variable? Explain your answers in 4 sentences or fewer.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(i) [5 pts] Which of the following plots could also be used to assess the association between reading scores in
kindergarten and grade 1 (assuming that neither variable is transformed)? Circle all that apply.
A. Line chart B. 2-d density plot C. Treemap D. 2-d histogram
END OF QUESTION
MATH 208 Final Exam December 8th-11th, 2020
Question 3 [50 points]
The goal of this task is to write functions to identify certain repeated patterns of characters in long character vectors,
a basic form of a more complicated task that is often used in gene sequencing.
For every part of this question, you will assume that the user gives you a vector where each element of the vector
contains a single character,lower-case letter. For example, the user may specify:
c("b", "c", "b", "d", "c", "a", "b", "b", "d", "c")
(a) [15 pts] Write a function below using a for loop (and possibly other control statements) which takes a
character vector as an argument and returns the length of the longest sequence of repeated letter “b” for an
arbitrary vector. For the example vector above, for example, the length of the longest sequence of repeated
“b” values is 2. It does not matter if the longest sequence length occurs multiple times, you only need to
report it once.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(b) [15 pts] Now assume that if the user inputs a vector that includes a certain stopping character, then you
should immediately stop analyzing the sequence and return a value of NA. If the input vector does not include
the stopping character, then it proceeds as in part (a) to return the length of the longest sequence of repeated
letter “b” values. For example, if the stopping character is “a”, then in the example above, your function
should return NA. But if the stopping character is “f”, then in the example above should return 2 as before.
Modify your function from part (a) to complete this task. Your function should take two arguments: the input
character vector and a stopping character whose default value is “f”.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(c) [10 pts] Now assume that you want write code to create a data frame or tibble that contains the longest run
in the vector for each letter of the alphabet, except for the single special stopping character specified by the
user. If a non-stopping letter does NOT appear in the vector, it should not appear in the table. In other
words, if the stopping character is “f”, then applying your code to the example vector above would return.
# A tibble: 4 x 2
letter longest
But if the stopping character is “a”, then your function should return NA for all letters, i.e.
# A tibble: 4 x 2
letter longest
Write code below that uses your function from part (b) to produce the desired result. You do not need to write a
separate function for this part, but you can if you think it is helpful.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
(d) [10 pts] Finally, using your code from part (c) so that you can obtain a list with 26 elements, where you
obtain the tibble in part (c) for a each of the 26 possible stopping characters. You do not need to write a
separate function for this part, but you can if you think it is helpful.
END OF QUESTION
MATH 208 Final Exam December 8th-11th, 2020
Question 4 [30 points]
In this question, you will write code to simulate a board game based on the fable, “The Tortoise and the Hare”.
The idea of the game is as follows:
(a) There are 100 spaces on the board and each piece must travel in order through the board.
(b) Both characters start on space 0.
(c) The Hare always gets to move first. The Hare randomly moves forward 5 spaces (when running) or moves
forward 0 spaces (when sleeping), with equal probability.
(d) Then the Tortoise moves forward either 2 spaces or 4 spaces, with equal probability.
(e) The game ends when one of the characters reaches a total of 100 spaces or greater.
[10 points] Write a function below, one_turn, which simulates a single turn in the game, i.e. steps (c) and (d)
above. The function should take two arguments, the current space of the Hare and the updated space of the tortoise.
The function should return the updated space of the the Hare and the upated space of the Tortoise after one turn.
Hint: You can use the sample function in R to choose the number of spaces each player moves forward.
CONTINUED ON NEXT PAGE
MATH 208 Final Exam December 8th-11th, 2020
[20 points] Write a new function which uses your function in part (a) to simulate one entire game, from steps a)
to e) above. Your function should take in one argument: a random seed so that you can replicate the results of the
game. Your function should return a list containing two elements: the name of the winner of the game (i.e. “Hare”
or “Tortoise”) and a tibble containing the history of all spaces travelled by both players .
Question 1 [50 points]
Question 2 [50 points]
Question 3 [50 points]
Question 4 [30 points]
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com