MATH 208 Midterm Exam October 20th, 2020
Question 1 [50 points]
(a) [15 pts] Consider code chunk Q1 and give the resulting output from the following R commands (or the
Copyright By PowCoder代写 加微信 powcoder
resulting error if one is produced):
(i) Toy_Story[[3]][[2]]
(ii) Toy_Story[[2]][3]
(iii) class(Toy_Story[[1]])
(b) [15 pts] Consider code chunk Q1 and answer the following multiple choice questions (please list all that apply
for each question):
(i) Which of the following command returns the result “Frozen 2”?
A.`Fun_Movies$FR$Movie[2]`
B.`Fun_Movies[[“FR”]][1,1]`
C.`Fun_Movies[[c(2,1)]]`
D.`Fun_Movies[2][[1]]$Movie[2]`
(ii) The class of the object returned by Fun_Movies[[3]][[1]] is a
A. atomic character vector
(iii) The class of the object returned by Fun_Movies[[3]][1] is a
A. atomic character vector
(c) [20 pts] Consider code chunk Q1 for the following questions and answer the following multiple choice questions
(please list all that apply):
(i) [10 pts] Write a line of code that will create and assign the tibble IJ_only used to create the plot
in Figure 1.
(ii) [5 pts] Write a line of code using Fun_Movies and %>% that will produce Table 1 below.
(iii) [5 pts] Which of the following commands yields Table 2 below? Please list all that apply.
A. `Fun_Movies$TS`
B. `Fun_Movies$TS %>% arrange(Year_of_Release)`
C. `Fun_Movies$TS %>% arrange(desc(Movie))`
# A tibble: 3 x 2
name value
1 Number_of_films 2
2 Total_World_Wide_Gross 2730
3 Average_World_Wide_Gross 1365
MATH 208 Midterm Exam October 20th, 2020
# A tibble: 4 x 3
Movie Year_of_Release World_Wide_Gross
1 Toy Story 1995 363
2 Toy Story 2 2000 487
3 Toy Story 3 2010 1066
4 Toy Story 4 2019 1073
Question 2 [50 pts]
(a) [20 marks] Which of the following plots could be properly used to assess the association between one qualitative
characteristic (qual) and one quantitative characteristic (quant). Assume that you cannot transform the
values)? Please list all that apply.
A. Scatterplot via ggplot(the_data,aes(x=qual,y=quant)) + geom_point()
B. Comparing barplots via ggplot(the_data,aes(x=quant)) + geom_bar() + facet_wrap(~qual)
C. Mosaic plot via ggplot(the_data) + geom_mosaic(x=aes(product(quant,qual),fill=qual))
D. Comparing histograms via ggplot(the_data,aes(x=quant)) + geom_hist() + facet_wrap(~qual)
E. 2D density plot via ggplot(the_data, aes(x=qual, y=quant)) + geom_density_2d()
F. Treemap via ggplot(the_data) + geom_treemap(aes(fill=qual,area=quant))
G. Boxplots via ggplot(the_data,aes(x=qual,y=quant)) + geom_boxplot()
Answer the parts (b) and (c) below based on a dataset on used Ford car prices in 2009 collected by undergraduate
students at Macalester College. The data include the final numerical grade for the course, the corresponding letter
grade (under the McGill rubric), and whether a student had taken the pre-requisite courses. Code chunk Q2 gives a
description of the data.
(b) [15 marks] Figure 2 shows two different panels examining the relationship between mileage and price.
(i) [5 marks] The plot in panel (b) was generated with the same code used to generate the plot in panel
(a), except that one additional aesthetic was added, i.e. there is one piece of code of the form + ??????.
What was used in the place of ?????? in the code?
(ii) [5 marks] Based on the plot in panel (a), describe the nature of the association between price and
mileage. Be sure to indicate whether you believe the two variables are correlated or not. Explain your
answer in 3 sentences or fewer.
(iii) [5 marks] Based on the plot in panel (b), would you say that the association you described in part (b)
for mileage and price is similar across locations or different? Explain your answer in 3 sentences or
(c) [15 marks] Figure 3 shows two different panels comparing locations by the distribution of car models.
(i) [5 marks] What other plot would give a similar (but not the same) representation of the data to the
plot in panel (d)? Give the name of the plot (not the R function).
(ii) [5 marks] Which panel do you feel best allows you to compare the distribution of Models across
Location? Explain your answer in 3 sentences or fewer. Hint: Note the code to generate the two panels
are quite similar, except in one respect.
(iii) [5 marks] Based on these plots, do you feel that there is visual evidence that the relative proportions of
Models vary across Locations? Explain your answer in 2 sentences or fewer.
MATH 208 Midterm Exam October 20th, 2020
R plots and output
Code chunk Q1
Toy_Story<- list(
Movie=c("Toy Story","Toy Story 2","Toy Story 3","Toy Story 4"),
Year_of_Release=c(1995,2000,2010,2019),
World_Wide_Gross=c(363,487,1066,1073)
Fun_Movies <- list(
TS = tibble(Movie=c("Toy Story","Toy Story 4","Toy Story 3","Toy Story 2"),
Year_of_Release=c(1995,2019,2010,2000),
World_Wide_Gross=c(363,1073,1066,487)),
FR = tibble(Movie=c("Frozen","Frozen 2"),
Year_of_Release=c(2013,2019),
World_Wide_Gross=c(1280,1450)),
IJ = tibble(Movie=c("Raiders of the Lost Ark", "Kingdom of the Crystal Skull",
"Last Crusade","Temple of Doom"),
Year_of_Release = c(1981,2008,1989,1984),
World_Wide_Gross=c(390,790,474,333))
ggplot(IJ_only,aes(x=Year_of_Release,y=World_Wide_Gross)) +
geom_line() + geom_point(aes(col=Movie),size=4)
1980 1990 2000
Year_of_Release
Temple of Doom
Last Crusade
Kingdom of the Crystal Skull
Figure 1: Indiana Jones Franchise
MATH 208 Midterm Exam October 20th, 2020
Code chunk Q2
glimpse(Used_Fords)
Columns: 7
$ Year
$ Mileage
$ Price
$ Color
$ Location
$ Model
$ Age
Used_Fords %>% count(Location)
Location n
1 Cambridge 141
2 Dallas 129
3 Fresno 20
4 Philadelphia 137
5 Phoenix 65
6 St Paul 107
0 50000 100000 150000
Philadelphia Phoenix St Paul
Cambridge Dallas Fresno
0 50000 100000 150000 0 50000 100000 150000 0 50000 100000 150000
Final Grade
Figure 2: 2009 Used Ford Prices by Mileage and Location
MATH 208 Midterm Exam October 20th, 2020
0.00 0.25 0.50 0.75 1.00
Figure 3: 2009 Used Ford Models by Location
Question 1 [50 points]
Question 2 [50 pts]
R plots and output
Code chunk Q1
Code chunk Q2
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com