代写 data structure html math Midterm Project

Midterm Project
Math 2820L
Feb 27, 2019
library(dplyr)
library(ggplot2)
library(fivethirtyeight)
Problems should be done independently. No any help from other people.
Question:
I. Probability (10pts)
Choose one probability problem and solve it by Monte Carlo method.
II. Third smallest of a dice (12pts)
1. (2 pts) Make a function that roll 7 dices and return the third smallest of them.

2. (5 pts) Use Monte Carlo to calculate the probability that the third smallest number is equal to \(1\), and notate it as \(P_1\). Here you need to use set.seed(100) and generate 10,000 cases.

3. (3 pts) Repeat step (2) for \(k = 2, \dots, 6\), i.e. calculate \(P_2,P_3, P_4, P_5, P_6\). Use data.frame to create a data PDF that has two variables, which are \(face = \{1,2,3,4,5,6\}\) and \(P = \{P_1, P_2, P_3, P_4, P_5, P_6\}.\) Use geom_line() to make a line chart of the \(PDF\) function.

4. (2 pt) Make a line chart of the \(CDF\) function.

III. Births in US from 2000 to 2014 (18pts)
Visualizations alone can tell a story. We??ll be using the US_births_2000-2014 data within the fivethirtyeight package.
# Load the data
data(US_births_2000_2014)
1. (2 pts) Familiarize yourself with the data structure. How many cases are in the data? What’s one observation/case?

2. (9 pts) Focus on 2014:

a. (1 pt) Use filter() to create a new data set births_2014 that only contains the 2014 data.

b. (2 pt) Construct a univariate visualization of the variability in births from day to day in 2014.

c. (1 pt) The time of year might explain some of this variability. Construct & interpret a plot that illustrates the relationship between births (y-axis) and date in 2014.

d. (2 pts) One goofy thing that stands out are the 2-3 distinct groups of points. Add a layer to this plot that explains the distinction between these groups.

e. (1 pt) There are some exceptions to the rule in part d, ie. some cases that should belong to group 1 but behave like the cases in group 2. Explain why these cases are exceptions – what explains the anomalies / why these are special cases?

f. (2 pt) Summarize your investigation in 1-2 sentences. Be sure to comment on both the goofy thing that stands out are the 2-3 distinct groups of points as well as trends across the months.

3. (3 pts) Look at all years
The data set US_births_1994_2003 data set contains similar data from the previous decade. Combine the US_births_1994_2003 and US_births_2000_2014 into 1 data table using full_join:
all_years <- full_join(US_births_1994_2003, US_births_2000_2014) ## Joining, by = c("year", "month", "date_of_month", "date", "day_of_week", "births") a. (1 pt) Construct a visualization of birth trends across all years.
 b. (2 pt) Summarize your investigation in 1-2 sentences. Be sure to comment on both the common seasonal trends within years as well as trends across the years.
 4. (4 pts) Friday the 13th Some people are superstitious about Friday the 13th. a. (1 pt) In one code chunk, create a new data set friday_only that only contains births that occur on Fridays and create a new variable within this data set, fri_13, that indicates whether the case falls on a Friday in the 13th date of the month.
 b. (2 pts) Using the friday_only data, construct and interpret a visualization that illustrates the distribution of births among Fridays that fall on & off the 13th.
 c. (1 pt) Do you see any evidence of superstition?
 Due: 3pm on March 20. Please submit both .Rmd and .pdf file. After you knit the file into .html, use the print to pdf option to generate the .pdf file.