US1994-2003CDCNCHS. is U.S. birth data for the years 1994 to 2003 data set of dimension 3653×5
US2000-2014SSA. is U.S. birth data for the years 2000 to 2014 data set of dimension 5480×5
➢ Questions to be Addressed
Is there any relationship between variables?
number of births vs day of the week
number of births vs date of the month
number of births vs month
Are there any seasonal/holiday trends to birthdates?
Do fall months have more babies than other months?
Influence medical intervention on particular date birth counts
Are there fewer babies born on Friday the 13th?
Are there more babies delivered just before Christmas?
This result would dispel or confirm the hypothesis that obstetricians are performing early csections so they can be home for the holidays.
Are there any other underlying temporal trends?
Methodologies
We will utilize linear regression models and outliers/influential point analysis to help analyze our dataset. The linear regression models can help us compare the data to a best fit result with the outliers taken care of and out of the calculation. Significant points, homoscedasticity, and the necessity of variable transformations will be determined through graphical analysis. Additionally, seasonality and superstition may also contribute to any spikes or decreases of birthrate. For these possibilities, we are planning to use graphs ggplots to display any generalized trends within subgroups, e.g. grouping birthdates by season or month to answer questions about birth trends. Testing variable interactions, specifically between the dayoftheweek and dayofthe month will be useful in determining any significance in Friday the 13th birth rates as well.