—
title: “STA 108 Project”
output: html_document
—
“`{r, echo=FALSE, warning=FALSE,include=FALSE}
knitr::opts_chunk$set(echo = FALSE)
“`
“`{r, echo=FALSE, warning=FALSE,include=FALSE}
pacman::p_load(AER,tidyverse)
“`
“`{r, echo=FALSE, warning=FALSE,include=FALSE}
data(“STAR”)
“`
Lei Han–(Q1,Q2,Q9)
Carmen Li–(Q4,Q5,Q6,introduction)
Yihui Yang–(Q3,Q7,Q8)
# Introduction
In this project, we try to figure out whether the math scores will have the relationship with the different class type and size. If they have correlation, we want to know if each class type have significantly influence with the math scores and each class type if they have big different or not. Moreover,how strong enough to claim their relation. So we are trying to provide evidences to prove the these questions.
“`{r, echo=FALSE, warning=FALSE}
#gender
STAR %>%
group_by(stark,gender) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(gender),y = Count,fill = gender)) +
geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0)+
facet_wrap(~stark)+
labs(x= “Gender”, y= “Number of student”,
title = “Student number in differnet class type”)
“`
“`{r, echo=FALSE, warning=FALSE}
#ethnicity
STAR %>%
group_by(stark,ethnicity) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(ethnicity),y = Count,fill = ethnicity)) +
geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0)+
facet_wrap(~stark)+
labs(x= “Ethnicity”, y= “Number of student”,
title = “Student number in differnet class type”)
“`
“`{r, echo=FALSE, warning=FALSE}
#birth
STAR %>% #filter(stark==”small”) %>%
group_by(stark,birth) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(birth),y = Count,fill = as.factor(birth))) +
geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0,size = 2.5)+
facet_wrap(~stark)+
labs(x= “Birth Quarter”, y= “Number of student”,
title = “Student number in differnet class type”)
“`
“`{r, echo=FALSE, warning=FALSE}
# drop observations with NA value in math1
Math1 <- STAR %>% filter(!is.na(math1))
# check if there is any NA in stark
sum(is.na(Math1$stark))
Math1$stark.small[Math1$stark==”small”] <- 1
Math1$stark.small[Math1$stark!="small"|is.na(Math1$stark)] <- 0
# generate stark.small to represent the child went to small type class
Math1$stark.regular[Math1$stark=="regular"] <- 1
Math1$stark.regular[Math1$stark!="regular"|is.na(Math1$stark)] <- 0
# generate stark.regular to represent the child went to regular type class
Math1$stark.regularaide[Math1$stark=="regular+aide"] <- 1
Math1$stark.regularaide[Math1$stark!="regular+aide"|is.na(Math1$stark)] <- 0
# generate stark.regularaide represent the child went to regular-with-aide type of class
# stark.small = stark.regular = stark.regularaide = 0 represent no STAR class was attended
```
# Method
**$$y_i=\beta_0+\beta_1X_1+\beta_2X_2+\epsilon_i$$**
$y_i$ is math score in the 1st grade
$x_1$ is number of students assigned to the small class
$x_2$ is number of students assigned to the regular class
$\beta_0$ is the average of math score when the students in both small classes and regular class are zero
$\beta_1$ is the average change of math score when the student-number of small class have one-unit change
$\beta_2$ is the average change of math score when the student-number of regular class have one-unit change
$\epsilon_i$ is the error between the ture value and the fit value
# Result
**Fitting model**
**$$y_i=\beta_0+\beta_1X_1+\beta_2X_2+\beta_3X_3+\epsilon_i$$**
**fitting Summary statistics:**
| Codfficient | Estimate | Std.Error | t_value | Pr(>|t|) |
|————————–|———————|——————|—————–|———————|
| Intercept | 522.4761 | 0.9129 | 572.357 | < 2e-16 |
| Small | 18.7197 | 1.4673 | 12.758 | <2e-16 |
| regular | 9.2512 | 1.4302 | 6.469 | 1.06e-10 |
| regular_with_aide | 8.7684 | 1.4143 | 6.200 | 5.99e-10 |
| Residual standard error: | 42.58 | | | |
| Degree of freedom: | 6596 | | | |
| Multiple R-squared: | 0.0245 | | | |
| F-statistic: | 55.22 | | | |
| P-value: | <2.2e-16 | | | |
From the table as above, we can know:
$\hat\beta_0$ is 522.4761, a estimator of $\beta_0$: the average of math score when the students in both small classes and regular class are zero
$\hat\beta_1$ is 18.7197, a estimator of $\beta_1$: the average change of math score when the student-number of small class have one-unit change
$\hat\beta_2$ is 9.2512,a estimator of $\beta_2$: the average change of math score when the student-number of regular class have one-unit change
For the 95% confident interval for the coeficient of class types $\hat\beta_1$ and $\hat\beta_2$, also, we can use theformula:
**$$C.I.=[(\hat\beta_i-t_{1-\alpha/4} \hat{SE}_i ),(\hat\beta_i+t_{1-\alpha/4} \hat{SE}_i )]$$**
“`{r, echo=FALSE, warning=FALSE,include=FALSE}
model <- lm(math1~stark.small+stark.regular+stark.regularaide,data = Math1)
confint(model,level=0.975)[2,]
confint(model,level=0.975)[3,]
```
$\hat\beta_1$ confident interval:(15.43010, 22.00925)
$\hat\beta_2$ confident interval:(6.447590, 12.05475)
For the $\hat\beta_1$ in different samplings for the dataset, we have 95% confident believe that the $\beta_1$ which the mean change of math score when the student-number of small class have one-unit change will exist between 15.43010 and 22.00925.
For the $\hat\beta_2$in different samplings for the dataset, we have 95% confident believe that the $\beta_2$ which the average change of math score when the student-number of regular class have one-unit changewill exist between 6.447590 and 12.05475.
“`{r, echo=FALSE, warning=FALSE}
model <- lm(math1~stark.small+stark.regular+stark.regularaide,data = Math1)
plot(model)
```
math1 = 522.4761 + 18.7197*stark.small + 9.2512*stark.regular + 8.7684*stark.regularaide
child does not attend STAR class has a average point of 522.48
So child went to small class has an average 18.7197 points higher than no-attend-STAR child
child went to regular class has an average 9.2512 points higher than no-attend-STAR child
child went to regular-with-aide class has an average 8.768 points higher than no-attend-STAR child
# Conclusion
# Apeedix
library(AER)
data(STAR)
STAR %>%
group_by(stark,gender) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(gender),y = Count,fill = gender)) +geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0)+facet_wrap(~stark)+labs(x= “Gender”, y= “Number of student”,
title = “Student number in differnet class type”)
STAR %>%
group_by(stark,ethnicity) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(ethnicity),y = Count,fill = ethnicity)) +geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0)+
facet_wrap(~stark)+labs(x= “Ethnicity”, y= “Number of student”,
title = “Student number in differnet class type”)
STAR %>% #filter(stark==”small”) %>%
group_by(stark,birth) %>%
summarise(Count = n()) %>%
ggplot(aes(x = as.factor(birth),y = Count,fill = as.factor(birth))) +geom_bar(stat = “identity”)+
geom_text(aes(label=Count),vjust = 0,size = 2.5)+facet_wrap(~stark)+labs(x= “Birth Quarter”, y= “Number of student”,
title = “Student number in differnet class type”)
Math1 <- STAR %>% filter(!is.na(math1))
sum(is.na(Math1$stark))
Math1$stark.small[Math1$stark==”small”] <- 1
Math1$stark.small[Math1$stark!="small"|is.na(Math1$stark)] <- 0
Math1$stark.regular[Math1$stark==”regular”] <- 1
Math1$stark.regular[Math1$stark!=”regular”|is.na(Math1$stark)] <- 0
Math1$stark.regularaide[Math1$stark==”regular+aide”] <- 1
Math1$stark.regularaide[Math1$stark!=”regular+aide”|is.na(Math1$stark)] <- 0
model <- lm(math1~stark.small+stark.regular+stark.regularaide,data = Math1)
confint(model,level=0.975)[2,]
confint(model,level=0.975)[3,]
model <- lm(math1~stark.small+stark.regular+stark.regularaide,data = Math1)
plot(model)