—
title: “Student Performance Analysis”
date: “October 16, 2018”
output: html_document
—
“`{r setup, include=TRUE}
knitr::opts_chunk$set(echo = FALSE)
library(ggplot2)
library(randomForest)
library(car)
standard<-read.csv("Student_data_standard.csv",header=T,sep=",")
colnames(standard)[1]<-c("Year")
advanced<-read.csv("Student_data_advanced.csv",header=T,sep=",")
colnames(advanced)[1]<-c("Year")
```
##Distribution of Grades by year
below charts provides us with the distribution of grades by year. this helps us understand better the trend of performance within the university. closely observing the graph, the number of pass grades seems not to change significantly, althoug a small decrease was recorded on the year 2015,fails increased from 2012 to 2015 but a slight decline started declining from the year 2015,
Distinctions grades was found to increase athrough the period 2012 to 2017, High distinctions increased fotr the years 2012-2015 but started declining for the years after 2015.the year 2015. recorded high grades than the other years.
```{r}
d<-data.frame(table(standard$Year,standard$Unit.of.Study.Grade))
colnames(d)<-c("year","Unit.of.Study.Grade","Freq")
ggplot()+geom_col(data=d,aes(x=year,y=Freq,fill=Unit.of.Study.Grade),position = "dodge")
ggplot(data=d,aes(x=year,y=Freq,group=Unit.of.Study.Grade))+
geom_line(aes(color = Unit.of.Study.Grade))+
geom_point(aes(color = Unit.of.Study.Grade))
```
##Classification
a classification model particularly a random forest model was run to predict student grades based on; Year of study, Whether the student is a domestic or an international student, Gender, Mode of study, Age of the student, Unit of Study and Unit of Study Level. the model is as follows.
```{r}
#split data into trainining and validation set
set.seed(100)
dat<-advanced[,-1]
ind<-sample(2,nrow(dat),
replace=T,
prob =c(.70,0.3))
training<-dat[ind==1,]
testing<-dat[ind==2,]
classi<-randomForest(Unit.of.Study.Grade~.,data=training,importance=T, ntree=10)
classi
```
##Prediction
the prediction accuracy of the model is as follows
```{r}
#predicting on the training set
predictionTrain<-predict(classi,training,type="class")
#classification
table(predictionTrain,training$Unit.of.Study.Grade)
#predicting on the validation set
predictionTest<-predict(classi,testing,type="class")
#classification
mean(predictionTest==testing$Unit.of.Study.Grade)
table(predictionTest,testing$Unit.of.Study.Grade)
```
##Checking for important variables
importance of the
```{r}
importance(classi)
varImpPlot(classi)
```
##Linear model
a one way anova to check if the number of the number of grades with the preceding attributes differ across the variables Gender,Mode,Unit.of.Study.Level and Domestic.Intl.the model is as below
the results are as followed;
levene test results are significant(P<0.05) indicating that the group variances are not constant hence anova may not work properly
```{r}
leveneTest(Count~Gender*Mode*Unit.of.Study.Level*Domestic.Intl,data = standard)
```
##the fitted anova model
the fitted anova model had all the p values significant (p<0.05) which means that Gender,Mode,Unit.of.Study.Level and Domestic.Intl. affects variable number of grades with the preceding attributes significantly
```{r}
fit<-aov(standard$Count~standard$Gender+standard$Mode+standard$Unit.of.Study.Level+standard$Domestic.Intl)
summary(fit)
```
##More assumption check
the assumptions of anova was checked futher, histogram of the residuals was found to be positively skewed while a Q-Q plot of the same is not linear normality is therefore not met.the. a residual plot shows that the variance is not constant as fitted values increase, homoscedasticity is also not met.a kruskal wallis was used to confirm the anova results because the above violation of assumptions migh affect accuracy.
```{r}
plot(fit)
hist(residuals(fit),col="cyan")
```
##kruskal wallis test
the test had all the variables except gender significant(p<0.05)which indicates that number of grades with the preceding is depedent on the variables
```{r}
kruskal.test(Count~ Gender,data=standard)
kruskal.test(Count~ Unit.of.Study.Level,data=standard)
kruskal.test(Count~ Domestic.Intl,data=standard)
kruskal.test(Count~ Mode,data=standard)
```