程序代写代做代考 decision tree —


title: “Modelling”
author: “Jack Bill”
date: “October 5, 2018”
output: html_document

“`{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
“`

In this R project, we are interested int knowing the relationship between Projected Households and other variables, is there significant or non significant relationship? To achieve this, we will be fitting three models, namely decision tree, logistic regression and linear regression. Before fitting the model
we started by carrying out correlation test between the variables

Note:
We shortened our variable names so that there will be enough space to contain the graphics. Here are the new names of our variables

ph- Projected Households
php- Projected_Households_prev
psg- Percentage of Shopper Group
psgp- Percentage of Shopper Group Prev
bspt- Basket Spend per Trip
bsptp – Basket Spend per Trip Prev
ps- Projected Sales
psp- Projected Sales Previous
bupt- Basket Units per Trip
buptp- Basket Units per Trip Prev
spt- Spend per Trip
sptp- Spend_per_Trip_Prev
perc_ hp-Percent of Household Penetration
Perc_ hpp-Percent of Household Penetration Prev
b_r- Buying rate
b_rp- Buying_Rate_Prev
pf- Purchase Frequency
p_ fp-Purchase_Frequency_Prev
r_ bc-Raw_Buyer_Count
r_ bcp-Raw_Buyer_Count_Prev
“`{r corr}
#jc<-read.csv(file.choose(), header = T) jc<-read.csv("newjc.csv", header = T) cr<-cor(jc[1:20,]) library(ggcorrplot) ggcorrplot(cr,hc.order = TRUE, lab = TRUE) ``` The correlation plot above shows that there is strong relationship between Projected Household and some variables, particularly between Projected Household(Ph ) and Raw Buyer Count (r_bc) , Percentage of Shopper Group (psg) We proceeded to fitting the decision tree model ```{r corr} library(caret) jc<-na.omit(jc) dtree <-train(ph ~ .,data=jc, method="rpart") suppressMessages(library(rattle)) fancyRpartPlot(dtree$finalModel) ``` The results of the Decision three above shows that the variables Percent of Shopper Group (psg) and Raw Buyer count(r_bc) are significant relationship on the projected Household. From the results obtained above,Percent of Shopper Group (psg) accounted for 98% and Raw buyer Count (r_bc) account for 90% of the variation in project household. We proceeded to fitting the logistic regression model. To fit the logistic, we categorized the Projected Household into two categories based on the mean of Projected Household, the categories are "Low" meaning less than the mean and "High" meaning greater than or equal the mean .The added new variable is called ph_cat (Projected household Categorized).This was done because two categories are needed in fitting Binary logistic regress.ion ```{r } jc$ph_cat[jc$ph < 248369] <- "low" jc$ph_cat[jc$ph >= 248369] <- "High" jc$ph_cat = factor(jc$ph_cat) suppressWarnings(logis<-glm(ph_cat ~ ., data = jc[2:21], family = binomial)) summary(logis) ``` From the result obtained from the logistic regression above, it appears that non of our independent variables are significant on the dependent variable as all of the have p-values greater than 0.05. This indicates that the logistic regression model fitted does not perform well. Finally we proceeded to fitting the linear regression model ```{r } lmodel<-lm(ph~.,data = jc[,-21]) summary(lmodel) ``` The output of the regression model shows that the Raw Buyer Count(r_bc) appears to be the only significant variable on our depended variable Project Household (ph). This results is partially inline with the results obtained from the Decision Tree. Concluson Based on the results obtained from the Decision Tree and Linear regression model, we can conclude that the variable Percent of Shopper Group(psg) and Raw Buyer Count (r_bc) have significant relationship with the dependent variable , Projected Household(ph)