代写 algorithm statistic Part 2.1

Part 2.1
(a)
set.seed(1122)
train <- read.csv("HW2/adult-train.csv") index.train <- numeric(ncol(train)) for(i in 1:ncol(train)) { index.train[i] <- sum(train[,i] == "?") } which(index.train > 0)
## [1] 2 7 14
train <- train[-which(train[,2] == "?"),] train <- train[-which(train[,7] == "?"),] train <- train[-which(train[,14] == "?"),] dim(train) ## [1] 30161 15 test <- read.csv("HW2/adult-test.csv") index.test <- numeric(ncol(test)) for(i in 1:ncol(test)) { index.test[i] <- sum(test[,i] == "?") } which(index.test > 0)
## [1] 2 7 14
test <- test[-which(test[,2] == "?"),] test <- test[-which(test[,7] == "?"),] test <- test[-which(test[,14] == "?"),] dim(test) ## [1] 15060 15 (b) library(rpart) library(rpart.plot) model <- rpart(income~.,data=train) rpart.plot(model, extra=104, fallen.leaves = T, type=4, main="Rpart on data (Full Tree)")  (i) The top three important predictors in the model are: relationship, education, capital_gain. (ii) 1. The first split is done on the predictor relationship.
 2. The predicted class of the first node is <= 50k.
 3. The distribution of observations between the “<=50K” and “>50K” classes at first node is that “<=50K” occupy about 75% while “>50K” occupy about 25%.

(c)
(i)
library(caret)
pred <- predict(model, test, type="class") confusionMatrix(pred, test$income, positive = ">50K”)
## Confusion Matrix and Statistics
##
## Reference
## Prediction <=50K >50K
## <=50K 10772 1837 ## >50K 588 1863
##
## Accuracy : 0.839
## 95% CI : (0.833, 0.8448)
## No Information Rate : 0.7543
## P-Value [Acc > NIR] : < 2.2e-16 ## ## Kappa : 0.5098 ## Mcnemar's Test P-Value : < 2.2e-16 ## ## Sensitivity : 0.5035 ## Specificity : 0.9482 ## Pos Pred Value : 0.7601 ## Neg Pred Value : 0.8543 ## Prevalence : 0.2457 ## Detection Rate : 0.1237 ## Detection Prevalence : 0.1627 ## Balanced Accuracy : 0.7259 ## ## 'Positive' Class : >50K
##
the balanced accuracy of the model is (0.5035+0.9482)/2 = 0.726.
(ii)
the balanced error rate of the model is 1 – 0.726 = 0.274.
(iii)
the sensitivity is 0.504 and Specificity is 0.948.
(iv)
the AUC of the ROC curve is 0.843. Plot the ROC curve:
# ROC curve
library(ROCR)
pred.rocr <- predict(model, newdata=test, type="prob")[,2] f.pred <- prediction(pred.rocr, test$income ) f.perf <- performance(f.pred, "tpr", "fpr") plot(f.perf, colorize=T, lwd=3) abline(0,1)  auc <- performance(f.pred, measure = "auc") cat(paste("The area under curve (AUC) for this model is ", round(auc@y.values[[1]], 3))) ## The area under curve (AUC) for this model is 0.843 (d) printcp(model) ## ## Classification tree: ## rpart(formula = income ~ ., data = train) ## ## Variables actually used in tree construction: ## [1] capital_gain education relationship ## ## Root node error: 7508/30161 = 0.24893 ## ## n= 30161 ## ## CP nsplit rel error xerror xstd ## 1 0.129995 0 1.00000 1.00000 0.0100018 ## 2 0.064198 2 0.74001 0.74001 0.0089670 ## 3 0.037294 3 0.67581 0.67581 0.0086527 ## 4 0.010000 4 0.63852 0.63852 0.0084574 cpx=model$cptable[which.min(model$cptable[,"xerror"]), "CP"] cpx ## [1] 0.01 the complexity table of the model shows that the CP = 0.01 shows lowest error, so we do not need to prune tree, as the default Cp used in previous is CP = 0.01 already. (e) (i) table(train$income) ## ## <=50K >50K
## 22653 7508
22653 are in the class “<=50K” and 7508 are in the class “>50K”.
(ii)
d <- sample(which(train$income == ">50K”),22653-7508,replace=TRUE)
newtrain <- rbind(train,train[d,]) table(newtrain$income) ## ## <=50K >50K
## 22653 22653
(iii)
model2 <- rpart(income~.,data=newtrain) pred <- predict(model2, test, type="class") confusionMatrix(pred, test$income, positive = ">50K”)
## Confusion Matrix and Statistics
##
## Reference
## Prediction <=50K >50K
## <=50K 8785 616 ## >50K 2575 3084
##
## Accuracy : 0.7881
## 95% CI : (0.7815, 0.7946)
## No Information Rate : 0.7543
## P-Value [Acc > NIR] : < 2.2e-16 ## ## Kappa : 0.5149 ## Mcnemar's Test P-Value : < 2.2e-16 ## ## Sensitivity : 0.8335 ## Specificity : 0.7733 ## Pos Pred Value : 0.5450 ## Neg Pred Value : 0.9345 ## Prevalence : 0.2457 ## Detection Rate : 0.2048 ## Detection Prevalence : 0.3758 ## Balanced Accuracy : 0.8034 ## ## 'Positive' Class : >50K
##
(i)
the balanced accuracy of the model is (0.8335+0.7733)/2 = 0.803.
(ii)
the balanced error rate of the model is 1 – 0.803 = 0.197.
(iii)
the sensitivity is 0.834 and Specificity is 0.773.
(iv)
The AUC of the ROC curve is 0.845. Plot the ROC curve:
pred.rocr <- predict(model2, newdata=test, type="prob")[,2] f.pred <- prediction(pred.rocr, test$income ) f.perf <- performance(f.pred, "tpr", "fpr") plot(f.perf, colorize=T, lwd=3) abline(0,1)  auc <- performance(f.pred, measure = "auc") cat(paste("The area under curve (AUC) for this model is ", round(auc@y.values[[1]], 3))) ## The area under curve (AUC) for this model is 0.845 (f) From the differences in the balanced accuracy, sensitivity, specificity, positive predictive value and AUC of the models used in (c) and (e). It can be founded that the model based on balanced data performs better overall excepted with a lower specificity. So it indicates that using balanced data is a better choice. Part 2.2 (a) library(arules) product <- read.csv("HW2/products.csv",header=FALSE) product.name <- as.character(product[,2]) names(product.name) <- product[,1] d1 <- readLines("HW2/tr-1k.csv") total <- c() for(i in 1:length(d1)) { a <- strsplit(d1[i],split= ", ")[[1]] #remove id a <- a[-1] res <- product.name[a] b <- paste(res, collapse = ", ") total <- c(total, b) } writeLines(total, "HW2/tr-1k-canonical.csv") d1 <- readLines("HW2/tr-5k.csv") total <- c() for(i in 1:length(d1)) { a <- strsplit(d1[i],split= ", ")[[1]] #remove id a <- a[-1] res <- product.name[a] b <- paste(res, collapse = ", ") total <- c(total, b) } writeLines(total, "HW2/tr-5k-canonical.csv") d1 <- readLines("HW2/tr-20k.csv") total <- c() for(i in 1:length(d1)) { a <- strsplit(d1[i],split= ", ")[[1]] #remove id a <- a[-1] res <- product.name[a] b <- paste(res, collapse = ", ") total <- c(total, b) } writeLines(total, "HW2/tr-20k-canonical.csv") d1 <- readLines("HW2/tr-75k.csv") total <- c() for(i in 1:length(d1)) { a <- strsplit(d1[i],split= ", ")[[1]] #remove id a <- a[-1] res <- product.name[a] b <- paste(res, collapse = ", ") total <- c(total, b) } writeLines(total, "HW2/tr-75k-canonical.csv") (b) frequent itemsets: d1 <- read.transactions("HW2/tr-1k-canonical.csv",sep=",") d2 <- read.transactions("HW2/tr-5k-canonical.csv",sep=",") d3 <- read.transactions("HW2/tr-20k-canonical.csv",sep=",") d4 <- read.transactions("HW2/tr-75k-canonical.csv",sep=",") f1 <- apriori(d1, parameter=list(support=0.03, target="frequent itemsets")) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 frequent itemsets FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 30 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 1000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [49 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [85 set(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(sort(f1, decreasing = T, by="count")) ## items support count ## [1] {Gongolais Cookie} 0.108 108 ## [2] {Truffle Cake} 0.103 103 ## [3] {Tuile Cookie} 0.102 102 ## [4] {Berry Tart} 0.095 95 ## [5] {Hot Coffee} 0.094 94 ## [6] {Coffee Eclair} 0.093 93 ## [7] {Strawberry Cake} 0.091 91 ## [8] {Apple Croissant} 0.091 91 ## [9] {Marzipan Cookie} 0.090 90 ## [10] {Napoleon Cake} 0.090 90 ## [11] {Lemon Cake} 0.085 85 ## [12] {Chocolate Coffee} 0.085 85 ## [13] {Chocolate Cake} 0.084 84 ## [14] {Cherry Tart} 0.084 84 ## [15] {Apple Danish} 0.084 84 ## [16] {Orange Juice} 0.082 82 ## [17] {Raspberry Cookie} 0.082 82 ## [18] {Blueberry Tart} 0.081 81 ## [19] {Apple Tart} 0.079 79 ## [20] {Opera Cake} 0.078 78 ## [21] {Cheese Croissant} 0.078 78 ## [22] {Bottled Water} 0.077 77 ## [23] {Cherry Soda} 0.077 77 ## [24] {Lemon Tart} 0.076 76 ## [25] {Apricot Croissant} 0.076 76 ## [26] {Apricot Danish} 0.075 75 ## [27] {Vanilla Frappuccino} 0.074 74 ## [28] {Blackberry Tart} 0.073 73 ## [29] {Casino Cake} 0.072 72 ## [30] {Raspberry Lemonade} 0.072 72 ## [31] {Apple Pie} 0.068 68 ## [32] {Lemon Lemonade} 0.066 66 ## [33] {Lemon Cookie} 0.066 66 ## [34] {Almond Twist} 0.065 65 ## [35] {Green Tea} 0.062 62 ## [36] {Walnut Cookie} 0.061 61 ## [37] {Single Espresso} 0.059 59 ## [38] {Gongolais Cookie, ## Truffle Cake} 0.058 58 ## [39] {Apricot Tart} 0.056 56 ## [40] {Blueberry Danish} 0.055 55 ## [41] {Marzipan Cookie, ## Tuile Cookie} 0.053 53 ## [42] {Chocolate Tart} 0.051 51 ## [43] {Almond Croissant} 0.049 49 ## [44] {Napoleon Cake, ## Strawberry Cake} 0.049 49 ## [45] {Vanilla Meringue} 0.047 47 ## [46] {Chocolate Cake, ## Chocolate Coffee} 0.047 47 ## [47] {Apricot Danish, ## Cherry Tart} 0.046 46 ## [48] {Ganache Cookie} 0.044 44 ## [49] {Apple Croissant, ## Apple Tart} 0.044 44 ## [50] {Chocolate Croissant} 0.042 42 ## [51] {Apple Croissant, ## Apple Danish} 0.042 42 ## [52] {Almond Tart} 0.041 41 ## [53] {Cherry Tart, ## Opera Cake} 0.041 41 ## [54] {Apple Danish, ## Apple Tart} 0.041 41 ## [55] {Pecan Tart} 0.040 40 ## [56] {Lemon Cake, ## Lemon Tart} 0.040 40 ## [57] {Casino Cake, ## Chocolate Cake} 0.040 40 ## [58] {Apricot Croissant, ## Blueberry Tart} 0.040 40 ## [59] {Apple Croissant, ## Apple Danish, ## Apple Tart} 0.040 40 ## [60] {Casino Cake, ## Chocolate Coffee} 0.039 39 ## [61] {Apricot Danish, ## Opera Cake} 0.039 39 ## [62] {Chocolate Meringue} 0.038 38 ## [63] {Cheese Croissant, ## Orange Juice} 0.038 38 ## [64] {Casino Cake, ## Chocolate Cake, ## Chocolate Coffee} 0.038 38 ## [65] {Apricot Danish, ## Cherry Tart, ## Opera Cake} 0.038 38 ## [66] {Vanilla Eclair} 0.037 37 ## [67] {Apple Tart, ## Cherry Soda} 0.036 36 ## [68] {Chocolate Eclair} 0.034 34 ## [69] {Berry Tart, ## Bottled Water} 0.034 34 ## [70] {Apple Pie, ## Coffee Eclair} 0.033 33 ## [71] {Lemon Cookie, ## Raspberry Cookie} 0.033 33 ## [72] {Blueberry Tart, ## Hot Coffee} 0.033 33 ## [73] {Apple Danish, ## Cherry Soda} 0.033 33 ## [74] {Apple Croissant, ## Cherry Soda} 0.033 33 ## [75] {Apricot Croissant, ## Hot Coffee} 0.032 32 ## [76] {Apricot Croissant, ## Blueberry Tart, ## Hot Coffee} 0.032 32 ## [77] {Blackberry Tart, ## Coffee Eclair} 0.031 31 ## [78] {Lemon Cookie, ## Lemon Lemonade} 0.031 31 ## [79] {Lemon Lemonade, ## Raspberry Cookie} 0.031 31 ## [80] {Apple Danish, ## Apple Tart, ## Cherry Soda} 0.031 31 ## [81] {Apple Croissant, ## Apple Tart, ## Cherry Soda} 0.031 31 ## [82] {Apple Croissant, ## Apple Danish, ## Cherry Soda} 0.031 31 ## [83] {Apple Croissant, ## Apple Danish, ## Apple Tart, ## Cherry Soda} 0.031 31 ## [84] {Almond Twist, ## Coffee Eclair} 0.030 30 ## [85] {Lemon Cookie, ## Raspberry Lemonade} 0.030 30 f2 <- apriori(d2, parameter=list(support=0.03, target="frequent itemsets")) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 frequent itemsets FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 150 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 5000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [85 set(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(sort(f2, decreasing = T, by="count")) ## items support count ## [1] {Coffee Eclair} 0.1108 554 ## [2] {Hot Coffee} 0.1026 513 ## [3] {Tuile Cookie} 0.0998 499 ## [4] {Strawberry Cake} 0.0960 480 ## [5] {Gongolais Cookie} 0.0954 477 ## [6] {Orange Juice} 0.0922 461 ## [7] {Cherry Tart} 0.0920 460 ## [8] {Apricot Danish} 0.0892 446 ## [9] {Truffle Cake} 0.0876 438 ## [10] {Blueberry Tart} 0.0852 426 ## [11] {Lemon Cake} 0.0850 425 ## [12] {Marzipan Cookie} 0.0836 418 ## [13] {Opera Cake} 0.0836 418 ## [14] {Apricot Croissant} 0.0822 411 ## [15] {Napoleon Cake} 0.0818 409 ## [16] {Almond Twist} 0.0814 407 ## [17] {Chocolate Coffee} 0.0808 404 ## [18] {Chocolate Cake} 0.0798 399 ## [19] {Berry Tart} 0.0782 391 ## [20] {Apple Danish} 0.0782 391 ## [21] {Apple Pie} 0.0782 391 ## [22] {Bottled Water} 0.0772 386 ## [23] {Cheese Croissant} 0.0770 385 ## [24] {Chocolate Tart} 0.0762 381 ## [25] {Blackberry Tart} 0.0760 380 ## [26] {Casino Cake} 0.0746 373 ## [27] {Apple Croissant} 0.0742 371 ## [28] {Vanilla Frappuccino} 0.0734 367 ## [29] {Apple Tart} 0.0734 367 ## [30] {Lemon Tart} 0.0708 354 ## [31] {Walnut Cookie} 0.0706 353 ## [32] {Raspberry Lemonade} 0.0678 339 ## [33] {Cherry Soda} 0.0670 335 ## [34] {Single Espresso} 0.0654 327 ## [35] {Lemon Lemonade} 0.0648 324 ## [36] {Lemon Cookie} 0.0642 321 ## [37] {Raspberry Cookie} 0.0640 320 ## [38] {Green Tea} 0.0620 310 ## [39] {Apricot Danish,Cherry Tart} 0.0512 256 ## [40] {Marzipan Cookie,Tuile Cookie} 0.0496 248 ## [41] {Gongolais Cookie,Truffle Cake} 0.0472 236 ## [42] {Vanilla Eclair} 0.0460 230 ## [43] {Almond Croissant} 0.0456 228 ## [44] {Chocolate Meringue} 0.0452 226 ## [45] {Pecan Tart} 0.0444 222 ## [46] {Apricot Croissant,Blueberry Tart} 0.0440 220 ## [47] {Cherry Tart,Opera Cake} 0.0436 218 ## [48] {Chocolate Croissant} 0.0432 216 ## [49] {Apricot Danish,Opera Cake} 0.0432 216 ## [50] {Cheese Croissant,Orange Juice} 0.0430 215 ## [51] {Almond Bear Claw} 0.0428 214 ## [52] {Apricot Tart} 0.0422 211 ## [53] {Napoleon Cake,Strawberry Cake} 0.0422 211 ## [54] {Almond Twist,Coffee Eclair} 0.0412 206 ## [55] {Apricot Danish,Cherry Tart,Opera Cake} 0.0408 204 ## [56] {Apple Pie,Coffee Eclair} 0.0406 203 ## [57] {Blueberry Danish} 0.0400 200 ## [58] {Vanilla Meringue} 0.0398 199 ## [59] {Chocolate Cake,Chocolate Coffee} 0.0394 197 ## [60] {Almond Twist,Apple Pie} 0.0394 197 ## [61] {Ganache Cookie} 0.0388 194 ## [62] {Almond Tart} 0.0386 193 ## [63] {Chocolate Eclair} 0.0382 191 ## [64] {Almond Twist,Apple Pie,Coffee Eclair} 0.0382 191 ## [65] {Berry Tart,Bottled Water} 0.0366 183 ## [66] {Blackberry Tart,Coffee Eclair} 0.0356 178 ## [67] {Blueberry Tart,Hot Coffee} 0.0350 175 ## [68] {Chocolate Tart,Vanilla Frappuccino} 0.0348 174 ## [69] {Apricot Croissant,Hot Coffee} 0.0348 174 ## [70] {Casino Cake,Chocolate Coffee} 0.0346 173 ## [71] {Casino Cake,Chocolate Cake} 0.0342 171 ## [72] {Coffee Eclair,Hot Coffee} 0.0338 169 ## [73] {Lemon Cake,Lemon Tart} 0.0336 168 ## [74] {Apple Pie,Hot Coffee} 0.0336 168 ## [75] {Almond Twist,Hot Coffee} 0.0336 168 ## [76] {Apple Croissant,Apple Danish} 0.0330 165 ## [77] {Apricot Croissant,Blueberry Tart,Hot Coffee} 0.0328 164 ## [78] {Apple Danish,Apple Tart} 0.0324 162 ## [79] {Apple Croissant,Apple Tart} 0.0316 158 ## [80] {Blackberry Tart,Single Espresso} 0.0314 157 ## [81] {Casino Cake,Chocolate Cake,Chocolate Coffee} 0.0312 156 ## [82] {Almond Twist,Apple Pie,Hot Coffee} 0.0308 154 ## [83] {Apple Pie,Coffee Eclair,Hot Coffee} 0.0308 154 ## [84] {Almond Twist,Coffee Eclair,Hot Coffee} 0.0308 154 ## [85] {Almond Twist,Apple Pie,Coffee Eclair,Hot Coffee} 0.0308 154 f3 <- apriori(d3, parameter=list(support=0.03, target="frequent itemsets")) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 frequent itemsets FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 600 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 20000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.01s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [80 set(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(sort(f3, decreasing = T, by="count")) ## items support count ## [1] {Coffee Eclair} 0.10985 2197 ## [2] {Hot Coffee} 0.10360 2072 ## [3] {Tuile Cookie} 0.09865 1973 ## [4] {Apricot Danish} 0.09270 1854 ## [5] {Orange Juice} 0.09240 1848 ## [6] {Strawberry Cake} 0.09200 1840 ## [7] {Gongolais Cookie} 0.09185 1837 ## [8] {Cherry Tart} 0.09125 1825 ## [9] {Marzipan Cookie} 0.08645 1729 ## [10] {Lemon Cake} 0.08600 1720 ## [11] {Truffle Cake} 0.08465 1693 ## [12] {Napoleon Cake} 0.08450 1690 ## [13] {Berry Tart} 0.08430 1686 ## [14] {Blueberry Tart} 0.08390 1678 ## [15] {Chocolate Cake} 0.08365 1673 ## [16] {Opera Cake} 0.08365 1673 ## [17] {Chocolate Coffee} 0.08225 1645 ## [18] {Cheese Croissant} 0.08170 1634 ## [19] {Apricot Croissant} 0.08170 1634 ## [20] {Vanilla Frappuccino} 0.07675 1535 ## [21] {Blackberry Tart} 0.07670 1534 ## [22] {Chocolate Tart} 0.07635 1527 ## [23] {Lemon Tart} 0.07595 1519 ## [24] {Casino Cake} 0.07530 1506 ## [25] {Apple Pie} 0.07415 1483 ## [26] {Almond Twist} 0.07315 1463 ## [27] {Bottled Water} 0.07215 1443 ## [28] {Single Espresso} 0.07110 1422 ## [29] {Apple Croissant} 0.07100 1420 ## [30] {Walnut Cookie} 0.06950 1390 ## [31] {Raspberry Cookie} 0.06945 1389 ## [32] {Apple Tart} 0.06925 1385 ## [33] {Raspberry Lemonade} 0.06845 1369 ## [34] {Lemon Cookie} 0.06825 1365 ## [35] {Apple Danish} 0.06755 1351 ## [36] {Lemon Lemonade} 0.06655 1331 ## [37] {Cherry Soda} 0.06530 1306 ## [38] {Green Tea} 0.06215 1243 ## [39] {Apricot Danish,Cherry Tart} 0.05255 1051 ## [40] {Marzipan Cookie,Tuile Cookie} 0.04855 971 ## [41] {Chocolate Croissant} 0.04460 892 ## [42] {Napoleon Cake,Strawberry Cake} 0.04455 891 ## [43] {Chocolate Meringue} 0.04450 890 ## [44] {Almond Bear Claw} 0.04425 885 ## [45] {Chocolate Cake,Chocolate Coffee} 0.04405 881 ## [46] {Cheese Croissant,Orange Juice} 0.04390 878 ## [47] {Cherry Tart,Opera Cake} 0.04365 873 ## [48] {Gongolais Cookie,Truffle Cake} 0.04335 867 ## [49] {Apricot Danish,Opera Cake} 0.04335 867 ## [50] {Ganache Cookie} 0.04330 866 ## [51] {Apricot Tart} 0.04275 855 ## [52] {Vanilla Eclair} 0.04270 854 ## [53] {Chocolate Eclair} 0.04260 852 ## [54] {Vanilla Meringue} 0.04240 848 ## [55] {Almond Croissant} 0.04205 841 ## [56] {Apricot Croissant,Blueberry Tart} 0.04185 837 ## [57] {Pecan Tart} 0.04155 831 ## [58] {Blueberry Danish} 0.04115 823 ## [59] {Apricot Danish,Cherry Tart,Opera Cake} 0.04100 820 ## [60] {Almond Tart} 0.04055 811 ## [61] {Apple Pie,Coffee Eclair} 0.03725 745 ## [62] {Lemon Cake,Lemon Tart} 0.03700 740 ## [63] {Blackberry Tart,Coffee Eclair} 0.03675 735 ## [64] {Chocolate Tart,Vanilla Frappuccino} 0.03675 735 ## [65] {Almond Twist,Coffee Eclair} 0.03625 725 ## [66] {Almond Twist,Apple Pie} 0.03595 719 ## [67] {Casino Cake,Chocolate Cake} 0.03585 717 ## [68] {Berry Tart,Bottled Water} 0.03570 714 ## [69] {Casino Cake,Chocolate Coffee} 0.03570 714 ## [70] {Blueberry Tart,Hot Coffee} 0.03570 714 ## [71] {Apricot Croissant,Hot Coffee} 0.03510 702 ## [72] {Almond Twist,Apple Pie,Coffee Eclair} 0.03415 683 ## [73] {Casino Cake,Chocolate Cake,Chocolate Coffee} 0.03390 678 ## [74] {Apricot Croissant,Blueberry Tart,Hot Coffee} 0.03260 652 ## [75] {Coffee Eclair,Hot Coffee} 0.03170 634 ## [76] {Vanilla Frappuccino,Walnut Cookie} 0.03095 619 ## [77] {Almond Twist,Hot Coffee} 0.03085 617 ## [78] {Apple Pie,Hot Coffee} 0.03085 617 ## [79] {Chocolate Tart,Walnut Cookie} 0.03055 611 ## [80] {Blackberry Tart,Single Espresso} 0.03015 603 f4 <- apriori(d4, parameter=list(support=0.03, target="frequent itemsets")) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## NA 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 frequent itemsets FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 2250 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 75000 transaction(s)] done [0.02s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.03s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [77 set(s)] done [0.00s]. ## creating S4 object ... done [0.01s]. inspect(sort(f4, decreasing = T, by="count")) ## items support count ## [1] {Coffee Eclair} 0.10924000 8193 ## [2] {Hot Coffee} 0.10266667 7700 ## [3] {Tuile Cookie} 0.10074667 7556 ## [4] {Cherry Tart} 0.09316000 6987 ## [5] {Strawberry Cake} 0.09264000 6948 ## [6] {Apricot Danish} 0.09257333 6943 ## [7] {Orange Juice} 0.09161333 6871 ## [8] {Gongolais Cookie} 0.09044000 6783 ## [9] {Marzipan Cookie} 0.08977333 6733 ## [10] {Berry Tart} 0.08482667 6362 ## [11] {Apricot Croissant} 0.08398667 6299 ## [12] {Lemon Cake} 0.08361333 6271 ## [13] {Chocolate Cake} 0.08353333 6265 ## [14] {Chocolate Coffee} 0.08314667 6236 ## [15] {Blueberry Tart} 0.08294667 6221 ## [16] {Napoleon Cake} 0.08274667 6206 ## [17] {Truffle Cake} 0.08224000 6168 ## [18] {Cheese Croissant} 0.08221333 6166 ## [19] {Opera Cake} 0.08209333 6157 ## [20] {Vanilla Frappuccino} 0.07746667 5810 ## [21] {Almond Twist} 0.07720000 5790 ## [22] {Apple Pie} 0.07712000 5784 ## [23] {Blackberry Tart} 0.07586667 5690 ## [24] {Lemon Tart} 0.07580000 5685 ## [25] {Bottled Water} 0.07520000 5640 ## [26] {Casino Cake} 0.07501333 5626 ## [27] {Chocolate Tart} 0.07372000 5529 ## [28] {Lemon Lemonade} 0.06824000 5118 ## [29] {Apple Tart} 0.06822667 5117 ## [30] {Lemon Cookie} 0.06801333 5101 ## [31] {Single Espresso} 0.06797333 5098 ## [32] {Walnut Cookie} 0.06785333 5089 ## [33] {Raspberry Lemonade} 0.06774667 5081 ## [34] {Apple Danish} 0.06769333 5077 ## [35] {Raspberry Cookie} 0.06764000 5073 ## [36] {Apple Croissant} 0.06729333 5047 ## [37] {Green Tea} 0.06246667 4685 ## [38] {Cherry Soda} 0.06198667 4649 ## [39] {Apricot Danish,Cherry Tart} 0.05309333 3982 ## [40] {Marzipan Cookie,Tuile Cookie} 0.05092000 3819 ## [41] {Blueberry Danish} 0.04409333 3307 ## [42] {Chocolate Cake,Chocolate Coffee} 0.04404000 3303 ## [43] {Gongolais Cookie,Truffle Cake} 0.04392000 3294 ## [44] {Apricot Croissant,Blueberry Tart} 0.04350667 3263 ## [45] {Pecan Tart} 0.04337333 3253 ## [46] {Cherry Tart,Opera Cake} 0.04337333 3253 ## [47] {Chocolate Croissant} 0.04324000 3243 ## [48] {Ganache Cookie} 0.04324000 3243 ## [49] {Napoleon Cake,Strawberry Cake} 0.04314667 3236 ## [50] {Cheese Croissant,Orange Juice} 0.04306667 3230 ## [51] {Apricot Danish,Opera Cake} 0.04302667 3227 ## [52] {Almond Croissant} 0.04273333 3205 ## [53] {Vanilla Eclair} 0.04252000 3189 ## [54] {Almond Bear Claw} 0.04244000 3183 ## [55] {Vanilla Meringue} 0.04238667 3179 ## [56] {Chocolate Eclair} 0.04237333 3178 ## [57] {Apricot Tart} 0.04236000 3177 ## [58] {Almond Tart} 0.04204000 3153 ## [59] {Chocolate Meringue} 0.04193333 3145 ## [60] {Apricot Danish,Cherry Tart,Opera Cake} 0.04110667 3083 ## [61] {Berry Tart,Bottled Water} 0.03780000 2835 ## [62] {Apple Pie,Coffee Eclair} 0.03726667 2795 ## [63] {Almond Twist,Coffee Eclair} 0.03712000 2784 ## [64] {Lemon Cake,Lemon Tart} 0.03685333 2764 ## [65] {Almond Twist,Apple Pie} 0.03668000 2751 ## [66] {Blackberry Tart,Coffee Eclair} 0.03641333 2731 ## [67] {Chocolate Tart,Vanilla Frappuccino} 0.03596000 2697 ## [68] {Casino Cake,Chocolate Cake} 0.03553333 2665 ## [69] {Apricot Croissant,Hot Coffee} 0.03537333 2653 ## [70] {Casino Cake,Chocolate Coffee} 0.03524000 2643 ## [71] {Blueberry Tart,Hot Coffee} 0.03504000 2628 ## [72] {Almond Twist,Apple Pie,Coffee Eclair} 0.03432000 2574 ## [73] {Casino Cake,Chocolate Cake,Chocolate Coffee} 0.03338667 2504 ## [74] {Apricot Croissant,Blueberry Tart,Hot Coffee} 0.03282667 2462 ## [75] {Coffee Eclair,Hot Coffee} 0.03156000 2367 ## [76] {Apple Pie,Hot Coffee} 0.03102667 2327 ## [77] {Almond Twist,Hot Coffee} 0.03092000 2319 strong association rules: rules <- apriori(d1, parameter = list(support=0.03)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.8 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 30 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 1000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [49 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [21 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(rules, by="confidence") ## lhs rhs support confidence lift count ## [1] {Casino Cake, ## Chocolate Cake} => {Chocolate Coffee} 0.038 0.9500000 11.176471 38
## [2] {Casino Cake,
## Chocolate Coffee} => {Chocolate Cake} 0.038 0.9743590 11.599512 38
## [3] {Chocolate Cake,
## Chocolate Coffee} => {Casino Cake} 0.038 0.8085106 11.229314 38
## [4] {Apricot Croissant,
## Blueberry Tart} => {Hot Coffee} 0.032 0.8000000 8.510638 32
## [5] {Apricot Croissant,
## Hot Coffee} => {Blueberry Tart} 0.032 1.0000000 12.345679 32
## [6] {Blueberry Tart,
## Hot Coffee} => {Apricot Croissant} 0.032 0.9696970 12.759171 32
## [7] {Apricot Danish,
## Opera Cake} => {Cherry Tart} 0.038 0.9743590 11.599512 38
## [8] {Cherry Tart,
## Opera Cake} => {Apricot Danish} 0.038 0.9268293 12.357724 38
## [9] {Apricot Danish,
## Cherry Tart} => {Opera Cake} 0.038 0.8260870 10.590858 38
## [10] {Apple Tart,
## Cherry Soda} => {Apple Danish} 0.031 0.8611111 10.251323 31
## [11] {Apple Danish,
## Cherry Soda} => {Apple Tart} 0.031 0.9393939 11.891063 31
## [12] {Apple Tart,
## Cherry Soda} => {Apple Croissant} 0.031 0.8611111 9.462759 31
## [13] {Apple Croissant,
## Cherry Soda} => {Apple Tart} 0.031 0.9393939 11.891063 31
## [14] {Apple Danish,
## Apple Tart} => {Apple Croissant} 0.040 0.9756098 10.720986 40
## [15] {Apple Croissant,
## Apple Tart} => {Apple Danish} 0.040 0.9090909 10.822511 40
## [16] {Apple Croissant,
## Apple Danish} => {Apple Tart} 0.040 0.9523810 12.055455 40
## [17] {Apple Danish,
## Cherry Soda} => {Apple Croissant} 0.031 0.9393939 10.323010 31
## [18] {Apple Croissant,
## Cherry Soda} => {Apple Danish} 0.031 0.9393939 11.183261 31
## [19] {Apple Danish,
## Apple Tart,
## Cherry Soda} => {Apple Croissant} 0.031 1.0000000 10.989011 31
## [20] {Apple Croissant,
## Apple Tart,
## Cherry Soda} => {Apple Danish} 0.031 1.0000000 11.904762 31
## [21] {Apple Croissant,
## Apple Danish,
## Cherry Soda} => {Apple Tart} 0.031 1.0000000 12.658228 31
rules <- apriori(d2, parameter = list(support=0.03)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.8 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 150 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 5000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.00s]. ## checking subsets of size 1 2 3 4 done [0.00s]. ## writing ... [19 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(rules, by="confidence") ## lhs rhs support confidence lift count ## [1] {Casino Cake, ## Chocolate Cake} => {Chocolate Coffee} 0.0312 0.9122807 11.290603 156
## [2] {Casino Cake,
## Chocolate Coffee} => {Chocolate Cake} 0.0312 0.9017341 11.299926 156
## [3] {Apricot Croissant,
## Hot Coffee} => {Blueberry Tart} 0.0328 0.9425287 11.062544 164
## [4] {Blueberry Tart,
## Hot Coffee} => {Apricot Croissant} 0.0328 0.9371429 11.400765 164
## [5] {Apricot Danish,
## Opera Cake} => {Cherry Tart} 0.0408 0.9444444 10.265700 204
## [6] {Cherry Tart,
## Opera Cake} => {Apricot Danish} 0.0408 0.9357798 10.490805 204
## [7] {Apple Pie,
## Hot Coffee} => {Almond Twist} 0.0308 0.9166667 11.261261 154
## [8] {Almond Twist,
## Hot Coffee} => {Apple Pie} 0.0308 0.9166667 11.722080 154
## [9] {Almond Twist,
## Apple Pie} => {Coffee Eclair} 0.0382 0.9695431 8.750389 191
## [10] {Apple Pie,
## Coffee Eclair} => {Almond Twist} 0.0382 0.9408867 11.558805 191
## [11] {Almond Twist,
## Coffee Eclair} => {Apple Pie} 0.0382 0.9271845 11.856579 191
## [12] {Apple Pie,
## Hot Coffee} => {Coffee Eclair} 0.0308 0.9166667 8.273165 154
## [13] {Coffee Eclair,
## Hot Coffee} => {Apple Pie} 0.0308 0.9112426 11.652719 154
## [14] {Almond Twist,
## Hot Coffee} => {Coffee Eclair} 0.0308 0.9166667 8.273165 154
## [15] {Coffee Eclair,
## Hot Coffee} => {Almond Twist} 0.0308 0.9112426 11.194627 154
## [16] {Almond Twist,
## Apple Pie,
## Hot Coffee} => {Coffee Eclair} 0.0308 1.0000000 9.025271 154
## [17] {Almond Twist,
## Apple Pie,
## Coffee Eclair} => {Hot Coffee} 0.0308 0.8062827 7.858506 154
## [18] {Apple Pie,
## Coffee Eclair,
## Hot Coffee} => {Almond Twist} 0.0308 1.0000000 12.285012 154
## [19] {Almond Twist,
## Coffee Eclair,
## Hot Coffee} => {Apple Pie} 0.0308 1.0000000 12.787724 154
rules <- apriori(d3, parameter = list(support=0.03)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.8 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 600 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 20000 transaction(s)] done [0.00s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.01s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [9 rule(s)] done [0.00s]. ## creating S4 object ... done [0.00s]. inspect(rules, by="confidence") ## lhs rhs support ## [1] {Casino Cake,Chocolate Coffee} => {Chocolate Cake} 0.03390
## [2] {Casino Cake,Chocolate Cake} => {Chocolate Coffee} 0.03390
## [3] {Almond Twist,Apple Pie} => {Coffee Eclair} 0.03415
## [4] {Almond Twist,Coffee Eclair} => {Apple Pie} 0.03415
## [5] {Apple Pie,Coffee Eclair} => {Almond Twist} 0.03415
## [6] {Apricot Croissant,Hot Coffee} => {Blueberry Tart} 0.03260
## [7] {Blueberry Tart,Hot Coffee} => {Apricot Croissant} 0.03260
## [8] {Cherry Tart,Opera Cake} => {Apricot Danish} 0.04100
## [9] {Apricot Danish,Opera Cake} => {Cherry Tart} 0.04100
## confidence lift count
## [1] 0.9495798 11.351821 678
## [2] 0.9456067 11.496738 678
## [3] 0.9499305 8.647524 683
## [4] 0.9420690 12.704909 683
## [5] 0.9167785 12.532857 683
## [6] 0.9287749 11.070023 652
## [7] 0.9131653 11.177053 652
## [8] 0.9392898 10.132576 820
## [9] 0.9457901 10.364823 820
rules <- apriori(d4, parameter = list(support=0.03)) ## Apriori ## ## Parameter specification: ## confidence minval smax arem aval originalSupport maxtime support minlen ## 0.8 0.1 1 none FALSE TRUE 5 0.03 1 ## maxlen target ext ## 10 rules FALSE ## ## Algorithmic control: ## filter tree heap memopt load sort verbose ## 0.1 TRUE TRUE FALSE TRUE 2 TRUE ## ## Absolute minimum support count: 2250 ## ## set item appearances ...[0 item(s)] done [0.00s]. ## set transactions ...[50 item(s), 75000 transaction(s)] done [0.02s]. ## sorting and recoding items ... [50 item(s)] done [0.00s]. ## creating transaction tree ... done [0.03s]. ## checking subsets of size 1 2 3 done [0.00s]. ## writing ... [9 rule(s)] done [0.00s]. ## creating S4 object ... done [0.01s]. inspect(rules, by="confidence") ## lhs rhs support ## [1] {Casino Cake,Chocolate Coffee} => {Chocolate Cake} 0.03338667
## [2] {Casino Cake,Chocolate Cake} => {Chocolate Coffee} 0.03338667
## [3] {Almond Twist,Apple Pie} => {Coffee Eclair} 0.03432000
## [4] {Apple Pie,Coffee Eclair} => {Almond Twist} 0.03432000
## [5] {Almond Twist,Coffee Eclair} => {Apple Pie} 0.03432000
## [6] {Blueberry Tart,Hot Coffee} => {Apricot Croissant} 0.03282667
## [7] {Apricot Croissant,Hot Coffee} => {Blueberry Tart} 0.03282667
## [8] {Apricot Danish,Opera Cake} => {Cherry Tart} 0.04110667
## [9] {Cherry Tart,Opera Cake} => {Apricot Danish} 0.04110667
## confidence lift count
## [1] 0.9474082 11.341679 2504
## [2] 0.9395872 11.300360 2504
## [3] 0.9356598 8.565175 2574
## [4] 0.9209302 11.929148 2574
## [5] 0.9245690 11.988705 2574
## [6] 0.9368341 11.154557 2462
## [7] 0.9280060 11.187985 2462
## [8] 0.9553765 10.255222 3083
## [9] 0.9477405 10.237727 3083
(c)
Compare the rules obtained for each different subset (1,000 – 75,000 transactions). The number of transactions show some different patterns in the rules, the number of rules founded by the specified support and confidence are fewer with more number of transactions, and the suport and confidence are also little different. However, the top rules founded are the same for different number of transactions.
(d)
(i)
The most frequently purchased item or itemset is Coffee Eclair.
(ii)
The least frequently purchased item or itemset is (Almond Twist,Hot Coffee).