ETX2250/ETF5922: Data Visualization and Analytics
Market Basket Analysis
Lecturer:
Department of Econometrics and Business Statistics
Week 9
Market basket analysis
Also known as: Association analysis A nity analysis
When would we use market basket analysis? Supermarket would like to increase sales by :
Placing products in such a way that customers see items they are likely to add to their basket (e.g., chips and dip)
Providing voucher to encourage the purchase of likely items (e.g., fancy cat toys to owners who buy super premium cat food)
What data do we have? Items purchased during each transaction:
What items are commonly purchased together
Does the purchase of item A make the purchase of item B more likely?
2/41
Can you think of an example
where you have experienced
the result of this analysis?
Online book stores increase sales by drawing customer’s attention to particular books Net ix helping you nd the next movie
Terminology and Notation
A transaction can be thought of as a basket of items. The letters represent the items. This transaction contains items A, B, C, D, E
A particular group of items is called an itemset. The following group contains items A, C and F
The data might suggest that where A and F occur in a basket, C is also likely to occur. This is represented
with an Association Rule
How do we nd these association rules?
4/41
}C{ → }F,A{
}E ,D ,C ,B ,A{ = teSmetI
}E ,D ,C ,B ,A{ = noitcasnar T
A transaction by product matrix
Transaction
A
B
C
D
E
111100 300010 501100 700100
Rows represent transactions (different baskets)
Columns represent different items that might be purchased 1 means the item was in the basket, 0 means it was not
2
1
1
1
1
1
4
1
0
0
0
0
6
1
0
1
1
0
8
0
1
1
0
0
5/41
Consider the itemset
There are several possible associative rules
These are probabilistic. Baskets with A and B don’t necessarily include C, etc.
Two baskets include A and B, both include C, so this rule works all the time
Four baskets include C and B, and of those, two include A, so this rule works half the time Three baskets include A and C, and of those, two include B, so this rule works 2/3 of the time.
6/41
NEHT → FI
}C ,B ,A{
}B{ → }C ,A{ }A{ → }B ,C{ }C{ → }B ,A{
How often does the rule work?
}B{ → }D ,C{
More formal language
In the rule
is called the antecedent is called the consequent
The support of the itemset
The support count of the itemset The support of the association rule
is the proportion of all transaction that include A, B and C is the number of transactions that include A, B and C
is the support of
8/41
}C,B,A{
}C{ → }B,A{ }C ,B ,A{
}C{ → }B ,A{
}C ,B ,A{
}C{ }B ,A{
Con dence
How can we describe uncertainty about this rule?
Con dence is a conditional probability Pr(Consequent|Antecedent)
9/41
)tnedecetnA(p = ecnedifnoC )tneuqesnoC DNA tnedecetnA(p
}B ,A{tnuoC troppuS = ecnedifnoC }C ,B ,A{tnuoC troppuS
Consider
What is the association rule? What is the antecedent? What is the consequent? What is the itemset? What is the support? What is the support count? What is the con dence?
}B{ → }D ,C{
Lift
What if C is just super common in the data?
Are you more likely to nd the consequent(C) in a basket that contains the antecedent (A and B)? How much more likely?
Numerator = likelihood of C in a basket that contains A and B
Denominator = likelihood of C in any basket
Lift measures how many times more likely the consequent is when the antecedent is present For the rule to be useful, lift needs to be more than 1
11/41
tneuqesnoc fo troppuS = elur noitaicossa na fo tfiL elur noitaicossa fo ecnedifnoC
What is a good association rule?
An association rule is ultimately judged on how actionable it is and how well it explains the relationship between item sets.
Ex. Wal-Mart mined its transactional data to uncover strong evidence of the association rule, “If a customer purchases a Barbie doll, then a customer also purchases a candy bar.”
An association rule is useful if
Its antecedents strongly increase the likelihood of the consequent, (measured by the lift) and It explains an important previously unknown relationship
It should also have reasonably large support
12/41
Computational challenges & the apriori algorithm
How do we generate these candidate rules?
We could use all possible rules, but why couldn’t we do this for very big data?
Take a three item set
How many possible rules are there?
If p = number of items in set,
In the case of 3 items, this is 12 , ,
,,,, In the case of 6 items, 602; 7 items 1932
, , , , ,
If we have a large number of items there will be too many rules! Often focus on frequent item-sets (see support)
13/41
}B,A{→C }A,C{→B }C,B{→A A→}C,B{ B→}C,A{ C→}B,A{ B→C C→B A→C A→B C→A B→A
1+ 1+p2− p3
}C ,B ,A{
A priori algorithm
1. Generate frequent itemsets with just one item (count how many transactions contain that item in the data)
2. Drop items that have support below the desired minimum support
3. Generate frequent two item sets using only the frequent one item sets.
4. Generate the frequent three item sets using only the frequent two item sets…
The idea is that if the smaller itemset doesn’t have su cient support, a larger itemset won’t have su cient support.
14/41
An example of the power of associations: The Net ix Prize
Net ix offered a prize of $1 million to anyone who could improve their recommendations by 10% Online selling provides data
Observe what movies are rented together or by the same individual at different times (this is before the monthly fee version of net ix)
Net ix movies also have customers’ ratings of movies Develop recommendation systems
Suggest purchases based on what the individual has already purchased and their ratings
15/41
Two approaches:
The direct approach:
Rate movies by factors such as: amount of comedy, complication of plot, amount of blood and gore, how handsome the lead actor was, and a couple more.
Classify the viewer’s taste in the same way
Recommend movies that coincide well with the viewer’s taste Alternatively:
Start with random factors assigned to movies, each a combination of available linear variables Tune the factors to align more and more with the viewer’s tastes
The factors may not be as comprehensible as “how much comedy” but they predicted viewer’s ratings of movies very well
Actually even more complex (http://blog.echen.me/2011/10/24/winning-the-net ix-prize-a- summary/)
16/41
HowtoinR
Shopping basket transactions at the Hy- store.
Aims: to suggest:
How to increase sales through product placement
How to increase sales through cross-product promotions First read in the data
transaction_matrix_v1 <- read_csv(here::here("data/HyVeetr.csv"), col_names = F Is this a good format? Rows = baskets, columns = item (no particular order)
18/41
Shopping basket transactions at the Hy- store.
Let's use a special function from the arules function to read in as transactions
library(arules)
transaction_matrix <- read.transactions(here::here("data/HyVeetr.csv"), sep = transaction_matrix
## transactions in sparse format with
## 10 transactions (rows) and
## 15 items (columns)
19/41
"
This is a very different structure to what we have used before, what does it look like?
str(transaction_matrix)
## Formal class 'transactions' [package "arules"] with 3 slots
## ..@ data
## .. .. ..@ i
## .. .. ..@ p
## .. .. ..@ Dim
## .. .. ..@ Dimnames:List of 2
## .. .. .. ..$ : NULL
## .. .. .. ..$ : NULL
## .. .. ..@ factors : list()
## ..@ itemInfo :'data.frame': 15 obs. of 1 variable:
## .. ..$ labels: chr [1:15] "Beer" "Bread" "Cheese" "Cream" ...
## ..@ itemsetInfo:'data.frame': 0 obs. of 0 variables
:Formal class 'ngCMatrix' [package "Matrix"] with 5 slots
: int [1:53] 1 3 6 7 8 10 1 2 3 4 ...
: int [1:11] 0 6 15 19 25 31 38 42 46 49 ...
: int [1:2] 15 10
Complex!
20/41
Getting back to a binary incidence matrix
The storage of the transaction matrix is designed for e cient memory and computation (remember the computation issues!)
This is a deviation from our usual tidy data, purely for computational e ciency
But I want to visually inspect our transaction matrix so I am going to extract it out. You won't be expected to do this in an exam
binary_incidence <- "matrix")
colnames(binary_incidence)<-
21/41
kableExtra::kable(binary_incidence)
Beer
Bread
FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
TRUE FALSE FALSE TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE FALSE
FALSE FA FALSE FA FALSE FA FALSE FA FALSE FA
TRUE
TRUE
FALSE
Cheese
TRUE
FALSE
FALSE
FALSE
TRUE
Cream
TRUE
TRUE
FALSE
FALSE
crisps
TRUE
FALSE
FALSE
Crisps
FALSE
TRUE
FALSE
FALSE
Eggs
TRUE
FALSE
FALSE
Fruit
TRUE
TRUE
TRUE
FALSE
Jam
TRUE
TRUE
FALSE
FALSE
MapleSyrup
FALSE
FALSE
FALSE
FALSE
FALSE
Milk
TRUE
TRUE
TRUE
FALSE
Softdrink
FALSE
TRUE
Steak
FALSE
FALSE
T
FALSE
TRUE
TRUE
FALSE
FALSE
FALSE
TRUE
TRUE
FALSE
TRUE
FA
FALSE
TRUE
TRUE
FALSE
FALSE
FALSE
FA
FALSE
FALSE
FALSE
TRUE
FA
TRUE
FALSE
T
22/41
Ve
R
R
Consider
What is the association rule? What is the antecedent? What is the consequent? What is the itemset? What is the support? What is the support count? What is the con dence?
}eseehC{ → }kliM ,daerB{
Frequency of items
Let's get the most frequent items using the eclat function
Specify the support (supp) and the maximum number of items in each item set (maxlen)
frequentItems <- eclat (transaction_matrix,
parameter = list(supp = 0.5, maxlen = 3))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.5 1
##
## algorithmic control:
## sparse sort verbose
3 frequent itemsets TRUE
## 7 -2 TRUE
##
## Absolute minimum support count: 5
24/41
inspect(frequentItems)
## items support transIdenticalToItemsets count
## [1] {Fruit,Softdrink} 0.5 5 5
## [2] {Cream,Fruit,Jam} 0.5 5 5
## [3] {Fruit,Jam}
## [4] {Cream,Jam}
## [5] {Fruit,Milk}
## [6] {Cream,Fruit}
## [7] {Fruit}
## [8] {Cream}
## [9] {Milk}
## [10] {Jam}
## [11] {Softdrink}
0.5 5 5
0.5 5 5
0.6 6 6
0.6 6 6
0.9 9 9
0.6 6 6
0.6 6 6
0.5 5 5
0.5 5 5
25/41
Item Frequency plot
Use the command itemFrequencyPlot
Specify that we only plot the 10 most frequent items (with the topN argument).
Specify we want the absolute frequency (rather than the relative) using the type argument, and add a title using main.
itemFrequencyPlot(transaction_matrix, topN=10,
type="absolute", main="Item Frequency")
26/41
Mine the rules ( nd the most common)
Use the apriori function to use the apriori algorithm
Need to specify the minimum support (supp), the minimum con dence (conf) and the minimum length considered.
basket_apriori <-apriori(transaction_matrix,
parameter = list(supp = 0.5,
conf = 0.8,
minlen = 2 ))
27/41
basket_apriori <-apriori(transaction_matrix,
parameter = list(supp = 0.5,
conf = 0.8,
minlen = 2 ))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.5 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 5
##
28/41
Extract the candidate rules
inspect(basket_apriori)
## lhs
## [1] {Softdrink}
## [2] {Jam}
## [3] {Cream}
## [4] {Jam}
## [5] {Milk}
## [6] {Cream}
## [7] {Cream,Jam}
## [8] {Fruit,Jam}
## [9] {Cream,Fruit} => {Jam} 0.5
rhs support confidence coverage lift count
=> {Fruit} 0.5
=> {Cream} 0.5
=> {Jam} 0.5
=> {Fruit} 0.5
=> {Fruit} 0.6
=> {Fruit} 0.6
=> {Fruit} 0.5
=> {Cream} 0.5
1.0000000 0.5
1.0000000 0.5
0.8333333 0.6
1.0000000 0.5
1.0000000 0.6
1.0000000 0.6
1.0000000 0.5
1.0000000 0.5
0.8333333 0.6
1.111111 5
1.666667 5
1.666667 5
1.111111 5
1.111111 6
1.111111 6
1.111111 5
1.666667 5
1.666667 5
29/41
What makes a rule good?
When the antecedents hold, the consequent is very likely to hold That is, the con dence is high
The antecedents strongly increase the likelihood of the consequent; That is the lift is reasonably large
It explains a previously unknown relationship.
This is beyond what the calculations can tell you.
A rule will not be useful if it applies to hardly any baskets. However, if the dataset (and your business) is su ciently large, you may be interested in rules whose itemsets have relatively low support, as we will see with the lastfm example below.
30/41
How often does data come like this?
The form of the grocery data was that each row represented a transaction, and the items in that transaction were listed, seperated by commas.
There are other common forms of data, but they take a little more data munging before we can use them.
One example is the data you will use in your tutorial. You will need extra data munging tools than the ones we covered in week 3 (R has an extensive way of manipulating data, but the basics we covered in week 3 are su cient in many scenarios).
31/41
Last fm artist data
lastfm.csv records 300,000 selections of artist selections by 15,000 users Also have the user’s country and gender
library(tidyverse)
lastfm <- read_csv(here::here("lectures/data/week09/lastfm.csv")) %>%
mutate(user = as.factor(user))
head(lastfm)
## # A tibble: 6 x 4
## user artist
##
sex country
## 1 1
## 2 1
## 3 1
## 4 1
## 5 1
## 6 1
red hot chili peppers
the black dahlia murder f
goldfrapp
dropkick murphys
le tigre
schandmaul
f
Germany
Germany
f Germany
f Germany
f Germany
f Germany
32/41
How is this scenario like the supermarket scenario?
user = shopper
artist = item bought
How is the data structure different from the structure of the supermarket scenario?
33/41
Group split
lastfm_split <- lastfm %>%
group_by(user) %>%
group_split()
Now we have a new data object type! This is something called a list, which let’s us store a large amount of dataframes in one object.
Here we use group_split() to make a dataframe(or a tibble to be more accurate) for every user, which we store in this list for easy access
34/41
Using map from purrr
This list of dataframes(tibbles) means we can apply some tidyverse data munging for each dataframe (user) using our usual dataframe munging.
To see what’s going on, let’s focus in on the rst user. We use the double square brackets to index into a list
user1 <- lastfm_split[[1]]
35/41
Data munging for user1
head(user1)
We want to:
Select only the artist
Make sure we only have the unique artists (so drop any multiple entries) Take it out of a dataframe
36/41
Data munging for user1
user1 %>%
select(artist)%>%
unique() %>%
deframe()
37/41
Map from the purrr package.
The map function in purrr lets us do this over every dataframe (tibble) in our list, with just one function call! Neat!
We do this by creating a function that takes in x (in this case each data.frame/tibble in our list), and then runs our little cleaning routine over it.
playlist<- lastfm_split %>% purrr::map(function(x) x %>%
select(artist) %>%
unique() %>%
deframe())
Let’s check
playlist[[1]]
## [1] “red hot chili peppers” “the black dahlia murder”
## [3] “goldfrapp”
## [5] “le tigre”
## [7] “edguy”
“dropkick murphys”
“schandmaul”
“jack johnson”
38/41
Almost nished with the data munging
Now we have a data.frame every artist the user has listened to, for every user, stored in a list. Fortunately this is su cient to coerce into the transactions matrix from before – phew!
playlist_tr<-as(playlist, "transactions")
playlist_tr
## transactions in sparse format with
## 15000 transactions (rows) and
## 1004 items (columns)
39/41
A popular story (from 2012)
A father walks into a store of a local Target, furious.
Turns out (as the story goes) his teenage daughter has been sent coupons and advertising surrounding maternity and newborn goods
The storemanager apologizes profusely and the man leaves
A few days later, the father calls the store and explains that his daughter was in fact pregnant. This was a widely publicized example, but I cannot nd actual veri cation for it.
Still it is a good cautionary tale of both the bene ts and ethical risks of these methods
40/41
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Lecturer:
Department of Econometrics and Business Statistics
Week 9