IT enabled Business Intelligence, CRM, Database Applications
Sep-18
Association Rules
Market Basket Analysis
Prof. Vibs Abhishek
The Paul Merage School of Business
University of California, Irvine
BANA 273 Session 7
1
Agenda
Assignment due dates on Canvas
Please work on your projects
Gather Data
Refer to project guidelines
Market Basket Analysis
Generating and identifying good Association Rules
2
Weka Memory
To increase Java virtual memory for WEKA
On your computer go to “Start”
In “Search Program and Files” type: “cmd”
Use “cd …” to change directory to Weka folder (under Program Files)
On command prompt type:
“Java –Xmx512m –jar weka.jar”
3
Why mine association rules?
The goal may be fuzzy or unstructured
More than one class variable
Interesting patterns (previously unknown) may emerge that can be used within a business
4
Transaction data set:
Shopper ID Date and Time of transaction Items included in the transaction Store ID Trans ID
Shopper111 09/10/2010, 12:09:01pm Milk, egg, bread Store123 Tran321
Shopper Information:
Shopper ID Address Most purchased category Recency
(days) Frequency
Total $ (year)
Shopper111 95616 food 12 4/month $4000
Census Data:
Zip Code Median family Income Median house value Median age Population Population density
95616 70,000 500,000 25 60,000 5700/mile2
What can we do with this data?
A retailer (e.g. Target) has the following data sources:
1. Shopping transactions, 2. Shopper information, 3. Census data with information for each zip code
5
Market Basket Analysis (MBA)
MBA in retail setting
Find out what items are bought together
Cross-selling
Optimize shelf layout
Product bundling
Timing promotions
Discount planning
Product selection under limited space
Targeted advertisement, Personalized coupons, item recommendations
Usage beyond Market Basket
Medical (associated symptoms)
6
Association Rules
Rule format:
If {set of items} Then {set of items}
LHS implies RHS
If {Diaper,
Baby Food}
{Beer, Wine}
Then
LHS
RHS
An association rule is valid if it satisfies some evaluation measures
7
Association Rule Discovery: Definition
Given a set of records each of which contain some number of items from a given collection;
Produce dependency rules which will predict occurrence of an item based on occurrences of other items.
Rules Discovered:
{Milk} –> {Coke}
{Diaper, Milk} –> {Beer}
8
Association Rule Discovery: Application 1
Marketing and Sales Promotion:
Let the rule discovered be
{Bagels, … } –> {Potato Chips}
Potato Chips as consequent => Can be used to determine what should be done to boost its sales.
Bagels in the antecedent => Can be used to see which products would be affected if the store discontinues selling bagels.
Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips.
9
Association Rule Discovery: Application 2
Supermarket shelf management.
Goal: To identify items that are bought together by sufficiently many customers.
Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items.
A classic rule —
If a customer buys diaper and milk, then he is very likely to buy beer.
So, don’t be surprised if you find six-packs stacked next to diapers.
10
Association Rule Discovery: Application 3
Inventory Management:
Goal: A consumer appliance repair company wants to anticipate the nature of repairs on its consumer products and keep the service vehicles equipped with right parts to reduce on number of visits to consumer households.
Approach: Process the data on tools and parts required in previous repairs at different consumer locations and discover the co-occurrence patterns.
11
Association Rule Mining
Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction
Market-Basket transactions
Example of Association Rules
{Diaper} {Beer},
{Milk, Bread} {Eggs,Coke},
{Beer, Bread} {Milk},
Implication means co-occurrence, not causality!
12
Definition: Frequent Itemset
Itemset
A collection of one or more items
Example: {Milk, Bread, Diaper}
k-itemset
An itemset that contains k items
Support count ()
Frequency of occurrence of an itemset
E.g. ({Milk, Bread,Diaper}) = 2
Support
Fraction of transactions that contain an itemset
E.g. s({Milk, Bread, Diaper}) = 2/5
Frequent Itemset
An itemset whose support is greater than or equal to a minsup threshold
13
Example:
Rule Evaluation Metrics
Support (s)
Fraction of transactions that contain both X and Y
Confidence (c)
Measures how often items in Y
appear in transactions that
contain X
Support =
No. of transactions containing items in LHS and RHS
Total No. of transactions in the dataset
Confidence =
No. of transactions containing both LHS and RHS
No. of transactions containing LHS
14
Rule Evaluation – Lift
Transaction No. Item 1 Item 2 Item 3 Item 4 …
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Milk Vodka Chocolate
103 Beer Milk Diaper Chocolate
104 Milk Diaper Beer
What’s the support and confidence for rule {Chocolate}{Milk}?
Support = 3/5
Confidence = 3/4
Very high support and confidence.
Is Chocolate a good predictor of Milk purchase?
No! Because Milk occurs in 4 out of 5 transactions. Chocolate is even
decreasing the chance of Milk purchase
3/4 < 4/5, i.e. P(Milk|Chocolate)
Expensive since M = 2d !!!
Optional
24
Computational Complexity
Given d unique items:
Total number of itemsets = 2d
Total number of possible association rules:
If d=6, R = 602 rules
Optional
25
Identifying Association Rules
Two-step approach:
Frequent Itemset Generation
Generate all itemsets whose support minsup
Rule Generation
Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset
Frequent itemset generation is still computationally expensive
26
If {diaper, beer} is frequent then {diaper} and {beer} are each frequent as well
This means that…
If an itemset is not frequent (e.g., {wine}) then no itemset that includes wine can be frequent either, such as {wine, beer} .
We therefore first find all itemsets of size 1 that are frequent.
Then try to “expand” these by counting the frequency of all itemsets of size 2 that include frequent itemsets of size 1.
Example:
If {wine} is not frequent we need not try to find out whether {wine, beer} is frequent. But if both {wine} & {beer} were frequent then it is possible (though not guaranteed) that {wine, beer} is also frequent.
Then take only itemsets of size 2 that are frequent, and try to expand those, etc.
Phase 1: Finding all frequent itemsets
How to perform an efficient search of all frequent itemsets?
27
Assume {Milk, Bread, Butter} is a frequent itemset.
Using items contained in the itemset, list all possible rules
{Milk} {Bread, Butter}
{Bread} {Milk, Butter}
{Butter} {Milk, Bread}
{Milk, Bread} {Butter}
{Milk, Butter} {Bread}
{Bread, Butter} {Milk}
Calculate the confidence of each rule
Pick the rules with confidence above the minimum confidence
Support {Milk, Bread, Butter}
Support {Milk}
No. of transaction that support {Milk, Bread, Butter}
No. of transaction that support {Milk}
=
Phase 2: Generating Association Rules
Confidence of {Milk} {Bread, Butter}:
28
Algorithm Apriori-Gen to Generate Frequent Itemsets (Agrawal and Srikant 1994)
Input: two itemsets, I and J, of size (k-1)
Output: a supported itemset, L of size k
L=Null
IF (the first (k-2) items of I and J match)
{ copy all items of I into L;
copy the last item of J into L;
FOR (every subset l ⊂ L of size (k-1))
IF (l is not supported) discard L and exit;
Calculate, from data, support(L);
IF (support(L)< target) discard L and exit;
return L;
}
ELSE exit;
29
Agrawal (94)’s Apriori Algorithm—An Example
Transactions
T-ID Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
An itemset must have been purchased at least twice in order to be considered frequent or supported
Agrawal (94)’s Apriori Algorithm—An Example
Transactions
1st scan
C1
L1
L2
C2
C2
2nd scan
C3
L3
3rd scan
T-ID Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset sup
{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3
{C, E} 2
Itemset
{B, C, E}
Itemset sup
{B, C, E} 2
{A,B,C}, {A, C, E}?
31
Exercise 5
Given the above list of transactions, do the following:
1) Find all the frequent itemsets (minimum support 40%)
2) Find all the association rules (minimum confidence 70%)
3) For the discovered association rules, calculate the lift
Transaction No. Item 1 Item 2 Item 3 Item 4
100 Beer Diaper Chocolate
101 Milk Chocolate Shampoo
102 Beer Soap Vodka
103 Beer Cheese Wine
104 Milk Diaper Beer Chocolate
32
Given table below, the confidence of the rule
Bread Coke is
A: 1/4
B: 2/4
C: 3/4
D: 4/4
E: None of the above
33
What is the objective of Apriori?
A: Identify good association rules
B: Identify all frequent itemsets
C: Identify all rules from frequent itemsets
D: Determine the computational complexity of finding association rules
E: None of the above
34
Other Applications of Association Rules
Recommendations: Determines which books are frequently purchased together and recommends associated books or products to people who express interest in an item.
Healthcare: Studying the side-effects in patients with multiple prescriptions, we can discover previously unknown interactions and warn patients about them.
Fraud detection: Finding in insurance data that a certain doctor often works with a certain lawyer may indicate potential fraudulent activity. (virtual items)
Sequence Discovery: looks for associations between items bought over time. E.g., we may notice that people who buy chili tend to buy antacid within a month. Knowledge like this can be used to plan inventory levels.
35
WEKA
Find association rules
Apriori
http://facweb.cs.depaul.edu/mobasher/classes/ect584/WEKA/associate.html
36
Nest Week
Clustering using K-Means
TID
Items
1
Bread, Coke, Milk
2
Beer, Bread
3
Beer, Coke, Diaper, Milk
4
Beer, Bread, Diaper, Milk
5
Coke, Diaper, Milk
TID
Items
1
Bread, Coke, Milk
2
Beer, Bread
3
Beer, Coke, Diaper, Milk
4
Beer, Bread, Diaper, Milk
5
Coke, Diaper, Milk
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
Beer
}
Diaper
,
Milk
{
Þ
4
.
0
5
2
|
T
|
)
Beer
Diaper,
,
Milk
(
=
=
=
s
s
67
.
0
3
2
)
Diaper
,
Milk
(
)
Beer
Diaper,
Milk,
(
=
=
=
s
s
c
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
null
AB
AC
AD
AE
BC
BD
BE
CD
CE
DE
A
B
C
D
E
ABC
ABD
ABE
ACD
ACE
ADE
BCD
BCE
BDE
CDE
ABCD
ABCE
ABDE
ACDE
BCDE
ABCDE
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
N
Transactions
List of
Candidates
M
w
N�
w�
M�
List of Candidates�
1
2
3
1
1
1
1
+
-
=
ú
û
ù
ê
ë
é
÷
ø
ö
ç
è
æ
-
´
÷
ø
ö
ç
è
æ
=
+
-
=
-
=
å
å
d
d
d
k
k
d
j
j
k
d
k
d
R
TID
Items
1
Bread, Milk
2
Bread, Diaper, Beer, Eggs
3
Milk, Diaper, Beer, Coke
4
Bread, Milk, Diaper, Beer
5
Bread, Milk, Diaper, Coke
/docProps/thumbnail.jpeg