程序代写代做代考 chain data mining html C go ASSOCIATION RULES:

ASSOCIATION RULES:
MARKET BASKET ANALYSIS Applied Analytics: Frameworks and Methods 2
1

Outline
■ Discuss applications of association rules
■ Conduct market basket analysis
■ Describe mathematical criteria for evaluating rules
■ Explain the importance of domain expertise for interpreting association rules
2

Association Rules: Market Basket Analysis
■ Market basket analysis is …
■ …. a data mining technique that has the purpose of finding the optimal combination of products or services and allows marketers to exploit this knowledge to provide recommendations, optimize product placement, or develop marketing programs that take advantage of cross-selling.
■ … a technique to see which items go together
■ … widely used by retailers to identify items that are purchased in the same shopping trip.
■ … not limited to retailers. May also be used by insurance companies, banks, etc.
3

Analytical Technique
■ Unsupervised Learning
■ Based on numeric thresholds, not statistics
■ Involves working with large sparse matrices
4

ILLUSTRATIONS

One of the most quoted illustrations of market basket analysis is a supermarket chain that found, “male customers that bought diapers often bought beer as well”, so “they put the diapers close to beer coolers, and their sales increased dramatically” (Wikipedia).
6

Source: Amazon.com webpage for Vishal Lala
7

Source: Found on http://www.theexaminingroom.com/ in Sept, 2012
8

SMALL DATA

Data
beer diapers bread
diapers eggs
diapers beer
beer diapers eggs
beer diapers
diapers milk
milk bread
diapers beer milk
bread
beer diapers milk
10

Data
beer
diapers
bread
diapers
eggs
diapers
beer
beer
diapers
eggs
beer
diapers
diapers
milk
milk
bread
diapers
beer
milk
bread
beer
diapers
milk
transactions in sparse format with 9 transactions (rows) and
5 items (columns)
most frequent items:
diapers beer milk bread eggs (Other)
86432 0
11

Data
beer
diapers
bread
diapers
eggs
diapers
beer
beer
diapers
eggs
beer
diapers
diapers
milk
milk
bread
diapers
beer
milk
bread
beer
diapers
milk
Item Frequency Plot
12

Data
beer
diapers
bread
diapers
eggs
diapers
beer
beer
diapers
eggs
beer
diapers
diapers
milk
milk
bread
diapers
beer
milk
bread
beer
diapers
milk
Many Rules
For k items, number of rules is of the order 2k
We are interested in selecting rules that indicate a strong association (i.e., high lift) and are not rare (i.e., high support)
13

Measuring Affinity
■ Rule:AB
■ Support: p(A&B)
– Prevalence of an item set ■ Confidence: p(B|A)
– Predictability of an association rule ■ Lift: p(B|A)/p(B) = p(A&B)/(p(A)*p(B))
– Strength of association. Lift of 1 indicates independence, i.e., no association.
14

Measuring Affinity
A
B
C ACD BCD ADE BCE
Rule Support Confidence
AD 2/5 2/3 CA 2/5 2/4 AC 2/5 2/3
B&CD 1/5 1/3
Source: Found on https://themainstreamseer.blogspot.com/2012/09/numeric-measures-for-association-rules.html on Sept 2012
15

Data
beer
diapers
bread
diapers
eggs
diapers
beer
beer
diapers
eggs
beer
diapers
diapers
milk
milk
bread
diapers
beer
milk
bread
beer
diapers
milk
Consider the Rule: DiapersBeer
Rule Frequency: How often do diapers and
Support, p(Beer & Diapers): What proportion of transactions contain both diapers and beer? 6/9
Confidence, p(Beer|Diapers): Given that diapers occur in a transaction, what is the chance the transaction contains beer? 6/8
Lift = p(Beer|Diapers)/p(Beer)
= p(Beer & Diapers)/p(Beer)p(Diapers) = (6/9)/((6/9)*(8/9))
beer co-occur? 6
16

Data
beer
diapers
bread
diapers
eggs
diapers
beer
beer
diapers
eggs
beer
diapers
diapers
milk
milk
bread
diapers
beer
milk
bread
beer
diapers
milk
Support and Lift matrices from data
17

Sifting through Rules
■ Set up rule filtering criteria. E.g. support>0.2, confidence>0.2.
18

Sifting Through Rules
■ In some cases, interest may be in a specific association. E.g., consider all transactions with diapers in LHS
19

Visualizing Rules
library(arulesViz)
plot(rules, method = ‘ENTER METHOD’, measure = c(‘support’,’confidence’), shading = ‘lift’, interactive = FALSE, control = list(reorder=T), …)
Source: arulesViz Vignette
20

MEDIUM DATA

Medium Data
■ Online retail dataset downloaded from UCI Machine Learning Repository
■ Contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retailer.
■ 19296 transactions on 7868 items
22

More on Market Basket Analysis
■ Market Basket Analysis generates a large number of rules. It is for the domain expert to identify the rules that are relevant and meet association thresholds
■ Rules can be reduced by
– grouping items using domain knowledge. E.g., combining milk, 1% milk, 2%
milk into a single category.
– text analysis. E.g., removing mentions of colors from products. Differences in color of blue mug and red mug may not be relevant for an association rule.
– clustering.
23

Summary
■ In this module we,
– discussed applications of association rules
– Conducted market basket analysis
– Described mathematical criteria for evaluating rules
– Explained the importance of domain expertise for interpreting association rules
24