COMP9318 Tutorial 4: Association Rule Mining
Wei Wang @ UNSW
Q1 I
Show that if A→ B does not meet the minconf constraint, A→ BC does not
either.
Solution to Q1 I
conf (A→ BC) =
supp(ABC)
supp(A)
≤
supp(AB)
supp(A)
= conf (A→ B)
Like Apriori, we can utilize this rule when generating association rules.
Q2 I
Given the following transactional database
1 C, B, H
2 B, F, S
3 A, F, G
4 C, B, H
5 B, F, G
6 B, E, O
1. We want to mine all the frequent itemsets in the data using the Apriori
algorithm. Assume the minimum support is 30%. (You need to give the
set of frequent itemsets in L1, L2, . . . , candidate itemsets in C2, C3, . . . ).
2. Find all the association rules that involves only B, C, H (in either left or
right hand side of the rule). The minimum confidence is 70%.
Solution to Q2 I
1. Apriori
1.1 minsup = 30%× 6 = 1.8. In other words, the support of a frequent itemset
must be no less than 2.
1.2 C1 = {A,B,C ,E ,F ,G ,H,O, S }, scanning the DB and collect the supports
as
A B C E F G H O S
1 5 2 1 3 2 2 1 1
Therefore, L1 = {B,C ,F ,G ,H }.
1.3 C2 is generated from L1 by enumerating all pairs as
{BC ,BF ,BG ,BH,CF ,CG ,CH,FG ,FH,GH }. Scan the DB and collect
the supports as (you may want to sort items in each transaction and remove
non-frequent items from the DB)
BC BF BG BH CF CG CH FG FH GH
2 2 1 2 0 0 2 2 0 0
Therefore, L2 = {BC ,BF ,BH,CH,FG }.
1.4 C3 is generated from L2 by a special enumeration-and-pruning procedure.
The result is {BCH }. Scan the DB and collect the support as
BCH
2
Therefore, L3 = {BCH }.
1.5 C4 will be the empty set, therefore we stop here.
2. We list the frequent itemsets related to B, C, and H below:
Solution to Q2 II
B C H BC BH CH BCH
5 2 2 2 2 2 2
2.1 For BC, we need to consider candidate rules: B → C , and C → B. The
former has confidence
supp(BC)
supp(B)
= 40% and does not meet the minconf
requirement. The latter rule has confidence
supp(BC)
supp(C)
= 100% and it is
qualified.
2.2 It is easy to see that any rule in the form of B → . . . will not meet the
minconf requirement for the dataset. Therefore, we can repeat the above
procedure and find the following rules:
I H → B (100%)
I C → H (100%)
I H → C (100%)
I BC → H (100%)
I BH → C (100%)
I CH → B (100%)
I C → BH (100%)
I H → BC (100%)
Q3 I
Compute the frequent itemset of for the data in Q2 using the FP-growth
algorithm.
Solution to Q3 I
1. Similar to the first step in Apriori, count the support of all items and
normalize the original transaction db as follows: (by removing
non-frequent items and sort items in the decreasing order of their support)
Order
B F C G H
5 3 2 2 2
DB
1 B, C, H
2 B, F
3 F, G
4 B, C, H
5 B, F, G
6 B
We can output all frequent item: B, C, F, G, H.
2. Construct the FP-tree as:
B 5
F 3
C 2
G 2
H 2
∅
B:5
C:2
H:2
F:2
G:1
F:1
G:1
Solution to Q3 II
3. H’s conditional pattern base is:
B C : 2
All of the items are frequent, and thus we can output: BH, CH. Construct
the H-conditional FP-tree as
B 2
C 2
∅
B:5
C:2
Since it is a single-path tree, we directly output all its combinations: BCH.
4. We track back and can now safely remove all H nodes from the initial
FP-tree, as shown below.
Solution to Q3 III
B 5
F 3
C 2
G 2
∅
B:5
C:2 F:2
G:1
F:1
G:1
We now find G’s conditional pattern base as:
B F : 1
F : 1
Only F is frequent. We output FG. It is clear that we can stop.
5. We track back and can now safely remove all G nodes from the FP-tree,
and then process C’s conditional pattern base:
B : 2
B is frequent, output BC, and we can stop here.
6. We track back and can now safely remove all C nodes from the FP-tree,
and then process F’s conditional pattern base:
Solution to Q3 IV
B : 2
B is frequent, output BF, and we can stop here.
7. Since we are left with one item (B) only, we can output stop the whole
mining process.