程序代写代做 chain algorithm C database data mining MS6711 Data Mining

MS6711 Data Mining
Exercise 6

• Consider the rule “if milk, then alcohol” and the following information on the number of transactions.

Alcohol
No Alcohol
Total
Milk
90
58
148
No Milk
220
418
638
Total
310
476
786

• Find the support of the above rule. Interpret this number precisely in the context of the application.
• Find the confidence of the above rule. Interpret this number precisely in the context of the application.
• Find the lift of the above rule. Interpret this number precisely in the context of the application.

• Consider the following masket basket data:
Basket ID
Item
Basket ID
Item
1
A
5
B
1
B
5
C
1
C
5
D
2
B
5
D
2
C
6
A
2
C
6
B
3
A
6
C
3
D
7
B
4
A
7
D
4
B
8
B
4
D
8
C

Use Apriori Algorithm to derive all k-frequent itemsets with support count equals to 3.

• Describe how product hierarchy may be used in market basket analysis.

• Consider the following transactions

Transaction
Items
T1
Bread, Jelly, Peanut Butter
T2
Bread, Peanut Butter
T3
Bread, Milk, Peanut Butter
T5
Beer, Bread
T6
Beer, Milk
• Find all frequent itemsets using Apriori algorithm with support = 20% and confidence = 40%.
• Indicate the association rules that will be generated from each frequent itemset found in (a).
• Compute the support, confidence, and lift value for each rule generated in (b).
• Use SAS EM to generate the same set of association rules and compare your calculations to those as given by EM. Are they the same? If not, find out why they are different.

• Consider the following data of responses to various promotions:

• Use the above data to give confidence and support values for the following association rule:
IF Sex = Male & Magazine Promotion = Yes Then Life Insurance Promotion = Yes
• For this data set, is the rule useful in predicting the response to the Life Insurance Promotion?

• Consider the following data sequence: < {1,3} {2} {2,3} {4}>
• List all the 4-item subsequences contained in the data sequence assuming that no timing constraint is imposed.
• List all the 3-element subsequences contained in the data sequences for part (a) assuming that no timing constraint is imposed.

• Consider the following sequential database with maximum span = and window size = 0:
Object
Timestamp
Events
A
10
2,3,5
A
20
6,1
A
23
1
B
11
2,5,6
B
17
2
B
21
4,7,1,2
C
14
2,6
C
28
1,7,3

• How many data sequences are there in the database?
• How many elements are there in each data sequence?
• List all of the data sequences for each object.
• List all 2-element subsequences contain in the data sequence for object A.
• List all 5-item subsequences contained in the data sequecne for object A.
• Compute the support for sequences <{2,6}>, and <{2} {6}>.

• Consider the following masket basket data:
Customer ID
Timestamp
Item
Customer ID
Timestamp
Item
1
1
A,B
5
1
B,C,D
1
2
B,C
5
2
B,C
1
3
D
5
4
C
2
1
B
5
6
D
2
3
C
6
1
B
2
5
D
6
2
C
3
1
A,D,B
6
3
D
3
2
B
7
1
B,C
4
1
B
7
4
D
4
2
C,D
7
5
C
4
3
D
7
8
D

• Calculate the support for the following rule: <{B,C}> => <{D}>, assuming that no timing constraints are imposed.
• List all valid 2-element subsequences for the customer with ID = 5 when the maximum span is set to 3 time units.
• List all valid 2-element subsequences for the customer with ID = 5 when the window size is set to 2 time units.
• Repeat part (a) but subject to a maximum span of 3 time units.
• Repeat part (a) but subject to a window size of 2 time units.

Exercises for Association Analysis with SAS EM
• Data BANKSERVICE.SAS7BDAT contains information on the type of bank products owned by the customers of a bank. Use SAS EM to find out all the association rules that could be valuable to the bank for promoting credit card (CCRD) ownership.

• Use SAS EM to answer parts (a), (d), and (e) of Question 8.

• Data set MARKSIX_2011.SAS7BDAT contains results of Mark Six Lottery for the year 2011. Find the most frequently appeared 3 numbers, 4 numbers, 5 numbers, and six numbers combination.

• Refer to the Grocery_seq.sas7bdat data (Example 4) in the notes, assuming that no timing constraints are imposed.
• Find one association rule that could/would be valuable to the grocery chain stores. Write down the rule and explain why it could be valuable. Be sure to report the support, confidence, and lift of the rule.
• The owner of the chain stores notices the sales of Bordeaux are rather low. Based on the results of basket analysis, what is your suggestion(s) to the store owner to improve the sales of Bordeaux?

• Use SAS EM to generate all association rules for predicting the Life Insurance Promotion with the data given in Question 5. You need to create your own transaction data set for such purpose. Are there any difference in the characteristics between those who respond to the Life Insurance Promotion and those who don’t respond?