HW3_Association_Rules
Student Information¶
Please fill in the block below. End each line with two spaces.
Name:
Email:
Date:
In [74]:
import pandas as pd
In [84]:
# Sample data path is used for display.
sample_data_path = ‘Online_Retail_Sample.csv’
# You may wish to extract other samples from the full dataset
# for practice.
real_data_path = ‘Online_Retail.csv’
In [ ]:
Sample Data Format¶
In [85]:
# The general format of the dataset.
# Each row corresponds to a transaction.
# Each number corresponds to the ID of a product bought in the transaction.
df = pd.read_csv(sample_data_path, names=list(range(0,6)))
display(df)
0 1 2 3 4 5
0 1084 1097 1126 2183.0 2375.0 NaN
1 1261 1394 2375 NaN NaN NaN
2 582 644 668 1082.0 1100.0 NaN
3 349 897 1142 1243.0 2316.0 2363.0
4 1098 1143 1816 2375.0 2402.0 NaN
5 121 219 363 1500.0 1943.0 NaN
6 964 1017 1126 2096.0 2183.0 NaN
7 1079 1189 2316 2356.0 NaN NaN
8 766 1079 1720 1816.0 2356.0 NaN
9 209 298 593 1565.0 NaN NaN
10 276 1709 1737 NaN NaN NaN
In [ ]:
Expected Output Format¶
Helper functions for prepping dummy data¶
In [86]:
def generate_dummy_data():
item_a_id = pd.DataFrame(random.sample(range(1, 2000), 100))
item_b_id = pd.DataFrame(random.sample(range(1, 2000), 100))
support_ab = pd.DataFrame([random.uniform(0, 0.1) for _ in range(100)])
confidence_a_to_b = pd.DataFrame([random.uniform(0, 0.5) for _ in range(100)])
confidence_b_to_a = pd.DataFrame([random.uniform(0, 0.5) for _ in range(100)])
return [item_a_id, item_b_id, support_ab,
confidence_a_to_b, confidence_b_to_a]
def generate_cols():
cols = [‘item_a’,
‘item_b’,
‘support_ab’,
‘confidence_a_to_b’,
‘confidence_b_to_a’]
return cols
def prep_df(dummy_data, cols):
df = pd.concat(dummy_data, axis=1)
df.columns = cols
df.sort_values(by=’support_ab’, inplace=True, ascending=False)
df.reset_index(inplace=True, drop=True)
return df
Expected output¶
Below you may find the type of output that you are expected to produce, sorted by support.
In [87]:
import pandas as pd
import random
dummy_data = generate_dummy_data()
cols = generate_cols()
df = prep_df(dummy_data, cols)
display(df)
item_a item_b support_ab confidence_a_to_b confidence_b_to_a
0 1591 1818 0.099874 0.188990 0.351385
1 661 10 0.099433 0.218578 0.353515
2 1757 1996 0.098234 0.350840 0.057046
3 350 1797 0.097202 0.365008 0.344481
4 649 1648 0.096667 0.000869 0.141335
5 757 1229 0.096079 0.475160 0.340172
6 760 723 0.095868 0.489010 0.456414
7 1456 503 0.093613 0.328031 0.431090
8 954 280 0.093305 0.442659 0.193428
9 1989 1368 0.092827 0.077373 0.180316
10 15 45 0.091973 0.085652 0.447166
11 41 1437 0.091866 0.229112 0.218635
12 1248 1098 0.091346 0.315638 0.321172
13 1961 357 0.088087 0.378523 0.129584
14 724 193 0.087411 0.410789 0.416510
15 1067 1340 0.086449 0.291933 0.287232
16 81 50 0.085358 0.335395 0.161805
17 478 1662 0.084293 0.437249 0.081967
18 200 1791 0.084246 0.167496 0.465840
19 1334 1790 0.082914 0.027239 0.483777
20 751 509 0.082302 0.067944 0.290998
21 656 1200 0.080859 0.439627 0.248154
22 214 1989 0.080674 0.304653 0.200182
23 1474 1656 0.080658 0.198373 0.367896
24 880 1357 0.077824 0.158162 0.063408
25 668 782 0.076413 0.235666 0.243995
26 994 1373 0.076406 0.299694 0.252299
27 702 1296 0.074409 0.272348 0.313928
28 1958 1658 0.072884 0.007035 0.304266
29 1545 953 0.071684 0.375723 0.286896
… … … … … …
70 693 356 0.030896 0.432301 0.283440
71 272 1417 0.029143 0.428839 0.034106
72 514 1139 0.027859 0.443013 0.131633
73 145 543 0.026729 0.344940 0.334551
74 983 1470 0.025193 0.485808 0.152076
75 936 1222 0.025070 0.402440 0.463816
76 138 169 0.022880 0.066795 0.369706
77 306 736 0.022332 0.223520 0.463071
78 42 1303 0.020956 0.133731 0.290236
79 1152 1670 0.020657 0.229391 0.276869
80 1817 1321 0.020309 0.294134 0.070691
81 915 1618 0.020277 0.388954 0.246102
82 87 877 0.017768 0.455113 0.458453
83 1434 1191 0.015685 0.310262 0.071290
84 1094 1706 0.015649 0.022586 0.493543
85 249 725 0.014983 0.214562 0.269430
86 1243 933 0.014486 0.425215 0.198460
87 614 1572 0.014074 0.488225 0.298015
88 156 1267 0.014060 0.067942 0.410818
89 717 1705 0.013757 0.305105 0.055637
90 1024 1817 0.012745 0.254574 0.431160
91 660 36 0.012188 0.491161 0.318240
92 1798 146 0.010543 0.030498 0.024547
93 573 100 0.010381 0.459442 0.362508
94 934 561 0.008149 0.222814 0.159166
95 533 1551 0.005294 0.117332 0.048354
96 653 434 0.002694 0.182333 0.075884
97 1863 1715 0.002145 0.422909 0.309899
98 603 1788 0.001197 0.286023 0.106362
99 80 1498 0.000891 0.066483 0.085059
100 rows × 5 columns
In [ ]:
Your code¶
You may use any existing packages, algorithms, etc.
Please include website links for any imports used.
Please describe your code and how you used any imports in the comments.
In [ ]:
# Your code goes here.
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]:
In [ ]: