程序代写代做代考 algorithm Tools

Tools
Spark version: 2.2.1 Scala version: 2.11
Commands
I use the time command to record the execution time.
Small2.case1.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/small2.csv 3

Small2.case2.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/small2.csv 5

Beauty.case1-50.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/beauty.csv 50

Beauty.case2-40.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/beauty.csv 40

Books.case1-1200.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/books.csv 1200

Books.case2-1500.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/beauty.csv 1500
Run Time
File Name
Case Number
Support
Runtime (sec)

beauty.csv
1
50
484.52

beauty.csv
2
40
56.29

books.csv
1
1200
920.63

books.csv
2
1500
111.53

Approach
I use SON algorithm as required and the A-Priori algorithm to process each chunk. I first use HashMap to compute the counts of each single item, and filter out the frequent singletons. Then I use the loop to get the frequent items from items. Because frequent set is the union of two frequent set. Thus, I get the candidate set as follows
for each pair of a, b frequent items with length n:
c = union of a and b
if c has length n+1
it is a candidate item

Related Posts