Versions Used
I choose Scala for this project. The version I use is 2.11. The spark version I use is 2.2.1.
Methods
I use A-Priori in the computing of local frequent items in the SON algorithm. The hard part of this approach is to get and filter the candidate sets. I first generate the combination of length n, and then filter out only the combination where each of the n-1 sub-combination has already in the frequent set. In this way, I can filter out many no-hope candidates to save computing.
Run
Small2.csv case 1 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/small2.csv 3
Small2.csv case 2 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/small2.csv 5
Beauty.csv case 1 support 50 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/beauty.csv 50
Beauty.csv case 2 support 40 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/beauty.csv 40
Books.csv case 1 support 1200 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/books.csv 1200
Books.csv case 2 support 1500 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/beauty.csv 1500
Time
File Name
Case Number
Support
Runtime (sec)
beauty.csv
1
50
373
beauty.csv
2
40
53
beauty.csv
1
1200
896
beauty.csv
2
1500
105