Versions Used
I choose Scala for this project. The version I use is 2.11. The spark version I use is 2.2.1.
Methods
I use A-Priori in the computing of local frequent items in the SON algorithm. The hard part
of this approach is to get and filter the candidate sets. I first generate the combination of
length n, and then filter out only the combination where each of the n-1 sub-combination
has already in the frequent set. In this way, I can filter out many no-hope candidates to save
computing.
Run
Small2.csv case 1 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/small2.csv 3
Small2.csv case 2 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/small2.csv 5
Beauty.csv case 1 support 50 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/beauty.csv 50
Beauty.csv case 2 support 40 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/beauty.csv 40
Books.csv case 1 support 1200 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 1 Data/books.csv 1200
Books.csv case 2 support 1500 :
bin/spark-submit –class project Hu_Zhendong_SON.jar 2 Data/beauty.csv 1500
Time
File Name Case Number Support Runtime (sec)
beauty.csv 1 50 373
beauty.csv 2 40 53
beauty.csv 1 1200 896
beauty.csv 2 1500 105
Versions Used
Methods
Run
Time