程序代写代做代考 algorithm Yang_Yueqin_Description

Yang_Yueqin_Description

Tools

Spark version: 2.2.1 Scala version: 2.11

Commands

I use the time command to record the execution time.

Run Time

Small2.case1.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1

Data/small2.csv 3

Small2.case2.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2

Data/small2.csv 5

Beauty.case1-50.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1

Data/beauty.csv 50

Beauty.case2-40.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2

Data/beauty.csv 40

Books.case1-1200.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1

Data/books.csv 1200

Books.case2-1500.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2

Data/beauty.csv 1500

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

File Name Case Number Support Runtime (sec)

beauty.csv 1 50 484.52

beauty.csv 2 40 56.29

books.csv 1 1200 920.63

books.csv 2 1500 111.53

Approach

I use SON algorithm as required and the A-Priori algorithm to process each chunk. I first use
HashMap to compute the counts of each single item, and filter out the frequent singletons.

Then I use the loop to get the frequent items from items. Because frequent set is
the union of two frequent set. Thus, I get the candidate set as follows

for each pair of a, b frequent items with length n:

c = union of a and b

if c has length n+1

it is a candidate item

1

2

3

4

5

Tools
Commands
Run Time
Approach