程序代写代做代考 algorithm ## Tools

## Tools

**Spark version**: 2.2.1
**Scala version**: 2.11

## Commands

I use the `time` command to record the execution time.

“`bash
Small2.case1.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/small2.csv 3

Small2.case2.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/small2.csv 5

Beauty.case1-50.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/beauty.csv 50

Beauty.case2-40.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/beauty.csv 40

Books.case1-1200.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 1 Data/books.csv 1200

Books.case2-1500.txt:

time spark-submit –class FrequentItemsets Yang_Yueqin_SON.jar 2 Data/beauty.csv 1500
“`

## Run Time

| File Name | Case Number | Support | Runtime (sec) |
| ——— | ———– | ——- | ————- |
| beauty.csv | 1 | 50 | 484.52 |
| beauty.csv | 2 | 40 | 56.29 |
| books.csv | 1 | 1200 | 920.63 |
| books.csv | 2 | 1500 | 111.53 |

## Approach

I use SON algorithm as required and the `A-Priori` algorithm to process each chunk. I first use `HashMap` to compute the counts of each single item, and filter out the frequent singletons. Then I use the loop to get the frequent $n+1$ items from $n$ items. Because $n+1$ frequent set is the union of two $n$ frequent set.
Thus, I get the candidate set as follows
“`
for each pair of a, b frequent items with length n:
c = union of a and b
if c has length n+1
it is a candidate item

“`