程序代写代做 2020/4/6

2020/4/6
(4) COMP5349 – Workspaces
COMP5349 – Workspaces
COMP5349
Word Count Program
This is the word count program used in week 5 lecture to illustrate basic spark program structure. It reads a text file from local disk and count the occurance of words in the text. For simplicity, words are considered as separaetd by white space only.
4
5
Workspaces
week5-lecture
pyspark SparkConf, SparkContext
spark_conf SparkConf()\
.setAppName(“Week 5 Lecture Sample Code”)
sc SparkContext.getOrCreate(spark_conf)
input_file
output_path
text_file
counts .map(
‘file:///home/1984-GeorgeOrwell.txt’
‘file:///home/1984_wordcount’
sc.textFile(input_file)
text_file.flatMap( line: line.strip().split(” “)) \ word: (word, 1)) \
.reduceByKey( a, b: a b) counts.saveAsTextFile(output_path)
In [2]:
In [3]:
Movie Rating Computing
This is a sample notebook showing basic spark RDD operations. The program has two input data sources: ratings.csv and movies.csv.
The movies.csv file contains movie information. Each row represents one movie, and has the following format:
movieId,title,genres
The ratings.csv file contains rating information. Each row represents one rating of one movie by one user, and has the following format:
userId,movieId,rating,timestamp
The following cell defines a number of functions to be used in the computation
csv
“””
This module includes a few functions used in computing average rating per genre
“””
iMiTG ( d) https://edstem.org/courses/3954/workspaces/p7mCN0LPFQuFSuEdEcw0Q9tUbYBd0opf 1/1
htap evitaler esu nac uoy ,noitarugifnoc poodah daer ot elba si kraps fI#
SFDH nwo ruoy ot gnitniop htap tupni eht egnahc nac uoY#
+
adbmal
adbmal
=
tropmi
morf
adbmal
= =
=
= =
tropmi
fd