hadoop代写 HW1

Description

HW1

Deadline: Feb. 19th 5:59 P.M. (before class)

In the lab session, we have shown how to perform WordCount using Hadoop. Now in this homework, we ask you to write a MapReduce program to count the 2nd letter of each word.

Input

Download the Complete Works of William Shakespeare from Project Gutenberg at http://www.gutenberg.org/cache/epub/100/pg100.txt or download the pg100.txt from Blackboard.

TODO

In your implementation, you need to
1. transform all words to lowercase
2. ignore all non-alphabetic characters (except whitespace)
3. count nothing if there’s only one alphabetic character in a word (e.g. A2)

Output must be in the following format

a<\tab>count of the 2nd letters “a” (e.g. bank)
b<\tab>count of the 2nd letters “b” (e.g. ****abstract, a–b–c) …

Submission
Upload your java file(s) (Mapper, Reducer, Driver) and output results to blackboard.