Microsoft Word – HWs.docx
HW1
Deadline: Feb. 19th 5:59 P.M. (before class)
Description
In the lab session, we have shown how to perform WordCount using Hadoop. Now in this
homework, we ask you to write a MapReduce program to count the 2nd letter of each word.
Input
Download the Complete Works of William Shakespeare from Project Gutenberg at
http://www.gutenberg.org/cache/epub/100/pg100.txt or download the pg100.txt from
Blackboard.
TODO
In your implementation, you need to
1. transform all words to lowercase
2. ignore all non-alphabetic characters (except whitespace)
3. count nothing if there’s only one alphabetic character in a word (e.g. A2)
Output must be in the following format
a<\tab>count of the 2nd letters “a” (e.g. bank)
b<\tab>count of the 2nd letters “b” (e.g. ****abstract, a–b–c)
…
Submission
Upload your java file(s) (Mapper, Reducer, Driver) and output results to blackboard.