REAME
1. Since the default input and write format of Hadoop is UTF-8, so I convert city.txt, county.txt and countrylanguage.txt to UTF-8 format.
2. For Computing Selection by MapReduce.
I implement the class City which is responsible for parsing the fields of city.txt.
Implement the class Q1Mapper which is Mapper and the class Q1 which is the Driver.
In the Q1Mapper, test if the population >= 300000, only write the city if it is true.
I don’t use the Reducer for this problem.
3. For Computing Projection by MapReduce.
It also uses the class City.
Implement the class Q2Mapper which is Mapper and the class Q2 which is the Driver.
In the Q2Mapper, only write the field city name and district.
I don’t use the Reducer for this problem.
4. For Computing Natural Join by MapReduce.
Implement the class Q3Mapper which is the Mapper, the class Q3Reducer which is the Reducer and class Q3 which is the Driver.
The Q3Mapper takes country.txt and countrylanguage.txt as input, and the output takes the CountryCode as key, country name in country.txt as value and English in countrylanguage.txt as value.
The Q3Reducer then tests if the key CountryCode has two values, if it has, write the country name.
5. For Aggregation by MapReduce.
Implement the class Q4Mapper which is the Mapper, the class Q4Reducer which is the Reducer and class Q4 which is the Driver.
The Q4Mapper takes the city.txt as input and output the district as key and integer 1 as value.
The Q4Reducer sums a district’s all 1 values and output the district and count pair.
/docProps/thumbnail.jpeg