代写 R C MapReduce Descriptions and Tasks

Descriptions and Tasks
Access data: Dataset is available as ncdcdata.zip at: https:1drv.msus!AslHHkfDcU2Ngx9FuLmjD9RSkH0A
https:pan.baidu.coms1IwLMQMCLGME3HGjUWcjQg codexkl3

More info about the data could be found at https:www.ncdc.noaa.govdataaccess
ftp:ftp.ncdc.noaa.govpubdatagsod
https:blog.csdn.netMrCharlesarticledetails50442367 in Chinese

Annual files:eg, gsod2006.tar All 2006 files compressed by station, in one tar file.etc, etc For each annual volume.Note: Each years data are contained in subdirectoriesfolders by year.Station files:eg, 010010999992006.op.gz Files by station year, identified by WMO number, WBAN number if appropriate, and year. For a cross reference of thefile names with location, see: ishhistory.txt.InformationalUtility Files:countrylist.txt: A list showing the station number range for each country.ishhistory.txt: A station list to be used with the data files, showing the names and locations for each station.Note: Global summary of day contains a subset of thestations listed in this station history.readme.txt: A description of the data and its format.

Description of Data Format:

FIELD POSITION TYPE DESCRIPTION
STN 16 Int. Station number WMODATSAV3 number
for the location.

WBAN 812 Int. WBAN number where applicablethis is the
historical Weather Bureau Air Force Navy
number with WBAN being the acronym.

YEAR 1518 Int. The year.

MODA 1922 Int. The month and day.

TEMP 2530 Real Mean temperature for the day in degrees
Fahrenheit to tenths. Missing 9999.9
Count 3233 Int. Number of observations used in
calculating mean temperature.

DEWP 3641 Real Mean dew point for the day in degrees
Fahrenheit to tenths. Missing 9999.9
Count 4344 Int. Number of observations used in
calculating mean dew point.

SLP 4752 Real Mean sea level pressure for the day
in millibars to tenths. Missing
9999.9
Count 5455 Int. Number of observations used in
calculating mean sea level pressure.

STP 5863 Real Mean station pressure for the day
in millibars to tenths. Missing
9999.9
Count 6566 Int. Number of observations used in
calculating mean station pressure.

VISIB 6973 Real Mean visibility for the day in miles
to tenths. Missing 999.9
Count 7576 Int. Number of observations used in
calculating mean visibility.

WDSP 7983 Real Mean wind speed for the day in knots
to tenths. Missing 999.9
Count 8586 Int. Number of observations used in
calculating mean wind speed.

MXSPD 8993 Real Maximum sustained wind speed reported
for the day in knots to tenths.
Missing 999.9

GUST 96100 Real Maximum wind gust reported for the day
in knots to tenths. Missing 999.9

MAX 103108 Real Maximum temperature reported during the
day in Fahrenheit to tenthstime of max
temp report varies by country and
region, so this will sometimes not be
the max for the calendar day. Missing
9999.9
Flag 109109 Char Blank indicates max temp was taken from the
explicit max temp report and not from the
hourly data. indicates max temp was
derived from the hourly data i.e., highest
hourly or synopticreported temperature.

MIN 111116 Real Minimum temperature reported during the
day in Fahrenheit to tenthstime of min
temp report varies by country and
region, so this will sometimes not be
the min for the calendar day. Missing
9999.9
Flag 117117 Char Blank indicates min temp was taken from the
explicit min temp report and not from the
hourly data. indicates min temp was
derived from the hourly data i.e., lowest
hourly or synopticreported temperature.

PRCP 119123 Real Total precipitation rain andor melted
snow reported during the day in inches
and hundredths; will usually not end
with the midnight observationi.e.,
may include latter part of previous day.
.00 indicates no measurable
precipitation includes a trace.
Missing 99.99
Note: Many stations do not report 0 on
days with no precipitationtherefore,
99.99 will often appear on these days.
Also, for example, a station may only
report a 6hour amount for the period
during which rain fell.
See Flag field for source of data.
Flag 124124 Char A 1 report of 6hour precipitation
amount.
B Summation of 2 reports of 6hour
precipitation amount.
C Summation of 3 reports of 6hour
precipitation amount.
D Summation of 4 reports of 6hour
precipitation amount.
E 1 report of 12hour precipitation
amount.
F Summation of 2 reports of 12hour
precipitation amount.
G 1 report of 24hour precipitation
amount.
H Station reported 0 as the amount
for the day eg, from 6hour reports,
but also reported at least one
occurrence of precipitation in hourly
observationsthis could indicate a
trace occurred, but should be considered
as incomplete data for the day.
I Station did not report any precip data
for the day and did not report any
occurrences of precipitation in its hourly
observationsits still possible that
precip occurred but was not reported.

SNDP 126130 Real Snow depth in inches to tenthslast
report for the day if reported more than
once. Missing 999.9
Note: Most stations do not report 0 on
days with no snow on the groundtherefore,
999.9 will often appear on these days.

FRSHTT 133138 Int. Indicators 1 yes, 0 nonot
reported for the occurrence during the
day of:
Fog F 1st digit.
Rain or Drizzle R 2nd digit.
Snow or Ice Pellets S 3rd digit.
Hail H 4th digit.
Thunder T 5th digit.
Tornado or Funnel Cloud T 6th
digit.

Task 1:
According to the description of NCDC data format in the Description of Data.txt file, you need to store all the data in the .op.gz file to the HDFS. And then load data from HDFS to table observations and counts of HBase. Set the column families of the two tables to info and data respectively. The counts table stores all the count information in the .op.gz files, and the observations table stores others.

Task 2:
An HBase table can be the source or target of a MapReduce job, or also we can use it as both input and output. Get data from tables observations and counts and use MapReduce to calculate the following results:
Which station has the most records? One row represents one data record one days data

Since each station only records part of the days in a year eg, the observation data of station which station ID is 00702699999 in 2016, this station only observed 8 days of data from June 22 to 29 in a year, you need to count which station has the most total days in the last 100 years.

Get one or more conclusions from the dataset by calculation and data processing. Give detailed procedures of the data analytics.