COMP606 Assignment 2
Data Cleaning Procedure
Each dataset has a different temporal resolution. This temporal resolution differs in each monitoring station and it depends on the purpose of data collection. This means some datasets might have hourly interval while others collect data every 10 or 3 minutes. Regardless of this temporal interval, you need to take the following steps for data cleaning:
1) Identify data that are out of the expected range and delete the value (not the entire row). For example:
a. Unusual reading of Air temperature (51 (¡ãC)).
b. Particulate Matter (PM10, PM2.5), NO, NO2, NOX, SO2, and CO concentration cannot be zero or negative. These values indicate
sensor fault.
c. Humidity cannot be zero, negative or above 100%.
d. Any unexplained outliers (unexplained high measurements).
2) Calculate 24-hours average.
3) Remove days with null values.
Note: The standard guidelines for calculating 24-hour average are different for each pollutant. By not taking these guidelines into account, your data will not fit a scientific conclusion. However, for this assignment, you can include all available data (as long as they meet the above- mentioned criteria) for 24-hour average calculation.