CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 1 of 6
Project 1:
Submission deadlines: 5:00 pm, Friday 18th September 2020
Value: 15% of CITS1401.
To be completed individually.
You should construct a Python 3 program containing your solution to the following problem
and submit your program electronically on Moodle. No other method of submission is
allowed. Your program will be automatically tested on Moodle. Remember your first two
checks against the tester on Moodle will not have any penalty. However any further check
will carry 10% penalty per check.
You are expected to have read and understood the University’s guidelines on academic
conduct. In accordance with this policy, you may discuss with other students the general
principles required to understand this project, but the work you submit must be the result of
your own effort. Plagiarism detection, and other systems for detecting potential malpractice,
will therefore be used. Besides, if what you submit is not your own work then you will have
learnt little and will therefore, likely, fail the final exam.
You must submit your project before the submission deadline listed above. Following UWA
policy, a late penalty of 5% will be deducted for each day (or part day), after the deadline,
that the assignment is submitted. No submissions will be allowed after 7 days following the
deadline except approved special consideration cases.
Overview
The Government of Australia established the Bureau of Meteorology (“BOM”) in 1906 under
the Meteorology Act which, at the time, brought together the state meteorological services.
The main objective of the BOM is to provide weather services to Australia and its surrounding
areas.
A research group in UWA requires a computer program which can read the data from a csv
(comma separated values) file and return different statistical aspects of the rainfall recorded
in the Perth metropolitan area. Your task is to write this program which will meet the following
specification.
Specification (i.e. what your program will need to do)
Input:
Your program must define the function main with the following signature:
def main(csvfile, year, type):
The input arguments to this function are:
• csvfile is the name of the CSV file containing the record of the rainfall in Perth which needs
to be analysed. The first row of the CSV file will contain the headers. From the second row,
the first value of each row contains the station number of the Bureau of Meteorology, and
the second, third and fourth values will contain the year, month and day respectively. The
http://www.teachingandlearning.uwa.edu.au/staffnet/policies/conduct
http://www.teachingandlearning.uwa.edu.au/staffnet/policies/conduct
CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 2 of 6
last value of the row will contain the recorded rainfall in millimetres. We do not have prior
knowledge about the number of days of rainfall (i.e. the number of rows) that the CSV file
will contain.
• year is the year or years for which we are looking to analyse. This input argument will accept
an integer as a year if the third input requires statistical analysis of a particular year.
Otherwise this input argument will contain a list of two integers containing two years for
which a correlation is to be calculated instead.
• type is the input argument which mentions which type of analysis are required. It can take
only one of the two string inputs: “stats” or “corr”. If third input argument is “stats”,
then the objective of the program is to find the statistical analysis of a single year. Otherwise
if the third input argument is “corr” then the objective of the program is to find the
correlation of statistical data of two years.
Output:
The function is required to return the following outputs in the order provided below:
• A list containing the minimum recorded rainfall (that is greater than zero) for each month
of the year (i.e. the year provided as the second input argument) if third input is “stats”.
Otherwise if the third input parameter is “corrs”, then the output should be a single value
that is the correlation of the minimum recorded rainfalls of the two years (that are provided
as a single list in second input argument).
• A list or value in the same fashion as the above that contains or is the correlation of the
maximum recorded rainfall.
• A list or value in the same fashion as the above that contains or is the correlation of the
average recorded rainfall.
• A list or value in the same fashion as the above that contains or is the correlation of the
standard deviations in recorded rainfall.
All returned lists should have the values recorded for each month of the year in order from
January to December. All returned output values (both in lists and individual) must contain
numerical values rounded to four decimal places (if required to be rounded off). Please do
not round the values during calculations. Instead round them only at the time that you save
them into the final output variables.
Examples:
Download the sample_rainfall_data.csv file from the folder of Project 1 on LMS or Moodle.
Some examples how you can call your program from the Python shell (and examine the
results it returns) are:
>>> mn1,mx1,avg1,std1 = main(‘sample_rainfall_data.csv’,2019, “stats”)
Your program’s outputs (that will be stored in the variables) are:
>>> mn1
[0.2, 0.2, 0.2, 3.0, 0.2, 0.2, 0.2, 0.2, 0.2, 0.8, 0.2, 0.2]
>>> mx1
CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 3 of 6
[4.8, 0.2, 3.4, 23.4, 7.2, 58.0, 44.2, 28.0, 13.8, 9.0, 13.0, 1.8]
>>> avg
[0.2323, 0.0071, 0.2, 1.4733, 0.5742, 7.0667, 3.5032, 3.6839, 1.0467, 0.5742,
0.6133, 0.0774]
>>> std
[0.8946, 0.0371, 0.6584, 4.4946, 1.6568, 12.4251, 8.3301, 7.563, 3.1806,
1.7029, 2.4684, 0.324]
>>> mn2,mx2,avg2,std2=main(‘sample_rainfall_data.csv’,[2019,2018],”corr”)
Your program’s outputs (that will be stored in the variables) are:
>>> mn2
-0.0916
>>> mx2
0.0677
>>> avg2
0.7229
>>> std2
0.254
Assumptions:
Your program can assume a number of things:
• Anything that is meant to be a string (i.e. header row) will be a string, and anything that is
meant to be a number (i.e. data) will be a number.
• The order of columns in each row will follow the order of the headings provided in the first
row. However, the rows may be in random order (i.e. they will not necessarily be in date
order) and their number are not constant across different testing CSV files.
• No data will be missing in the csv file except recorded rainfall. You can presume the
missing recorded rainfall as zero.
• It is not mandatory that rainfall is recorded all days of the month or year. Therefore, while
finding average or standard deviation for record of a month, you are required to not to
make assumptions about the missing days’ data.
• If there is no recorded rainfall over a month then consider it to be zero. The minimum will
also be considered as zero which cannot be the case otherwise.
• The main() function will always be provided with valid input parameters.
• The formula for standard deviation and correlation can be found at the end of the project
sheet.
CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 4 of 6
Important grading instruction:
You will have noticed that you have not been asked to write specific functions. That has been
left to you. However, it is essential that your program defines the top-level
function main(csvfile, year, type) (hereafter referred to as “main()” to save space when
writing it, note that when “main()” is written it still implies that it is defined with its three
input arguments). The idea is that within main(), the program calls the other functions. (Of
course, these functions may then call further functions.) The reason this is essential is that
when your program is tested on Moodle, the testing program will call your main() function.
So, if you fail to define main(), the testing program will not be able to test your program
and your submission will be graded zero. Don’t forget the penalty for repeated submissions
(see the Project 1 Moodle submission page for more information about this).
Things to avoid:
There are a few things for your program to avoid.
• You are not allowed to import any Python module. While use of the many of these
modules, e.g. csv or math is a perfectly sensible thing to do in a production setting, it takes
away much of the point of different aspects of the project, which is about getting practice
opening text files, processing text file data, and use of basic Python structures, in this case
lists and loops.
• Do not assume that the input file names will end in .csv. File name suffixes such as .csv and
.txt are not mandatory in systems other than Microsoft Windows. Do not enforce that within
your program that the file must end with a .csv or any other extension (or try to add an
extension onto the provided csvfile argument), doing so can easily lead to lost marks.
• Ensure your program does NOT call the input()function at any time. Calling the input()
function will cause your program to hang, waiting for input that automated testing system
will not provide (in fact, what will happen is that if the marking program detects the call(s),
and will not test your code at all which may result in zero grade).
• Your program should also not call the print()function at any time. If it has encountered an
error state and is exiting gracefully then your program needs to return zero as values for the
correlation otherwise empty lists. At no point should you print the program’s outputs instead
of (or in addition to) returning them or provide a printout of the program’s progress in
calculating such outputs.
Submission:
Submit your solution on Moodle before the deadline.
You need to contact unit coordinator if you have special considerations or you plan to be
making a submission after the mentioned due date.
CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 5 of 6
Marking Rubric:
Your program will be marked out of 30 (later scaled to be out of 15% of the final mark).
22 out of 30 marks will be awarded automatically based on how well your program completes
a number of tests, reflecting normal use of the program, and also how the program handles
various states including, but not limited to, different numbers of rows in the input file, missing
rainfall record(s) and / or any error states. You need to think creatively what your program
may face. Your submission will be graded by data files other than the provided data file.
Therefore you need to be creative to look into corner or worst cases. There are some hidden
tests in Moodle as well as the project may undergo further automated testing after the
deadline.
8 out of 30 marks will be awarded manually after the deadline. They will be based on style
(5/8) “the code is clear to read” and efficiency (3/8) “your program is well constructed and
runs efficiently”. For style, think about use of comments, sensible variable names, your name
at the top of the program, etc. (Please look at your lecture notes, where this is discussed.)
Style Rubric:
0 Gibberish, impossible to understand
1-2 Style is really poor or fair
3-4 Style is good or very good, with small lapses
5 Excellent style, really easy to read and follow
Your program will be traversing text files of various sizes (possibly including large csv files)
so try to minimise the number of times your program looks at the same data items. You may
think to use different data structures such as tuples, lists, or dictionaries.
Efficiency Rubric:
0 Code too incomplete to judge efficiency, or wrong problem tackled
1 Very poor efficiency, additional loops, inappropriate use of readline()
2 Acceptable or good efficiency with some lapses
3 Excellent efficiency, should have no problem on large files, etc.
Automated Moodle testing is being used so that all submitted programs are being tested the
same way. However, there is randomness in the testing data. Sometimes it happens that
there is one mistake in your program that means that no tests are passed and you will get
CITS1401 Computational Thinking with Python
Project 1 Semester 2 2020
Page 6 of 6
zero grade. Remember there is penalty for re-submissions. So it is better to check your
program thoroughly on Thonny before attempting to submit it on Moodle.
Formula:
Standard deviation is mathematically expressed as:
You can find more details at https://en.wikipedia.org/wiki/Standard_deviation
The correlation rxy for paired data {(x1,y1),…(xn,yn)} consisting of n pairs, is mathematically
expressed as:
https://en.wikipedia.org/wiki/Standard_deviation