DATA1001 assignment – data dictionary
This data set contains 64,486 anonymised student grades for 14 first year Maths units of study from 2012 to 2017. The data provided is real data from the University’s student system and has been provided by Institutional Analytics and Planning, the department of the University responsible for student data reporting and analysis.
There are two data sets, one a ‘standard’ and an ‘advanced’ set. The fields in each set are mostly the same. Below is a list of definitions for each of the fields:
Standard data set:
· Year: The academic year in which the unit of study was run. This is an integer between 2012 and 2017. Both semesters 1 and 2 have been combined together.
· Domestic/Intl: Whether the student is a domestic or an international student. “D” denotes a domestic student and “I” denotes an international student.
· Gender: The gender of the student. “M” denotes male students and “F” denotes female students. To preserve anonymity, students who identify as neither male nor female have been coded as female. This is the same approach that is used by the Department of Education when reporting aggregated student statistics: see the notes on ‘Gender’ at http://highereducationstatistics.education.gov.au/DataNotes.aspx
· Mode: Whether the student is full time or part time. Full time is defined as taking 18 or more credit points in the semester in which the student took the unit of study.
· Age: The age of the student at the time that they undertook the unit of study. This is reported as one of four bands:
· 18 and under
· 19-21
· 22-25
· Over 25
· Unit of Study: The “name” of the unit of study. To preserve anonymity, this is not the actual unit of study code for the unit but rather a made-up identifier such as “Unit A”, “Unit B” and so forth. You can assume that each of these identifiers relate to a junior unit of study offered by the School of Mathematics and Statistics between 2012 and 2017.
· Unit of Study Level: The level of the unit of study – either fundamental, mainstream or advanced.
· Unit of Study Grade: The final grade achieved by the student in the unit of study:
· FA: Fail (0-49)
· PS: Pass (50-64)
· CR: Credit (65-74)
· DI: Distinction (75-84)
· HD: High Distinction (85-100)
Grades other than these five descriptors (for example, discontinuations, withdrawals, absent fails and the like) have been removed from the dataset to preserve anonymity.
· Count: The number of grades with the preceding attributes.
Advanced data set:
· The advanced dataset is the same as the standard data set except the “Count” column has been replaced by a “Student Identifier” column, which is a hashed representation of the student’s ID number. In this dataset, each row denotes a unit of study grade.
· The purpose of this column is to facilitate the discovery of deeper correlations between students as the performance of a student in one unit of study given their performance in another unit of study can be determined.