The deadline for project 1 is:
Fri 08 Nov, 5:00 pm
1. Aims
Project 1 SQL
This project aims to give you practice in
reading and understanding a moderately large relational schema MyMyUNSW.
implementing SQL queries and views to satisfy requests for information.
The goal is to build some useful data access operations on the MyMyUNSW database. A
theme of this project is dirty data. As I was building the database, using a collection of reports from UNSWs information systems and the database for the academic proposal system MAPPS, I discovered that there were some inconsistencies in parts of the data e.g. duplicate entries in the table for UNSW buildings, or students who were mentioned in the student data, but had no enrolment records, and, worse, enrolment records with marks and grades for students who did not exist in the student data. I removed most of these problems as I discovered them, but no doubt missed some. Some of the exercises below aim to uncover such anomalies; please explore the database and let me know if you find other anomalies.
2. Howtodothisproject:
Read this specification carefully and completely
Familiarize yourself with the database schema description, SQL schema, summary
Make a private directory for this project, and put a copy of the proj1.sql template there
You must use the create statements in proj1.sql when defining your solutions
Look at the expected outputs in the expectedqX tables loaded as part of the check.sql file
Solve each of the problems below, and put your completed solutions into proj1.sql
Check that your solution is correct by verifying against the example outputs and by using
the checkqX functions
Test that your proj1.sql file will load without error into a database containing just the
original MyMyUNSW data
Doublecheck that your proj1.sql file loads in a single pass into a database containing just
the original MyMyUNSW data
Submit the project via give
PLpgSQL functions are not allowed to use in this project
For each question, you must output result within 120 seconds on Grieg server.
3. Introduction
All Universities require a significant information infrastructure in order to manage their affairs. This typically involves a large commercial DBMS installation. UNSWs student information system sits behind the MyUNSW web site. MyUNSW provides an interface to a PeopleSoft enterprise management system with an underlying Oracle database. This backend system PeoplesoftOracle is often called NSS.
UNSW has spent a considerable amount of money 80M on the MyUNSWNSS system, and it handles much of the educational administration plausibly well. Most people gripe about the quality of the MyUNSW interface, but the system does allow you to carry out most basic enrolment tasks online.
Despite its successes, MyUNSWNSS still has a number of deficiencies, including:
no waiting lists for course or class enrolment
no representation for degree program structures
poor integration with the UNSW Online Handbook
The first point is inconvenient, since it means that enrolment into a full course or class becomes a sequence of trialanderror attempts, hoping that somebody has dropped out just before you attempt to enrol and that noone else has grabbed the available spot.
The second point prevents MyUNSWNSS from being used for three important operations that would be extremely helpful to students in managing their enrolment:
finding out how far they have progressed through their degree program, and what remains to be completed
checking what are their enrolment options for next semester e.g. get a list of suggested courses
determining when they have completed all of the requirements of their degree program and are eligible to graduate
NSS contains data about student, courses, classes, prerequisites, quotas, etc. but does not contain any representation of UNSWs degree program structures. Without such information in the NSS database, it is not possible to do any of the above three. So, in 2007 the COMP9311 class devised a data model that could represent program requirements and rules for UNSW degrees. This was built on top of an existing schema that represented all of the core NSS data students, staff, courses, classes, etc.. The enhanced data model was named the MyMyUNSW schema.
The MyMyUNSW database includes information that encompasses the functionality of NSS, the UNSW Online Handbook, and the CATS room allocation database. The MyMyUNSW data model, schema and database are described in a separate document.
4. SettingUp
To install the MyMyUNSW database under your Grieg server, simply run the following two commands:
createdb proj1
psql proj1 f homecs9311web19T3projproj1mymyunsw.dump
If youve already set up PLpgSQL in your template1 database, you will get one error message as the database starts to load:
psql:mymyunsw.dump:NN: ERROR: language plpgsql already exists
You can ignore this error message, but any other occurrence of ERROR during the load needs to be investigated.
If everything proceeds correctly, the load output should look something like:
SET
SET
SET
SET
SET
psql:mymyunsw.dump:NN: ERROR: language plpgsql already exists … if PLpgSQL is not already defined,
… the above ERROR will be replaced by CREATE LANGUAGE SET
SET
SET
CREATE TABLE
CREATE TABLE
… a whole bunch of these
CREATE TABLE
ALTER TABLE
ALTER TABLE
… a whole bunch of these
ALTER TABLE
Apart from possible messages relating to plpgsql, you should get no error messages. The database loading should take less than 60 seconds on Grieg, assuming that Grieg is not under heavy load. If you leave your project until the last minute, loading the database on Grieg will be considerably slower,
thus delaying your work even more. The solution: at least load the database Right Now, even if you dont start using it for a while. Note that the mymyunsw.dump file is 50MB in size; copying it under your home directory or your srvr directory is not a good idea.
If you have other large databases under your PostgreSQL server on Grieg or you have large files under your srvrYOU directory, it is possible that you will exhaust your Grieg disk quota. In particular, you will not be able to store two copies of the MyMyUNSW database under your Grieg server. The solution: remove any existing databases before loading your MyMyUNSW database.
If you are running PostgreSQL at home, you can download the files: mymyunsw.dump, proj1.sql to get you started. You can grab the check.sql separately, once it becomes available.
A useful thing to do initially is to get a feeling for what data is actually there. This may help you understand the schema better, and will make the descriptions of the exercises easier to understand. Look at the schema. Ask some queries. Do it now.
Examples …
psql proj1
… PostgreSQL welcome stuff … proj1 d
… look at the schema … proj1 select from Students;
… look at the Students table …
proj1 select p.unswid,p.name from People p join Students s on p.ids.id;
… look at the names and UNSW ids of all students …
proj1 select p.unswid,p.name,s.phone from People p join Staff s on p.ids.id; … look at the names, staff ids, and phone s of all staff …
proj1 select count from CourseEnrolments;
… how many course enrolment records …
proj1 select from dbpop;
… how many records in all tables …
proj1 select from transcript3197893;
… transcript for student with ID 3197893 …
proj1 … etc. etc. etc.
proj1 q
You will find that some tables e.g. Books, Requirements, etc. are currently unpopulated; their contents are not needed for this project. You will also find that there are a number of views and functions defined in the database e.g. dbpop and transcript from above, which may or may not be useful in this project.
Summary on Getting Started
To set up your database for this project, run the following commands in the order supplied: createdb proj1
psql proj1 f homecs9311web19T3projproj1mymyunsw.dump
psql proj1
… run some checks to make sure the database is ok
mkdir Project1Directory
… make a working directory for Project 1
cp homecs9311web19T3projproj1proj1.sql Project1Directory
The only error messages produced by these commands should be those noted above. If you omit any of the steps, then things will not work as planned.
Notes
Read these before you start on the exercises:
the marks reflect the relative difficultylength of each question
use the supplied proj1.sql template file for your work
you may define as many additional functions and views as you need, provided that a the
definitions in proj1.sql are preserved, b you follow the requirements in each question on
what you are allowed to define
make sure that your queries would work on any instance of the MyMyUNSW schema;
dont customize them to work just on this database; we may test them on a different
database instance
do not assume that any query will return just a single result; even if it phrased as most or
biggest, there may be two or more equally big instances in the database
when queries ask for peoples names, use the Person.name field; its there precisely to
produce displayable names
when queries ask for student ID, use the People.unswid field; the People.id field is an internal numeric key and of no interest to anyone outside the database
unless specifically mentioned in the exercise, the order of tuples in the result does not matter; it can always be adjusted using order by. In fact, our check.sql will order your results automatically for comparison.
the precise formatting of fields within a result tuple does matter; e.g. if you convert a number to a string using tochar it may no longer match a numeric field containing the same value, even though the two fields may look similar
develop queries in stages; make sure that any subqueries or subjoins that youre using actually work correctly before using them in the query for the final viewfunction
You can define either SQL views OR SQL functions to answer the following questions.
If you meet with error saying something like cannot change name of view column, you can
drop the view you just created by using command drop view VIEWNAME cascade; then create your new view again.
Each question is presented with a brief description of whats required. If you want the full details of the expected output, take a look at the expectedqX tables supplied in the checking script.
5. Tasks
To facilitate the semiauto marking, please pack all your SQL solutions into view or function as defined in each problem see details from the solution template we provided.
Q1 4 marks
Define an SQL view Q1unswid,longname that gives the distinct room id and name of any room that is Airconditioned refers to the facilities.description. The view should return the following details about each room:
unswid should be taken from Rooms.unswid field.
longname should be taken from Rooms.longname field.
Q2 4 marks
Define a SQL view Q2unswid,name that displays unswid and name of all the distinct staff who taught the student Hemma Margareta. The view should return the following details about each staff:
unswid should be taken from People.unswid field.
name should be taken from People.name field.
Q3 4 marks
Define a SQL view Q3unswid,name that gives all the distinct international students who enrolled in COMP9311 and COMP9024 in the same semester and got HD Courseenrolments.mark 85 in both courses. The view should return the following details about each student:
unswid should be taken from People.unswid field.
name should be taken from People.name field.
Q4 4 marks
Define a SQL view Q4numstudent that gives the number of distinct students who get more HD Courseenrolments.mark 85 than average student. For example, if on average a student gets 3 HDs, we only count students who get more than 3 HDs.
Note:
when calculating the average, only consider students who have at least one not null mark.
Q5 5 marks
Define a SQL view Q5code,name,semester that displays the courses with the lowest maximum mark in each semester. The view should return the following details about each course:
code should be taken from Subjects.code field.
name should be taken from Subjects.name field.
semester should be taken from Semesters.name field.
Note:
only consider valid courses which have at least 20 not null mark records.
if several courses have the same lowest maximum mark in one semester, return all of them.
skip the semester with no valid course.
Q6 5 marks
Define SQL view Q6num, which gives the number of distinct local students enrolling in 10S1 in Management stream but never enrolling in any course offered by Faculty of Engineering.
Note:
the student IDs are the UNSW ids i.e. student numbers defined in the People.unswid field.
Do not count duplicate records.
Q7 6 marks
Database Systems course admins would like to know the average mark of each semester. Define a SQL view Q7year, term, averagemark to help them monitor the average mark each semester. Each tuple in the view should contain the following:
the year Semesters.year of the semester
the term Semesters.term
the average mark of students enrolled in this course this semester as numeric4,2
Database Systems has value Database Systems in the Subjects.name field. Y ou can find the information about all the course offerings for a given subject from Courses. You should calculate the average mark of enrolled students for a course offering from the table Courseenrolments. Note:
There are two subjects that share the same name Database Systems, and we do not distinguish them in this question. In consequence, you may find more than one course for a single semester. In such case, there is no student enrolling in more than one course.
When calculating the average marks, only consider not null mark records.
Only consider the semesters which have Database Systems.
Q8 6 marks
The head of school would like to know the performance of students in a set of CSE subjects. A subject in this set has two properties: 1 its subject code must start with COMP93, and 2 it must be offered in every major semester i.e., S1 and S2 from 2004 inclusive to 2013 inclusive. The head of school requests a list of students who failed every subject in the set. We say a student fails a subject if heshe had received a not null mark 50 for at least one course offering of the subject. Define a SQL view Q8zid, name for the head of school. Each tuple in the view should contain the following:
zid z concatenating with People.unswid field
student name taken from the People.name field
Note:
For a given subject, the number of course offerings that a student can enroll in is at least one. For example, one may fail the course offering of a subject in 03S1, and then reenroll in another course offering of the same subject in 04S1.
Assume there are two subjects in the set, A and B. We only count students who failed both A and B, but not students who failed either A or B.
Q9 6 marks
Define SQL view Q9unswid,namethat gives all the distinct students who are satisfied the following conditions in one program:
enroll a program in BSc refer to programdegrees.abbrev
must pass at least one course in the program in semester 2010 S2.
average mark 80. Average mark means the average mark of all courses a student has
passed before 2011exclusive in the program.
the total UOC refer to subjects.uoc earned in the program should be no less than the
required UOC of the program refer to programs.uoc. A student can only earn the UOC
of the courses heshe passed.
The view should return the following details about each student:
before 2011 exclusive
Note:
unswid should be taken from People.unswid field. name should be taken from People.name field.
to pass a course, a student must get at least 50 in that course Courseenrolments.mark 50.
if a student has enrolled into several different programs, you need to calculate the UOC and average mark separately according to different programs. A course belongs to a program if this student enrolls into course and program in a same semester refer to semesters.id.
Q10 6 marks
The university is interested in the Lecture Theatre usage status in 2011 S1. So please define SQL view Q10unswid,longname,num,rank, which gives the distinct room id and name of all the Lecture Theatre with the number of distinct classes that use this theatre and the rank. Theatre rankings are ordered by num from highest to lowest. If there are multiple theatres with the same num, they have the same rank. The view should return the following details about each theatre:
unswid should be taken from Rooms.unswid field.
longname should be taken from Rooms.longname field.
num counts the total number of distinct classes that use this theatre.
rank records the rank of the number of distinct classes that use this theatre.
Note:
if there is no class using a theatre, the num would be 0.
the ranking is with gaps. i.e., if there are 2 theatres ranked as first, the third theatre will be ranked as third.
6. Submission
You can submit this project by doing the following:
The file name should be proj1.sql.
Log into the CSE server, ensure that you are in the directory containing the file to be
submitted.
Type give cs9311 proj1 proj1.sql to submit.
You can also use the web give system to submit.
If you submit your project more than once, the last submission will replace the previous one
In case that the system is not working properly, you must take the following actions:
Please keep a screen capture including timestamp and the size of the submitted file for
your submissions as proof. If you are not sure how, please have a look at the guidelines.
Please keep a copy of your submitted file on the CSE server. If you are not sure how, please
have a look at taggi.
The proj1.sql file should contain answers to all of the exercises for this project. It should be completely
selfcontained and able to load in a single pass, so that it can be autotested as follows:
a fresh copy of the MyMyUNSW database will be created using the schema from mymyunsw.dump
the data in this database may be different from the database that youre using for testing
a new check.sql file may be loaded with expected results appropriate for the database
the contents of your proj1.sql file will be loaded
each checking function will be executed, and the results recorded
Before you submit your solution, you should check that it will load correctly for testing by using something like the following operations:
dropdb proj1 … remove any existing DB
createdb proj1 … create an empty database
psql proj1 f homecs9311web19T3projproj1mymyunsw.dump … load the MyMyUNSW schema and data
psql proj1 f homecs9311web19T3projproj1check.sql … load the checking code
psql proj1 f proj1.sql
psql proj1
proj1 select checkq1; …
If your code loads with errors, fix it and repeat the above until it does not.
You must ensure that your proj1.sql file will load correctly i.e. it has no syntax errors and it contains all of your view definitions in the correct order. If I need to manually fix problems with
your proj1.sql file in order to test it e.g. change the order of some definitions, you will be fined via
… load your solution
… check your solution to question1 …checkyoursolutiontoquestion5
proj1 selectcheckq5;
…
proj1 select checkq10;
proj1 select checkall;
Note: if your database contains any views or functions that are not available in a file somewhere, you should put them into a file before you drop the database.
… check your solution to question10 … check all your solutions
half of the mark penalty for each problem. In addition, make sure that your queries are reasonably efficient. For each question, you must output result within 120 seconds on Grieg server. This time restriction applies to the execution of the select from checkQn calls. For each question, you will be fined via half of the mark penalty if your solution cannot output results within 120 seconds.
7. LateSubmissionPenalty
20 reduction for each day late.