COMP3311 22T1 – Assignment 1
COMP3311 22T1
Copyright By PowCoder代写 加微信 powcoder
Assignment 1
Queries and Functions on MyMyUNSW
Database Systems
Last updated: Wednesday 16th March 11:25pm
Most recent changes are shown in red …
older changes are shown in brown.
[Assignment Spec] [Database Design] [Schema in SQL] [check1.sql] [Examples] [Fixes+Updates]
This assignment aims to give you practice in
reading and understanding a moderately large relational schema (MyMyUNSW)
implementing SQL queries and views to satisfy requests for information
implementing SQL/PLpgSQL functions to aid in satisfying requests for information
The goal is to build some useful data access operations on the MyMyUNSW
which contains a wealth of information about what happens at UNSW.
One aim of this assignment is to use SQL queries (packaged
as views) to extract such information.
Another is to build SQL/PlpgSQL functions that can support
higher-level activities, such as might be needed in a Web
interface.
A theme of this assignment is “dirty data”. As I was building the database,
using a collection of reports from UNSW’s information systems and the
database for the academic proposal system (MAPPS), I discovered
that there were some inconsistencies in parts of the data (e.g. duplicate
entries in the table for UNSW buildings, or students who
were mentioned in the student data, but had no enrolment records, and, worse,
enrolment records with marks and grades for students who did not exist in
the student data). I removed most of these problems as I discovered them,
but no doubt missed some. Some of the exercises below aim to uncover such
anomalies; please explore the database and let me know if you find
other anomalies.
Submission: Submit required files on moodle
Required Files: ass1.sql (contains both SQL views and SQL/PLpgSQL functions)
Deadline: 21:00 Friday 18 March 21:00 Saturday 19 March
Marks: 15 marks toward your total mark for this course
Late Penalty: 0.1 marks off the ceiling mark for each hour late, no submission is accepted 5 days (120h) after the deadline
How to do this assignment:
read this specification carefully and completely
create a directory for this assignment
copy the supplied files into this directory
login to d.cse and run your PostgreSQL server
load the database and start exploring
complete the tasks below by editing ass1.sql
submit ass1.sql via moodle
Details of the above steps are given below. Note that you can
put the files wherever you like; they do not have to be under
your /localstorage directory. You also edit your SQL files
on hosts other than d.cse. The only time that
you need to use d.cse is to manipulate your database.
Introduction
All Universities require a significant information infrastructure in order
to manage their affairs. This typically involves a large commercial DBMS
installation.
UNSW’s student information system sits behind the MyUNSW web site.
MyUNSW provides an interface to a PeopleSoft enterprise management
system with an underlying Oracle database. This back-end system
(Peoplesoft/Oracle) is sometimes called NSS. The specific version of
PeopleSoft that we use is called Campus Solutions.
Despite its successes, however, MyUNSW/NSS still has a number of
deficiencies, including:
no easy way to swap classes once enrolled
no representation for degree program structures
poor integration with the UNSW Online Handbook
The first point is inconvenient, since it means that the only way for
a student to change tute classes is to drop the course and re-enrol
into the course, selecting th new tute. If the course is already full,
students would be unwilling to drop the course in case someone else
grabs their place before they can re-enrol.
The second point prevents MyUNSW/NSS from being used for three
important operations that would be extremely helpful to students
in managing their enrolment:
finding out how far they have progressed through their degree
program, and what remains to be completed
checking what are their enrolment options for next semester
(e.g. get a list of suggested courses)
determining when they have completed all of the requirements
of their degree program and are eligible to graduate
In this assignment, you will be working with two instances
of a database to hold information about academic matters at
The first database instance (mymy1) contains data from 2000 to 2015.
The second database instance (mymy2) will contain data from 2014 to 2019.
Note that all People data about students, and much of the
People data about staff is synthetic.
Doing this Assignment
The following sections describe how to carry out this assignment.
Some of the instructions must be followed exactly; others require
you to exercise some discretion. The instructions are targetted at
people doing the assignment on d.cse. If you plan to work on this
assignment at home on your own computer, you’ll need to adapt the
instructions to local conditions.
If you’re doing your assignment on the CSE machines,
some commands must be carried out on d.cse,
while others can (and probably should) be done on a
CSE machine other than d.cse.
In the examples below, we’ll use vxdb$ to
indicate that the comand must be done on d.cse
and cse$ to indicate that it can be done elsewhere.
Setting Up
The first step in setting up this assignment is to set up a
directory to hold your files for this assignment.
cse$ mkdir /my/dir/for/ass1
cse$ cd /my/dir/for/ass1
cse$ cp /home/cs3311/web/22T1/assignments/ass1/ass1.sql ass1.sql
… or …
vxdb$ cd /localstorage/YourZid
vxdb$ cp /home/cs3311/web/22T1/assignments/ass1/ass1.sql ass1.sql
The next step is to set up your database:
… login to d.cse and source env as usual …
vxdb$ dropdb mymy … if you already had such a database
vxdb$ createdb mymy
vxdb$ bzcat /home/cs3311/web/22T1/assignments/ass1/mymy1.dump.bz2 | psql mymy …or mymy2.dump.bz2
vxdb$ psql mymy
… examine the database contents …
Note that the database dump is quite large.
It’s not worth copying it into your assignment directory
on the CSE servers, because you only need to read it
once to build your database (see below).
The database loading should take less than 60 seconds on d.cse, assuming
that d.cse is not under heavy load.
(If you leave your assignment until the last minute, loading the database
on d.cse will be considerably slower, thus delaying your work even more.
The solution: at least load the database Right Now, even if you
don’t start using it for a while.)
(Note that the mymy1.dump file is 50MB in size; copying it under
your home directory is not a good idea).
If you have other large databases under your PostgreSQL server on d.cse
or you have large files under your /localstorage/YOU/ directory,
it is possible that you will exhaust your d.cse disk quota. In particular,
you will not be able to store two copies of the MyMyUNSW database under
your d.cse server. The solution: remove any existing databases before
loading your MyMyUNSW database.
If you’re running PostgreSQL at home, the file
ass1.zip contains
copies of the files: mymy1.dump.bz2, mymy2.dump.bz2, ass1.sql
to get you started.
If you copy ass1.zip to your home computer, unzip it,
and perform commands analogous to the above, you should have a
copy of the MyMyUNSW database that you can use at home to do this
assignment.
Think of some questions you could ask on the database (e.g. like
the ones in Prac Exercises and Online Exercises) and work out
SQL queries to answer them.
One useful query is
mymy=# select * from dbpop();
This will give you a list of tables and the number of tuples in each.
Some tables are empty, and are not relevant to this assignment.
They are included for the sake of “completeness”, i.e. to show
what kinds of data might be stored in a real database to replace
MyUNSW/NSS.
Your Tasks
Answer each of the following questions by typing SQL or PLpgSQL
into the ass1.sql file.
You may find it convenient to work on each question in a temporary
file, so that you don’t have to keep loading all of the
other views and functions each time you change the one you’re
working on.
Note that you can add as many auxuliary views and functions to
ass1.sql as you want.
However, make sure that everything that’s required to
make all of your views and functions work ends up in the
ass1.sql file before you submit.
Q1 (1 mark)
Define an SQL view Q1(unswid,name) that gives the student id and
name of any student who has enrolled in more than 4 distinct programs at UNSW.
The name should be take from the People.name field for the
student, and the student id should be taken from People.unswid.
Q2 (1 marks)
Define an SQL view Q2(unswid,name,course_cnt) that gives the unswid
and name of the person(s) who has been course tutor of the most courses at UNSW
and the number of courses they have been course tutor for. In the database,
the course tutor has the role of Course Tutor.
Q3 (1 marks)
Define an SQL view Q3(unswid,name) that gives all the distinct
international students who have enrolled in the course offered by the School
of Law (refers to the OrgUnits.Name) and got a mark higher than 85.
Q4 (1 marks)
Define a SQL view Q4(unswid,name) that gives all the distinct local
students who enrolled in COMP9020 and COMP9331 (refer to Subjects.code) in the same term.
Q5 (2 marks)
For the mymy1 database instance … Define an SQL view Q5a(term,min_fail_rate)
that gives the term and the minimum fail rate of the course COMP3311 from year 2009 to year 2012.
For the mymy2 database instance … Define an SQL view Q5b(term,min_fail_rate)
that gives the term and the minimum fail rate of the course COMP3311 from year 2016 to year 2019.
The term should be taken from Terms.name;
Only count the students with valid marks (not null), fail rate = (number of students with mark less than 50 / number of students with mark);
Round min_fail_rate to the nearest 0.0001. (i.e. if minimum fail rate = 0.01,
then return 0.0100; if minimum fail rate = 0.01234, then return 0.0123; if minimum fail rate = 0.02345, then return 0.0235). This rounding behaviour is different from the IEEE 754 specification for floating point rounding which PostgreSQL uses for float/real/double precision types. PostgreSQL only performs this type of rounding for numeric and decimal types.
Q6 (1 mark)
Define an SQL function (SQL, not PLpgSQL) called Q6(id integer, code text) that takes as
parameters: a People.id value (i.e. an internal database identifier) and a Subjects.code
value (i.e. a subject code), and returns the student’s mark for the course with the given
subject code. If either id or code is invalid, return NULL as the result.
The function must use the following interface:
create or replace function Q6(id integer, code text) returns integer
Q7 (1 marks)
Define an SQL function (SQL, not PLpgSQL) called Q7(year integer, session text) that
takes as parameters the year (e.g. 2019) and a session (aka term e.g. ‘T1’), and returns a list of
all the postgraduate COMP courses (refers to Subjects.code starting with COMP) offered at
the given year and session. An postgraduate course is the one whose Subjects.career is PG.
The function must use the following interface:
create or replace function Q7(year integer, session text)
returns table (code text)
Q8 (2 marks)
Define a PLpgSQL function Q8(zid integer), which
takes a student zid (People.unswid) and produces a term transcript as a table of
TermTranscriptRecords.
Each transcript record should contain information about
the student’s attempt of a term.
Records should appear ordered chronologically by term.
Use the following definition for the transcript tuples:
create type TermTranscriptRecord as (
term char(4), — term code (e.g. 98s1)
termwam integer, — numeric term WAM acheived
termuocpassed integer — units of credit passed this term
Note that this type is already defined in the database. term in a record can be obtained by termName(terms.id).
But it is text type. Use CAST (termName(terms.id) AS char(4)) to convert it into char(4) type.
Only count a UOC value when calculating termuocpassed if the student actually passed the
course (i.e. has a grade from the set
{SY, PT, PC, PS, CR, DN, HD, A, B, C})
or has an XE grade (for credit from exchange) or a T grade
(transferred credit) or a PE grade (professional experience) or a RC or RS
grade (research courses)).
A null grade or any grade other than those just mentioned should not
be treated as a pass. (For simplicity, grade GP and EC are treated as unpassed)
Only results with associated marks are included in the termwam calculation. If no mark or grade is available for a course (a null mark or grade),
do not consider it when calcualting termwam. (i.e. grade SY, PT, XE, T, PE, RC and RS…or other grade-only results,
they normally have a null mark).
termuocpassed and termwam
are computed as follows:
termuocpassed = ΣUOC for all courses passed in this term
termwam = Σ(M*U)/ΣU for all courses completed (marked) in this term
M = mark received in a course
U = units of credit for a course
Round the WAM value to the nearest integer and refer to the note of Q5. (i.e. the nearest integer of 10.49 is 10; the nearest integer of 10.50 is 11)
If no termwam or termwam is 0 this term, set termwam as null;
If no courses have been passed this term, set termuocpassed as null. Examples can be found in Example page.
At the end of the transcript, add an extra TermTranscriptRecord
which contains
(‘OVAL’, overallwam, overalluocpassed)
where the overalluocpassed and overallwam
are computed as follows:
overalluocpassed = ΣUOC for all courses passed
overallwam = Σ(M*U)/ΣU for all courses completed (marked)
M = mark received in a course
U = units of credit for a course
The calculation rules are similar between the term and overall.
Round the WAM value to the nearest integer.
If no overallwam or overallwam is 0, just set overallwam as null;
If no courses have been passed overall, set overalluocpassed as null.
For simplicity, ignore the situiation of degree changes. If a student zid is invalid, return an empty table.
See the example in Example page.
How can you find interesting transcripts to look at?
The answer is to think of some properties of a transcript that
might make it interesting, and then ask a query to get information
about any students who have these kinds of transcripts.
I used the following query to find some students.
Work out what it does and then try variations to find other kinds
of “interesting” students:
select p.unswid,pr.code,termName(min(pe.term)),count(*)
from People p
join Program_enrolments pe on (pe.student=p.id)
join Programs pr on (pe.program=pr.id)
where pr.code like ‘3%’
group by p.unswid,pr.code
having count(*) > 5;
Q9 (2 marks)
An important part of defining academic rules in MyMyUNSW is the ability
to define groups of academic objects (e.g. groups of subjects, streams
or programs)
In MyMyUNSW, groups can be defined in three different ways:
enumerated by giving a list of objects in a X_members table
pattern by giving a pattern that identifies all relevant objects
query by storing an SQL query which returns a set of object ids
In all cases, the result is a set of academic objects of a specific type
(given in the gtype attribute).
Write a PLpgSQL function Q9(gid integer) that takes the internal
ID of an academic object group and returns the codes for all members
of the academic object group, including any child groups.
Associated with each code should be the
type of the corresponding object, either subject, stream
or program.
You should return distinct codes (i.e. ignore multiple
versions of any object), and there is no need to check whether the academic
object is still being offered.
The function is defined as follows:
create or replace function Q9(gid integer) returns setof AcObjRecord
where AcObjRecord is already defined in the database as follows:
create type AcObjRecord as (
objtype text, — academic object’s type e.g. subject, stream, program
objcode text — academic object’s code e.g. COMP3311, SENGA1, 3978
Groups of academic objects are defined in the tables:
acad_object_groups(id, name, gtype, glogic, gdefby, negated, parent, definition)
where the most important fields are:
gtype … what kind of objects in the group
gdefby … how the group is defined
definition … where queries or patterns are given
program_group_members(program, ao_group) … for enumerated program groups
stream_group_members(stream, ao_group) … for enumerated stream groups
subject_group_members(subject, ao_group) … for enumerated subject groups
There are a wide variety of patterns. You should explore the acad_object_groups
table yourself to see what’s available. To give you a head start, here are some
existing patterns and what they mean:
COMP2### … any level 2 computing course (e.g. COMP2911, COMP2041)
COMP[34]### … any level 3 or 4 computing course (e.g. COMP3311, COMP4181)
####1### … any level 1 course at UNSW
(COMP|SENG|BINF)2### … any level 2 course from CSE
COMP1917,COMP1927 … core first year computing courses
COMP1###,COMP2### … same as COMP[12]###
You do not need to handle any of the following types of academic object groups:
any groups defined using a query (gdefby=’query’)
any groups defined using negation (negated=true)
any groups defined by a pattern which includes ‘FREE’, ‘GEN’ or ‘F=’ as a substring
For any group like the above, simply return no reults (zero rows).
You can also ignore the glogic field; treat them all as or
If any group has a child group containing FREE, ignore just the child group.
For pattern groups, you do not need to check whether codes used in the
pattern correspond to real objects in the relevant table (e.g. a pattern
string may contain a subject code which does not exist in the Subjects
Your function should be able to expand any pattern element from the above classes
of patterns (i.e. pattern elements that include #, […] and
(…|…)).
If patterns include {xxx;yyy;zzz} alternatives, include all of the
alternatives as group members (i.e. as if they were xxx,yyy,zzz).
If patterns have child patterns, include all of the acdemic objects in the
child patterns.
You can recognise that a group with id=X has child groups,
by the existnce of other academic groups with parent=X.
Hint: In order to solve this, you’ll probably need to look in the
PostgreSQL manual at Section 9.4 “String Functions and Operators” and
Section 42.5.4 “Executing Dynamic Commands”.
Q10 (1 marks)
Define a PLpgSQL function that takes a subject code and
returns the set of all subjects that include this subject
in their pre-reqs.
This kind of function could help in planning what courses
you could study in subsequent semesters.
The function is defined as follows:
create or replace function Q10(code text) returns setof text …
You only need to consider literal subject codes (e.g. COMP1234) in the pre-reqs.
If a pre-req object group contains a pattern, ignore the pattern.
Hint: This function can probably make use of (a variation of) the
Q9() function.
Submission and Testing
We will test your submission as follows:
create a testing subdirectory
create a new database TestingDB and initialise it with mymy1.dump
run the command: psql TestingDB -f ass1.sql (using your ass1.sql)
load and run the tests in the check1.sql script
repeat the above for mymy2.dump and check2.sql
Note that there is a time-limit on the execution of queries.
If any query takes longer than 60 seconds to run (you can check
this using the \timing flag) your mark for that query
will be reduced.
Your submitted code must be complete so that when we do the
above, your ass1.sql will load without errors, which
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com