Columbia University, COMS4111, Fall 2021
Homework 2
• Due date: Tuesday, October 19, 11:59 pm ET.
You cannot use grace days for this homework. We will post the solutions right after
the deadline so you can prepare for the upcoming midterm exam.
• This is a homework assignment, not a project; you should work on this
assignment individually.
This assignment uses an actual PostgreSQL database with data about a university.
The schema of the database is provided below; you should not make any additional
assumptions about this database besides what is provided in the schema. Primary
keys are in bold face and underlined, field types are omitted. Assume that all
attributes are defined NOT NULL.
• student(sid, sname, sex, age, year, gpa)
• dept(dname, numphds)
• prof(pname, dname, FK(dname)->dept)
• course(cno, cname, dname, FK(dname)->dept)
• major(dname, sid, FK(dname)->dept, FK(sid)->student)
• section(dname, cno, sectno, pname, FK(dname, cno)->course, FK(pname)->prof)
• enroll(sid, grade, dname, cno, sectno, FK(sid)->student, FK(dname,cno,sectno)->section)
We ask that you write the following queries in SQL (see instructions below). Each
question is worth 2 points.
1. Print the sid and name of students who major in ‘Mathematics’ and have
enrolled in at least one course offered by the ‘Chemical Engineering’
department and not enrolled in any course offered by the ‘Industrial
Engineering’ department.
2. Print the name, age, and gpa of the students that are younger than 20 years
old and have the highest GPA among all students of the same age.
3. For each department with at least 20 students majoring in the department,
print the department name, the number of students majoring in the
department, and finally, the average GPA and the difference between the
highest GPA and the lowest GPA of the students majoring in the department.
4. Difficult courses. For each department offering more than 2 courses, print the
department name, course number, and course average grade of each course
that has an average grade that is at least 5% less than the average grade
obtained by students in all courses offered by the same department.
5. Popular courses. Print the department name, course number, and course
enrollment of each course that has an enrollment of at least 20% higher than
the average enrollment of all courses offered by the same department. The
enrollment of a course is the sum of the enrollment of all its sections. Note:
for this query, when computing the average enrollment, you should consider
all courses including those with zero enrollment. [Hint: You will need to use
outer joins for this query.]
What to do
Write one SQL query for each question above. Run your SQL queries on one of
our PostgreSQL servers (choose any one, they are identical) using the web interface
that we have set up at:
http://w4111p1.cs.columbia.edu/
http://w4111p2.cs.columbia.edu/
The SQL interpreter in PostgreSQL is not identical as the one described in the
textbook. If the query you write is not accepted by the server, try different
equivalent ways until you get one that works with PostgreSQL. You can find the SQL
commands supported by PostgreSQL
at http://www.postgresql.org/docs/12/interactive/index.html.
Submission Instructions
a. For each of the 5 questions, your answer should include (a) the SQL query and (b) the
results of the execution of the query on the with PostgreSQL server described above,
which you should cut and paste from the web interface into a single document (of any
format we can read such as text, word, html, pdf).
b. Submit on CourseWorks the single document you created above.
http://w4111p1.cs.columbia.edu/
http://w4111p2.cs.columbia.edu/
http://www.postgresql.org/docs/12/interactive/index.html
Grading
Your grade is based exclusively on whether your SQL statement is correct, and
nothing else. Details:
1. We don’t care if your answer contains duplicate rows or not.
2. We don’t care about the performance of your queries, seriously, we don’t.
3. Do not spend time on nicely formatting the resulting tables – we don’t care.
4. Do you see yourself spending more than 5 seconds on how to name a column in a
resulting table? Stop, we don’t care.
5. We love elegant solutions but still this is not a criterion when grading this particular
homework – if your query is correct, you will get full credit regardless how convoluted
your solution is. We do ask you, however, to write your query in way that is easily
understood (using appropriate nesting levels, meaningful names, etc.).
6. More is not better (1). If the correct result is a table of two rows and your statement
returns a table with those two rows and 20 more, your SQL statement is wrong.
7. More is not better (2). The returned table of your SQL statement must return the exact
columns that you are asked to return (no more, no less). For example, if the query asks
you to find the names of students that satisfy some property and your SQL statement
returns a table with columns (sid, sname), your SQL statement is wrong.
8. While a SQL query that returns a wrong result is certainly wrong, a query that returns
the correct result is not necessarily correct. In other words, a wrong query may produce
the correct result by accident, just because of the way the tables are currently
populated. Your query is correct if and only if it produces the correct result for any valid
instance of this university database. This is the reason why, for grading purposes, we will
be looking at your SQL statements not just the results.
9. Your score for each problem will be one of the following numbers only.
– 2.0 points for a correct SQL
– 0.5 points for a SQL that is really truly very close to be correct but it isn’t – it may
contain a small mistake
– 0.2 points for a SQL where some SQL components are correct
– 0.0 points for anything else
Columbia University, COMS4111, Fall 2021
Homework 2
What to do
Submission Instructions
Grading