Week 6: Lecture Recap
CSC 343 Winter 2021
MICHAEL LIUT (MICHAEL.LIUT@UTORONTO.CA) ILIR DEMA (ILIR.DEMA@UTORONTO.CA)
DEPARTMENT OF MATHEMATICAL AND COMPUTATIONAL SCIENCES UNIVERSITY OF TORONTO MISSISSAUGA
Michael Liut CSC343, Winter 2018 University of Toronto
Count
Table of Contents
Summation
Grouping
Aggregation
Average
Minimum and Maximum
Ordering
Michael Liut 2 CSC343, Winter 2021 University of Toronto
Aggregation
There are two rules you must follow when aggregating:
1 Aggregation functions can be used in both the SELECT and HAVING clauses (the HAVING clause will be discussed at another time).
Aggregation functions cannot be used in a WHERE clause.
RECALL: Aggregation is a column procedure, not a row operation!
2
Michael Liut 3 CSC343, Winter 2021 University of Toronto
Aggregation Overview
Function Syntax
COUNT( [ALL |DISTINCT] expression )
COUNT(*)
AVG( [ALL | DISTINCT] expression )
SUM( [ALL | DISTINCT] expression )
MAX(expression)
MIN(expression)
Function Usage
The number of (distinct) non-NULL values in a column/expression.
The number of selected rows.
The average of the (distinct) values in a numeric column/expression.
The total of the (distinct) values in a numeric column/expression.
The largest value in a column/expression.
The smallest value in a column/ expression.
Michael Liut https://www.postgresql.org/docs/13/functions-aggregate.html 4 CSC343, Winter 2021 University of Toronto
Count
Count(*) will count all of the rows in a table.
Count(columnName) will count a specified column.
Rows containing NULL (unknown) values are omitted. An empty or zero value is not to be confused with NULL.
Michael Liut 5 CSC343, Winter 2021 University of Toronto
Count
Count() returns a single scalar value.
e.g. let¡¯s say a manager of a company needs to know the number of employees that work within the organization, count(*) can produce this information.
Let¡¯s say that the HR department has a table of all ¡°employee[s]¡±.
SELECT COUNT(*) FROM employee;
COUNT(*) ——– 8
Michael Liut 6 CSC343, Winter 2021 University of Toronto
Count
The result is ¡°8¡±, a single scalar value.
Notice that the result table column heading, in this case ¡°COUNT(*)¡± for example, is not the most meaningful name.
The output column name can be made more meaningful through query manipulation.
This is accomplished by using the ¡°as¡± keyword.
SELECT COUNT(*) AS numOfEmps FROM employee;
numOfEmps ———- 8
SELECT COUNT(*) FROM employee;
COUNT(*) ——– 8
Michael Liut 7 CSC343, Winter 2021 University of Toronto
Average
AVG(columnName) returns the average for that column.
e.g. let¡¯s say a manager of a company needs to know the average salary of their employees.
SELECT AVG(empSalary)
AS Average_Employee_Salary
FROM employee;
Average_Employee_Salary ———————— $79,575
Michael Liut 8 CSC343, Winter 2021 University of Toronto
Summation
SUM(columnName) returns the computed total summation for that column.
e.g. let¡¯s say a manager of a company needs to know the total salary of their employees.
SELECT SUM(empSalary)
AS Total_Employee_Salary
FROM employee;
Total_Employee_Salary ———————- $1,875,750
Michael Liut 9 CSC343, Winter 2021 University of Toronto
Summation
Who or when would use SUM in a query?
e.g. let¡¯s say that you are a finance officer for the Faculty of Engineering and that your responsibility is to prepare budgets and expense reports for various departments. You were tasked with computing are required to specify what each department¡¯s salary expense is.
SELECT SUM(empSalary)
AS Total_CS_Employee_Salary FROM employee
WHERE empDept = ¡®CS¡¯;
Total_CS_Employee_Salary ————————– $550,000
Michael Liut 10 CSC343, Winter 2021 University of Toronto
Minimum and Maximum
The MIN function will return the smallest value stored in the search
column.
The MAX function will return the largest value stored in the search
column.
Unlike AVG and SUM, MIN and MAX work with both numeric and
character data.
e.g. let¡¯s say that you you want to know who from your employee table comes first alphabetically by last name and who comes last.
Michael Liut 11 CSC343, Winter 2021 University of Toronto
Minimum and Maximum
e.g. let¡¯s say that you you want to know who from your employee table comes first alphabetically by last name and who comes last.
MIN will return the employee row that comes first alphabetical (aka is
the smallest).
MAX will return the employee row that comes last alphabetical (aka is
the largest).
SELECT MIN(empSurname), MAX(empSurname) FROM employee;
MIN(EMPSURNAME) MAX(EMPSURNAME) —————— —————— Amin Zhu
Michael Liut 12 CSC343, Winter 2021 University of Toronto
Minimum and Maximum
The same idea can be said if we look at a numeric value.
e.g. let¡¯s say that you you want to know who from your employee table has the largest and smallest salary.
SELECT MIN(empSalary),MAX(empSalary) AS Least_Paid, Most_Paid
FROM employee;
Least_Paid ———– $41,512
Most_Paid ——————
$315,945
Michael Liut 13 CSC343, Winter 2021 University of Toronto
Grouping
Aggregation functions are more powerful when utilized with the GROUP BY clause.
In fact, the GROUP BY clause is rarely used without an aggregation
function.
I know what you¡¯re going to ask, ¡°when would it be used without an
aggregate¡±?
The schema construction that must be in existence for you to use this would have to be extremely poor, so poorly constructed that any results would likely lead to a confusing or misleading results table.
We will talk about the ORDER BY clause shortly!
Michael Liut 14 CSC343, Winter 2021 University of Toronto
Grouping
If any aggregation is used, then each element of the SELECT list must be:
aggregated; or
an attribute on the GROUP BY list.
The name of the attribute used in GROUP BY does not have to be listed in the SELECT clause.
However, it must be a column from one of the tables in the FROM clause.
1 2
Michael Liut 15 CSC343, Winter 2021 University of Toronto
Grouping
e.g. let¡¯s say you want to know what department numbers each employee you have belongs to.
For simplicity, remember the following:
1
SELECT COUNT(*)
AS Department_Count FROM employee
GROUP BY empDeptNum;
Department_Count —————– 1
3
4
2
if you have column names and aggregate functions in SELECT, then you must have a GROUP BY clause; and
the column names in SELECT must match the column names listed in GROUP BY.
Michael Liut 16 CSC343, Winter 2021 University of Toronto
Grouping with WHERE
The WHERE clause eliminates rows prior to being grouped.
Why is this important?
It cuts down on unnecessary computation time.
Less computations == less money and resources spent.
Let¡¯s look at a query that produces the average number of hours an employee works per week, where the employee¡¯s SIN is greater than 999-500-000.
Michael Liut 17 CSC343, Winter 2021 University of Toronto
Grouping with WHERE
Let¡¯s look at a query that produces the average number of hours an employee works per week, where the employee¡¯s SIN is greater than 999-500-000.
SELECT empSIN AS SIN,
AVG(workHours) AS Average_Hours_Worked FROM employee
WHERE empSIN > 999500000
GROUP BY empSIN;
SIN Average_Hours_Worked ——— ——————– 999666666 39.5
999887777 40.5
999888888 41.5
Michael Liut 18 CSC343, Winter 2021 University of Toronto
Grouping with HAVING
The HAVING clause is used for aggregate functions. Much like the WHERE clause is used for column names and expressions.
HAVING and WHERE in technicality do the same thing, just at different times.
i.e. both filter rows from inclusion in a result table based on some condition.
The WHERE clause filters rows BEFORE the grouping action. The HAVING clause filters rows AFTER the grouping action.
1 2
Michael Liut 19 CSC343, Winter 2021 University of Toronto
Grouping with HAVING
Let¡¯s look at the ordering example, this time using GROUP BY and HAVING.
SELECT empDeptNum AS Department, AVG(empSalary) as Average_Salary
FROM employee
GROUP BY empDeptNum
HAVING AVG(empSalary) < 90000;
Department Average_Salary ---------- -------------- 1 $79,575
7 $83,400
Michael Liut 20 CSC343, Winter 2021 University of Toronto
Ordering
The ORDER BY clause allows you to specify how rows in a result table are sorted.
PostgreSQL¡¯s default is ascending order (smallest to largest).
SELECT empDeptNum AS Department, AVG(empSalary) AS Average_Salary
FROM employee
GROUP BY empDeptNum ORDER BY AVG(empSalary);
Department Average_Salary ---------- --------------
1 7 3
$79,575 $83,400
$93,660
Michael Liut 21 CSC343, Winter 2021 University of Toronto
Grouping with WHERE and HAVING
Let¡¯s look at the combination of the two clauses.
SELECT empDeptNum AS Department, AVG(empSalary) as Average_Salary FROM employee
WHERE empDeptNum <> 1
GROUP BY empDeptNum
HAVING AVG(empSalary) < 90000;
Department Average_Salary ---------- -------------- 7 $83,400
Michael Liut 22 CSC343, Winter 2021 University of Toronto
SQL step-by-step
Conceptually, SQL performs the following steps for the query:
1 TheWHEREclausefilterstheempDeptNumnotequalto1.
2 TheGROUPBYclausecollectstheremainingrowsintooneormore groups for each unique empDeptNum.
3 Theaggregatefunctioncalculatestheaveragesalaryforeach empDeptNum grouping.
4 TheHAVINGclausefilterstherowsfromtheresulttablewhichfailto meet the condition (i.e. the average salary is less than $90,000).
SELECT empDeptNum AS Department, AVG(empSalary) as Average_Salary
FROM employee
WHERE empDeptNum <> 1
GROUP BY empDeptNum
HAVING AVG(empSalary) < 90000;
Michael Liut 23 CSC343, Winter 2021 University of Toronto
Task (Week 4 Extension!) Write a single statement to do the following:
¡ñ Return how many aliases citizen 4 has and name the column ¡°Strange Nicknames¡±
¡ñ Return the average weight of Superheroes who are older than 50.
¡ñ Return the most vicious (in terms of people killed) Villain.
¡ñ For all citizens with 2 or more nicknames, show me their cid, and how many nicknames they
have.
¡ñ Show the names of all the ExtraordinaryCitizen next to their total PowerGrid points
Note: You will have to have a clean/empty version of the tables from Lecture¡¯s Week 4 to do this. Use the sample solution DDL posted on the class site to create the tables for consistency in the solution.
Michael Liut 24 CSC343, Winter 2021 University of Toronto
Hmm..
¡ñ Show the names of the ExtraordinaryCitizen with the most PowerGrid points
¡ñ I can¡¯t do this, the DDL design doesn¡¯t allow me to do it!
¡ñ Let¡¯s consider the concept of joins.
Michael Liut 25 CSC343, Winter 2021 University of Toronto
Cross product
course_id
title
dept_name
credits
BIO-301
Genetics
Biology
4
CS-190
Game Design
Comp Sci
4
course_id
BIO-301
BIO-301
BIO-301
CS-190
title
Genetics
Genetics
Genetics
dept_name
Biology
Biology
Biology
credits
4
4
4
course_id
BIO-301
CS-190
CS-347
course_id
BIO-301
CS-190
CS-347
prereq_id
BIO-101
CS-101
CS-101
prereq_id
BIO-101
CS-101
CS-101
CS-190
Note: Taken froCmS-l1e9ct0ure slides
Game Design
Game Design
Game Design
Comp Sci
Comp Sci
Comp Sci
4
4
4
BIO-301
CS-190
CS-347
BIO-101
CS-101
CS-101
Michael Liut 26 CSC343, Winter 2021 University of Toronto
Left outer join
course_id
title
dept_name
credits
BIO-301
Genetics
Biology
4
CS-190
Game Design
Comp Sci
4
CS-315
Robotics
Comp Sci
3
course_id
prereq_id
BIO-301
BIO-101
CS-190
CS-101
CS-347
CS-101
course_id
title
dept_name
credits
prereq_id
BIO-301
Genetics
Biology
4
BIO-101
CS-190
Game Design
Comp Sci
4
CS-101
CS-315
Robotics
Comp Sci
3
NULL
Michael Liut
Note: Taken from lecture slides 27 CSC343, Winter 2021 University of Toronto
Right outer join
course_id
title
dept_name
credits
BIO-301
Genetics
Biology
4
CS-190
Game Design
Comp Sci
4
CS-315
Robotics
Comp Sci
3
course_id
prereq_id
BIO-301
BIO-101
CS-190
CS-101
CS-347
CS-101
course_id
prereq_id
title
dept_name
credits
BIO-301
BIO-101
Genetics
Biology
4
CS-190
CS-101
Game Design
Comp Sci
4
CS-347
CS-101
NULL
NULL
NULL
Michael Liut
Note: Taken from lecture slides 28 CSC343, Winter 2021 University of Toronto
Full outer join
course_id
title
dept_name
credits
BIO-301
Genetics
Biology
4
CS-190
Game Design
Comp Sci
4
CS-315
Robotics
Comp Sci
3
course_id
prereq_id
BIO-301
BIO-101
CS-190
CS-101
CS-347
CS-101
course_id
BIO-301
CS-190
CS-315
title
Genetics
Game Design
dept_name
Biology
Comp Sci
credits
4
4
prereq_id
BIO-101
CS-101
NULL
CS-347
Robotics
NULL
Comp Sci
NULL
3
NULL
CS-101
Michael Liut
Note: Taken from lecture slides 29 CSC343, Winter 2021 University of Toronto
Inner join
course_id
title
dept_name
credits
BIO-301
Genetics
Biology
4
CS-190
Game Design
Comp Sci
4
CS-315
Robotics
Comp Sci
3
course_id
prereq_id
BIO-301
BIO-101
CS-190
CS-101
CS-347
CS-101
course_id
title
dept_name
credits
prereq_id
BIO-301
Genetics
Biology
4
BIO-101
CS-190
Game Design
Comp Sci
4
CS-101
Michael Liut
Note: Taken from lecture slides 30 CSC343, Winter 2021 University of Toronto
Let¡¯s try it again
¡ñ Show the names of all the ExtraordinaryCitizen next to their total PowerGrid points
Michael Liut 31 CSC343, Winter 2021 University of Toronto
Example
SELECT cid, Intelligence + EnergyProjection + Durability + Strength + Speed + FightingSkills AS SumPoints
FROM PowerGrid;
That gets us the cid and the point sum for each citizen, but how do we get the Names of each of those citizens?
Michael Liut 32 CSC343, Winter 2021 University of Toronto
Example
SELECT Name, SumPoints
FROM ExtraordinaryCitizen EC INNER JOIN (SELECT
cid,
Intelligence + EnergyProjection + Durability + Strength
+ Speed + FightingSkills AS SumPoints FROM PowerGrid) SumPgTable
ON SumPgTable.cid = EC.cid;
Michael Liut 33 CSC343, Winter 2021 University of Toronto
Additional Trigger Exercise
Create a trigger that will automatically check when inserting a new villain. If this villain is in prison and kills 0 people with an age older than 100, release the new villain from the prison (i.e., set imprisonment to false).
Michael Liut 34 CSC343, Winter 2021 University of Toronto
Solution
CREATE OR REPLACE FUNCTION show_mercy() RETURNS TRIGGER AS $show_mercy_trigger$ BEGIN
IF (NEW.Imprisonment = true) AND (NEW.PeopleKilled = 0) AND (NEW.vid IN (SELECT cid FROM ExtraordinaryCitizen WHERE age > 100))
THEN NEW.Imprisonment = false; END IF;
RETURN NEW; END;
$show_mercy_trigger$ LANGUAGE PLPGSQL;
CREATE TRIGGER show_mercy_trigger BEFORE INSERT ON villain FOR EACH ROW EXECUTE PROCEDURE show_mercy();
Michael Liut 35 CSC343, Winter 2021 University of Toronto