Database Fundamentals
Aggregate Queries
SUM, Avg, MIN, MAX …
Example database – Employees & Departments
FirstName Surname Dept Office Salary City
Mary Brown Administration 10 45 London
Charles White Production 20 36 Toulouse
Gus Green Administration 20 40 Oxford
Jackson Neri Distribution 16 45 Dover
Charles Brown Planning 14 80 London
Laurence Chen Planning 7 73 Worthing
Pauline Bradshaw Administration 75 40 Brighton
Alice Jackson Production 75 46 Toulouse
Employees
DeptName Address City
Administration Bond Street London
Production Rue Victor Hugo Toulouse
Distribution Pond Road Brighton
Planning Bond Street London
Research Sunset Street San Jose
Departments
2
Example database – Employees & Departments
CREATE TABLE Departments(
DeptName varchar(50) PRIMARY KEY ,
Address varchar(50) NULL,
City varchar(50) NULL
);
CREATE TABLE dbo.Employees(
FirstName varchar(50) NOT NULL,
Surname varchar(50) NOT NULL,
Dept varchar(50) REFERENCES Departments(DeptName),
Office varchar(50) NULL,
Salary int NULL,
City varchar(50) NULL,
PRIMARY KEY (FirstName, Surname)
);
INSERT INTO Departments VALUES
(‘Administration’, ‘Bond Street’, ‘London’),
(‘Distribution’, ‘Pond Road’, ‘Brighton’),
(‘Planning’, ‘Bond Street’, ‘London’),
(‘Production’, ‘Rue Victor Hugo’, ‘Toulouse’),
(‘Research’, ‘Sunset Street’, ‘San Jose’);
INSERT INTO Employees VALUES
(‘Mary’, ‘Brown’, ‘Administration’, ’10’, 45, ‘London’),
(‘Charles’, ‘White’, ‘Production’, ’20’, 36, ‘Toulouse’),
(‘Gus’, ‘Green’, ‘Administration’, ’20’, 40, ‘Oxford’),
(‘Jackson’, ‘Neri’, ‘Distribution’, ’16’, 45, ‘Dover’),
(‘Charles’, ‘Brown’, ‘Planning’, ’14’, 80, ‘London’),
(‘Laurence’, ‘Chen’, ‘Planning’, ‘7’, 73, ‘Worthing’),
(‘Pauline’, ‘Bradshaw’, ‘Administration’, ’75’, 40, ‘Brighton’),
(‘Alice’, ‘Jackson’, ‘Production’, ’75’, 46, ‘Toulouse’);
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
SQL – Aggregate Queries
Aggregate queries utilise SQL functions to count, sum and calculate the maximum, minimum and average values over numerical fields in a Database
Example questions these functions answer:
COUNT: How to count the number of employees, the number of distinct salary values?
SUM: What is the salary expenditure of each department?
MIN: What is minimum salary among all employees?
MAX: Who receives the highest salary in the department?
AVG: What is the average salary of all employees of a department?
With the exception of COUNT, all aggregate operators apply to a single attribute
4
Aggregation functions appear in the select clause and perform computations over sets of values in multiple rows of a relation and return a single value.
There are five main basic aggregate functions: – list them…
Group BY allows us to partition our relations into groups and then we can compute our aggregate values over each group independently
The Having condition allows us to filter our results based on aggregate values
4
SQL – Aggregate Queries – COUNT
Standard syntax of the count function:
count(<*|[distinct|all]AttributeList>)
Find the number of employees
Find the number of different values of Salary
5
SELECT COUNT(*) AS empCount FROM Employees;
SELECT COUNT (DISTINCT Salary) FROM Employees
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
The count function finds the number of tuples in the result.
5
SQL – Aggregate Queries – COUNT
Returns the number of rows or DISTINCT values:
COUNT(<*|[DISTINCT|ALL]AttributeList>)
COUNT applies to duplicates, unless otherwise stated
Find the number of employees
SELECT COUNT(*) FROM Employees
Find the number of different values of Salary
SELECT COUNT(DISTINCT Salary) FROM Employees
Note: DISTINCT will return only the 1st of duplicate values
If two people earn 42k, it will only be counted once not twice
Find the number of rows having a not null value for Salary
SELECT COUNT(Salary) FROM Employees; or alternatively:
SELECT COUNT(ALL Salary) FROM Employees
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
6
Aggregate Queries – SUM, AVG, Max and MIN
Standard syntax of the sum function:
Find the sum of the salaries of the Administration department:
7
SELECT SUM(Salary) as SumSalary
FROM Employees
WHERE Dept = ‘Administration’
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Aggregate Queries – SUM, AVG, Max and MIN
SELECT Sum(salary) AS TotalSalaries
FROM Employees
SELECT Sum(salary) AS TotalSalaries
FROM Employees
WHERE Dept = ‘Administration’
What is the difference in their meaning?
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
8
8
Aggregate Queries – SUM, AVG, Max and MIN
Incorrect query:
Might be more than one employee – whose name?
Remember, SELECT returns ROWS (tuples) of data whereas an aggregate like MAX or AVG aggregates all those rows and returns a single value
They are not compatible expressions in the same query
SELECT FirstName, Surname, MAX(Salary) AS TopSalary
FROM Employees
Msg 8120, Level 16, State 1, Line 1
Column ‘Employees.FirstName’ is invalid in the select list because
it is not contained in either an aggregate function or the GROUP BY clause.
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
FirstName Surname Max(Sal)
Mary Brown 80
Charles White ?
Gus Green ?
… … ?
9
Aggregate Queries – SUM, AVG, Max and MIN
Multiple aggregates can be combined into a single query
This is acceptable since they all return a single value
SELECT MAX(Salary) AS MaxSal, MIN(Salary) AS MinSal
FROM Employees
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
10
SQL – Aggregate Queries and GROUP BY
Multiple aggregates can be combined into a single query
This is acceptable since they all return a single value
The GROUP BY clause aggregates operators into subsets of rows
Find the highest salary paid in each department:
11
SELECT Dept, MAX(Salary) AS HighestSalary
FROM Employees GROUP BY Dept;
When one or more columns in the SQL SELECT statement is not encapsulated in the Aggregate function, the column(s) must be in a SQL GROUP BY clause.
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Aggregate Queries with GROUP BY
GROUP BY collects data across multiple records and then groups the results by one or more columns
Find the sum of salaries for every department:
12
SELECT Dept, SUM(Salary) AS TotalSalaries
FROM Employees GROUP BY Dept;
SELECT Dept, Salary FROM Employees
ORDER BY Dept ASC
Un-grouped query
Grouped query
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
The group by clause is only used in conjunction with aggregation.
Our first query is going to find the sum of all salaries by each department.
The group by cluase is going to return one value for each department.
Grouping takes a relation and partitions it into groups of the same type and then runs an aggregate calculation over those groups to return a single result.
In this case we are going to group our employees by department and then find the sum of their salaries for each dept
12
Aggregate Queries with GROUP BY
GROUP BY collects data across multiple records and then groups the results by one or more columns
Find the MAX salary by first name:
SELECT FirstName, MAX(Salary)
FROM Employees
GROUP BY FirstName
SELECT FirstName, Salary
FROM Employees
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Un-grouped query
Grouped query
13
Aggregate Queries GROUP BY vs Nested Query
GROUP BY queries can also be represented using Nested Queries
The nested query helps visualise how the group by statement works
For each row returned, the sub-query is run and returns a single value:
14
SELECT Dept, MAX(Salary) AS HighestSalary
FROM Employees GROUP BY Dept;
SELECT DISTINCT E1.Dept,
(
SELECT MAX(Salary) FROM Employees AS E2
WHERE E1.Dept = E2.Dept
) AS HighestSalary
FROM Employees E1;
Returns a single value for each row of outer query
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Aggregate Queries – How GROUP BY works
First, the query is executed without group by clause and aggregate operators
Then the query result is divided into subsets with the same values for the attributes in the GROUP BY clause
Finally, the aggregate operators are applied
SELECT Dept, SUM(Salary) AS TotalSal
FROM Employees
GROUP BY Dept
Dept Salary
Administration 45
Production 36
Administration 40
Distribution 45
Planning 80
Planning 73
Administration 40
Production 46
Dept Salary
Administration 45
Administration 40
Administration 40
Distribution 45
Planning 80
Planning 73
Production 36
Production 46
Dept TotalSal
Administration 125
Distribution 45
Planning 153
Production 82
1
2
3
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
15
SQL – Aggregate Queries and GROUP BY
Multiple aggregates can be combined into a single query
This is acceptable since they all return a single value
The GROUP BY clause aggregates operators into subsets of rows
Invalid Query:
Correct Query:
16
SELECT DeptName, COUNT(*), D.City
FROM Employees E JOIN Departments D
ON E.Dept = D.Deptname
GROUP BY D.DeptName
SELECT DeptName, COUNT(*), D.City
FROM Employees E JOIN Departments D
ON E.Dept = D.Deptname
GROUP BY D.DeptName, D.City
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
16
Aggregate Queries With JOINs
Find the maximum salary among the employees who work in a department based in London:
1. Join Employees to Departments to take into account what city they work in (not live in!) – filter on London
This returns many tuples
2. Find the max salary from the filtered result – MAX(Salary)
This returns the single value
SELECT MAX(e.Salary) AS MaxLondonSal
FROM Employees e INNER JOIN Departments d
ON e.Dept = d.DeptName
WHERE d.City = ‘London’
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
17
17
Aggregate Queries With GROUP BY and JOINs
Incorrect query:
Incorrect query:
Correct query:
All non-aggregate attributes in the SELECT clause must appear in the GROUP BY clause
The GROUP BY clause may have more attributes than those non-aggregate attributes in the SELECT clause
SELECT Office FROM Employees
GROUP BY Dept
SELECT DeptName, COUNT(*), D.City
FROM Employees E INNER JOIN Departments D
ON E.Dept = D.Deptname
GROUP BY D.DeptName
SELECT DeptName, COUNT(*), D.City
FROM Employees E INNER JOIN Departments D
ON E.Dept = D.Deptname
GROUP BY D.DeptName, D.City
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
18
GROUP BY Predicates (conditions) – HAVING
The HAVING clause is used to place conditions on the result of an aggregate operator
Find which departments spend more than 100 on salaries
SELECT Dept, SUM(Salary) AS theSalary
FROM Employees
GROUP BY Dept
HAVING SUM(Salary) > 100
SELECT Dept, SUM(Salary) AS theSalary
FROM Employees
GROUP BY Dept
HAVING theSalary > 100
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
19
GROUP BY Predicates (conditions) – HAVING
When conditions are based on the result of an aggregate operator use the HAVING clause
Find which departments spend more than 100 on salaries
If you want to see the sum of the salaries:
20
SELECT Dept
FROM Employees
GROUP BY Dept
HAVING SUM (Salary) > 100
SELECT Dept, SUM(Salary)
FROM Employees
GROUP BY Dept
HAVING SUM (Salary) > 100
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
The Having clause is another clause that is only used in the presence of aggregation
The Having clause allows us to apply conditions to the whole result of aggregate functions and allows us to check conditions that apply to the whole group whereas the Where clause applies conditions 1 tuple at a time.
20
GROUP BY Predicates (conditions) – HAVING
WHERE vs HAVING?
Conditions involving aggregate operators must appear in a HAVING clause
Conditions involving non-aggregate operators must appear in the WHERE clause
EXAMPLE: Find the departments in which the average salary of employees working in office 20 is higher than 25
21
SELECT Dept, AVG(Salary)
FROM Employees
WHERE office = ’20’
GROUP BY Dept
HAVING AVG(Salary) > 25
Normal condition
Aggregate Condition
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Syntax of an SQL query
Example complete query with all available options:
SELECT TargetList
FROM TableList
WHERE Condition
GROUP BY GroupingAttributeList
HAVING AggregateCondition
ORDER BY OrderingAttributeList
Oracle/MS-SQL SELECT options
22
SET Operators
SQL Queries – SET Operators
Set operations are used to combine query results
Standard SQL syntax:
SelectSQL {< UNION|INTERSECT|EXCEPT>[ALL] } SelectSQL
The standard uses EXCEPT instead of minus for difference
A set is a result table (eg, results returned by a SELECT query).
Both sets must have the same number of attributes (columns)
Both sets must have compatible attributes (domains/data types)
SET 1 Columns SET 2 Columns
Varchar Varchar
Int Int
Bit Bit
Varchar Int
Varchar NULL | ‘TBA’
Result
Varchar
Int
Bit
???
Varchar
NULL and ‘TBA’ are both examples of varchar
24
SET Queries – UNION
Set operations are used to combine query results
Duplicates are removed (unless the all option is used)
List the first names and surnames of employees
This will remove any duplicates
SELECT FirstName AS Name
FROM Employees
UNION
SELECT Surname
FROM Employees
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
No Duplicates
25
SET Queries – UNION ALL
Example: Find the first names and surnames of the employees
Use ALL if you want to keep any duplicates
By default, duplicates are removed
26
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
Duplicates retained by ALL
SELECT FirstName AS Name
FROM Employees
UNION ALL
SELECT Surname
FROM Employees
SET Queries – INTERSECT
Find the surnames of employees that are also first names
This is equivalent to
SELECT FirstName AS Name FROM Employees
INTERSECT
SELECT Surname AS Name FROM Employees
FROM Employees E1 INNER JOIN Employees E2
ON E1.FirstName = E2.Surname
Departments(DeptName, Address, City)
Employee(FirstName, Surname, Dept, Office, Salary, City)
FK(Dept) -> Departments(DeptName)
SELECT DISTINCT E1.FirstName AS Name
FROM Employees E1 WHERE EXISTS (
SELECT * FROM Employees E2
WHERE E1.FirstName = E2.Surname
)
Can also use IN|ANY
27
SET Queries – EXCEPT
Finds non-matching values between the two SQL data sets:
Find the surnames of employees that are not also first names
This is equivalent to:
SELECT FirstName AS Name FROM Employees
EXCEPT
SELECT Surname AS Name FROM Employees
SELECT DISTINCT FirstName AS Name FROM Employees
WHERE FirstName NOT IN (
SELECT Surname AS Name FROM Employees
)
Nested Query
SELECT DISTINCT E1.FirstName AS Name
FROM Employees E1 WHERE NOT EXISTS (
SELECT * FROM Employees E2
WHERE E1.FirstName = E2.Surname
)
Can also use <> ANY
28
SQL Functions
DATE, STRING. . .
29
SQL – Functions – STRING
Functions are calculations performed by the DBMS
Common functions include:
SELECT GETDATE() AS theCurrentDate
SELECT SUBSTRING(FirstName, 1, 1) AS Initial
Function Example Output
UPPER(col) UPPER(Name) Sam SAM
LOWER(col) LOWER(Name) Sam sam
RTRIM(col) RTRIM(Name) [Sam ] [Sam]
LTRIM(col) LTRIM(Name) [ Sam] [Sam]
LEN(col) LEN(Name) Returns int length of string
https://msdn.microsoft.com/en-us/library/ms181984.aspx
30
SQL – Functions – STRING
Functions are calculations performed by the DBMS
Common functions include:
SELECT GETDATE() AS theCurrentDate
SELECT SUBSTRING(FirstName, 1, 1) AS Initial
Function Example Output
LEFT(string, length) UPPER(Name) Sam SAM
RIGHT(string, length) LOWER(Name) Sam sam
CHARINDEX(string1, string2) CHARINDEX(‘Fun’, ‘DBFundamentals’) 3
SUBSTRING(col, start, length) SUBSTRING(Name, 1, 1) Returns char(s) at start position
https://msdn.microsoft.com/en-us/library/ms181984.aspx
30
SQL – Functions – STRING
Working with Strings in SQL is not easy:
32
DECLARE @address varchar(100) = ’13 Wayville road, Woodville, SA 5000′
SELECT
LEFT(@address, CHARINDEX(‘,’, @address) – 1) AS streetAddress,
LEFT(secondPart, LEN(secondPart) – CHARINDEX(‘ ‘, REVERSE(secondPart)) -1) AS suburb,
RIGHT(secondPart, CHARINDEX(‘ ‘, REVERSE(secondPart))) AS state,
REVERSE(SUBSTRING(REVERSE(@address), 1, 4)) AS postcode
FROM (
SELECT
RTRIM(
REVERSE(
SUBSTRING(
REVERSE(@address), 6, LEN(@address) – CHARINDEX(‘,’, @address) – 5
)
)
) AS secondPart
) AS t1
SQL – Functions – DATE
Functions are calculations performed by the DBMS
Common functions include:
Dateparts:
d | m | y => day | month | year number of the calander date
dw returns the day of the week number except when used with DATENAME where it returns the name of the day!
m will return the name of the month when used with DATENAME
Test it out for yourself by placing SELECT before the function
SELECT DATENAME(m, ’31/Dec/2015′)
Function Example Output
GETDATE() 01/09/2015
DATEPART(datepart, date) DATEPART(d, GetDate()) 3
DATENAME(datepart, date) DATENAME(dw, GetDate()) Wednesday
DATENAME(m, GetDate()) September
DATEADD(datepart, number, date) DATEADD(d, 7, GetDate()) Date 7 days from today!
https://msdn.microsoft.com/en-us/library/ms186724.aspx
31
33
SQL CONTROL – IF and ELSE
Control statements allow actions to happen depending on a condition
The action may involve setting a value
The action may involve running a different query
Basic Syntax
34
IF (
THEN ‘Over Paid’
WHEN (E.Salary <= 40) THEN 'Under Paid' ELSE 'Adequately Paid' END ) AS PayConclusion FROM Employees AS E Departments(DeptName, Address, City) Employee(FirstName, Surname, Dept, Office, Salary, City) FK(Dept) -> Departments(DeptName)
Single column result called “PayConclusion” with value that depends on the employee salary
SQL – NULL replacement Value
ISNULL is a function that can be used to provide a value when an unknown or NULL value is returned
ISNULL(expressionORattribute, replacementValue)
Find ALL Simpsons characters and where available their first aired episode else show ‘TBA’
38
SELECT CharacterName,
ISNULL(EpisodeName, ‘TBA’) AS FirstEpisode
FROM Characters C LEFT OUTER JOIN Episodes E
ON C.EpisodeID = E.EpisodeID
/docProps/thumbnail.jpeg