CS计算机代考程序代写 SQL SQL 3: SELECT, part 2

SQL 3: SELECT, part 2

SQL 2.3: SELECT
Jianjun Chen

1

Contents
Joins
Cross, Inner, Natural, Outer
ORDER BY to produce ordered output
Aggregate functions
MIN, MAX, SUM, AVG, COUNT
GROUP BY
HAVING
UNION
Missing Information

Joins
JOINs can be used to combine tables in a SELECT query

CROSS JOIN: Returns all pairs of rows from A and B, the same as Cartesian Product
INNER JOIN: Returns pairs of rows satisfying a condition
NATURAL JOIN: Returns pairs of rows with common values in identically named columns
OUTER JOIN: Returns pairs of rows satisfying a condition (as INNER JOIN), BUT ALSO handles NULLS

CROSS JOIN
Syntax:
SELECT * FROM A CROSS JOIN B;
Same as:
SELECT * FROM A, B;

Usually needs WHERE to filter out unrelated tuples

INNER JOIN
INNER JOIN specifies a condition that pairs of rows must satisfy.
SELECT * FROM A INNER JOIN B
ON condition

Can also use a USING clause that will output rows with equal values in the specified columns
SELECT * FROM A INNER JOIN B USING (col1, col2)

col1 and col2 must appear in both A and B.

INNER JOIN: Example

SELECT * FROM
Buyer INNER JOIN
Property
ON Price <= Budget CREATE TABLE Buyer (name varchar(60), budget integer); CREATE TABLE Property (addr varchar(60), price integer); INSERT INTO Buyer VALUES ('Smith',100000),('Jones',150000),('Green',80000); INSERT INTO Property VALUES ('15 high st',85000),('12 Queen st',125000),('87 Oak lane',175000); Result is the same as: SELECT * FROM Buyer, Property WHERE Price <= Budget; 6 INNER JOIN: Practice SELECT * FROM Student INNER JOIN Enrolment USING (ID) ? INNER JOIN: Practice A single ID row will be output representing the equal values from both Student.ID and Enrolment.ID SELECT * FROM Student INNER JOIN Enrolment USING (ID) NATURAL JOIN SELECT * FROM A NATURAL JOIN B; A NATURAL JOIN is effectively a special case of an INNER JOIN where the USING clause has specified all identically named columns. Same as the A ⋈ B in relational algebra. The data type of these columns can be different but will not always work. Strings can be automatically converted into integer or double, so they might work. However, data/times won’t work usually. 9 NATURAL JOIN SELECT * FROM Student NATURAL JOIN Enrolment; JOINs vs WHERE Clauses Inner/Natural JOINs are not absolutely necessary You can obtain the same results by selecting from multiple tables and using appropriate WHERE clauses Should you use them? Yes They often lead to concise and elegant queries NATURAL JOINs are extremely common No Support for JOINs can vary between DBMSs OUTER JOIN SELECT cols FROM table1 type-OUTER-JOIN table2 ON condition; Where type is one of LEFT, RIGHT or FULL. Example: Left Outer Join SELECT * FROM Student LEFT OUTER JOIN Enrolment ON Student.ID = Enrolment.ID; INSERT INTO student VALUES (123,'John'),(124,'Mary'),(125,'Mark'),(126,'Jane'); INSERT INTO enrolment VALUES (123,'DBS',60),(124,'PRG',70),(125,'DBS',50),(128,'DBS',80); 13 Example: Right Outer Join Example: Full Outer Join Full Outer Join in MySQL Only Left and Right outer joins are supported in MySQL. If you really want a FULL outer join: SELECT * FROM Student FULL OUTER JOIN Enrolment ON Student.ID = Enrolment.ID; Can be achieved using: (SELECT * FROM Student LEFT OUTER JOIN Enrolment ON Student.ID = Enrolment.ID) UNION (SELECT * FROM Student RIGHT OUTER JOIN Enrolment ON Student.ID = Enrolment.ID); 16 Why Using Outer Joins? Sometimes an outer join is the most practical approach. We may encounter NULL values, but may still wish to see the existing information. The next few slides will consider this problem: For students graduating in absentia, find a list of all student IDs, names, addresses, phone numbers and their final degree classifications. For students graduating in absentia, find a list of all student IDs, names, addresses, phone numbers and their final degree classifications. Problems with INNER JOINS An Inner Join with Student and Address will ignore Student 127, who doesn’t have an address record An Inner Join with Student and Phone will ignore student 125, who doesn’t have a phone record 19 For students graduating in absentia, find a list of all student IDs, names, addresses, phone numbers and their final degree classifications. What is your solution? Next table to combine Next table to combine Solution Using OURTER JOIN SELECT ID, Name, aStreet, aTown, aPostcode, pNumber, Classification FROM ((Student LEFT OUTER JOIN Phone ON Student.pID = Phone.pID) LEFT OUTER JOIN Address ON Student.aID = Address.aID) INNER JOIN Degree ON Student.ID = Degree.ID WHERE Grad = ‘A’; Solution Using OURTER JOIN Student LEFT OUTER JOIN Phone ON Student.pID = Phone.pID Address LEFT OUTER JOIN ON Student.aID = Address.aID INNER JOIN ON Student.ID = Degree.ID WHERE Grad = ‘A’; Final Result Using OUTER JOIN The records for students 125 and 127 have been preserved despite missing information SQL SELECT Overview SELECT [DISTINCT | ALL] column-list FROM table-names [WHERE condition] [GROUP BY column-list] [HAVING condition] [ORDER BY column-list] ([] optional, | or) ORDER BY The ORDER BY clause sorts the results of a query You can sort in ascending (default) or descending order Multiple columns can be given You cannot order by a column which isn’t in the result (Really so? Check Notes under this slide) SELECT columns FROM tables WHERE condition ORDER BY cols [ASC | DESC] SELECT y / 100 as y2 FROM a ORDER BY y DESC; This DOES work in MySQL. SELECT y / 100 + x as why FROM a ORDER BY y DESC; It WORKS too! SELECT y / 100 + x AS y1, y AS y2 FROM a ORDER BY y DESC; Is sorted by y2 in descending order. SELECT y AS y2 , y / 100 + x AS y1 FROM a ORDER BY y DESC; Is still sorted by y2 in descending order. Check this: SELECT y + 1 AS y1 , - y + 2 AS y2, y FROM a ORDER BY y DESC So apparently, Result is first sorted by y, then y + 1 and –y + 2 are calculated and added to the final result. Thus, “order by” is evaluated first, followed by Select xxx. 27 ORDER BY: Example 1 SELECT * FROM Grades ORDER BY Mark; ORDER BY: Example 2 SELECT * FROM Grades ORDER BY Code ASC, Mark DESC; Arithmetic As well as columns, a SELECT statement can also be used to Compute arithmetic expressions Evaluate functions Often helpful to use an alias when dealing with expressions or functions. SELECT Mark / 100 FROM Grades; SELECT Salary + Bonus FROM Employee; SELECT 1.20 * Price AS ‘Price inc. VAT’ FROM Products; Updated select flow: Once result returned from “from … where …”, each expression in select will be evaluated and included into the final table. 30 Aggregate Functions Aggregate functions compute summaries of data in a table. Most aggregate functions (except COUNT (*)) work on a single column of numerical data Aggregate functions: COUNT: The number of rows SUM: The sum of the entries in the column AVG: The average entry in a column MIN,MAX: The minimum/maximum entries in a column Again, it’s best to use an alias to name the result 31 COUNT SELECT COUNT(*) AS Count FROM Grades; SELECT COUNT(Code) AS Count FROM Grades; SELECT COUNT(DISTINCT Code) AS Count FROM Grades; How about “SELECT COUNT(y), y FROM a”? There will only be one value for y, Which means the aggregation function is first done, and then all tuples with index larger than 0 is removed! 32 COUNT SELECT COUNT(*) AS Count FROM Grades; SELECT COUNT(Code) AS Count FROM Grades; SELECT COUNT(DISTINCT Code) AS Count FROM Grades; SUM, MIN/MAX and AVG SELECT SUM(Mark) AS Total FROM Grades; SELECT MAX(Mark) AS Best FROM Grades; SELECT AVG(Mark) AS Mean FROM Grades; SUM, MIN/MAX and AVG SELECT SUM(Mark) AS Total FROM Grades; SELECT MAX(Mark) AS Best FROM Grades; SELECT AVG(Mark) AS Mean FROM Grades; Combining Aggregate Functions You can combine aggregate functions using arithmetic SELECT MAX(Mark) - MIN(Mark) AS Range_of_marks FROM Grades; Combining AF: Example Find John’s average mark, weighted by the credits of each module Combining AF: Example Find John’s average mark, weighted by the credits of each module SELECT SUM(Mark*Credits) / SUM (Credits) AS ‘Final Mark’ FROM Modules, Grades WHERE Modules.Code = Grades.Code AND Grades.Name = ‘John’; GROUP BY Sometimes we want to apply aggregate functions to groups of rows Example: find the average mark of each student individually The GROUP BY clause achieves this. SELECT column_set1 FROM tables WHERE predicate GROUP BY column_set2; Every entry in ‘column_set2’ should be in ‘column_set1’, be a constant, or be an aggregate function GROUP BY: Example SELECT Name, AVG(Mark) AS Average FROM Grades GROUP BY Name; GROUP BY: Example SELECT Name, AVG(Mark) AS Average FROM Grades GROUP BY Name; GROUP BY: Example Find the total value of the sales for each department in each month Can group by Month then Department or Department then Month GROUP BY: Example Same results, but produced in a different order HAVING HAVING is like a WHERE clause, except that it only applies to the results of a GROUP BY query It can be used to select groups which satisfy a given condition SELECT Name, AVG(Mark) AS Average FROM Grades GROUP BY Name HAVING AVG(Mark) >= 40;

WHERE and HAVING
WHERE refers to the rows of tables, so cannot make use of aggregate functions.
HAVING refers to the groups of rows, and so cannot use columns which are not in the GROUP BY or an aggregate function.
Think of a query being processed as follows:
Tables are joined
WHERE clauses
GROUP BY clauses and aggregates
Column selection
HAVING clauses
ORDER BY

SQL SELECT Overview
SELECT
[DISTINCT | ALL] column-list
FROM table-names
[WHERE condition]
[GROUP BY column-list]
[HAVING condition]
[ORDER BY column-list]

([] optional, | or)

SET operations
UNION, INTERSECT and EXCEPT
These treat the tables as sets and are the usual set operators of union, intersection and difference
We’ll be concentrating on UNION
They all combine the results from two select statements
The results of the two selects should have the same columns and corresponding data types

UNION: Example
Find, in a single query, the average mark for each student and the average mark overall.

UNION
The average for each student:
SELECT Name, AVG(Mark) AS Average
FROM Grades
GROUP BY Name;

The average overall:
SELECT ‘Total’ AS Name,
AVG(Mark) AS Average
FROM Grades;

UNION
SELECT Name,
AVG(Mark) AS Average
FROM Grades
GROUP BY Name
UNION
SELECT
‘Total’ AS Name,
AVG(Mark) AS Average
FROM Grades;

That’s Everything for SELECT 🙂

Missing Information

Missing Information
Sometimes we don’t know what value an entry in a relation should have
We know that there is a value, but don’t know what it is
There is no value at all that makes any sense
Two main methods have been proposed to deal with this
NULLs can be used as markers to show that information is missing
A default value can be used to represent the missing value

Null
Represents a state for an attribute that is currently unknown or is not applicable for this tuple.
Nulls are a way to deal with incomplete or exceptional data.
NULL is a placeholder for missing or unknown value of an attribute. It is not itself a value.

E.g. A new staff is just added, but hasn’t been decided which branch he belongs to.

NULLs
Codd proposed to distinguish two types of NULLs:
A-marks: data Applicable but not known (for example, someone’s age)
I-marks: data is Inapplicable (telephone number for someone who does not have a telephone, or spouse’s name for someone who is not married)

Problems with NULLs
Problems extending relational algebra operations to NULLs:
Selection operation: if we check tuples for “Mark > 40” and for some tuple Mark is NULL, do we include it?
Comparing tuples in two relations: are two tuples and the same or not?
Additional problems for SQL:
NULLs treated as duplicates?
Inclusion of NULLs in count, sum, average?
If yes, how?
Arithmetic operations behaviour with argument NULL?

Theoretical Solutions
Use three-valued logic instead of classical two valued logic to evaluate conditions.

When there are no NULLs around, conditions evaluate to true or false, but if a null is involved, a condition might evaluate to the third value (‘undefined’, or ‘unknown’)

3-valued logic
If the condition involves a Boolean combination, we evaluate it as follows:

SQL NULLs in Conditions
SELECT *
FROM Employee
Where
Salary > 15,000;

WHERE clause of SQL SELECT uses three-valued logic: only tuples where the condition evaluates to true are returned.

Salary > 15,000 evaluates to ‘unknown’ on the last tuple – not included

59

SQL NULLs in Conditions
SELECT * FROM Employee
Where
Salary > 15,000
OR
Name = ‘Chris’;

Salary > 15,000 OR Name = ‘Chris’ Is essentially Unknown OR TRUE on the last tuple

SQL NULLs in Arithmetic
SELECT Name,
Salary * 0.05 AS Bonus
FROM Employee;

Arithmetic operations applied to NULLs result in NULLS

SQL NULLs in Aggregation
SELECT
AVG(Salary) AS Average,
COUNT(Salary) AS Count,
SUM(Salary) AS Sum
FROM Employee;

Average = 20,000
Count = 3
Sum = 60,000
Using COUNT(*) would give 4, even if the name of Chris is changed to NULL.

SQL NULLs in GROUP BY
SELECT Salary,
COUNT(Name) AS Count
FROM Employee
GROUP BY Salary;

NULLs are treated as equivalents in GROUP BY clauses

SQL NULLs in ORDER BY
SELECT *
FROM Employee
ORDER BY Salary;

NULLs are considered and reported in ORDER BY clauses

Dealing with Missing Information
Sometimes we don’t know what value an entry in a relation should have
We know that there is a value, but don’t know what it is
There is no value at all that makes any sense
Two main methods have been proposed to deal with this
NULLs can be used as markers to show that information is missing
A default value can be used to represent the missing value

Default Values
Default values are an alternative to the use of NULLs
You can choose a value that makes no sense in normal circumstances:
age INT DEFAULT -1,
These are actual values
Default values can have more meaning than NULLs
‘none’
‘unknown’
‘not supplied’
‘not applicable’
Not all defaults represent missing information. It depends on the situation

Default Values Example
Default values are
“Unknown” for Name
-1 for Weight and Quantity
-1 is used for Wgt and Qty as it is not sensible otherwise

There are still problems:
UPDATE Parts SET Quantity = Quantity + 5

SQL Support
SQL allows both NULLs and defaults:
A table to hold data on employees
All employees have a name
All employees have a salary (default 10000)
Some employees have phone numbers, if not we use NULLs

CREATE TABLE Employee
(
Name VARCHAR(50)
NOT NULL,
Salary INT
DEFAULT 10000
NOT NULL,
Phone VARCHAR(15)
NULL
);

68

SQL Support
SQL allows you to insert NULLs
INSERT INTO Employee
VALUES (‘John’,
12000, NULL);

UPDATE Employee
SET Phone = NULL
WHERE Name = ‘Mark’;
You can also check for NULLs

SELECT Name FROM
Employee WHERE
Phone IS NULL;

SELECT Name FROM
Employee WHERE
Phone IS NOT NULL;

SQL: The Final Example

The Final Example
Examiners’ reports
We want a list of students and their average mark
For first and second years the average is for that year
For finalists (third year) it is 40% of the second year plus 60% of the final year averages
We want the results:
Sorted by year (desc), then by average mark (high to low) then by last name, first name and finally ID
To take into account of the number of credits each module is worth
Produced by a single query

Example Output

Getting Started
Finalists should be treated differently to other years
Write one SELECT for the finalists
Write a second SELECT for the first and second years
Merge the results using a UNION

QUERY FOR FINALISTS
UNION
QUERY FOR OTHERS

Table Joins
Both subqueries need information from all the tables
The student ID, name and year
The marks for each module and the year taken
The number of credits for each module

This is a natural join operation
But because we’re practicing, we’re going to use a standard CROSS JOIN and WHERE clause
Exercise: repeat the query using natural join

The Query so Far
SELECT some-information
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND student-is-in-third-year

UNION

SELECT some-information
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND student-is-in-first-or-second-year;

Information for Finalists
We must retrieve
Computed average mark,weighted 40-60 across years 2 and 3
First year marks must be ignored
The ID, Name and Year are needed as they are used for ordering
The average is difficult
We don’t have any statements to separate years 2 and 3 easily
We can exploit the fact that 40 = 20 * 2 and 60 = 20 * 3, so YearTaken and the weighting have the same relationship

The Query so Far
SELECT some-information
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND student-is-in-third-year
UNION

Information for Finalists
SELECT Year, Student.ID, Last, First,
SUM(((20*YearTaken)/100)*Mark*Credits)/120
AS AverageMark
FROM
Student, Module, Grade
WHERE
Student.ID = Grade.ID
AND
Module.Code = Grade.Code
AND
YearTaken IN (2,3)
AND
Year = 3
GROUP BY
Year, Student.ID, First, Last

Information for Others
Other students are easier than finalists
We just need their average marks where YearTaken and Year are the same
As before, we need ID, Name and Year for ordering


UNION
SELECT some-information
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND student-is-in-first-or-second-year;

Information for Finalists
SELECT Year, Student.ID, Last, First,
SUM(Mark*Credits)/120 AS AverageMark
FROM
Student, Module, Grade
WHERE
Student.ID = Grade.ID
AND
Module.Code = Grade.Code
AND
YearTaken = Year
AND
Year IN (1,2)
GROUP BY
Year, Student.ID, First, Last

SELECT Year, Student.ID, Last, First,
SUM(((20*YearTaken)/100)*Mark*Credits)/120
AS AverageMark
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND YearTaken IN (2,3)
AND Year = 3
GROUP BY Year, Student.ID, Last, First
UNION
SELECT Year, Student.ID, Last, First,
SUM(Mark*Credits)/120 AS AverageMark
FROM Student, Module, Grade
WHERE Student.ID = Grade.ID
AND Module.Code = Grade.Code
AND YearTaken = Year
AND Year IN (1,2)
GROUP BY Year, Student.ID, Last, First

ORDER BY
Year desc, AverageMark desc, Last, First, ID;

Questions?