Chapter 5: Other Relational Languages
A Big Picture:
Basic Steps in Query Processing
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
A Big Picture:
Basic Steps in Query Processing (Cont.)
A SQL query can be translated into several relational algebra expressions (internal form).
E.g., select salary from instructor where salary < 75000;
salary75000(salary(instructor)), or
salary(salary75000(instructor))
The relational algebra representation of a query specifies (partially) how to evaluate a query.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Relational Algebra
Database System Concepts
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
3
Relational Algebra
A quick review of relational database
Six basic operators
select:
project:
union:
set difference: –
Cartesian product: x
rename:
Additional operations that simplify common queries
set intersection
join
assignment
outer join
Extended relational algebra operations
generalized projection
aggregate functions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
4
Example of a Instructor Relation
attributes
(or columns)
tuples
(or rows)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
5
Attribute
The set of allowed values for each attribute is called the domain of the attribute
Attribute values are (normally) required to be atomic; that is, indivisible
The special value null is a member of every domain, indicated that the value is “unknown”
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
6
Relation Schema and Instance
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
instructor = (ID, name, dept_name, salary)
Formally, given sets D1, D2, …. Dn, a relation r is a subset of
D1 x D2 x … x Dn
where Di is the domain of attribute Ai. Thus, a relation is a set of n-tuples (a1, a2, …, an) where each ai Di
The current values (relation instance) of a relation are specified by a table
An element t of r is a tuple, represented by a row in a table
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
7
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
Example: instructor relation with unordered tuples
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
8
Database
A database consists of multiple relations
Information about a university is broken up into parts
instructor
student
advisor
Bad design:
univ (instructor-ID, name, dept_name, salary, student_ID, ..)
results in
repetition of information (e.g., two students have the same instructor)
the need for null values (e.g., represent an student with no advisor)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
9
Keys
Let K R
K is a superkey of R if values for K are sufficient to identify a unique tuple of each possible relation r(R)
Example: {ID} and {ID,name} are both superkeys of instructor.
Superkey K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
One of the candidate keys is selected to be the primary key.
Foreign key constraint: Value in one relation must appear in another
Referencing relation
Referenced relation
Example – dept_name in instructor is a foreign key from instructor referencing department
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
10
Schema Diagram for University Database
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Relational Algebra
A quick review of relational database
Six basic operators
select:
project:
union:
set difference: –
Cartesian product: x
rename:
Additional operations that simplify common queries
set intersection
join
assignment
outer join
Extended relational algebra operations
generalized projection
aggregate functions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
12
Select Operation
The select operation selects tuples that satisfy a given predicate.
Notation: p(r)
p is called the selection predicate
Example: select those tuples of the instructor relation where the instructor is in the “Physics” department.
Query:
dept_name=“Physics” (instructor)
Result:
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
13
Select Operation (Cont.)
We allow comparisons using
=, , >, , <,
in the selection predicate.
We can combine several predicates into a larger predicate by using the connectives:
(and), (or), (not)
Example: Find the instructors in Physics with a salary greater $90,000, we write:
dept_name=“Physics” salary > 90,000 (instructor)
Then select predicate may include comparisons between two attributes.
Example, find all departments whose name is the same as their building name:
dept_name=building (department)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
14
Project Operation
A unary operation that returns its argument relation, with certain attributes left out.
Notation:
A1,A2,A3 ….Ak (r)
where A1, A2 are attribute names and r is a relation name.
The result is defined as the relation of k columns obtained by erasing the columns that are not listed
Duplicate rows removed from result, since relations are sets
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
15
Project Operation (Cont.)
Example: eliminate the dept_name attribute of instructor
Query:
ID, name, salary (instructor)
Result:
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
16
Composition of Relational Operations
The result of a relational-algebra operation is relation and therefore relational-algebra operations can be composed together into a relational-algebra expression.
Consider the query — Find the names of all instructors in the Physics department.
name( dept_name =“Physics” (instructor))
Instead of giving the name of a relation as the argument of the projection operation, we give an expression that evaluates to a relation.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
17
Union Operation
The union operation allows us to combine two relations
Notation: r s
Output the union of tuples from the two input relations
For r s to be valid.
1. r, s must have the same arity (same number of attributes)
2. The attribute domains must be compatible (example: 2nd column
of r deals with the same type of values as does the 2nd
column of s)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
18
Union Operation – Example
Example: to find all courses taught in the Fall 2009 semester, or in the Spring 2010 semester, or in both
Query:
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Result:
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
19
Set Difference Operation
The set-difference operation allows us to find tuples that are in one relation but are not in another.
Notation: r – s
Produce a relation containing those tuples in r but not in s
Set differences must be taken between compatible relations.
r and s must have the same arity
attribute domains of r and s must be compatible
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
20
Set Difference Operation – Example
Example: to find all courses taught in the Fall 2009 semester, but not in the Spring 2010 semester
Query:
course_id ( semester=“Fall” Λ year=2009 (section)) −
course_id ( semester=“Spring” Λ year=2010 (section))
Result:
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
21
Cartesian-Product Operation
The Cartesian-product operation (denoted by X) allows us to combine information from any two relations.
Notation: r x s
Output all pairs of rows from the two input relations (regardless of whether or not they have the same values on common attributes)
Assume that attributes of r(R) and s(S) are disjoint. (That is, R S = ).
If attributes of r(R) and s(S) are not disjoint, then renaming must be used.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
22
Cartesian-Product Operation – Example
Example: the Cartesian product of the relations instructor and teaches is written as:
instructor X teaches
We construct a tuple of the result out of each possible pair of tuples: one from the instructor relation and one from the teaches relation (see next slide)
Since the instructor ID appears in both relations we distinguish between these attribute by attaching to the attribute the name of the relation from which the attribute originally came.
instructor.ID
teaches.ID
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
23
The instructor X teaches table
instructor
teaches
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
24
Example Queries
Find the names of all instructors in the Physics department, along with the course_id of all courses they have taught
Query 1
name,course_id (instructor.ID=teaches.ID (
dept_name=“Physics” (instructor x teaches)))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
25
Example Queries
There is often more than one way to write a query in relational algebra
The following two queries are equivalent, i.e., they give the same result
Query 1
name,course_id ( instructor.ID=teaches.ID (
dept_name=“Physics” (instructor x teaches)))
Query 2
name,course_id (instructor.ID=teaches.ID (
dept_name=“Physics” (instructor) x teaches))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
26
Rename Operation
Allows us to name, and therefore to refer to, the results of relational-algebra expressions.
Allows us to refer to a relation by more than one name.
The expression
x (E)
returns the result of expression E under the name X
If a relational-algebra expression E has arity n, then
x(A1,A2, .. An) (E)
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
27
Example Query
Find the highest salary in the university
Step 1: find instructor salaries that are less than some other instructor salary (i.e. not the highest)
using a copy of instructor under a new name d
instructor.salary ( instructor.salary < d.salary
(instructor x d (instructor)))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Example Query
Find the highest salary in the university
Step 2: Find the highest salary
salary (instructor) –
instructor.salary ( instructor.salary < d.salary
(instructor x d (instructor)))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Formal Definition
A basic expression in the relational algebra consists of either one of the following:
A relation in the database
A constant relation
Let E1 and E2 be relational-algebra expressions; the following are all relational-algebra expressions:
E1 E2
E1 – E2
E1 x E2
p(E1), p is a predicate on attributes in E1
s(E1), S is a list consisting of some of the attributes in E1
x(E1), x is the new name for the result of E1
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
30
Relational Algebra
A quick review of relational database
Six basic operators
select:
project:
union:
set difference: –
Cartesian product: x
rename:
Additional operations that simplify common queries
set intersection
Join
assignment
outer join
Extended relational algebra operations
generalized projection
aggregate functions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
31
Additional Operations
We define additional operations that do not add any power to the
relational algebra, but that simplify common queries.
Set intersection
Join
Assignment
Outer join
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
32
Set-Intersection Operation
The set-intersection operation allows us to find tuples that are in both the input relations.
Notation: r s
Assume:
r, s have the same arity
attributes of r and s are compatible
Note: r s = r – (r – s)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
33
Set-Intersection Operation – Example
Example: to find all courses taught in both the Fall 2009 and the Spring 2010 semesters
Query:
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Result:
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
34
Join Operation
The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
Most of the resulting rows have information about instructors who did NOT teach a particular course.
To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses that they taught, we write:
instructor.id = teaches.id (instructor x teaches))
We get only those tuples of “instructor X teaches” that pertain to instructors and the courses that they taught.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
35
Join Operation (Cont.)
The join operation allows us to combine a select operation and a Cartesian-Product operation into a single operation.
Consider relations r(R) and s(S)
Let “theta” be a predicate on attributes in the schema R “union” S. The join operation r s is defined as follows:
Thus
instructor.id = teaches.id (instructor x teaches))
Can equivalently be written as
instructor Instructor.id = teaches.id teaches.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
36
Join Operation (Cont.)
The join operation without predicate is called natural join.
Notation: r s
Output pairs of rows from the two input relations that have the same value on all attributes that have the same name
Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R S obtained as follows:
Consider each pair of tuples tr from r and ts from s.
If tr and ts have the same value on each of the attributes in R S, add a tuple t to the result, where
t has the same value as tr on r
t has the same value as ts on s
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
37
Join Operation – Example
Find the names of all instructors in the Comp. Sci. department together with the course titles of all the courses that the instructors teach
name, title ( dept_name=“Comp. Sci.” (instructor teaches course))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Assignment Operation
It is convenient at times to write a relational-algebra expression by assigning parts of it to temporary relation variables.
The assignment operation is denoted by and works like assignment in a programming language.
Example: Find all instructor in the “Physics” and Music department.
Physics dept_name=“Physics” (instructor)
Music dept_name=“Music” (instructor)
Physics Music
With the assignment operation, a query can be written as a sequential program consisting of
a series of assignments
followed by an expression whose value is displayed as a result of the query.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
39
Outer Join
An extension of the join operation that avoids loss of information.
Computes the join and then adds tuples from one relation that does not match tuples in the other relation to the result of the join.
Uses null value to signify that the value is unknown or does not exist
Three forms of outer join:
left outer join
right outer join
full outer join
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
40
Outer Join – Example
Relation course
Relation prereq
Observe that
CS-315 is missing in prereq and
CS-347 is missing in course
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
41
Outer Join – Example
Left Outer Join
course ⟕ prereq
Join
course prereq
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
42
Outer Join – Example
Right Outer Join
course ⟖ prereq
Full Outer Join
course ⟗ prereq
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
43
Outer Join using Joins
Outer join can be expressed using basic operations
e.g. r ⟕ s can be written as
(r s) (r – R(r s)) x {(null, …, null)}
where the constant relation {(null, …, null)} is on the schema S - R
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Relational Algebra
A quick review of relational database
Six basic operators
select:
project:
union:
set difference: –
Cartesian product: x
rename:
Additional operations that simplify common queries
set intersection
join
assignment
outer join
Extended relational algebra operations
generalized projection
aggregate functions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
45
Generalized Projection
Extends the projection operation by allowing arithmetic functions to be used in the projection list.
E is any relational-algebra expression
Each of F1, F2, …, Fn are arithmetic expressions involving constants and attributes in the schema of E.
Given relation instructor(ID, name, dept_name, salary) where salary is annual salary, get the same information but with monthly salary
ID, name, dept_name, salary/12 (instructor)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
46
Aggregate Functions and Operations
Aggregate function takes a collection of values and returns a single value as a result.
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
Aggregate operation in relational algebra
E is any relational-algebra expression
G1, G2 …, Gn is a list of attributes on which to group (can be empty)
Each Fi is an aggregate function
Each Ai is an attribute name
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
47
Aggregate Operation – Example
Find the average salary in each department
For convenience, we permit renaming as part of aggregate operation
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
48
Multiset Relational Algebra
Pure relational algebra removes all duplicates
e.g. after projection
Multiset relational algebra retains duplicates, to match SQL semantics
SQL duplicate retention was initially for efficiency, but is now a feature
Multiset relational algebra defined as follows
selection: has as many duplicates of a tuple as in the input, if the tuple satisfies the selection
(r1): If there are c1 copies of tuple t1 in r1, and t1 satisfies selection ,, then there are c1 copies of t1 in (r1).
projection: one tuple per input tuple, even if it is a duplicate
A (r1): For each copy of tuple t1 in r1, there is a copy of tuple A (t1) in A (r1) where A (t1) denotes the projection of the single tuple t1.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Multiset Relational Algebra (Cont.)
Multiset relational algebra defined as follows
Cartesian product:
r1 x r2 : If there are c1 copies of tuple t1 in r1 and c2 copies of tuple t2 in r2, there are c1 x c2 copies of the tuple t1. t2 in r1 x r2
Other operators similarly defined
union: m + n copies
intersection: min(m, n) copies
difference: min(0, m – n) copies
Example: Suppose multiset relations r1 (A, B) and r2 (C) are as follows:
r1 = {(1, a) (2,a)} r2 = {(2), (3), (3)}
Then B(r1) would be {(a), (a)}, while B(r1) x r2 would be
{(a,2), (a,2), (a,3), (a,3), (a,3), (a,3)}
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
SQL and Multiset Relational Algebra
select A1, A2, ..., An
from r1, r2, ..., rm
where P
is equivalent to the following expression in multiset relational algebra
The select clause corresponds to the projection operation
The from clause corresponds to the Cartesian product operation
The where clause corresponds to the selection operation
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
SQL and Multiset Relational Algebra- Examples
Find courses that ran in Fall 2009 or in Spring 2010
(select course_id from section where semester = ‘Fall’ and year = 2009)
union
(select course_id from section where semester = ‘Spring’ and year = 2010)
is equivalent to
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
Find courses that ran in Fall 2009 and in Spring 2010
(select course_id from section where semester = ‘Fall’ and year = 2009)
intersect
(select course_id from section where semester = ‘Spring’ and year = 2010)
is equivalent to
course_id ( semester=“Fall” Λ year=2009 (section))
course_id ( semester=“Spring” Λ year=2010 (section))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
52
SQL and Multiset Relational Algebra - Examples
Find courses that ran in Fall 2009 but not in Spring 2010
(select course_id from section where semester = ‘Fall’ and year = 2009)
except
(select course_id from section where semester = ‘Spring’ and year = 2010)
is equivalent to
course_id ( semester=“Fall” Λ year=2009 (section)) −
course_id ( semester=“Spring” Λ year=2010 (section))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
53
SQL and Multiset Relational Algebra (Cont.)
Set operations union, intersect, and except automatically eliminate duplicates
To retain all duplicates, use the corresponding multiset versions union all, intersect all and except all.
Suppose a tuple occurs m times in r and n times in s, then, it occurs:
m + n times in r union all s
min(m,n) times in r intersect all s
max(0, m – n) times in r except all s
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
54
SQL and Multiset Relational Algebra (Cont.)
select A1, A2, sum(A3)
from r1, r2, ..., rm
where P
group by A1, A2 having count (A4) > 2
is equivalent to the following expression in multiset relational algebra
Find the average salary of instructors in each department
select dept_name, avg (salary) as avg_salary
from instructor
group by dept_name;
is equivalent to
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
)
(
)
(
,
),
(
),
(
,
,
,
2
2
1
1
2
1
E
n
n
n
A
F
A
F
A
F
G
G
G
K
K
))
(
(
2
1
,
,
,
2
1
m
P
A
A
A
r
r
r
n
´
´
´
Õ
K
K
s
/docProps/thumbnail.jpeg