PowerPoint Presentation
Relational Algebra
R & G, Chapters 4.1 – 4.2
1
Architecture of a DBMS: What we’ve learned
Database Management
System
Database
Query Parsing
& Optimization
Relational Operators
Files and Index Management
Buffer Management
Disk Space Management
SQL Client
Completed
Completed
Completed
You are here!
Completed
Today: definitions of the relational operators.
Coming soon: implementations
2
An Overview of the Layer Above
SELECT S.name
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100
AND S.rating > 5
SQL Query
Query Parser
& Optimizer
𝜋S.name(𝜎bid=100⋀rating>5(
Reserves ⋈R.sid=S.sid Sailors))
Relational Algebra
𝜋S.name
𝜎R.bid=100 ⋀ S.rating > 5
⋈R.sid=S.sid
Reserves
Sailors
(Logical) Query Plan:
Equivalent to…
On-the-fly
Select Iterator
But actually will produce…
⋈R.sid=S.sid
𝜋S.name
𝜎R.bid=100
Reserves
Sailors
𝜎S.rating>5
Optimized (Physical) Query Plan:
On-the-fly
Project Iterator
Indexed Nested Loop Join Iterator
Heap Scan Iterator
B+-Tree
Indexed Scan Iterator
Operator Code
3
SQL vs Relational Algebra
SELECT S.name
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100
AND S.rating > 5
SQL Query
Query Parser
& Optimizer
𝜋S.name(𝜎bid=100⋀rating>5(
Reserves ⋈R.sid=S.sid Sailors))
Relational Algebra
𝜋S.name
𝜎R.bid=100 ⋀ S.rating > 5
⋈R.sid=S.sid
Reserves
Sailors
(Logical) Query Plan:
Equivalent to…
On-the-fly
Select Iterator
But actually will produce…
⋈R.sid=S.sid
𝜋S.name
𝜎R.bid=100
Reserves
Sailors
𝜎S.rating>5
Optimized (Physical) Query Plan:
On-the-fly
Project Iterator
Indexed Nested Loop Join Iterator
Heap Scan Iterator
B+-Tree
Indexed Scan Iterator
Operator Code
𝜋S.name(𝜎bid=100⋀rating>5(
Reserves ⋈R.sid=S.sid Sailors))
SELECT S.name
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100
AND S.rating > 5
SQL Query
Relational Algebra
Operational description of
a computation.
Systems execute relational algebra query plan.
SQL
A declarative expression of the query result
Relational Algebra
4
SQL (Structured Query Language)
SELECT S.name
FROM Reserves R, Sailors S
WHERE R.sid = S.sid
AND R.bid = 100
AND S.rating > 5
Key System Features: Why do we like SQL
Declarative:
Say what you want, not how to get it
Enables system to optimize the how
Foundation in formal Query Languages
Relational Calculus
History: Formal Relational QL’s
Relational Calculus: (Basis for SQL)
Describe the result of computation
Based on first order logic
Tuple Relational Calculus (TRC)
{S | S ∈ Sailors ∃R ∈ Reserves
(R.sid = S.sid∧R.bid = 103)}
Relational Algebra:
Algebra on sets
Operational description of transformations
Are these equivalent?
Can we go from one to the other?
6
Codd’s Theorem
Established equivalence in
expressivity between :
Relational Calculus
Relational Algebra
Why an important result?
Connects declarative representation of queries with operational description
Constructive: we can compile SQL into relational algebra
Edgar F. “Ted” Codd
(1923 – 2003)
Turing Award 1981
1972
7
Relational Algebra Preliminaries
Algebra of operators on relation instances
𝜋S.name(𝜎R.bid=100 ⋀ S.rating>5(R ⋈R.sid=S.sid S))
Closed: result is also a relation instance
Enables rich composition!
Typed: input schema determines output
Can statically check whether queries are legal.
9
Why is typing important? Relations must all have schemas. Schema information is important for composition and also for system optimization.
This is something that map-reduce systems have struggled with.
Relational Algebra and Sets
Pure relational algebra has set semantics
No duplicate tuples in a relation instance
vs. SQL, which has multiset (bag) semantics
We will switch to multiset in the system discussion
Relational Algebra Operators: Unary
Unary Operators: on single relation
Projection (p ): Retains only desired columns (vertical)
Selection (s ): Selects a subset of rows (horizontal)
Renaming ( 𝜌 ): Rename attributes and relations.
11
Relational Algebra Operators: Binary
Binary Operators: on pairs of relations
Union ( ): Tuples in r1 or in r2.
Set-difference ( — ): Tuples in r1, but not in r2.
Cross-product ( ): Allows us to combine two relations.
Relational Algebra Operators: Compound
Compound Operators: common “macros” for the above
Intersection ( ∩ ): Tuples in r1 and in r2.
Joins ( ⋈𝜃 , ⋈ ): Combine relations that satisfy predicates
Projection ()
Corresponds to the SELECT list in SQL
Schema determined by schema of attribute list
Names and types correspond to input attributes
Selects a subset of columns (vertical)
psname,age(S2)
sname age
yuppy 35.0
lubber 55.5
guppy 35.0
rusty 35.0
List of Attributes
Relational Instance S2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
14
Projection (), cont.
Set semantics results in fewer rows
Real systems don’t automatically remove duplicates
Why? (Semantics and Performance reasons)
Selects a subset of columns (vertical)
page(S2)
age
35.0
55.5
35.0
35.0
age
35.0
55.5
Multiset
Set
Relational Instance S2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
15
Costly and SQL doesn’t enforce set semantics.
Size of output is
Selection(𝜎)
Corresponds to the WHERE clause in SQL
Output schema same as input
Duplicate Elimination? Not needed.
Selects a subset of rows (horizontal)
𝜎rating>8(S2)
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Selection Condition (Boolean Expression)
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
Relational Instance S2
16
Don’t need to worry about duplicate elimination because we don’t change schema and only remove additional elements from set.
Composing Select and Project
Names of sailors with rating > 8:
What about:
Invalid types. Input to 𝜎rating>8 does not contain rating.
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
sid sname rating age
28 yuppy 9 35.0
58 rusty 10 35.0
sname
yuppy
rusty
psname
𝜎rating>8
𝜎rating>8(psname(S2))
psname(𝜎rating>8(S2))
17
Union (∪)
Two input relations, must be compatible:
Same number of fields
Fields in corresponding positions have same type
SQL Expression: UNION
S1
S2
S1 ∪ S2
S1 ∪ S2
19
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Union (∪) VS Union ALL
Duplicate elimination in practice?
SQL’s UNION vs UNION ALL
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Relational Instance S2
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
Relational Instance S1
sid sname rating age
22 dustin 7 45
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
S1 ∪ S2
20
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Set Difference ( − )
Same as with union, both input relations must be compatible.
SQL Expression: EXCEPT
S1
S2
S1 − S2
S1 − S2
21
Set Difference ( − ), cont.
Duplicate elimination?
Not required
EXCEPT vs EXCEPT ALL
sid sname rating age
22 dustin 7 45
S1 − S2
sid sname rating age
28 yuppy 9 35.0
44 guppy 5 35.0
S2 − S1
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Relational Instance S2
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
Relational Instance S1
22
SQL treats EXCEPT like UNION and applies duplicate elimination afterwards. This could remove duplicates if the left relation had duplicates to begin with. This is not an issue in relational algebra where duplicates are not possible in inputs.
Cross-Product (×)
R1 × S1: Each row of R1 paired with each row of S1
How many rows in result? |R1|*|R2|
Schema compatability? Not needed.
Duplicates? None generated.
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1:
sid bid day
22 101 10/10/96
58 103 11/12/96
R1:
×
sid bid day sid sname rating age
22 101 10/10/96 22 dustin 7 45.0
22 101 10/10/96 31 lubber 8 55.5
22 101 10/10/96 58 rusty 10 35.0
58 103 11/12/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0
=
23
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Renaming ( 𝜌 = “rho” )
Renames relations and their attributes:
Note that relational algebra doesn’t require names.
We could just use positional arguments.
sid bid day sid sname rating age
22 101 10/10/96 22 dustin 7 45.0
22 101 10/10/96 31 lubber 8 55.5
22 101 10/10/96 58 rusty 10 35.0
58 103 11/12/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0
R1 × S1
𝜌( Temp1(1 sid1, 4 sid2), R1 × S1)
Output Relation
Name
Renaming List
position New Name
Input
Relation
sid1 bid day sid2 sname rating age
22 101 10/10/96 22 dustin 7 45.0
22 101 10/10/96 31 lubber 8 55.5
22 101 10/10/96 58 rusty 10 35.0
58 103 11/12/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0
Temp1
24
Compound Operator: Intersection
Same as with union, both input relations must be compatible.
SQL Expression: INTERSECT
S1
S2
S1 ∩ S2
S1 ∩ S2
25
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Intersection (∩)
Equivalent to:
S1 — (S1 — S2)
sid sname rating age
31 lubber 8 55.5
58 rusty 10 35.0
S1 ∩ S2
sid sname rating age
28 yuppy 9 35.0
31 lubber 8 55.5
44 guppy 5 35.0
58 rusty 10 35.0
Relational Instance S2
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
Relational Instance S1
26
VALUES (‘Bob’), (‘Bob’), (‘Popeye’) INTERSECT VALUES (‘Bob’), (‘Bob’), (‘Popeye’);
VALUES (‘Bob’), (‘Bob’), (‘Popeye’) INTERSECT ALL VALUES (‘Bob’), (‘Bob’), (‘Popeye’);
VALUES (‘Bob’), (‘Popeye’) INTERSECT ALL VALUES (‘Bob’), (‘Bob’), (‘Popeye’);
Intersection (∩), Pt 2
S1 ∩ S2 = ?
∩
S1
S2
27
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Intersection (∩), Pt 3
S1 ∩ S2 = S1 – ?
S1
?
= –
∩
S1
S2
28
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Intersection (∩), Pt 4
S1 ∩ S2 = S1 – (S1 – S2)
Is intersection monotonic?
R1 ⊆ R2 ⇒ S ∩ R1 ⊆ S ∩ R2
S1
?
= –
∩
S1
S2
( – )
S1
S2
29
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Compound Operator: Join
Joins are compound operators (like intersection):
Generally, 𝜎𝜃( R × S)
Hierarchy of common kinds:
Theta Join ( ⋈𝜃 ): join on logical expression 𝜃
Equi-Join: theta join with theta being a conjunction of equalities
Natural Join ( ⋈ ): equi-join on all matching column names
Note: we will need to learn a good join algorithm.
Avoid cross-product if we can!!
31
Theta Join (⋈𝜃) Example
R1 ⋈sid=sid S1
Note that output needs a rename operator!
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1:
sid bid day
22 101 10/10/96
58 103 11/12/96
R1:
⋈sid=sid
sid bid day sid sname rating age
22 101 10/10/96 22 dustin 7 45.0
58 103 11/12/96 58 rusty 10 35.0
=
32
Default of Union is to apply set semantics.
Recall that Union All is required to keep duplicates.
Another Theta Join (⋈𝜃) Example
R ⋈𝜃 S = 𝜎𝜃( R × S)
Example: More senior sailors for each sailor.
S1 ⋈ f4 < f8 S1
f1 f2 f3 f4
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1:
S1 S1
f1 f2 f3 f4 f5 f6 f7 f8
22 dustin 7 45.0 22 dustin 7 45.0
22 dustin 7 45.0 31 lubber 8 55.5
22 dustin 7 45.0 58 rusty 10 35.0
31 lubber 8 55.5 22 dustin 7 45.0
31 lubber 8 55.5 31 lubber 8 55.5
31 lubber 8 55.5 58 rusty 10 35.0
58 rusty 10 35.0 22 dustin 7 45.0
58 rusty 10 35.0 31 lubber 8 55.5
58 rusty 10 35.0 58 rusty 10 35.0
33
Another Theta Join (⋈𝜃), Pt 2
f1 f2 f3 f4
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1 S1
f1 f2 f3 f4 f5 f6 f7 f8
22 dustin 7 45.0 22 dustin 7 45.0
22 dustin 7 45.0 31 lubber 8 55.5
22 dustin 7 45.0 58 rusty 10 35.0
31 lubber 8 55.5 22 dustin 7 45.0
31 lubber 8 55.5 31 lubber 8 55.5
31 lubber 8 55.5 58 rusty 10 35.0
58 rusty 10 35.0 22 dustin 7 45.0
58 rusty 10 35.0 31 lubber 8 55.5
58 rusty 10 35.0 58 rusty 10 35.0
S1:
R ⋈𝜃 S = 𝜎𝜃( R × S)
Example: More senior sailors for each sailor.
S1 ⋈ age < age2 S1
34
Another Theta Join (⋈𝜃), Pt 3
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
S1:
S1 S1
sid sname rating age sid sname rating age2
22 dustin 7 45.0 31 lubber 8 55.5
58 rusty 10 35.0 22 dustin 7 45.0
58 rusty 10 35.0 31 lubber 8 55.5
R ⋈𝜃 S = 𝜎𝜃( R × S)
Example: More senior sailors for each sailor.
S1 ⋈ f4 < f8 S1
Result schema same as that of cross-product.
Special Case:
Equi-Join: theta join with AND of = predicates
Special special case Natural Join …
35
Natural Join (⋈)
Special case of equi-join in which equalities are specified for all matching fields and duplicate fields are projected away
R ⋈ S = 𝜋unique fld. 𝜎eq. matching fld.( R × S )
Compute R × S
Select rows where fields appearing in both relations have equal values
Project onto the set of all unique fields.
36
Natural Join (⋈) Pt 2
R ⋈ S = 𝜋unique fld. 𝜎eq. matching fld.( R × S )
R1 ⋈ S1
S1:
sid bid day
22 101 10/10/96
58 103 11/12/96
R1:
sid bid day sid sname rating age
22 101 10/10/96 22 dustin 7 45.0
22 101 10/10/96 31 lubber 8 55.5
22 101 10/10/96 58 rusty 10 35.0
58 103 11/12/96 22 dustin 7 45.0
58 103 11/12/96 31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
37
Natural Join (⋈), Pt 3
R ⋈ S = 𝜋unique fld. 𝜎eq. matching fld.( R × S )
Commonly used for foreign key joins (as above).
R1 ⋈ S1
sid bid day sname rating age
22 101 10/10/96 dustin 7 45.0
58 103 11/12/96 rusty 10 35.0
sid bid day
22 101 10/10/96
58 103 11/12/96
R1:
sid sname rating age
22 dustin 7 45.0
31 lubber 8 55.5
58 rusty 10 35.0
38
Extended Relational Algebra
Group By / Aggregation Operator (𝛾 ):
𝛾age, AVG(rating)(Sailors)
With selection (HAVING clause):
𝛾age, AVG(rating), COUNT(*)>2(Sailors)
Textbook uses two operators:
GROUP BY age, AVG(rating) (Sailors)
HAVING COUNT(*)>2
(GROUP BY age, AVG(rating)(Sailors))
Summary
Relational Algebra: a small set of operators mapping relations to relations
Operational, in the sense that you specify the explicit order of operations
A closed set of operators! Mix and match.
Basic ops include: s, p, , , —
Important compound ops: , ⋈
41
/docProps/thumbnail.jpeg