module4
Database Management, GMU, Dr. Brodsky, Module 4 1
Relational Algebra
Module 4
Professor Alex Brodsky
Database Systems
Database Management, GMU, Dr. Brodsky, Module 4 2
Relational Query Languages
❖ Query languages: Allow manipulation and retrieval
of data from a database.
❖ Relational model supports simple, powerful QLs:
– Strong formal foundation based on logic.
– Allows for much optimization.
❖ Query Languages != programming languages!
– QLs not expected to be “Turing complete”.
– QLs not intended to be used for complex calculations.
– QLs support easy, efficient access to large data sets.
Database Management, GMU, Dr. Brodsky, Module 4 3
Formal Relational Query Languages
Two mathematical Query Languages form the
basis for “real” languages (e.g. SQL), and for
implementation:
❶ Relational Algebra: More operational, very
useful for representing execution plans.
❷ Relational Calculus: Lets users describe what
they want, rather than how to compute it.
(Non-operational, declarative.)
☛ Understanding Algebra is key to understanding SQL,
☛ and query processing!
Database Management, GMU, Dr. Brodsky, Module 4 4
Algebra Preliminaries
❖ A query is applied to relation instances, and the
result of a query is also a relation instance.
– Schemas of input relations for a query are fixed (but
query will run regardless of instance!)
– The schema for the result of a given query is also
fixed! Determined by definition of query language
constructs.
Database Management, GMU, Dr. Brodsky, Module 4 5
Example Instances
R1
S1
S2
❖ “Sailors” and “Reserves”
relations for our examples.
Database Management, GMU, Dr. Brodsky, Module 4 6
Algebra Operations
❖ Look what we want to get from the following
table:
Database Management, GMU, Dr. Brodsky, Module 4 7
Projection
p
sname rating
S
,
( )2
page S( )2
❖ Deletes attributes that are not in
projection list.
❖ Schema of result contains exactly
the fields in the projection list,
with the same names that they
had in the (only) input relation.
❖ Projection operator has to
eliminate duplicates! (Why??)
– Note: real systems typically
don’t do duplicate elimination
unless the user explicitly asks
for it. (Why not?)
Database Management, GMU, Dr. Brodsky, Module 4 8
Selection
=
>
)2(
8
S
rating
s
❖ Selects rows that satisfy
selection condition.
❖ No duplicates in result!
(Why?)
❖ Schema of result
identical to schema of
(only) input relation.
S2
Database Management, GMU, Dr. Brodsky, Module 4 9
Composition of Operations
❖ Result relation can be the input for another relational
algebra operation! (Operator composition.)
=
>
))2(
8
(, Sratingratingsname
sp
S2
Database Management, GMU, Dr. Brodsky, Module 4 10
What do we want to get from two
relations?
R1
S1
What about: Who reserved boat 101?
Or: Find the name of the sailor who reserved boat 101.
Database Management, GMU, Dr. Brodsky, Module 4 11
Cross-Product
❖ Each row of S1 is paired with each row of R1.
❖ Result schema has one field per field of S1 and R1,
with field names inherited.
)1,2()1,1( RsidsidSsidsid ®´® rr
sid1 sname rating age sid2 bid day
22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
31 lubber 8 55.5 22 101 10/10/96
31 lubber 8 55.5 58 103 11/12/96
58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0 58 103 11/12/96
☛ Renaming operator (because naming conflict):
Database Management, GMU, Dr. Brodsky, Module 4 12
Why does this cross product help
Query: Find the name of the sailor who reserved boat 101.
))(
101and21
(Result
)1,2()1,1(
CP
bidsidsidSname
RsidsidSsidsidCP
==
=
®´®=
sp
rr
* Note my use of “temporary” relation CP.
Database Management, GMU, Dr. Brodsky, Module 4 13
Another example
❖ Find the name of the sailor having the highest
rating.
))AllR2((Result?
)2,(AllR
´
<
=
>-=
S
ratingAratingSname
SratingArating
ratingA
sp
p
What’s in “Result?” ?
Does it answer our query?
Database Management, GMU, Dr. Brodsky, Module 4 14
Union, Intersection, Set-Difference
❖ All of these operations take
two input relations, which
must be union-compatible:
– Same number of fields.
– `Corresponding’ fields
have the same type.
❖ What is the schema of result?
Database Management, GMU, Dr. Brodsky, Module 4 15
Back to our query
❖ Find the name of the sailor having the highest
rating.
Tmp)2
,
( Result
))AllR2((
,
Tmp
)2,(AllR
-=
´
<
=
>-=
)(S
SnameSidSname
S
ratingAratingSnameSid
SratingArating
ratingA
pp
sp
p
* Why not project on Sid only for Tmp?
Database Management, GMU, Dr. Brodsky, Module 4 16
Relational Algebra (Summary)
❖ Basic operations:
– Selection ( s ) Selects a subset of rows from relation.
– Projection ( p ) Deletes unwanted columns from relation.
– Cross-product ( ´ ) Allows us to combine two relations.
– Set-difference ( – ) Tuples in reln. 1, but not in reln. 2.
– Union ( È ) Tuples in reln. 1 and in reln. 2.
– Rename ( r ) Changes names of the attributes
❖ Since each operation returns a relation, operations can be
composed! (Algebra is “closed”.)
❖ Use of temporary relations recommended.
Database Management, GMU, Dr. Brodsky, Module 4 17
Natural Join
❖ Natural-Join:
❖ Result schema similar to cross-product, but only one copy of
each field which appears in both relations.
❖ Natural Join = Cross Product + Selection on common fields
+ drop duplicate fields.
=11 RS !”
Database Management, GMU, Dr. Brodsky, Module 4 18
Query revisited using natural join
Query: Find the name of the sailor who reserved boat 101.
))1(
101
1(Result
))11(
101
(Result
R
bid
S
Sname
Or
RS
bidSname
=
=
=
=
sp
sp
!”
!”
Database Management, GMU, Dr. Brodsky, Module 4 19
Consider yet another query
❖ Find the sailor(s) who reserved all the
red boats.
sid bid day
22 101 10/10/96
22 103 10/11/96
58 102 11/12/96
R1
bid color
101 Red
102 Green
103 Red
B
Database Management, GMU, Dr. Brodsky, Module 4 20
Start an attempt
❖ who reserved what boat:
❖ All the red boats:
sid bid
22 101
22 103
58 102
== )1(
,
1 R
bidsid
S p
=
=
= ))((2 B
redcolorbid
S sp
bid
101
103
Database Management, GMU, Dr. Brodsky, Module 4 21
Division
❖ Not supported as a primitive operator, but useful for
expressing queries like:
Find sailors who have reserved all red boats.
❖ Let S1 have 2 fields, x and y; S2 have only field y:
– S1/S2 =
– i.e., S1/S2 contains all x tuples (sailors) such that for every
y tuple (redboat) in S2, there is an xy tuple in S1 (i.e, x
reserved y).
❖ In general, x and y can be any lists of fields; y is the
list of fields in S2, and xÈy is the list of fields of S1.
ïþ
ï
ý
ü
ïî
ï
í
ì )1,(2| RinyxexiststhereRinyforallx
Database Management, GMU, Dr. Brodsky, Module 4 22
Examples of Division A/B
A
B1
B2
B3
A/B1 A/B2 A/B3
Database Management, GMU, Dr. Brodsky, Module 4 23
Find names of sailors who’ve reserved boat #103
❖ Solution 1: )))Re(( (
103
Sailorsserves
bidsname
!”
=
sp
❖ Solution 2: )Re1 (
103
servesTemp
bid =
=s
SailorsTempTemp !”12=
)2(Re Tempsult snamep=
❖ Solution 3:
Database Management, GMU, Dr. Brodsky, Module 4 24
Find names of sailors who’ve reserved a red boat
❖ Information about boat color only available in
Boats; so need an extra join:
❖ A more efficient
solution:
☛ A query optimizer can find this given the first solution!
Database Management, GMU, Dr. Brodsky, Module 4 25
Find sailors who’ve reserved a red or a green boat
❖ Can identify all red or green boats, then find
sailors who’ve reserved one of these boats:
)(
””
Boats
greencolorredcolor
Tempboats
=Ú=
=s
)Re(Re SailorsservesTempboatssnamesult !”!”p=
❖ Can also define Tempboats using union! (How?)
❖ What happens if is replaced by in this query?Ú Ù
Database Management, GMU, Dr. Brodsky, Module 4 26
Find sailors who’ve reserved a red and a green boat
❖ Previous approach won’t work! Must identify
sailors who’ve reserved red boats, sailors
who’ve reserved green boats, then find the
intersection (note that sid is a key for Sailors):
)Re))(
”
(( servesBoats
redcolorsid
Tempred !”
=
= sp
))((Re SailorsTempgreenTempredsnamesult !ӂ=p
)Re))(
”
(( servesBoats
greencolorsid
Tempgreen !”
=
= sp
Database Management, GMU, Dr. Brodsky, Module 4 27
Find the names of sailors who’ve reserved all boats
❖ Uses division; schemas of the input relations
to / must be carefully chosen:
))((/))(Re
,
(Tempsids Boats
bid
serves
bidsid
pp=
)(Result SailorsTempsidssname !”p=
❖ To find sailors who’ve reserved all ‘Interlake’ boats:
))(
”
(/ Boats
Interlakebnamebid =
sp…..
Database Management, GMU, Dr. Brodsky, Module 4 28
Summary
❖ The relational model has rigorously defined
query languages that are simple and
powerful.
❖ Relational algebra is more operational; useful
as internal representation for query
evaluation plans.
❖ Several ways of expressing a given query; a
query optimizer should choose the most
efficient version.