RELATIONAL ALGEBRA
CS 564- Fall 2021
ACKs: , Jignesh Patel, An
WHAT IS THIS LECTURE ABOUT?
• RelationalAlgebra
– query language for relations
• BasicOperations
– selection, projection
– difference, union
– cross-product, renaming
• DerivedOperations
– join, natural join, equi-join, division
CS 564 [Fall 2021] – Paris Koutris 2
RELATIONAL QUERY LANGUAGES
• allowthemanipulationandretrievalofdatafroma database
• twotypesofquerylanguages:
– Declarative: describe what a user wants, rather
than how to compute it
• Tuple Relational Calculus
• Domain Relational Calculus
– Procedural: operational, useful for representing execution plans
• Relational Algebra
CS 564 [Fall 2021] – Paris Koutris 3
WHAT IS RELATIONAL ALGEBRA? • algebra:mathematicalsystemconsistingof
– operands: variables or values from which new values can be constructed
– operators: symbols denoting procedures that construct new values from given values
• relationalalgebra:analgebrawhoseoperands are relations or variables that represent relations
– operators do the most common things that we need to do with relations in a database
– can be used as a query language for relations
CS 564 [Fall 2021] – Paris Koutris 4
RELATIONAL ALGEBRA: PRELIM
• Query:
– Input: relational instances
– Output: relational instances
– specified using the schemas
• mayproducedifferentresultsfordifferentinstances
• theschemaoftheresultisfixed
• therearetwotypesofnotationforattributes: – positional (e.g. 2, 4)
– named-field (e.g. C.name, Person.SSN)
CS 564 [Fall 2021] – Paris Koutris 5
RELATIONAL ALGEBRA: PRELIM
• Basicoperations:
– Selection {𝜎}: selects a subset of rows
– Projection {𝜋}: deletes columns
– Cross-product {×}: combines two relations
– Set-difference {−}
– Union {⋃}
• When the relations have named fields: – Renaming {𝜌}
• Additionaloperations:
– Intersection, join, division
CS 564 [Fall 2021] – Paris Koutris 6
KEEP IN MIND!
• SQLusesmultisets,howeverinRelationalAlgebra we will consider relations as sets
• Wewillconsiderthenamedperspective,where every attribute must have a unique name
The attribute order in a relation does not matter!
CS 564 [Fall 2021] – Paris Koutris 7
BASIC OPERATIONS
CS 564 [Fall 2021] – Paris Koutris 8
SELECTION
Notation: 𝜎!(𝑅)
• CisaconditionthatreferstotheattributesofR
• outputstherowsofRthatsatisfyC
• outputschema:sameasinputschema
Example
• 𝜎!”#$%& 𝑃𝑒𝑟𝑠𝑜𝑛
• 𝜎!”#$%& !'( !”#)%*(𝑃𝑒𝑟𝑠𝑜𝑛)
• 𝜎!”#$%& !'( ‘!+#,“.!/01”(𝑃𝑒𝑟𝑠𝑜𝑛)
CS 564 [Fall 2021] – Paris Koutris 9
SELECT *
FROM Person WHERE age > 24 ;
SELECTION: EXAMPLE Person
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
25
206-473-8221
𝜎!”#$%& 𝑃𝑒𝑟𝑠𝑜𝑛
SSN
name
age
phoneNumber
123123645
John
30
608-321-1163
384475687
Arun
25
206-473-8221
CS 564 [Fall 2021] – Paris Koutris 10
PROJECTION
Notation: 𝜋”!,””,…,”# (𝑅)
• outputsonlythecolumns𝐴!,𝐴”,…,𝐴# • removesanyduplicatetuples
• outputschema:𝑅(𝐴!,𝐴”,…,𝐴#)
Example
• 𝜋334,!”# 𝑃𝑒𝑟𝑠𝑜𝑛
• 𝜋334,678’#49+:#/,!”# (𝑃𝑒𝑟𝑠𝑜𝑛)
SELECT DISTINCT SSN,age FROM Person ;
CS 564 [Fall 2021] – Paris Koutris 11
PROJECTION: EXAMPLE Person
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
20
206-473-8221
𝜋334,’!+# 𝑃𝑒𝑟𝑠𝑜𝑛
SSN
name
934729837
Paris
123123645
John
384475687
S 564 [Fall 2021] – Paris Koutris 12
RA OPERATORS ARE COMPOSITIONAL
SELECT DISTINCT SSN,age FROM Person
WHERE age > 24 ;
Two logically equivalent expressions in RA:
• 𝜋334,!”# 𝜎!”#$%& 𝑃𝑒𝑟𝑠𝑜𝑛 • 𝜎!”#$%&(𝜋334,!”# 𝑃𝑒𝑟𝑠𝑜𝑛 )
CS 564 [Fall 2021] – Paris Koutris 13
UNION
Notation: 𝑅% ∪ 𝑅&
• outputs all tuples in 𝑅!or 𝑅”
• bothrelationsmusthavethesameschema! • outputschema:sameasinput
∪=
A
B
a1
b1
a2
b1
a2
b2
a3
b1
a4
b4
A
B
a1
b1
a2
b1
a2
b2
A
B
a1
b1
a3
b1
a4
b4
CS 564 [Fall 2021] – Paris Koutris 14
DIFFERENCE
Notation: 𝑅% − 𝑅&
• outputsalltuplesin𝑅!andnotin𝑅”
• bothrelationsmusthavethesameschema! • outputschema:sameasinput
−=
A
B
a1
b1
a2
b1
a2
b2
A
B
a1
b1
a3
b1
a4
b4
A
B
a2
b1
a2
b2
CS 564 [Fall 2021] – Paris Koutris 15
CROSS-PRODUCT
Notation: 𝑅%× 𝑅&
• matcheseachtuplesin𝑅!witheachtuplein𝑅”
• input schema: 𝑅! 𝐴!,𝐴”,…,𝐴# , 𝑅” 𝐵!,𝐵”,…,𝐵$ • output schema: 𝑅 𝐴!,…,𝐴#,𝐵!,…,𝐵$
Example
• 𝑃𝑒𝑟𝑠𝑜𝑛×𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡
CS 564 [Fall 2021] – Paris Koutris 16
SELECT *
FROM Person,Department;
CROSS-PRODUCT: EXAMPLE
Person
Dependent
SSN
name
934729837
Paris
123123645
John
depSSN
depname
934729837
Helen
934729837
Bob
𝑃𝑒𝑟𝑠𝑜𝑛 × 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
SSN
name
depSSN
depname
934729837
Paris
934729837
Helen
123123645
John
934729837
Bob
934729837
Paris
934729837
Bob
123123645
John
934729837
S 564 [Fall 2021] – Paris Koutris 17
RENAMING
Notation: 𝜌”!,””,…,”#(𝑅)
• doesnotchangetheinstance,onlytheschema! • inputschema:𝑅 𝐵!,𝐵”,…,𝐵#
• outputschema:𝑅𝐴!,…,𝐴#
Why is it necessary?
named perspective: when joining relations, we need to distinguish between attributes with the same name!
CS 564 [Fall 2021] – Paris Koutris 18
RENAMING: EXAMPLE Person
Dependent
SSN
name
934729837
Paris
123123645
SN
name
934729837
Helen
934729837
Bob
𝑃𝑒𝑟𝑠𝑜𝑛 × 𝜌(#6334,(#6′!+# (𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡)
SSN
name
depSSN
depname
934729837
Paris
934729837
Helen
123123645
John
934729837
Bob
934729837
Paris
934729837
Bob
123123645
John
934729837
S 564 [Fall 2021] – Paris Koutris 19
DERIVED OPERATIONS
CS 564 [Fall 2021] – Paris Koutris 20
INTERSECTION
Notation: 𝑅% ∩ 𝑅&
• outputs all tuples in 𝑅!and 𝑅”
• outputschema:sameasinput
• canbeexpressedas:𝑅!−(𝑅!−𝑅”)
RS
∩
SELECT R.A, R.B FROM R,S
WHERE R.A = S.A AND R.B = S.B;
A
B
a1
b1
a2
b1
a2
b2
A
B
a1
b1
a3
b1
a4
b4
A
B
a1
b1
=
CS 564 [Fall 2021] – Paris Koutris 21
JOIN (THETA JOIN)
Notation: 𝑅%⋈’ 𝑅& = 𝜎'(𝑅%× 𝑅&)
• cross-productfollowedbyaselection
• θcanbeanyboolean-valuedcondition
• mighthavelesstuplesthanthecross-product!
SELECT * FROM R1, R2 WHERE θ;
CS 564 [Fall 2021] – Paris Koutris 22
THETA JOIN: EXAMPLE
Person
Dependent
SSN
name
age
934729837
Paris
26
123123645
John
22
dSSN
dname
dage
934729837
Helen
23
934729837
Bob
28
𝑃𝑒𝑟𝑠𝑜𝑛 ⋈.#/18′.!”#$<#6#'(#'=.(!"# 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
SSN
name
age
dSSN
dname
dage
934729837
Paris
26
934729837
Helen
23
CS 564 [Fall 2021] - Paris Koutris 23
EQUI-JOIN
Notation: 𝑅%⋈' 𝑅&
• specialcaseofjoinwheretheconditionθcontains
only equalities between attributes
• outputschema:sameasthecross-product
Examplefor𝑅 𝐴,𝐵 ,𝑆(𝐶,𝐷)
• 𝑅 ⋈%&' 𝑆
• outputschema:𝑇(𝐴,𝐵,𝐶,𝐷)
SELECT *
FROM R, S
WHERE R.B = S.C;
CS 564 [Fall 2021] - Paris Koutris 24
NATURAL JOIN
Notation: 𝑅! ⋈ 𝑅"
• equi-join on all the common fields
• the output schema has one copy of each common attribute
Person
Dependent
SSN
name
age
934729837
Paris
26
123123645
John
22
SSN
dname
934729837
Helen
934729837
Bob
𝑃𝑒𝑟𝑠𝑜𝑛 ⋈ 𝐷𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡
SELECT SSN,name,age,dname FROM Person P,
Department D
WHERE P.SSN = D.SSN ;
SSN
name
age
dname
934729837
Paris
26
Helen
934729837
Paris
26
S 564 [Fall 2021] - Paris Koutris 25
NATURAL JOIN
Natural Join 𝑅 ⋈ 𝑆
• Input schema: 𝑅 𝐴,𝐵,𝐶,𝐷 ,𝑆 𝐴,𝐶,𝐸
– Output schema: T(A, B, C, D, E)
• Input schema: 𝑅 𝐴,𝐵,𝐶 ,𝑆 𝐷,𝐸
– Output schema: T(A, B, C, D, E)
• Input schema: 𝑅 𝐴,𝐵,𝐶 ,𝑆 𝐴,𝐵,𝐶
– Output schema? T(A, B, C,)
CS 564 [Fall 2021] - Paris Koutris 26
SEMI-JOIN
Notation: 𝑅%⋉ 𝑅&
• naturaljoinfollowedbyprojectiononthe
attributes of R1
Example:
• 𝑅 𝐴,𝐵,𝐶 ,𝑆(𝐵,𝐷)
• 𝑅 ⋉ 𝑆 = 𝜋(,%,' 𝑅 ⋈ 𝑆
• outputschema:T(A,B,C)
SELECT A,B,C
FROM R, S
WHERE R.B = S.B ;
CS 564 [Fall 2021] - Paris Koutris 27
DIVISION
Notation: 𝑅%/𝑅&
• suppose𝑅!(𝐴,𝐵)and 𝑅" 𝐵
• theoutputcontainsallvaluesasuchthatforevery tuple (b) in R2, tuple (a, b) is in R1
• outputschema:R(A)
CS 564 [Fall 2021] - Paris Koutris 28
DIVISION: EXAMPLE
A
B1 B2
A
B
a1
b1
a1
b2
a1
b3
a2
b1
B
b2
b3
b1
B
b1
A
a1
A
a1
a2
𝑨 /𝑩𝟏
𝑨 /𝑩𝟐
CS 564 [Fall 2021] - Paris Koutris 29
EXTENDING RELATIONAL ALGEBRA
CS 564 [Fall 2021] - Paris Koutris 30
GROUP BY AGGREGATE
• ispartoftheso-calledextendedRA
• helpsustocomputecounts,sums,min,max,...
Examples
• Whatistheaverageageofthecustomers? • HowmanypeopleboughtaniPad?
CS 564 [Fall 2021] - Paris Koutris 31
GROUP BY AGGREGATE
Notation: 𝛾(,")) * (𝑅)
• groupbytheattributesinX
• aggregatetheattributeinY
– SUM, COUNT, AVG (average), MIN, MAX
• Outputschema:X+anextra(numerical)attribute
CS 564 [Fall 2021] - Paris Koutris 32
EXAMPLE Person
SSN
name
age
934729837
Paris
24
123123645
John
30
384475687
Arun
21
𝑃𝑒𝑟𝑠𝑜𝑛
AVG(age)
25
SELECT AVG(age) FROM Person ;
CS 564 [Fall 2021] - Paris Koutris 33
EXAMPLE Person
SSN
name
age
phoneNumber
934729837
Paris
24
608-374-8422
934729837
Paris
24
603-534-8399
123123645
John
30
608-321-1163
384475687
Arun
21
206-473-8221
𝛾334,EFG4H(678'#49+:#/) 𝑃𝑒𝑟𝑠𝑜𝑛
SELECT SSN, COUNT(phoneNumber)
FROM Person GROUP BY SSN;
SSN
COUNT(phoneNumber)
934729837
2
123123645
1
384475687
1
CS 564 [Fall 2021] - Paris Koutris 34
CONSTRUCTING RA QUERIES
CS 564 [Fall 2021] - Paris Koutris 35
COMBINING RA OPERATORS
• Wecanbuildmorecomplexqueriesbycombining RA operators together
e.g. standard algebra: 𝑥 + 1
• Thereare3differentnotations:
– sequence of assignment statements
– expressions with operators – expression trees
∗ 𝑦 − 𝑧"
CS 564 [Fall 2021] - Paris Koutris
36
COMBINING RA OPERATORS
Input schema: 𝑅 𝐵, 𝐶 , 𝑆(𝐴, 𝐵) • expressionswithoperators
𝑅 ⋈𝑆)
𝜋@ ⋈
• sequenceofassignmentstatements 𝑅J = 𝜎E,I 𝑅
𝑅JJ = 𝑅′ ⋈ 𝑆 S 𝑅JJJ =
𝜎E,I
• expressiontrees
R
CS 564 [Fall 2021] - Paris Koutris
37
EXPRESSIVE POWER OF RA
• RAcannotexpresstransitiveclosure! Edges
Transitive closure computes all pairs of nodes connected by a directed path
From
To
a
b
b
c
a
d
c
d
CS 564 [Fall 2021] - Paris Koutris 38