CS3402
1
CS3402 : Chapter 7 Relational Algebra
Semester B, 2020/2021
Relational Algebra
Relational algebra: a formal language for the relational model
The operations in relational algebra enable a user to specify basic
retrieval requests (or queries)
Relational algebra consists of a set of operations on relations to
generate relations
The result of an operation is a new relation
They can be further manipulated using operations
A sequence of relational algebra operations forms a relational
algebra expression
CS3402 2
Importance of Relational Algebra
Relational algebra provides a formal foundation for relational model It is used as a basis for implementing and optimizing queries in
query processing and optimization
Its concepts are incorporated into SQL standard language for
CS3402
3
relational database management systems
The internal modules of most commercial RDBMS are based on relational algebra
Relational Algebra Overview
Relational algebra consists of several groups of operations Unary Relational Operations
CS3402
4
SELECT (symbol: σ (sigma)) PROJECT (symbol: π (pi)) RENAME (symbol: ρ (rho))
Binary Relational Operations
JOIN (several variations of JOIN exist) DIVISION
Relational algebra operations from set theory
UNION ( ∪ ), INTERSECTION ( ∩ ), DIFFERENCE (or MINUS, – ) CARTESIAN PRODUCT ( x )
Additional Relational Operations
OUTER JOINS, OUTER UNION
AGGREGATE FUNCTIONS (These compute summary of information: for example, SUM, COUNT, AVG, MIN, MAX)
Database State for COMPANY
All examples discussed below refer to the COMPANY database shown here
CS3402
Slide 8- 5
5
The following query results refer to this database state
CS3402 6
Unary Relational Operations: SELECT
The SELECT operation (denoted by σ (sigma)) is used to select a subset of the tuples from a relation based on a selection condition.
The selection condition acts as a filter
Keeps only those tuples that satisfy the qualifying condition
Horizontal partitioning
Tuples satisfying the condition are selected whereas the other tuples are discarded (filtered out)
The general form of the select operation is: σ
CS3402
7
Unary Relational Operations: SELECT
Example 1:
Select the EMPLOYEE tuples whose department number is 4:
Equivalent to: SELECT *
FROM EMPLOYEE WHERE DNO=4
CS3402
8
σ DNO = 4 (EMPLOYEE)
Unary Relational Operations: SELECT
Example 2:
Select the employee tuples whose department number is 4 and salary is greater than $25,000 or department number is 5 and salary is greater than $30,000:
Equivalent to: SELECT *
FROM EMPLOYEE
WHERE (Dno=4 AND Salary>25,000) OR (Dno=5 AND Salary>30,000)
σ (Dno =4 AND Salary>25,000 ) OR (Dno=5 AND Salary> 30 ,000)(EMPLOYEE)
CS3402 9
Unary Relational Operations: SELECT
SELECT Operation Properties
The SELECT operation σ
S that has the same schema (same attributes) as R SELECT σ is commutative:
σ
CS3402
10
applied in any order:
σ
Unary Relational Operations: SELECT
SELECT Operation Properties:
A cascade of SELECT operations may be replaced by a single
selection with a conjunction (and) of all the conditions: σ
The number of tuples in the result of a SELECT is less than (or
equal to) the number of tuples in the input relation R
The fraction of tuples selected by a selection condition is called
the selectivity of the condition
CS3402 11
Unary Relational Operations: PROJECT
PROJECT Operation is denoted by π (pi)
This operation keeps certain attributes from a relation and discards
the other attributes
PROJECT creates a vertical partitioning
The list of specified attributes is kept in each tuple The other attributes in each tuple are discarded
The general form of the project operation is: π
π (pi) is the symbol used to represent the project operation
CS3402 12
Unary Relational Operations: PROJECT
The project operation removes any duplicate tuples
This is because the result of the project operation must be a set of
tuples
Mathematical sets do not allow duplicate elements Example: πSex,Salary(EMPLOYEE)
CS3402 Duplicated tuples 13
Unary Relational Operations: PROJECT
PROJECT Operation Properties
CS3402
14
The number of tuples in the result of projection π(R) is always less (duplicates are removed) or equal (unique values) to the number of tuples in R
If the list of attributes includes a key of R, then the number of tuples in the result of PROJECT is equal to the number of tuples in R
PROJECT is not commutative
π
π
the attributes in
List1 = LNAME, FNAME
List2 = LNAME, FNAME, SALARY
Single expression versus sequence of relational operations
We may want to apply several relational algebra operations one after the other
Either we can write the operations as a single relational algebra expression by nesting the operations, or
We can apply one operation at a time and create intermediate result relations
In the latter case, we must give names (rename) to the relations that hold the intermediate results
CS3402 15
Single expression versus sequence of relational operations
Example: To retrieve the first name, last name, and salary of all employees who work in department number 5, we must apply a select and a project operation
We can write a single relational algebra expression as follows: πFNAME, LNAME, SALARY(σ DNO=5(EMPLOYEE))
OR We can explicitly show the sequence of operations, giving a name to each intermediate relation:
DEP5_EMPS ← σ DNO=5(EMPLOYEE) TEMP ← π FNAME, LNAME, SALARY (DEP5_EMPS)
Note: the ← symbol is an assignment operator
CS3402 16
Unary Relational Operations: RENAME
The RENAME operator is denoted by ρ (rho)
The RENAME operation ρ can be used to rename either the relation
CS3402
17
name or the attribute names, or both: ρS(R) changes:
the relation name only to S ρ(B1, B2, …, Bn)(R) changes:
the attribute names only to B1, B2, …..Bn ρS (B1, B2, …, Bn )(R) changes both:
the relation name to S, and
the attribute names to B1, B2, …..Bn
Unary Relational Operations: RENAME
For convenience, we also use a shorthand for renaming attributes in an intermediate relation:
If we write: TEMPS ← π FNAME, LNAME, SALARY (DEP5_EMPS)
• Note: TEMPS has the same attribute names as DEP5_EMPS
If we write: ρ R (First_name , Last_name , Salary)(TEMPS)
• The 3 attributes of TEMPS are renamed to First_name ,
CS3402
18
Last_name and Salary, respectively; and • R is the name of the result relation.
Example of applying multiple operations and RENAME
CS3402
Slide 8- 19
19
Relational Algebra Operations from Set Theory
Type compatibility of operands is required for the binary set operation UNION ∪, INTERSECTION ∩, and SET DIFFERENCE –
R1(A1, A2, …, An) and R2(B1, B2, …, Bn) are type compatible if:
they have the same number of attributes, and
the domains of corresponding attributes are type compatible (i.e. dom(Ai)=dom(Bi) for i=1, 2, …, n)
The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2) has the same attribute names as the first operand relation R1
CS3402 20
Relational Algebra Operations from Set Theory: UNION
UNION Operation
Binary operation, denoted by ∪
The result of R ∪ S, is a relation that includes all tuples that are either in R or in S or in both R and S
Duplicate tuples are eliminated
The two operand relations R and S must be “type compatible”
CS3402 21
Relational Algebra Operations from Set Theory: UNION
Example:
CS3402
22
To retrieve the social security numbers of all employees who either work in department 5 (RESULT1 below) or directly supervise an employee who works in department 5 (RESULT2 below)
We can use the UNION operation as follows: DEP5_EMPS ← σDNO=5 (EMPLOYEE)
RESULT1 ← π SSN(DEP5_EMPS) RESULT2(SSN) ← πSUPERSSN(DEP5_EMPS) RESULT ← RESULT1 ∪ RESULT2
The union operation produces the tuples that are in either RESULT1 or RESULT2 or both
Figure 8.3 Result of the UNION operation RESULT ← RESULT1 ∪ RESULT2.
CS3402
23
RESULT ← RESULT1 ∪ RESULT2
Relational Algebra Operations from Set Theory: INTERSECTION
INTERSECTION is denoted by ∩
The result of the operation R ∩ S, is a relation that includes all tuples
that are in both R and S
The attribute names in the result will be the same as the attribute names in R
The two operand relations R and S must be “type compatible”
CS3402 24
Relational Algebra Operations from Set Theory: SET DIFFERENCE
SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by – The result of R – S, is a relation that includes all tuples that are in R
but not in S
The attribute names in the result will be the same as the attribute names in R
The two operand relations R and S must be “type compatible” R∩ S=(R∪S)–(R–S))–(S–R)
CS3402 25
Example to illustrate the result of UNION
CS3402 26
Example to illustrate the result of INTERSECT
CS3402 27
Example to illustrate the result of SET DIFFERENCE
CS3402 28
Example to illustrate the result of DIFFERENCE
CS3402 29
Properties of UNION, INTERSECT, and DIFFERENCE
Notice that both union and intersection are commutative operations; that is
R ∪ S = S ∪ R, and R ∩ S = S ∩ R
Both union and intersection can be treated as n-ary operations applicable to any number of relations as both are associative operations; that is
R ∪ (S ∪ T) = (R ∪ S) ∪ T R ∩ (S ∩ T) = (R ∩ S) ∩ T
The minus operation is not commutative R – S ≠ S – R
CS3402 30
Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
CARTESIAN (or CROSS) PRODUCT Operation
This operation is used to combine tuples from two relations in a
combinatorial fashion
Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm) Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order
The resulting relation state has one tuple for each combination
of tuples – one from R and one from S
Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS
tuples, then R x S will have nR * nS tuples
The two operands do NOT have to be “type compatible”
CS3402 31
Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
Generally, CROSS PRODUCT is not a meaningful operation Some relations do not exist in the mini-world
Can become meaningful when followed by other operations
Example (not meaningful):
(1) FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)
(2) EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS) (3) EMP_DEPENDENTS ← EMPNAMES x DEPENDENT
EMP_DEPENDENTS will contain every combination of EMPNAMES and DEPENDENT
whether or not they are actually related
CS3402 32
Figure 8.5 The CARTESIAN PRODUCT (CROSS PRODUCT) operation
CS3402 33
Figure 8.5 The CARTESIAN PRODUCT (CROSS PRODUCT) operation
CS3402 34
Figure 8.5 The CARTESIAN PRODUCT (CROSS PRODUCT) operation
CS3402 35
Relational Algebra Operations from Set Theory: CARTESIAN PRODUCT
To keep only combinations where the DEPENDENT is related to the EMPLOYEE, we add a SELECT operation as follows
Example (meaningful):
(1) FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)
(2) EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS) (3) EMP_DEPENDENTS ← EMPNAMES x DEPENDENT (4) ACTUAL_DEPS ← σ SSN=ESSN(EMP_DEPENDENTS)
(5) RESULT ← π FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)
RESULT will now contain the name of female employees and their
dependents
CS3402 36
The CARTESIAN PRODUCT (CROSS PRODUCT) operation
CS3402 37
Binary Relational Operations: JOIN
JOIN Operation (denoted by )
CS3402
38
The sequence of CARTESIAN PRODECT followed by SELECT is used quite commonly to identify and select related tuples from two relations
A special operation, called JOIN combines this sequence into a single operation
The general form of a join operation on two relations R(A1, A2, . . ., An) and S(B1, B2, . . ., Bm) is:
R
R and S can be any relations that result from general relational algebra expressions
Binary Relational Operations: JOIN
Example: Suppose that we want to retrieve the name of the manager of each department
To get the manager’s name, we need to combine each DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value matches the MGRSSN value in the department tuple.
DEPT_MGR ← DEPARTMENT MGRSSN=SSN EMPLOYEE
CS3402
39
MGRSSN=SSN is the join condition: Combines each department record with the employee who manages the department
Figure 8.6 Result of the JOIN operation
DEPT_MGR ← DEPARTMENT Mgr_ssn=SsnEMPLOYEE
CS3402
40
Some properties of JOIN
Consider the following JOIN operation: R(A1, A2, . . ., An)
S(B1, B2, . . ., Bm)
CS3402
41
R.Ai=S.Bj
Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order
The resulting relation state has one tuple for each combination of tuples: r from R and s from S, but only if they satisfy the join condition r[Ai]=s[Bj]
Hence, if R has nR tuples, and S has nS tuples, then the join result will generally have less than nR x nS tuples
Theta-join
The general case of JOIN operation is called a Theta-join: R
A general join condition is of the form
Each
Ai is an attribute of R
Bj is an attribute of S
Ai and Bj have the same domain
θ is one of the comparison operators {=, <, ≤, >, ≥, ≠}
CS3402 42
EQUIJOIN
The most common use of join involves join conditions with equality comparisons only
Such a join, where the only comparison operator used is =, is called an EQUIJOIN
In the result of an EQUIJOIN we always have one or more pairs of attributes (whose names need not be identical) that have identical values in every tuple
The JOIN seen in the previous example was an EQUIJOIN
DEPT_MGR ← DEPARTMENT Mgr_ssn=SsnEMPLOYEE CS3402
43
NATURAL JOIN Operation
NATURAL JOIN: denoted by * was created to get rid of the superfluous attributes in an EQUIJOIN condition
One of each pair of attributes with identical values is superfluous
The standard definition of natural join requires that the two join attributes, or each pair of corresponding join attributes, have the same name in both relations.
E.g. Q ← R(A,B,C,D) * S(C,D,E)
The implicit join condition includes each pair of attributes with
CS3402
44
the same name, “AND”ed together: R.C=S.C AND R.D=S.D Result keeps only one attribute of each such pair:
Q(A,B,C,D,E)
NATURAL JOIN
Example: Combine each PROJECT tuple with the DEPARTMENT.
We first rename the Dnumber attribute of DEPARTMENT to Dnum, so that it has the same name as the Dnum attribute in PROJECT, and then we apply NATURAL JOIN.
DEPT ← ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)(DEPARTMENT) PROJ_DEPT ← PROJECT * DEPT
The attribute Dnum is called the join attribute for NATURAL JOIN, because it is the attribute with the same name in both relations.
Only one join attribute value is kept.
CS3402 45
Example of NATURAL JOIN operation
PROJ_DEPT ← PROJECT * DEPT
DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS
CS3402 46
Binary Relational Operations: DIVISION
Q. How to retrieve the Social Security numbers of employees who work on all the projects that ‘John Smith’ works on?
First, retrieve the list of project numbers that ‘John Smith’ works on in the intermediate relation SMITH_PNOS:
SMITH ← σ Fname=‘John’ AND Lname=‘Smith’ (EMPLOYEE) SMITH_PNOS ← π Pno(WORKS_ON Essn=SsnSMITH)
Next, we create a relation that includes a tuple
SSN_PNOS ← π Essn, Pno(WORKS_ON)
Finally, apply the DIVISION operation to the two relations, which
gives the desired employees’ Social Security numbers:
CS3402 SSNS(Ssn) ← SSN_PNOS ÷ SMITH_PNOS 47
Binary Relational Operations: DIVISION
CS3402 48
Binary Relational Operations: DIVISION
DIVISION Operation
The division operation is applied to two relations
R(Z) ÷ S(X), where X subset Z
Let Y = Z – X (and hence Z = X ∪ Y); that is, let Y be the set of attributes of R that are not attributes of S
The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear in R with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S
For a tuple t to appear in the result T of the DIVISION, the
values in t must appear in R in combination with every tuple in S
CS3402 49
Example of DIVISION
SSNS(Ssn) ← SSN_PNOS ÷SMITH_PNOS
•R= SSN_PNOS, S= SMITH_PNOS, T= SSNS.
Z={Essn,Pno}, Y = {Essn}, X ={Pno}
•E.g., 123456789 is in SSNS because tuples tR appear in R with tR[Y]=123456789, and with tR[X]= ts for every tuple ts (i.e., 1 and 2) in S
48
CS3402 50
Complete Set of Relational Operations
The set of operations including SELECT σ, PROJECT π , UNION ∪, DIFFERENCE − , RENAME ρ, and CARTESIAN PRODUCT X is called a complete set because any other relational algebra expression can be expressed by a combination of these six operations
For example:
R ∩ S = (R ∪ S ) – ((R − S) ∪ (S − R))
CS3402
51
R
Table 8.1 Operations of Relational Algebra
CS3402 52
Table 8.1 Operations of Relational Algebra
CS3402 53
References
7e
Ch. 8
CS3402 54