1/59
Week 8 Workshop – Query Processing and Optimisation
2/59
Housekeeping
1 Assignment 2 (Database Theory) for both COMP2400/6240 students:
The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.
2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.
3 Lab 8 is optional (no associated with any assessment items)
We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB
2/59
Housekeeping
1 Assignment 2 (Database Theory) for both COMP2400/6240 students:
The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.
2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.
3 Lab 8 is optional (no associated with any assessment items)
We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB
2/59
Housekeeping
1 Assignment 2 (Database Theory) for both COMP2400/6240 students:
The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.
2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.
3 Lab 8 is optional (no associated with any assessment items)
We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB
3/59
Query Processing – Example
SELECT name FROM Person WHERE age<21; High-level language
(SQL)
⇓
πname(σage<21(Person))
Low-level language
(Relational Algebra)
⇓
Person
21σage<
nameπ
Execution plan
(Query tree)
⇓
name
Rickon
Bran
Query result
4/59
From SQL to RA Expressions
Students(matNr, firstName, lastName, email)
Exams(matNr, crsNr, result, semester)
Courses(crsNr, title, unit)
SELECT lastName, result, title
FROM Students, Exams, Courses
WHERE Students.matNr=Exams.matNr AND
Exams.crsNr=Courses.crsNr AND result≤1.3;
RA Expressions:
1 πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams)
./σExams.crsNr=Courses.crsNr Courses))
2 πlastName,result,title(σresult≤1.3(σEXAMS.crsNr=Courses.crsNr
(σStudents.matNr=Exams.matNr(Students× Exams× Courses))))
3 πlastName,result,title((Students ./Students.matNr=Exams.matNr
(σresult≤1.3(Exams))) ./Exams.crsNr=Courses.crsNr Courses)
4 . . .
4/59
From SQL to RA Expressions
Students(matNr, firstName, lastName, email)
Exams(matNr, crsNr, result, semester)
Courses(crsNr, title, unit)
SELECT lastName, result, title
FROM Students, Exams, Courses
WHERE Students.matNr=Exams.matNr AND
Exams.crsNr=Courses.crsNr AND result≤1.3;
RA Expressions:
1 πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams)
./σExams.crsNr=Courses.crsNr Courses))
2 πlastName,result,title(σresult≤1.3(σEXAMS.crsNr=Courses.crsNr
(σStudents.matNr=Exams.matNr(Students× Exams× Courses))))
3 πlastName,result,title((Students ./Students.matNr=Exams.matNr
(σresult≤1.3(Exams))) ./Exams.crsNr=Courses.crsNr Courses)
4 . . .
5/59
From RA Expressions to Query Trees
Each RA expression can be represented as a query tree:
leaf nodes represent the input relations;
internal nodes represent the intermediate result;
the root node represents the resulting relation.
Example:
πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams)
./Exams.crsNr=Courses.crsNr Courses))
6/59
Query Tree Example
For each query tree, computation proceeds bottom-up:
child nodes must be executed before their parent nodes;
but there can exist multiple methods of executing sibling nodes.
Example:
πlastName,result,title(σresult≤1.3(σExams.crsNr=Courses.crsNr
(σStudents.matNr=Exams.matNr(Students× Exams× Courses))))
. .σstudents matNr exams matNr=
lastName, result, titleπ
×
courses
exams
×
students
. .σexams crsNr courses crsNr=
1.3σresult≤
7/59
Equivalent Query Trees (Query Optimisation)
. .σstudents matNr exams matNr=
lastName, result, titleπ
×
courses
exams
×
students
. .σexams crsNr courses crsNr=
1.3σresult≤
8/59
Execution Plan
(Slide 8-27 will not be assessed in our course)
A query execution plan consists of an (extended) query tree with
additional annotation at each node indicating:
(1) the access method to use for each table, and
(2) the implementation method for each RA operator.
9/59
Execution Plan
Materialized: The intermediate result of an operator may be saved in a
temporary table for processing by the next operator.
Pipelined: the intermediate result of an operator is directly sent to another
operator without creating a temporary table (also called on-the-fly).
Note: Pipelined evaluation may have significant saving on I/O cost, while
materialized evaluation can avoid repeated computations.
10/59
Execution Plan
Question: Which execution plan is “optimal” in terms of processing
efficiency?
This is determined by the query optimiser using a variety of algorithms
(Fact: there is no true optimal solution in general!).
Realistically, we cannot expect to always find the best plan, but we expect to
consistently find a plan that is good.
The performance of different execution plans for the same query may differ
considerably (e.g., seconds vs. hours vs. days):
different but equivalent RA expressions;
different algorithms for each RA operator.
10/59
Execution Plan
Question: Which execution plan is “optimal” in terms of processing
efficiency?
This is determined by the query optimiser using a variety of algorithms
(Fact: there is no true optimal solution in general!).
Realistically, we cannot expect to always find the best plan, but we expect to
consistently find a plan that is good.
The performance of different execution plans for the same query may differ
considerably (e.g., seconds vs. hours vs. days):
different but equivalent RA expressions;
different algorithms for each RA operator.
10/59
Execution Plan
Question: Which execution plan is “optimal” in terms of processing
efficiency?
This is determined by the query optimiser using a variety of algorithms
(Fact: there is no true optimal solution in general!).
Realistically, we cannot expect to always find the best plan, but we expect to
consistently find a plan that is good.
The performance of different execution plans for the same query may differ
considerably (e.g., seconds vs. hours vs. days):
different but equivalent RA expressions;
different algorithms for each RA operator.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
11/59
Execution Plan
Basic ideas of algorithms used for RA operators
Selection: If there is no index, we have to scan the table. Otherwise,
we scan the indexes to retrieve matching tuples and apply remaining
selection conditions to further restrict the tuples.
Projection retrieves a subset of attributes from each tuple of the table
(similar to selection). If requiring duplicate elimination, then we have to
do sorting additionally (expensive part!)
Join: We may use nested loops join, or sort-merge join, hash joins,
block nested loops join, etc.
Group by and order by are typically implemented using sorting.
Aggregation operators use temporary counters in main memory
when retrieving tuples.
Set operators can use the same approach as projection to eliminate
duplicates.
12/59
Estimating Query Costs - Example
Which movies got a non-US award for one of its actors playing an ‘agent’ ?
πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD
−σaward country=‘USA′(AWARD))))
12/59
Estimating Query Costs - Example
Which movies got a non-US award for one of its actors playing an ‘agent’ ?
πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD
−σaward country=‘USA′(AWARD))))
12/59
Estimating Query Costs - Example
Which movies got a non-US award for one of its actors playing an ‘agent’ ?
πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD
−σaward country=‘USA′(AWARD))))
13/59
Size of Relations
How to determine the size of a relation r over R(A1, . . . ,Ak )?
Let n denote the average number of tuples in r , and `j the the average
space (e.g., in bits) for attribute Aj .
R
A1 A2 . . . Ak
1 . . . . . . . . . . . .
. . . . . . . . . . . . . . .
n . . . . . . . . . . . .
`1 `2 . . . `k
Then, n ·
k∑
j=1
`j is the size of the relation r .
We use this formula to assign sizes to leaf nodes in the query tree.
13/59
Size of Relations
How to determine the size of a relation r over R(A1, . . . ,Ak )?
Let n denote the average number of tuples in r , and `j the the average
space (e.g., in bits) for attribute Aj .
R
A1 A2 . . . Ak
1 . . . . . . . . . . . .
. . . . . . . . . . . . . . .
n . . . . . . . . . . . .
`1 `2 . . . `k
Then, n ·
k∑
j=1
`j is the size of the relation r .
We use this formula to assign sizes to leaf nodes in the query tree.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
14/59
Estimating Query Costs - Example (Relation Sizes)
AWARD(Award name:varchar(30),Institution:varchar(50),Award country:
varchar(20))
Estimate the average number of tuples as 15.
Estimate the average space for attributes:
Award name: 8 · 20 = 160 bits (the mean length is 20);
Institution: 8 · 30 = 240 bits (the mean length is 30);
Award country: 8 · 10 = 80 bits (the mean length is 10).
The average size of a tuple is 160 + 80 + 240 = 480 bits.
The average size of a relation is estimated to be 15 · 480 = 7, 200 bits.
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
15/59
Estimating Query Costs - Example (Relation Sizes)
ROLE(Id:char(8), Title:varchar(40), Production year:number(4),
Role description:varchar(100),Credits:varchar(40))
Estimate the average number of tuples as 500.
Estimate the average space for attributes:
Id: 8 · 8 = 64 bits (as the domain is char(8));
Title: 8 · 25 = 200 bits (the mean length is 25);
Production year: 13 bits (as the domain is number(4));
Role description: 8 · 50 = 400 bits (the mean length is 50);
Credits: 8 · 20 = 160 bits (the mean length is 20).
The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits
The average size of a relation is to be 500 · 837 = 418, 500 bits
16/59
Estimating Query Costs - Example (Relation Sizes)
ACTOR AWARD(Title:varchar(40),Production year:number(4),
Role description:varchar(100),Award name:varchar(30),
Year of award:number(4),Category:varchar(100),Result:varchar(20))
Estimate the average number of tuples as 40
Estimate the average space for attributes:
Title: 200 bits (as before);
Production year: 13 bits (as before);
Role description: 400 bits (as before);
Award name: 160 bits (as before);
Year of award: 13 bits (as the domain is number(4));
Category: 8 · 40 = 320 bits (the mean length is 40);
Result: 8 · 7 = 56 bits (the mean length is 7).
The average size of a tuple is 200 + 13 + 400 + 160 + 13 + 320 + 56
= 1, 162 bits.
The average size of a relation is 40 · 1162 = 46, 480 bits.
17/59
Estimating Query Costs - Example (Query Tree)
18/59
Size of Selection Node
Selection σϕ is linear in the number n of tuples of the involved relation:
Scan the relation one tuple after another (if there is no index);
Check for each tuple, whether the condition ϕ is satisfied or not;
Keep exactly those tuples satisfying ϕ.
Let s be the size of its single relevant node.
The size of a selection node σϕ is
aϕ · s ,
where aϕ is the average percentage of tuples satisfying ϕ.
18/59
Size of Selection Node
Selection σϕ is linear in the number n of tuples of the involved relation:
Scan the relation one tuple after another (if there is no index);
Check for each tuple, whether the condition ϕ is satisfied or not;
Keep exactly those tuples satisfying ϕ.
Let s be the size of its single relevant node.
The size of a selection node σϕ is
aϕ · s ,
where aϕ is the average percentage of tuples satisfying ϕ.
19/59
Estimating Query Costs - Example (Selection)
For selection σaward country=‘USA’ assume aϕ = 0.4 (i.e., 40% of the movie
awards from the USA). Hence, we have: aϕ · s = 0.4 · 7,200 = 2,880.
20/59
Size of Difference Node
Let s1 and s2 be the sizes of the two relevant nodes.
Again, we need to consider the probability that tuples occur in both
relations.
The size of a difference node is
s1 · (1− p)
where (1− p) is the probability that tuples from s1 does not occur in s2.
20/59
Size of Difference Node
Let s1 and s2 be the sizes of the two relevant nodes.
Again, we need to consider the probability that tuples occur in both
relations.
The size of a difference node is
s1 · (1− p)
where (1− p) is the probability that tuples from s1 does not occur in s2.
21/59
Estimating Query Costs - Example (Difference)
Since 40% of the movie awards from the USA, the probability of an award to
be a US-award is p = 0.4. We have: s1 · (1− p)= 7,200 · (1-0.4) = 4,320.
22/59
Size of Natural Join Node
Let s1 and s2 be the sizes of the two relevant nodes, and r1 and r2 be the
size of a tuple in these two nodes.
s1
r1
and
s2
r2
are the estimated number of
tuples in these two nodes.
The size of a natural join node is
s1
r1
· p ·
s2
r2
· (r1 + r2 − r) ,
where r is the size of a tuple over the common attributes, and p is the
matching probability (for any tuple of the first relevant node to match with
any tuples in the second relevant relation). Note that r1 + r2 − r is the size of
a tuple after the natural join operation.
22/59
Size of Natural Join Node
Let s1 and s2 be the sizes of the two relevant nodes, and r1 and r2 be the
size of a tuple in these two nodes.
s1
r1
and
s2
r2
are the estimated number of
tuples in these two nodes.
The size of a natural join node is
s1
r1
· p ·
s2
r2
· (r1 + r2 − r) ,
where r is the size of a tuple over the common attributes, and p is the
matching probability (for any tuple of the first relevant node to match with
any tuples in the second relevant relation). Note that r1 + r2 − r is the size of
a tuple after the natural join operation.
23/59
Estimating Query Costs - Example (Natural Join)
For join with ACTOR AWARD assume p = 0.08, i.e., 8% of the actor awards
are non-US awards. By
s1
r1
· p ·
s2
r2
· (r1 + r2 − r), we have:
46, 480
1, 162
· 0.08 ·
4, 320
480
· (1, 162 + 480− 160) = 42, 682.
24/59
Estimating Query Costs - Example (Natural Join)
Assume p = 0.002. By
s1
r1
· p ·
s2
r2
· (r1 + r2 − r), we have:
418, 500
837
· 0.002 ·
42, 682
1, 482
· (837 + 1, 482− 200− 400− 13) = 49, 133.
25/59
Estimating Query Costs - Example (Selection)
For selection σrole description=’agent’ assume aϕ = 0.05 (i.e., non-US awards for
“agent” roles are 5%). Hence, we have: aϕ · s = 0.05 · 49,133 = 2,457.
26/59
Size of Projection Node
Projection π{A1, . . . ,An}:
Project each tuple to the attributes in {A1, . . . ,An}
Eliminate duplicates (Note: SQL does not eliminate tuples unless
DISTINCT is used).
Let s be the size of its single relevant node with s = n · r for its average
number n of tuples and its average size r of a tuple.
The size of a projection node πA1,...,An is
(1− pi) · s ·
ri
r
,
where ri is the average size of a tuple over {A1, . . . ,An}, and pi is the
probability that two tuples coincide on A1, . . . ,An (i.e., the same values on
all attributes A1, . . . ,An).
26/59
Size of Projection Node
Projection π{A1, . . . ,An}:
Project each tuple to the attributes in {A1, . . . ,An}
Eliminate duplicates (Note: SQL does not eliminate tuples unless
DISTINCT is used).
Let s be the size of its single relevant node with s = n · r for its average
number n of tuples and its average size r of a tuple.
The size of a projection node πA1,...,An is
(1− pi) · s ·
ri
r
,
where ri is the average size of a tuple over {A1, . . . ,An}, and pi is the
probability that two tuples coincide on A1, . . . ,An (i.e., the same values on
all attributes A1, . . . ,An).
27/59
Estimating Query Costs - Example (Projection)
For projection πtitle, production year assume that there are 1% of duplicates, i.e.,
pi = 0.01. By (1− pi) · s ·
ri
r
, we have (1− 0.01) · 2, 457
213
1706
= 304
28/59
Query Optimisation
29/59
Relational Algebra ⇒ Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
ri
ed
p
er
fo
rm
an
ce
Logically equivalent
Which RA query should be chosen for a given SQL query?
Who choose? Query optimiser!
How to choose?
– Semantic query optimisation
– Rule-based optimisation
– Cost-based optimisation
29/59
Relational Algebra ⇒ Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
ri
ed
p
er
fo
rm
an
ce
Logically equivalent
Which RA query should be chosen for a given SQL query?
Who choose?
Query optimiser!
How to choose?
– Semantic query optimisation
– Rule-based optimisation
– Cost-based optimisation
29/59
Relational Algebra ⇒ Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
ri
ed
p
er
fo
rm
an
ce
Logically equivalent
Which RA query should be chosen for a given SQL query?
Who choose? Query optimiser!
How to choose?
– Semantic query optimisation
– Rule-based optimisation
– Cost-based optimisation
29/59
Relational Algebra ⇒ Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
ri
ed
p
er
fo
rm
an
ce
Logically equivalent
Which RA query should be chosen for a given SQL query?
Who choose? Query optimiser!
How to choose?
– Semantic query optimisation
– Rule-based optimisation
– Cost-based optimisation
29/59
Relational Algebra ⇒ Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
ri
ed
p
er
fo
rm
an
ce
Logically equivalent
Which RA query should be chosen for a given SQL query?
Who choose? Query optimiser!
How to choose?
– Semantic query optimisation
– Rule-based optimisation
– Cost-based optimisation
30/59
Query Optimisation
In practice, query optimisers incorporate elements of the following three
optimisation approaches:
Semantic query optimisation
Use application specific semantic knowledge to transform a query into
the one with a lower cost (they return the same answer).
Rule-based query optimisation
Use heuristic rules to transform a relational algebra expression into an
equivalent one with a possibly lower cost.
Cost-based query optimisation
Use a cost model to estimate the costs of plans, and then select the
most cost-effective plan. This will not be assessed in our course.
30/59
Query Optimisation
In practice, query optimisers incorporate elements of the following three
optimisation approaches:
Semantic query optimisation
Use application specific semantic knowledge to transform a query into
the one with a lower cost (they return the same answer).
Rule-based query optimisation
Use heuristic rules to transform a relational algebra expression into an
equivalent one with a possibly lower cost.
Cost-based query optimisation
Use a cost model to estimate the costs of plans, and then select the
most cost-effective plan. This will not be assessed in our course.
30/59
Query Optimisation
In practice, query optimisers incorporate elements of the following three
optimisation approaches:
Semantic query optimisation
Use application specific semantic knowledge to transform a query into
the one with a lower cost (they return the same answer).
Rule-based query optimisation
Use heuristic rules to transform a relational algebra expression into an
equivalent one with a possibly lower cost.
Cost-based query optimisation
Use a cost model to estimate the costs of plans, and then select the
most cost-effective plan. This will not be assessed in our course.
30/59
Query Optimisation
In practice, query optimisers incorporate elements of the following three
optimisation approaches:
Semantic query optimisation
Use application specific semantic knowledge to transform a query into
the one with a lower cost (they return the same answer).
Rule-based query optimisation
Use heuristic rules to transform a relational algebra expression into an
equivalent one with a possibly lower cost.
Cost-based query optimisation
Use a cost model to estimate the costs of plans, and then select the
most cost-effective plan. This will not be assessed in our course.
31/59
Semantic Query Optimisation
Example:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
WRITER(id, title, production year, credits) where
[id] ⊆ PERSON[id]
[title, production year] ⊆ MOVIE [title, production year]
List the ids of the writers who have written movies produced in 2000.
πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE)
πidσproduction year=2000(WRITER ./ PERSON)
πidσproduction year=2000(WRITER ./ MOVIE)
πidσproduction year=2000WRITER ←− the optimised RA
31/59
Semantic Query Optimisation
Example:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
WRITER(id, title, production year, credits) where
[id] ⊆ PERSON[id]
[title, production year] ⊆ MOVIE [title, production year]
List the ids of the writers who have written movies produced in 2000.
πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE)
πidσproduction year=2000(WRITER ./ PERSON)
πidσproduction year=2000(WRITER ./ MOVIE)
πidσproduction year=2000WRITER ←− the optimised RA
31/59
Semantic Query Optimisation
Example:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
WRITER(id, title, production year, credits) where
[id] ⊆ PERSON[id]
[title, production year] ⊆ MOVIE [title, production year]
List the ids of the writers who have written movies produced in 2000.
πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE)
πidσproduction year=2000(WRITER ./ PERSON)
πidσproduction year=2000(WRITER ./ MOVIE)
πidσproduction year=2000WRITER ←− the optimised RA
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
32/59
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
33/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
σϕ(R1 × R2) ≡ R1 ./ϕ R2;
σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1;
33/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
σϕ(R1 × R2) ≡ R1 ./ϕ R2;
σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1;
34/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY )
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
STUDY
UID CourseNo Hours
111 COMP2400 120
111 BUSN2011 110
111 ECON2102 120
STUDY
UID CourseNo Hours
111 COMP2400 120
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
STUDY
UID CourseNo Hours
111 COMP2400 120
34/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY )
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
STUDY
UID CourseNo Hours
111 COMP2400 120
111 BUSN2011 110
111 ECON2102 120
STUDY
UID CourseNo Hours
111 COMP2400 120
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
STUDY
UID CourseNo Hours
111 COMP2400 120
34/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY )
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
STUDY
UID CourseNo Hours
111 COMP2400 120
111 BUSN2011 110
111 ECON2102 120
STUDY
UID CourseNo Hours
111 COMP2400 120
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
STUDY
UID CourseNo Hours
111 COMP2400 120
34/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY )
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
STUDY
UID CourseNo Hours
111 COMP2400 120
111 BUSN2011 110
111 ECON2102 120
STUDY
UID CourseNo Hours
111 COMP2400 120
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
STUDY
UID CourseNo Hours
111 COMP2400 120
34/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY )
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
STUDY
UID CourseNo Hours
111 COMP2400 120
111 BUSN2011 110
111 ECON2102 120
STUDY
UID CourseNo Hours
111 COMP2400 120
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
STUDY
UID CourseNo Hours
111 COMP2400 120
35/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πUID(πUID,CourseNo(Study)) v.s. πUID(Study)
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
UID CourseNo
111 COMP2400
222 COMP2400
333 STAT2001
111 BUSN2011
111 ECON2102
333 BUSN2011
UID
111
222
333
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
UID
111
222
333
35/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πUID(πUID,CourseNo(Study)) v.s. πUID(Study)
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
UID CourseNo
111 COMP2400
222 COMP2400
333 STAT2001
111 BUSN2011
111 ECON2102
333 BUSN2011
UID
111
222
333
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
UID
111
222
333
35/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πUID(πUID,CourseNo(Study)) v.s. πUID(Study)
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
UID CourseNo
111 COMP2400
222 COMP2400
333 STAT2001
111 BUSN2011
111 ECON2102
333 BUSN2011
UID
111
222
333
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
UID
111
222
333
35/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πUID(πUID,CourseNo(Study)) v.s. πUID(Study)
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
UID CourseNo
111 COMP2400
222 COMP2400
333 STAT2001
111 BUSN2011
111 ECON2102
333 BUSN2011
UID
111
222
333
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
UID
111
222
333
35/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πUID(πUID,CourseNo(Study)) v.s. πUID(Study)
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
UID CourseNo
111 COMP2400
222 COMP2400
333 STAT2001
111 BUSN2011
111 ECON2102
333 BUSN2011
UID
111
222
333
STUDY
UID CourseNo Hours
111 COMP2400 120
222 COMP2400 115
333 STAT2001 120
111 BUSN2011 110
111 ECON2102 120
333 BUSN2011 130
⇒
(without any intermediate relation)
UID
111
222
333
36/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
36/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
36/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
36/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
36/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
37/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
σCourse.No=Enrol.CoureNo(Course × Enrol)
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active
BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active
38/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
Course ./Course.No=Enrol.CourseNo Enrol
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
Inner Join on Course.No=Enrol.CourseNo (no intermediate Cartesian product)
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
38/59
Rule-based Optimisation
Can they be executed in one go? ↪→ Merging RA operators.
σϕ(R1 × R2) ≡ R1 ./ϕ R2
Course ./Course.No=Enrol.CourseNo Enrol
COURSE
No Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
Inner Join on Course.No=Enrol.CourseNo (no intermediate Cartesian product)
No Cname Unit StudentID CourseNo Semester Status
COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active
COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active
BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active
39/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σϕ1∧ϕ2(R1 ./ R2) ≡ σϕ1(R1) ./ σϕ2(R2), if ϕ1 contains only attributes
in R1 and ϕ2 contains only attributes in R2;
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition contains
attributes not in X , where Xi contains attributes both in Ri and X , and
ones both in R1 and R2;
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , where Xi contains attributes both in Ri and X , and
ones both in R1 and R2;
39/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σϕ1∧ϕ2(R1 ./ R2) ≡ σϕ1(R1) ./ σϕ2(R2), if ϕ1 contains only attributes
in R1 and ϕ2 contains only attributes in R2;
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition contains
attributes not in X , where Xi contains attributes both in Ri and X , and
ones both in R1 and R2;
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , where Xi contains attributes both in Ri and X , and
ones both in R1 and R2;
40/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNoNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
40/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNoNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
40/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNoNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
40/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNoNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
41/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course) ./ Enrol
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
COURSE
CourseNo Cname Unit
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
41/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course) ./ Enrol
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
COURSE
CourseNo Cname Unit
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
41/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course) ./ Enrol
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
COURSE
CourseNo Cname Unit
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
41/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course) ./ Enrol
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
COURSE
CourseNo Cname Unit
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
41/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1;
σCname=‘ManagementAccounting′(Course) ./ Enrol
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
COURSE
CourseNo Cname Unit
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
BUSN2011 Management Accounting 6 111 2016 S1 active
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
42/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
43/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only
attributes in X , how could we derive X1 and X2?
πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
44/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
CourseNo Cname Unit StudentID Semester Status
COMP2400 Relational Databases 6 222 2016 S1 active
COMP2400 Relational Databases 6 111 2016 S2 active
BUSN2011 Management Accounting 6 111 2016 S1 active
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
45/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)?
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCnameCOURSE
Cname
Relational
Management
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πStudentID ENROL
StudentID
111
222
Is πCname(Course) ./ πStudentID(Enrol) our desired result?
No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol)
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
46/59
Rule-based Optimisation
Can join be executed last? ↪→ Push select/project before join.
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves
attributes outside X , how could we derive X1 and X2?
πCname,StudentID(Course ./ Enrol)
πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol))
COURSE
CourseNo Cname Unit
COMP2400 Relational Databases 6
BUSN2011 Management Accounting 6
πCourseNo,CnameCOURSE
CourseNo Cname
COMP2400 Relational Databases
BUSN2011 Management Accounting
ENROL
StudentID CourseNo Semester Status
111 BUSN2011 2016 S1 active
222 COMP2400 2016 S1 active
111 COMP2400 2016 S2 active
πCourseNo,StudentID ENROL
StudentID CourseNo
111 BUSN2011
222 COMP2400
111 COMP2400
CourseNo Cname StudentID
COMP2400 Relational Databases 222
COMP2400 Relational Databases 111
BUSN2011 Management Accounting 111
Cname StudentID
Relational Databases 222
Relational Databases 111
Management Accounting 111
47/59
Heuristic Rules and Query Trees
(1) σϕ(σψ(R)) ≡ σϕ∧ψ(R);
(2) πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
47/59
Heuristic Rules and Query Trees
(1) σϕ(σψ(R)) ≡ σϕ∧ψ(R);
(2) πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
48/59
Heuristic Rules
(3) σϕ(R1 × R2) ≡ R1 ./ϕ R2
(4) σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1
48/59
Heuristic Rules
(3) σϕ(R1 × R2) ≡ R1 ./ϕ R2
(4) σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1
49/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧
first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE))))
Question: Can we apply the following rule to optimise the query?
σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1
and ϕ2 contains only attributes in R2
We would have
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
49/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧
first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE))))
Question: Can we apply the following rule to optimise the query?
σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1
and ϕ2 contains only attributes in R2
We would have
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
49/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧
first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE))))
Question: Can we apply the following rule to optimise the query?
σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1
and ϕ2 contains only attributes in R2
We would have
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
50/59
Push-down Selection – Example
51/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2?
We would have
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
51/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2?
We would have
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
51/59
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE)
×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2?
We would have
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
52/59
Push-down Selection – Example
53/59
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
Question: Can we apply the following rule to optimise the query?
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)),
where Xi contains attributes both in Ri and X , and ones both in R1 and R2
53/59
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
Question: Can we apply the following rule to optimise the query?
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)),
where Xi contains attributes both in Ri and X , and ones both in R1 and R2
54/59
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
We would have:
πtitle,production year (πtitle,production year (σmajor genre=‘war ’(MOVIE))
./title=mtitle∧production year=mprod year
(πmtitle,mprod year (σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))))
We further apply some rules to optimise the query ...
54/59
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
MOVIE(title, production year, country, run time, major genre)
ROLE(id, mtitle, mprod year, description, credits)
Query: List all war movies that are performed by ‘Tom Cruise’.
πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year (
σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))
We would have:
πtitle,production year (πtitle,production year (σmajor genre=‘war ’(MOVIE))
./title=mtitle∧production year=mprod year
(πmtitle,mprod year (σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))))
We further apply some rules to optimise the query ...
55/59
Push-down Projection – Example
56/59
Cost-based Optimisation (not assessed)
Consider CHARTS={Rank, Artist, Song} with 100 tuples and 3 attributes.
Rank Artist Song
1 Chingy Right Thurr
2 Scribe Stand up
3 Aguilera and Kim Can’t hold us down
4 Evanescence Going under
5 Justin Timberlake Senorita
6 Brooke Fraser Better
7 Black Eyed Peas Where is the love?
... ... ...
... ... ...
Compare two strategies of evaluating “Who is top of the pops?”:
σRank=1(πRank, Artist(CHARTS))
πRank, Artist(σRank=1(CHARTS))
Selection before Projection is preferred.
56/59
Cost-based Optimisation (not assessed)
Consider CHARTS={Rank, Artist, Song} with 100 tuples and 3 attributes.
Rank Artist Song
1 Chingy Right Thurr
2 Scribe Stand up
3 Aguilera and Kim Can’t hold us down
4 Evanescence Going under
5 Justin Timberlake Senorita
6 Brooke Fraser Better
7 Black Eyed Peas Where is the love?
... ... ...
... ... ...
Compare two strategies of evaluating “Who is top of the pops?”:
σRank=1(πRank, Artist(CHARTS))
πRank, Artist(σRank=1(CHARTS))
Selection before Projection is preferred.
57/59
Cost-based Optimisation (not assessed)
Consider CHARTS={Rank, Artist, ...} with 100 tuples and 50 attributes:
Rank Artist Song ... ... ...
1 Chingy Right Thurr ... ... ...
2 Scribe Stand up ... ... ...
3 Aguilera and Kim Can’t hold us down ... ... ...
4 Evanescence Going under ... ... ...
5 Justin Timberlake Senorita ... ... ...
6 Brooke Fraser Better ... ... ...
7 Black Eyed Peas Where is the love? .. ... ...
... ... ... ... ... ...
Compare two strategies of evaluating?
σRank > 10(πRank, Artist(CHARTS))
πRank, Artist(σRank > 10(CHARTS))
Projection before Selection is preferred.
57/59
Cost-based Optimisation (not assessed)
Consider CHARTS={Rank, Artist, …} with 100 tuples and 50 attributes:
Rank Artist Song … … …
1 Chingy Right Thurr … … …
2 Scribe Stand up … … …
3 Aguilera and Kim Can’t hold us down … … …
4 Evanescence Going under … … …
5 Justin Timberlake Senorita … … …
6 Brooke Fraser Better … … …
7 Black Eyed Peas Where is the love? .. … …
… … … … … …
Compare two strategies of evaluating?
σRank > 10(πRank, Artist(CHARTS))
πRank, Artist(σRank > 10(CHARTS))
Projection before Selection is preferred.
58/59
Query Optimisation
SQL query
RA query 1
RA query 2
RA query n
……
Va
rie
d
pe
rf
or
m
an
ce
Logically equivalent
SQL query RA query
……
Query Optimiser
Heuristic
rules
Cost
model
Database statistics
RA query 4
RA query n
RA query 2
RA query 3
RA query 1
……
Execution planInput
Trade-off:
Time for executing a RA query vs Time for finding a better RA query
59/59
(credit cookie) memorising vs understanding