CS计算机代考程序代写 SQL python database Java ER algorithm 1/59

1/59

Week 8 Workshop – Query Processing and Optimisation

2/59

Housekeeping

1 Assignment 2 (Database Theory) for both COMP2400/6240 students:

The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.

2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.

3 Lab 8 is optional (no associated with any assessment items)

We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB

2/59

Housekeeping

1 Assignment 2 (Database Theory) for both COMP2400/6240 students:

The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.

2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.

3 Lab 8 is optional (no associated with any assessment items)

We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB

2/59

Housekeeping

1 Assignment 2 (Database Theory) for both COMP2400/6240 students:

The submission deadline is 23:59, Oct 12, 2021.
This assignment must be done individually (no group work). Please
join the special drop-in sessions if you need any clarifications.

2 All the labs on Oct 4 (Monday, public holiday) in Week 9 will be moved to the
same timeslots on Oct 11 (Monday) in Week 10.

3 Lab 8 is optional (no associated with any assessment items)

We will open a separate sign-up page on Wattle at 12pm Oct 6.
All the optional labs will be scheduled from Oct 12 to Oct 15.
Three options are available
(1) Database Programming with Java
(2) Database Programming with Python
(3) Database Exercises on IMDB

3/59

Query Processing – Example

SELECT name FROM Person WHERE age<21; High-level language (SQL) ⇓ πname(σage<21(Person)) Low-level language (Relational Algebra) ⇓ Person 21σage< nameπ Execution plan (Query tree) ⇓ name Rickon Bran Query result 4/59 From SQL to RA Expressions Students(matNr, firstName, lastName, email) Exams(matNr, crsNr, result, semester) Courses(crsNr, title, unit) SELECT lastName, result, title FROM Students, Exams, Courses WHERE Students.matNr=Exams.matNr AND Exams.crsNr=Courses.crsNr AND result≤1.3; RA Expressions: 1 πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams) ./σExams.crsNr=Courses.crsNr Courses)) 2 πlastName,result,title(σresult≤1.3(σEXAMS.crsNr=Courses.crsNr (σStudents.matNr=Exams.matNr(Students× Exams× Courses)))) 3 πlastName,result,title((Students ./Students.matNr=Exams.matNr (σresult≤1.3(Exams))) ./Exams.crsNr=Courses.crsNr Courses) 4 . . . 4/59 From SQL to RA Expressions Students(matNr, firstName, lastName, email) Exams(matNr, crsNr, result, semester) Courses(crsNr, title, unit) SELECT lastName, result, title FROM Students, Exams, Courses WHERE Students.matNr=Exams.matNr AND Exams.crsNr=Courses.crsNr AND result≤1.3; RA Expressions: 1 πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams) ./σExams.crsNr=Courses.crsNr Courses)) 2 πlastName,result,title(σresult≤1.3(σEXAMS.crsNr=Courses.crsNr (σStudents.matNr=Exams.matNr(Students× Exams× Courses)))) 3 πlastName,result,title((Students ./Students.matNr=Exams.matNr (σresult≤1.3(Exams))) ./Exams.crsNr=Courses.crsNr Courses) 4 . . . 5/59 From RA Expressions to Query Trees Each RA expression can be represented as a query tree: leaf nodes represent the input relations; internal nodes represent the intermediate result; the root node represents the resulting relation. Example: πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams) ./Exams.crsNr=Courses.crsNr Courses)) 6/59 Query Tree Example For each query tree, computation proceeds bottom-up: child nodes must be executed before their parent nodes; but there can exist multiple methods of executing sibling nodes. Example: πlastName,result,title(σresult≤1.3(σExams.crsNr=Courses.crsNr (σStudents.matNr=Exams.matNr(Students× Exams× Courses)))) . .σstudents matNr exams matNr= lastName, result, titleπ × courses exams × students . .σexams crsNr courses crsNr= 1.3σresult≤ 7/59 Equivalent Query Trees (Query Optimisation) . .σstudents matNr exams matNr= lastName, result, titleπ × courses exams × students . .σexams crsNr courses crsNr= 1.3σresult≤ 8/59 Execution Plan (Slide 8-27 will not be assessed in our course) A query execution plan consists of an (extended) query tree with additional annotation at each node indicating: (1) the access method to use for each table, and (2) the implementation method for each RA operator. 9/59 Execution Plan Materialized: The intermediate result of an operator may be saved in a temporary table for processing by the next operator. Pipelined: the intermediate result of an operator is directly sent to another operator without creating a temporary table (also called on-the-fly). Note: Pipelined evaluation may have significant saving on I/O cost, while materialized evaluation can avoid repeated computations. 10/59 Execution Plan Question: Which execution plan is “optimal” in terms of processing efficiency? This is determined by the query optimiser using a variety of algorithms (Fact: there is no true optimal solution in general!). Realistically, we cannot expect to always find the best plan, but we expect to consistently find a plan that is good. The performance of different execution plans for the same query may differ considerably (e.g., seconds vs. hours vs. days): different but equivalent RA expressions; different algorithms for each RA operator. 10/59 Execution Plan Question: Which execution plan is “optimal” in terms of processing efficiency? This is determined by the query optimiser using a variety of algorithms (Fact: there is no true optimal solution in general!). Realistically, we cannot expect to always find the best plan, but we expect to consistently find a plan that is good. The performance of different execution plans for the same query may differ considerably (e.g., seconds vs. hours vs. days): different but equivalent RA expressions; different algorithms for each RA operator. 10/59 Execution Plan Question: Which execution plan is “optimal” in terms of processing efficiency? This is determined by the query optimiser using a variety of algorithms (Fact: there is no true optimal solution in general!). Realistically, we cannot expect to always find the best plan, but we expect to consistently find a plan that is good. The performance of different execution plans for the same query may differ considerably (e.g., seconds vs. hours vs. days): different but equivalent RA expressions; different algorithms for each RA operator. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 11/59 Execution Plan Basic ideas of algorithms used for RA operators Selection: If there is no index, we have to scan the table. Otherwise, we scan the indexes to retrieve matching tuples and apply remaining selection conditions to further restrict the tuples. Projection retrieves a subset of attributes from each tuple of the table (similar to selection). If requiring duplicate elimination, then we have to do sorting additionally (expensive part!) Join: We may use nested loops join, or sort-merge join, hash joins, block nested loops join, etc. Group by and order by are typically implemented using sorting. Aggregation operators use temporary counters in main memory when retrieving tuples. Set operators can use the same approach as projection to eliminate duplicates. 12/59 Estimating Query Costs - Example Which movies got a non-US award for one of its actors playing an ‘agent’ ? πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD −σaward country=‘USA′(AWARD)))) 12/59 Estimating Query Costs - Example Which movies got a non-US award for one of its actors playing an ‘agent’ ? πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD −σaward country=‘USA′(AWARD)))) 12/59 Estimating Query Costs - Example Which movies got a non-US award for one of its actors playing an ‘agent’ ? πtitle,production year (σrole description=′agent′(ROLE ./ ACTOR AWARD ./ (AWARD −σaward country=‘USA′(AWARD)))) 13/59 Size of Relations How to determine the size of a relation r over R(A1, . . . ,Ak )? Let n denote the average number of tuples in r , and `j the the average space (e.g., in bits) for attribute Aj . R A1 A2 . . . Ak 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . n . . . . . . . . . . . . `1 `2 . . . `k Then, n · k∑ j=1 `j is the size of the relation r . We use this formula to assign sizes to leaf nodes in the query tree. 13/59 Size of Relations How to determine the size of a relation r over R(A1, . . . ,Ak )? Let n denote the average number of tuples in r , and `j the the average space (e.g., in bits) for attribute Aj . R A1 A2 . . . Ak 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . n . . . . . . . . . . . . `1 `2 . . . `k Then, n · k∑ j=1 `j is the size of the relation r . We use this formula to assign sizes to leaf nodes in the query tree. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 14/59 Estimating Query Costs - Example (Relation Sizes) AWARD(Award name:varchar(30),Institution:varchar(50),Award country: varchar(20)) Estimate the average number of tuples as 15. Estimate the average space for attributes: Award name: 8 · 20 = 160 bits (the mean length is 20); Institution: 8 · 30 = 240 bits (the mean length is 30); Award country: 8 · 10 = 80 bits (the mean length is 10). The average size of a tuple is 160 + 80 + 240 = 480 bits. The average size of a relation is estimated to be 15 · 480 = 7, 200 bits. 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 15/59 Estimating Query Costs - Example (Relation Sizes) ROLE(Id:char(8), Title:varchar(40), Production year:number(4), Role description:varchar(100),Credits:varchar(40)) Estimate the average number of tuples as 500. Estimate the average space for attributes: Id: 8 · 8 = 64 bits (as the domain is char(8)); Title: 8 · 25 = 200 bits (the mean length is 25); Production year: 13 bits (as the domain is number(4)); Role description: 8 · 50 = 400 bits (the mean length is 50); Credits: 8 · 20 = 160 bits (the mean length is 20). The average size of a tuple is 64 + 200 + 13 + 400 + 160 = 837 bits The average size of a relation is to be 500 · 837 = 418, 500 bits 16/59 Estimating Query Costs - Example (Relation Sizes) ACTOR AWARD(Title:varchar(40),Production year:number(4), Role description:varchar(100),Award name:varchar(30), Year of award:number(4),Category:varchar(100),Result:varchar(20)) Estimate the average number of tuples as 40 Estimate the average space for attributes: Title: 200 bits (as before); Production year: 13 bits (as before); Role description: 400 bits (as before); Award name: 160 bits (as before); Year of award: 13 bits (as the domain is number(4)); Category: 8 · 40 = 320 bits (the mean length is 40); Result: 8 · 7 = 56 bits (the mean length is 7). The average size of a tuple is 200 + 13 + 400 + 160 + 13 + 320 + 56 = 1, 162 bits. The average size of a relation is 40 · 1162 = 46, 480 bits. 17/59 Estimating Query Costs - Example (Query Tree) 18/59 Size of Selection Node Selection σϕ is linear in the number n of tuples of the involved relation: Scan the relation one tuple after another (if there is no index); Check for each tuple, whether the condition ϕ is satisfied or not; Keep exactly those tuples satisfying ϕ. Let s be the size of its single relevant node. The size of a selection node σϕ is aϕ · s , where aϕ is the average percentage of tuples satisfying ϕ. 18/59 Size of Selection Node Selection σϕ is linear in the number n of tuples of the involved relation: Scan the relation one tuple after another (if there is no index); Check for each tuple, whether the condition ϕ is satisfied or not; Keep exactly those tuples satisfying ϕ. Let s be the size of its single relevant node. The size of a selection node σϕ is aϕ · s , where aϕ is the average percentage of tuples satisfying ϕ. 19/59 Estimating Query Costs - Example (Selection) For selection σaward country=‘USA’ assume aϕ = 0.4 (i.e., 40% of the movie awards from the USA). Hence, we have: aϕ · s = 0.4 · 7,200 = 2,880. 20/59 Size of Difference Node Let s1 and s2 be the sizes of the two relevant nodes. Again, we need to consider the probability that tuples occur in both relations. The size of a difference node is s1 · (1− p) where (1− p) is the probability that tuples from s1 does not occur in s2. 20/59 Size of Difference Node Let s1 and s2 be the sizes of the two relevant nodes. Again, we need to consider the probability that tuples occur in both relations. The size of a difference node is s1 · (1− p) where (1− p) is the probability that tuples from s1 does not occur in s2. 21/59 Estimating Query Costs - Example (Difference) Since 40% of the movie awards from the USA, the probability of an award to be a US-award is p = 0.4. We have: s1 · (1− p)= 7,200 · (1-0.4) = 4,320. 22/59 Size of Natural Join Node Let s1 and s2 be the sizes of the two relevant nodes, and r1 and r2 be the size of a tuple in these two nodes. s1 r1 and s2 r2 are the estimated number of tuples in these two nodes. The size of a natural join node is s1 r1 · p · s2 r2 · (r1 + r2 − r) , where r is the size of a tuple over the common attributes, and p is the matching probability (for any tuple of the first relevant node to match with any tuples in the second relevant relation). Note that r1 + r2 − r is the size of a tuple after the natural join operation. 22/59 Size of Natural Join Node Let s1 and s2 be the sizes of the two relevant nodes, and r1 and r2 be the size of a tuple in these two nodes. s1 r1 and s2 r2 are the estimated number of tuples in these two nodes. The size of a natural join node is s1 r1 · p · s2 r2 · (r1 + r2 − r) , where r is the size of a tuple over the common attributes, and p is the matching probability (for any tuple of the first relevant node to match with any tuples in the second relevant relation). Note that r1 + r2 − r is the size of a tuple after the natural join operation. 23/59 Estimating Query Costs - Example (Natural Join) For join with ACTOR AWARD assume p = 0.08, i.e., 8% of the actor awards are non-US awards. By s1 r1 · p · s2 r2 · (r1 + r2 − r), we have: 46, 480 1, 162 · 0.08 · 4, 320 480 · (1, 162 + 480− 160) = 42, 682. 24/59 Estimating Query Costs - Example (Natural Join) Assume p = 0.002. By s1 r1 · p · s2 r2 · (r1 + r2 − r), we have: 418, 500 837 · 0.002 · 42, 682 1, 482 · (837 + 1, 482− 200− 400− 13) = 49, 133. 25/59 Estimating Query Costs - Example (Selection) For selection σrole description=’agent’ assume aϕ = 0.05 (i.e., non-US awards for “agent” roles are 5%). Hence, we have: aϕ · s = 0.05 · 49,133 = 2,457. 26/59 Size of Projection Node Projection π{A1, . . . ,An}: Project each tuple to the attributes in {A1, . . . ,An} Eliminate duplicates (Note: SQL does not eliminate tuples unless DISTINCT is used). Let s be the size of its single relevant node with s = n · r for its average number n of tuples and its average size r of a tuple. The size of a projection node πA1,...,An is (1− pi) · s · ri r , where ri is the average size of a tuple over {A1, . . . ,An}, and pi is the probability that two tuples coincide on A1, . . . ,An (i.e., the same values on all attributes A1, . . . ,An). 26/59 Size of Projection Node Projection π{A1, . . . ,An}: Project each tuple to the attributes in {A1, . . . ,An} Eliminate duplicates (Note: SQL does not eliminate tuples unless DISTINCT is used). Let s be the size of its single relevant node with s = n · r for its average number n of tuples and its average size r of a tuple. The size of a projection node πA1,...,An is (1− pi) · s · ri r , where ri is the average size of a tuple over {A1, . . . ,An}, and pi is the probability that two tuples coincide on A1, . . . ,An (i.e., the same values on all attributes A1, . . . ,An). 27/59 Estimating Query Costs - Example (Projection) For projection πtitle, production year assume that there are 1% of duplicates, i.e., pi = 0.01. By (1− pi) · s · ri r , we have (1− 0.01) · 2, 457 213 1706 = 304 28/59 Query Optimisation 29/59 Relational Algebra ⇒ Query Optimisation SQL query RA query 1 RA query 2 RA query n …… Va ri ed p er fo rm an ce Logically equivalent Which RA query should be chosen for a given SQL query? Who choose? Query optimiser! How to choose? – Semantic query optimisation – Rule-based optimisation – Cost-based optimisation 29/59 Relational Algebra ⇒ Query Optimisation SQL query RA query 1 RA query 2 RA query n …… Va ri ed p er fo rm an ce Logically equivalent Which RA query should be chosen for a given SQL query? Who choose? Query optimiser! How to choose? – Semantic query optimisation – Rule-based optimisation – Cost-based optimisation 29/59 Relational Algebra ⇒ Query Optimisation SQL query RA query 1 RA query 2 RA query n …… Va ri ed p er fo rm an ce Logically equivalent Which RA query should be chosen for a given SQL query? Who choose? Query optimiser! How to choose? – Semantic query optimisation – Rule-based optimisation – Cost-based optimisation 29/59 Relational Algebra ⇒ Query Optimisation SQL query RA query 1 RA query 2 RA query n …… Va ri ed p er fo rm an ce Logically equivalent Which RA query should be chosen for a given SQL query? Who choose? Query optimiser! How to choose? – Semantic query optimisation – Rule-based optimisation – Cost-based optimisation 29/59 Relational Algebra ⇒ Query Optimisation SQL query RA query 1 RA query 2 RA query n …… Va ri ed p er fo rm an ce Logically equivalent Which RA query should be chosen for a given SQL query? Who choose? Query optimiser! How to choose? – Semantic query optimisation – Rule-based optimisation – Cost-based optimisation 30/59 Query Optimisation In practice, query optimisers incorporate elements of the following three optimisation approaches: Semantic query optimisation Use application specific semantic knowledge to transform a query into the one with a lower cost (they return the same answer). Rule-based query optimisation Use heuristic rules to transform a relational algebra expression into an equivalent one with a possibly lower cost. Cost-based query optimisation Use a cost model to estimate the costs of plans, and then select the most cost-effective plan. This will not be assessed in our course. 30/59 Query Optimisation In practice, query optimisers incorporate elements of the following three optimisation approaches: Semantic query optimisation Use application specific semantic knowledge to transform a query into the one with a lower cost (they return the same answer). Rule-based query optimisation Use heuristic rules to transform a relational algebra expression into an equivalent one with a possibly lower cost. Cost-based query optimisation Use a cost model to estimate the costs of plans, and then select the most cost-effective plan. This will not be assessed in our course. 30/59 Query Optimisation In practice, query optimisers incorporate elements of the following three optimisation approaches: Semantic query optimisation Use application specific semantic knowledge to transform a query into the one with a lower cost (they return the same answer). Rule-based query optimisation Use heuristic rules to transform a relational algebra expression into an equivalent one with a possibly lower cost. Cost-based query optimisation Use a cost model to estimate the costs of plans, and then select the most cost-effective plan. This will not be assessed in our course. 30/59 Query Optimisation In practice, query optimisers incorporate elements of the following three optimisation approaches: Semantic query optimisation Use application specific semantic knowledge to transform a query into the one with a lower cost (they return the same answer). Rule-based query optimisation Use heuristic rules to transform a relational algebra expression into an equivalent one with a possibly lower cost. Cost-based query optimisation Use a cost model to estimate the costs of plans, and then select the most cost-effective plan. This will not be assessed in our course. 31/59 Semantic Query Optimisation Example: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) WRITER(id, title, production year, credits) where [id] ⊆ PERSON[id] [title, production year] ⊆ MOVIE [title, production year] List the ids of the writers who have written movies produced in 2000. πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE) πidσproduction year=2000(WRITER ./ PERSON) πidσproduction year=2000(WRITER ./ MOVIE) πidσproduction year=2000WRITER ←− the optimised RA 31/59 Semantic Query Optimisation Example: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) WRITER(id, title, production year, credits) where [id] ⊆ PERSON[id] [title, production year] ⊆ MOVIE [title, production year] List the ids of the writers who have written movies produced in 2000. πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE) πidσproduction year=2000(WRITER ./ PERSON) πidσproduction year=2000(WRITER ./ MOVIE) πidσproduction year=2000WRITER ←− the optimised RA 31/59 Semantic Query Optimisation Example: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) WRITER(id, title, production year, credits) where [id] ⊆ PERSON[id] [title, production year] ⊆ MOVIE [title, production year] List the ids of the writers who have written movies produced in 2000. πidσproduction year=2000(WRITER ./ PERSON ./ MOVIE) πidσproduction year=2000(WRITER ./ PERSON) πidσproduction year=2000(WRITER ./ MOVIE) πidσproduction year=2000WRITER ←− the optimised RA 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 32/59 Rule-based Query Optimisation A rule-based optimisation transforms the RA expression by using a set of heuristic rules that typically improve the execution performance. Key ideas: apply the most restrictive operation before other operations, which can reduce the size of intermediate results: Push-down selection: Apply as early as possible to reduce the number of tuples; Push-down projection: Apply as early as possible to reduce the number of attributes. But we must ensure that the resulting query tree gives the same result as the original query tree, i.e., the equivalence of RA expressions. 33/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); πX (πY (R)) ≡ πX (R) if X ⊆ Y ; σϕ(R1 × R2) ≡ R1 ./ϕ R2; σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1; 33/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); πX (πY (R)) ≡ πX (R) if X ⊆ Y ; σϕ(R1 × R2) ≡ R1 ./ϕ R2; σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1; 34/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY ) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 STUDY UID CourseNo Hours 111 COMP2400 120 111 BUSN2011 110 111 ECON2102 120 STUDY UID CourseNo Hours 111 COMP2400 120 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) STUDY UID CourseNo Hours 111 COMP2400 120 34/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY ) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 STUDY UID CourseNo Hours 111 COMP2400 120 111 BUSN2011 110 111 ECON2102 120 STUDY UID CourseNo Hours 111 COMP2400 120 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) STUDY UID CourseNo Hours 111 COMP2400 120 34/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY ) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 STUDY UID CourseNo Hours 111 COMP2400 120 111 BUSN2011 110 111 ECON2102 120 STUDY UID CourseNo Hours 111 COMP2400 120 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) STUDY UID CourseNo Hours 111 COMP2400 120 34/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY ) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 STUDY UID CourseNo Hours 111 COMP2400 120 111 BUSN2011 110 111 ECON2102 120 STUDY UID CourseNo Hours 111 COMP2400 120 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) STUDY UID CourseNo Hours 111 COMP2400 120 34/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(σψ(R)) ≡ σϕ∧ψ(R); σCourseNo=‘COMP2400′ (σUID=111(STUDY )) v.s. σ(Course=‘COMP2400′)∧(UID=111)(STUDY ) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 STUDY UID CourseNo Hours 111 COMP2400 120 111 BUSN2011 110 111 ECON2102 120 STUDY UID CourseNo Hours 111 COMP2400 120 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) STUDY UID CourseNo Hours 111 COMP2400 120 35/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. πX (πY (R)) ≡ πX (R) if X ⊆ Y ; πUID(πUID,CourseNo(Study)) v.s. πUID(Study) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 UID CourseNo 111 COMP2400 222 COMP2400 333 STAT2001 111 BUSN2011 111 ECON2102 333 BUSN2011 UID 111 222 333 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) UID 111 222 333 35/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. πX (πY (R)) ≡ πX (R) if X ⊆ Y ; πUID(πUID,CourseNo(Study)) v.s. πUID(Study) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 UID CourseNo 111 COMP2400 222 COMP2400 333 STAT2001 111 BUSN2011 111 ECON2102 333 BUSN2011 UID 111 222 333 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) UID 111 222 333 35/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. πX (πY (R)) ≡ πX (R) if X ⊆ Y ; πUID(πUID,CourseNo(Study)) v.s. πUID(Study) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 UID CourseNo 111 COMP2400 222 COMP2400 333 STAT2001 111 BUSN2011 111 ECON2102 333 BUSN2011 UID 111 222 333 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) UID 111 222 333 35/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. πX (πY (R)) ≡ πX (R) if X ⊆ Y ; πUID(πUID,CourseNo(Study)) v.s. πUID(Study) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 UID CourseNo 111 COMP2400 222 COMP2400 333 STAT2001 111 BUSN2011 111 ECON2102 333 BUSN2011 UID 111 222 333 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) UID 111 222 333 35/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. πX (πY (R)) ≡ πX (R) if X ⊆ Y ; πUID(πUID,CourseNo(Study)) v.s. πUID(Study) STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 UID CourseNo 111 COMP2400 222 COMP2400 333 STAT2001 111 BUSN2011 111 ECON2102 333 BUSN2011 UID 111 222 333 STUDY UID CourseNo Hours 111 COMP2400 120 222 COMP2400 115 333 STAT2001 120 111 BUSN2011 110 111 ECON2102 120 333 BUSN2011 130 ⇒ (without any intermediate relation) UID 111 222 333 36/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 36/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 36/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 36/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 36/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 37/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 σCourse.No=Enrol.CoureNo(Course × Enrol) COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 111 BUSN2011 2016 S1 active COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active BUSN2011 Management Accounting 6 222 COMP2400 2016 S1 active BUSN2011 Management Accounting 6 111 COMP2400 2016 S2 active 38/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 Course ./Course.No=Enrol.CourseNo Enrol COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active Inner Join on Course.No=Enrol.CourseNo (no intermediate Cartesian product) No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active 38/59 Rule-based Optimisation Can they be executed in one go? ↪→ Merging RA operators. σϕ(R1 × R2) ≡ R1 ./ϕ R2 Course ./Course.No=Enrol.CourseNo Enrol COURSE No Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active Inner Join on Course.No=Enrol.CourseNo (no intermediate Cartesian product) No Cname Unit StudentID CourseNo Semester Status COMP2400 Relational Databases 6 222 COMP2400 2016 S1 active COMP2400 Relational Databases 6 111 COMP2400 2016 S2 active BUSN2011 Management Accounting 6 111 BUSN2011 2016 S1 active 39/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σϕ1∧ϕ2(R1 ./ R2) ≡ σϕ1(R1) ./ σϕ2(R2), if ϕ1 contains only attributes in R1 and ϕ2 contains only attributes in R2; πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition contains attributes not in X , where Xi contains attributes both in Ri and X , and ones both in R1 and R2; πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , where Xi contains attributes both in Ri and X , and ones both in R1 and R2; 39/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σϕ1∧ϕ2(R1 ./ R2) ≡ σϕ1(R1) ./ σϕ2(R2), if ϕ1 contains only attributes in R1 and ϕ2 contains only attributes in R2; πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition contains attributes not in X , where Xi contains attributes both in Ri and X , and ones both in R1 and R2; πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , where Xi contains attributes both in Ri and X , and ones both in R1 and R2; 40/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNoNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 40/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNoNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 40/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNoNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 40/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNoNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 41/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course) ./ Enrol COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 COURSE CourseNo Cname Unit BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 41/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course) ./ Enrol COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 COURSE CourseNo Cname Unit BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 41/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course) ./ Enrol COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 COURSE CourseNo Cname Unit BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 41/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course) ./ Enrol COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 COURSE CourseNo Cname Unit BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 41/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1; σCname=‘ManagementAccounting′(Course) ./ Enrol COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 COURSE CourseNo Cname Unit BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status BUSN2011 Management Accounting 6 111 2016 S1 active 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 42/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 43/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), if the join condition involves only attributes in X , how could we derive X1 and X2? πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 44/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active CourseNo Cname Unit StudentID Semester Status COMP2400 Relational Databases 6 222 2016 S1 active COMP2400 Relational Databases 6 111 2016 S2 active BUSN2011 Management Accounting 6 111 2016 S1 active Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 45/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) = πCname(Course) ./ πStudentID(Enrol)? COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCnameCOURSE Cname Relational Management ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πStudentID ENROL StudentID 111 222 Is πCname(Course) ./ πStudentID(Enrol) our desired result? No. πCname,StudentID(Course ./ Enrol) 6= πCname(Course) ./ πStudentID(Enrol) 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 46/59 Rule-based Optimisation Can join be executed last? ↪→ Push select/project before join. πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), if the join condition involves attributes outside X , how could we derive X1 and X2? πCname,StudentID(Course ./ Enrol) πCname,StudentID(πCourseNo,Cname(Course) ./ πCourseNo,StudentID(Enrol)) COURSE CourseNo Cname Unit COMP2400 Relational Databases 6 BUSN2011 Management Accounting 6 πCourseNo,CnameCOURSE CourseNo Cname COMP2400 Relational Databases BUSN2011 Management Accounting ENROL StudentID CourseNo Semester Status 111 BUSN2011 2016 S1 active 222 COMP2400 2016 S1 active 111 COMP2400 2016 S2 active πCourseNo,StudentID ENROL StudentID CourseNo 111 BUSN2011 222 COMP2400 111 COMP2400 CourseNo Cname StudentID COMP2400 Relational Databases 222 COMP2400 Relational Databases 111 BUSN2011 Management Accounting 111 Cname StudentID Relational Databases 222 Relational Databases 111 Management Accounting 111 47/59 Heuristic Rules and Query Trees (1) σϕ(σψ(R)) ≡ σϕ∧ψ(R); (2) πX (πY (R)) ≡ πX (R) if X ⊆ Y ; 47/59 Heuristic Rules and Query Trees (1) σϕ(σψ(R)) ≡ σϕ∧ψ(R); (2) πX (πY (R)) ≡ πX (R) if X ⊆ Y ; 48/59 Heuristic Rules (3) σϕ(R1 × R2) ≡ R1 ./ϕ R2 (4) σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1 48/59 Heuristic Rules (3) σϕ(R1 × R2) ≡ R1 ./ϕ R2 (4) σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1 49/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧ first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE)))) Question: Can we apply the following rule to optimise the query? σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1 and ϕ2 contains only attributes in R2 We would have πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 49/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧ first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE)))) Question: Can we apply the following rule to optimise the query? σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1 and ϕ2 contains only attributes in R2 We would have πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 49/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’∧ first name=‘Tom’∧last name=‘Cruise’(MOVIE × (PERSON ./ ROLE)))) Question: Can we apply the following rule to optimise the query? σϕ1∧ϕ2(R1 × R2) ≡ σϕ1(R1)× σϕ2(R2) if ϕ1 contains only attributes in R1 and ϕ2 contains only attributes in R2 We would have πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 50/59 Push-down Selection – Example 51/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2? We would have πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 51/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2? We would have πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 51/59 Push-down Selection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σtitle=mtitle∧production year=mprod year (σmajor genre=‘war ’(MOVIE) ×σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) Can we apply σϕ(R1 × R2) ≡ R1 ./ϕ R2? We would have πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) 52/59 Push-down Selection – Example 53/59 Push-down Projection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) Question: Can we apply the following rule to optimise the query? πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), where Xi contains attributes both in Ri and X , and ones both in R1 and R2 53/59 Push-down Projection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) Question: Can we apply the following rule to optimise the query? πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), where Xi contains attributes both in Ri and X , and ones both in R1 and R2 54/59 Push-down Projection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) We would have: πtitle,production year (πtitle,production year (σmajor genre=‘war ’(MOVIE)) ./title=mtitle∧production year=mprod year (πmtitle,mprod year (σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))) We further apply some rules to optimise the query ... 54/59 Push-down Projection – Example Given the relation schemas: PERSON(id, first name, last name, year born) MOVIE(title, production year, country, run time, major genre) ROLE(id, mtitle, mprod year, description, credits) Query: List all war movies that are performed by ‘Tom Cruise’. πtitle,production year (σmajor genre=‘war ’(MOVIE) ./title=mtitle∧production year=mprod year ( σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE))) We would have: πtitle,production year (πtitle,production year (σmajor genre=‘war ’(MOVIE)) ./title=mtitle∧production year=mprod year (πmtitle,mprod year (σfirst name=‘Tom’∧last name=‘Cruise’(PERSON ./ ROLE)))) We further apply some rules to optimise the query ... 55/59 Push-down Projection – Example 56/59 Cost-based Optimisation (not assessed) Consider CHARTS={Rank, Artist, Song} with 100 tuples and 3 attributes. Rank Artist Song 1 Chingy Right Thurr 2 Scribe Stand up 3 Aguilera and Kim Can’t hold us down 4 Evanescence Going under 5 Justin Timberlake Senorita 6 Brooke Fraser Better 7 Black Eyed Peas Where is the love? ... ... ... ... ... ... Compare two strategies of evaluating “Who is top of the pops?”: σRank=1(πRank, Artist(CHARTS)) πRank, Artist(σRank=1(CHARTS)) Selection before Projection is preferred. 56/59 Cost-based Optimisation (not assessed) Consider CHARTS={Rank, Artist, Song} with 100 tuples and 3 attributes. Rank Artist Song 1 Chingy Right Thurr 2 Scribe Stand up 3 Aguilera and Kim Can’t hold us down 4 Evanescence Going under 5 Justin Timberlake Senorita 6 Brooke Fraser Better 7 Black Eyed Peas Where is the love? ... ... ... ... ... ... Compare two strategies of evaluating “Who is top of the pops?”: σRank=1(πRank, Artist(CHARTS)) πRank, Artist(σRank=1(CHARTS)) Selection before Projection is preferred. 57/59 Cost-based Optimisation (not assessed) Consider CHARTS={Rank, Artist, ...} with 100 tuples and 50 attributes: Rank Artist Song ... ... ... 1 Chingy Right Thurr ... ... ... 2 Scribe Stand up ... ... ... 3 Aguilera and Kim Can’t hold us down ... ... ... 4 Evanescence Going under ... ... ... 5 Justin Timberlake Senorita ... ... ... 6 Brooke Fraser Better ... ... ... 7 Black Eyed Peas Where is the love? .. ... ... ... ... ... ... ... ... Compare two strategies of evaluating? σRank > 10(πRank, Artist(CHARTS))
πRank, Artist(σRank > 10(CHARTS))

Projection before Selection is preferred.

57/59

Cost-based Optimisation (not assessed)

Consider CHARTS={Rank, Artist, …} with 100 tuples and 50 attributes:

Rank Artist Song … … …
1 Chingy Right Thurr … … …
2 Scribe Stand up … … …
3 Aguilera and Kim Can’t hold us down … … …
4 Evanescence Going under … … …
5 Justin Timberlake Senorita … … …
6 Brooke Fraser Better … … …
7 Black Eyed Peas Where is the love? .. … …
… … … … … …

Compare two strategies of evaluating?

σRank > 10(πRank, Artist(CHARTS))
πRank, Artist(σRank > 10(CHARTS))

Projection before Selection is preferred.

58/59

Query Optimisation

SQL query

RA query 1

RA query 2

RA query n
……

Va
rie

d
pe

rf
or

m
an

ce

Logically equivalent

SQL query RA query
……

Query Optimiser

Heuristic
rules

Cost
model

Database statistics

RA query 4
RA query n

RA query 2
RA query 3

RA query 1

……

Execution planInput

Trade-off:

Time for executing a RA query vs Time for finding a better RA query

59/59

(credit cookie) memorising vs understanding