CS3402
1
CS3402 Chapter 12 Query Optimization
Semester B, 2020/2021
Overview
Query optimization:
The process of choosing a suitable execution strategy for
CS3402
2
processing a query
It may not be optimal but is a reasonably efficient strategy
A query, e.g., a SQL, first be scanned, parsed and validated The scanner identifies the query tokens, e.g., the keywords,
attribute names and relation names
The parser checks the query syntax
The validation checks that all attributes and relation names are valid
The Query Go Through…
CS3402
3
3
Using Heuristics in Query Optimization
Heuristic rule may be applied to modify the internal representation of a query (e.g., a query tree) to improve performance
Process for heuristics optimization:
1. The parser generates an initial internal representation
2. Apply heuristics rules to optimize the internal representation
3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query
CS3402 4
Using Heuristics in Query Optimization
The main heuristic is to apply the operations that reduce the size of intermediate results first
E.g., SELECT and PROJECT before JOIN or other binary operations
SELECT and PROJECT operations reduce the size of a file
The size of the resulting file from a binary operation (e.g., JOIN) is usually a multiplicative function of the sizes of the input files
CS3402 5
Using Heuristics in Query Optimization
Example:
σ Salary>30000 ( EMPLOYEE SSN=MGRSSN DEPT )
(σ Salary>30000 (EMPLOYEE)) SSN=MGRSSN DEPT
Q. Which one is better if most of the employees in the company has
salary below 30000?
CS3402 6
Example
Example query: For every project located in “Stafford”, retrieve the project number, the controlling department number, and the manager’s last name, address and birthdate
Relational algebra expression:
SQL query:
CS3402 7
Query Tree
A query tree is a tree data structure that corresponds to a relational algebra expression
leaf nodes represent the input relations.
internal tree nodes represent the relational algebra operations of the expression.
order of execution of operations: starts at the leaf nodes and ends at the root node.
CS3402
8
Fig: Query tree corresponding to the relational algebra expression for Q2.
Heuristic Optimization of Query Trees
In general, many different relational algebra expressions can be semantically equivalent (i.e., they can represent the same query and produce the same results)
Hence many different query trees are semantically equivalent.
The query parser will typically generate a standard initial query tree to
correspond to an SQL query, without doing any optimization.
CS3402
9
Fig: Initial (canonical) query tree for SQL query Q2.
Heuristic Optimization of Query Trees
Such a canonical query tree is very inefficient if executed directly, because of the CARTESIAN PRODUCT (×) operations.
E.g. PROJECT, DEPARTMENT, and EMPLOYEE relations had record sizes of 100, 50, and 150 bytes and contained 100, 20, and 5,000 tuples, respectively =>
CARTESIAN PRODUCT contains 10 million tuples of record size 300 bytes each. This canonical query tree in a simple standard form that can be easily
created from the SQL query. But it will never be executed.
The heuristic query optimizer will transform this initial query tree into
an equivalent final query tree that is efficient to execute.
CS3402
10
Fig: Initial (canonical) query tree for SQL query Q2.
Transformation Rules (Selections)
The optimizer must include rules for equivalence among extended relational algebra expressions that can be applied to transform the initial tree into the final, optimized query tree.
There are many rules for transforming relational algebra operations into equivalent ones.
1.
Cascade of σ : A conjunctive selection condition can be broken up into a cascade (sequence) of individual σ operations:
2.
Commutativity of σ : The σ operation is commutative:
CS3402 11
Transformation Rules (Projections)
3. Cascade of π : In a sequence of π operations, all but the last one can be ignored:
CS3402 12
Transformation Rules (Joins & Cartesian Products)
5. Commutativity of Joins & Cartesian Products:
9. Associativity of Joins & Cartesian Products: × × ≡(×)×
⋈ ⋈ ≡(⋈)⋈
CS3402 13
Transformation Rules (Commuting σ)
4. Commuting σ with π : If the selection condition c involves only the attributes A1, …, An in the projection list, the two operations can be commuted:
6. Commuting σ with (or X): If all the attributes in the selection condition c involve only the attributes of one of the relations being joined—say, R—the two operations can be commuted as follows:
Alternatively, if the selection condition c can be written as (c1 AND c2), where condition c1 involves only the attributes of R and condition c2 involves only the attributes of S, the operations commute as follows:
CS3402 14
Transformation Rules (Commuting π)
7. Commuting π with (or X): Suppose:
Projection list is L = {A1, …, An, B1, …, Bm},
A1, …, An are attributes of R
B1, …, Bm are attributes of S.
If the join condition c involves only attributes in L, the two
operations can be commuted as follows:
CS3402 15
Transformation Rules (Others)
CS3402 16
Transformation Rules (Others)
CS3402 17
Outline of a Heuristic Algebraic Optimization Algorithm
(Step i) Using Rule 1, break up any SELECT operations with conjunctive conditions into a cascade of SELECT operations
This permits a greater degree of freedom in moving SELECT operations down different branches of the tree
(Step ii) Using Rules 2, 4, 6, and 10, 13, 14 (commutativity of SELECT with other operations), move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition.
CS3402 18
Example
CS3402
19
19
Applying Step i & ii of algorithm.
Outline of a Heuristic Algebraic Optimization Algorithm
(Step iii) Using Rules 5 and 9 concerning commutativity and associativity of binary operations, rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive SELECT operations are executed first in the query tree representation.
Most restrictive SELECT can mean either the ones that produce a relation with the fewest tuples or with the smallest absolute size.
CS3402 20
Example
(b)
Step iii of algo: Applying the more restrictive SELECT operation first.
CS3402
21
21
(c)
Outline of a Heuristic Algebraic Optimization Algorithm
(Step iv) Using Rule 12, combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation whose condition represents a join condition into a JOIN operation.
CS3402 22
Example
(c)
CS3402
23
23
Step iv of algo: Replacing (CARTESIAN PRODUCT and SELECT) with JOIN operations.
Outline of a Heuristic Algebraic Optimization Algorithm
(Step v) Using Rules 3, 4, 7, and 11 (cascading of PROJECT and the commuting of PROJECT with other operations)
break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed.
Only those attributes needed in the query result and in subsequent operations in the query tree should be kept after each PROJECT operation.
(Step vi) Identify subtrees that represent groups of operations that can be executed by a single algorithm
CS3402 24
Example
CS3402
25
25
Step v of algo: Moving PROJECT operations down the query tree.
Summary of Heuristic Optimization
The main heuristic is to first apply the operations that reduce the size of intermediate results:
CS3402
26
This includes performing SELECT operations as early as possible to reduce the number of tuples, and
Performing PROJECT operations as early as possible to reduce the number of attributes
This is done by moving SELECT and PROJECT operations as far down the tree as possible
In addition, the SELECT and JOIN operations that are most restrictive—that is, result in relations with the fewest tuples or with the smallest absolute size—should be executed before other similar operations.
This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately
References
7e
Chapter 19
CS3402 27