CS计算机代考程序代写 SQL data structure algorithm CS3402

CS3402
1
CS3402 Chapter 12 Query Optimization
Semester B, 2020/2021

Overview
 Query optimization:
The process of choosing a suitable execution strategy for
CS3402
2
processing a query
It may not be optimal but is a reasonably efficient strategy
 A query, e.g., a SQL, first be scanned, parsed and validated The scanner identifies the query tokens, e.g., the keywords,
attribute names and relation names
The parser checks the query syntax
The validation checks that all attributes and relation names are valid

The Query Go Through…
CS3402
3
3

Using Heuristics in Query Optimization
 Heuristic rule may be applied to modify the internal representation of a query (e.g., a query tree) to improve performance
 Process for heuristics optimization:
1. The parser generates an initial internal representation
2. Apply heuristics rules to optimize the internal representation
3. A query execution plan is generated to execute groups of operations based on the access paths available on the files involved in the query
CS3402 4

Using Heuristics in Query Optimization
 The main heuristic is to apply the operations that reduce the size of intermediate results first
 E.g., SELECT and PROJECT before JOIN or other binary operations
 SELECT and PROJECT operations reduce the size of a file
 The size of the resulting file from a binary operation (e.g., JOIN) is usually a multiplicative function of the sizes of the input files
CS3402 5

Using Heuristics in Query Optimization
 Example:
σ Salary>30000 ( EMPLOYEE SSN=MGRSSN DEPT )
(σ Salary>30000 (EMPLOYEE)) SSN=MGRSSN DEPT
Q. Which one is better if most of the employees in the company has
salary below 30000?
CS3402 6

Example
 Example query: For every project located in “Stafford”, retrieve the project number, the controlling department number, and the manager’s last name, address and birthdate
 Relational algebra expression:
 SQL query:
CS3402 7

Query Tree
 A query tree is a tree data structure that corresponds to a relational algebra expression
 leaf nodes represent the input relations.
 internal tree nodes represent the relational algebra operations of the expression.
 order of execution of operations: starts at the leaf nodes and ends at the root node.
CS3402
8
Fig: Query tree corresponding to the relational algebra expression for Q2.

Heuristic Optimization of Query Trees
 In general, many different relational algebra expressions can be semantically equivalent (i.e., they can represent the same query and produce the same results)
 Hence many different query trees are semantically equivalent.
 The query parser will typically generate a standard initial query tree to
correspond to an SQL query, without doing any optimization.
CS3402
9
Fig: Initial (canonical) query tree for SQL query Q2.

Heuristic Optimization of Query Trees
 Such a canonical query tree is very inefficient if executed directly, because of the CARTESIAN PRODUCT (×) operations.
 E.g. PROJECT, DEPARTMENT, and EMPLOYEE relations had record sizes of 100, 50, and 150 bytes and contained 100, 20, and 5,000 tuples, respectively =>
 CARTESIAN PRODUCT contains 10 million tuples of record size 300 bytes each.  This canonical query tree in a simple standard form that can be easily
created from the SQL query. But it will never be executed.
 The heuristic query optimizer will transform this initial query tree into
an equivalent final query tree that is efficient to execute.
CS3402
10
Fig: Initial (canonical) query tree for SQL query Q2.

Transformation Rules (Selections)

The optimizer must include rules for equivalence among extended relational algebra expressions that can be applied to transform the initial tree into the final, optimized query tree.

There are many rules for transforming relational algebra operations into equivalent ones.
1.
Cascade of σ : A conjunctive selection condition can be broken up into a cascade (sequence) of individual σ operations:
2.
Commutativity of σ : The σ operation is commutative:
CS3402 11

Transformation Rules (Projections)
3. Cascade of π : In a sequence of π operations, all but the last one can be ignored:
CS3402 12

Transformation Rules (Joins & Cartesian Products)
5. Commutativity of Joins & Cartesian Products:
9. Associativity of Joins & Cartesian Products: × × ≡(×)×
⋈ ⋈ ≡(⋈)⋈
CS3402 13

Transformation Rules (Commuting σ)
4. Commuting σ with π : If the selection condition c involves only the attributes A1, …, An in the projection list, the two operations can be commuted:
6. Commuting σ with (or X): If all the attributes in the selection condition c involve only the attributes of one of the relations being joined—say, R—the two operations can be commuted as follows:
Alternatively, if the selection condition c can be written as (c1 AND c2), where condition c1 involves only the attributes of R and condition c2 involves only the attributes of S, the operations commute as follows:
CS3402 14

Transformation Rules (Commuting π)
7. Commuting π with (or X): Suppose:
 Projection list is L = {A1, …, An, B1, …, Bm},
A1, …, An are attributes of R
B1, …, Bm are attributes of S.
 If the join condition c involves only attributes in L, the two
operations can be commuted as follows:
CS3402 15

Transformation Rules (Others)
CS3402 16

Transformation Rules (Others)
CS3402 17

Outline of a Heuristic Algebraic Optimization Algorithm
(Step i) Using Rule 1, break up any SELECT operations with conjunctive conditions into a cascade of SELECT operations
This permits a greater degree of freedom in moving SELECT operations down different branches of the tree
(Step ii) Using Rules 2, 4, 6, and 10, 13, 14 (commutativity of SELECT with other operations), move each SELECT operation as far down the query tree as is permitted by the attributes involved in the select condition.
CS3402 18

Example
CS3402
19
19
Applying Step i & ii of algorithm.

Outline of a Heuristic Algebraic Optimization Algorithm
(Step iii) Using Rules 5 and 9 concerning commutativity and associativity of binary operations, rearrange the leaf nodes of the tree so that the leaf node relations with the most restrictive SELECT operations are executed first in the query tree representation.
 Most restrictive SELECT can mean either the ones that produce a relation with the fewest tuples or with the smallest absolute size.
CS3402 20

Example
(b)
Step iii of algo: Applying the more restrictive SELECT operation first.
CS3402
21
21
(c)

Outline of a Heuristic Algebraic Optimization Algorithm
(Step iv) Using Rule 12, combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation whose condition represents a join condition into a JOIN operation.
CS3402 22

Example
(c)
CS3402
23
23
Step iv of algo: Replacing (CARTESIAN PRODUCT and SELECT) with JOIN operations.

Outline of a Heuristic Algebraic Optimization Algorithm
(Step v) Using Rules 3, 4, 7, and 11 (cascading of PROJECT and the commuting of PROJECT with other operations)
 break down and move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as needed.
 Only those attributes needed in the query result and in subsequent operations in the query tree should be kept after each PROJECT operation.
(Step vi) Identify subtrees that represent groups of operations that can be executed by a single algorithm
CS3402 24

Example
CS3402
25
25
Step v of algo: Moving PROJECT operations down the query tree.

Summary of Heuristic Optimization
 The main heuristic is to first apply the operations that reduce the size of intermediate results:
CS3402
26
 This includes performing SELECT operations as early as possible to reduce the number of tuples, and
 Performing PROJECT operations as early as possible to reduce the number of attributes
 This is done by moving SELECT and PROJECT operations as far down the tree as possible
 In addition, the SELECT and JOIN operations that are most restrictive—that is, result in relations with the fewest tuples or with the smallest absolute size—should be executed before other similar operations.
 This is done by reordering the leaf nodes of the tree among themselves and adjusting the rest of the tree appropriately

References
 7e
 Chapter 19
CS3402 27