Chapter 7: Relational Database Design
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Basic Steps in Query Processing (Cont.)
Parsing and translation
Parser checks syntax, verifies relations
Translator translates the query (e.g., SQL) into its internal form (e.g., relational algebra)
A SQL query can be translated into several relational algebra expressions.
E.g., select salary from instructor where salary < 75000;
salary75000(salary(instructor)), or
salary(salary75000(instructor))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Basic Steps in Query Processing (Cont.)
Each relational algebra operation can be evaluated using one of several different algorithms
E.g., can search every tuple in instructor to find tuples with salary < 75000, or
can use an index on salary to find instructors with salary < 75000
Correspondingly, a relational algebra expression can be evaluated in many ways.
Annotated expression specifying detailed evaluation strategy is called an evaluation plan (or execution plan).
Evaluation
The query execution engine takes a query evaluation plan, executes that plan, and returns the answers to the query.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Basic Steps in Query Processing (Cont.)
Query Optimization: Amongst all equivalent evaluation plans, choose the one with lowest cost.
Cost is estimated using statistical information from the
database catalog
e.g. number of tuples in each relation, size of tuples, etc.
In this lecture, we study
how to measure query costs
algorithms for evaluating relational algebra operations
how to combine algorithms for individual operations in order to evaluate a complete expression
how to optimize queries, that is, how to find an evaluation plan with lowest estimated cost
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Measures of Query Cost
Cost can be measured based on response time, i.e., total elapsed time for answering query
Many factors contribute to time cost
disk access, CPU time, …
Typically disk access is the predominant cost, and is also relatively easy to estimate.
Real systems do take CPU cost into account
Disk cost can be estimated as number of block transfers from disk and number of seeks
tT – time to transfer one block
tS – time for one seek
Cost for b block transfers plus S seeks
b * tT + S * tS
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Measures of Query Cost (Cont.)
We do not include cost of writing output to disk in our cost formulae
The output of an operation may be sent to the next operation without being written to disk.
Worst case estimates
No data is initially in buffer, i.e., data must be read from disk initially
Only the minimum amount of memory needed for the operation is available
Several algorithms can reduce disk I/O by using extra buffer space
Amount of real memory available to buffer depends on other concurrent queries and OS processes, known only during execution
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Selection Operation
File scan
Algorithm A1 (linear search). Scan each file block (stored contiguously) and test all records to see whether they satisfy the selection condition.
Cost estimate = tS + br * tT
br denotes number of blocks in relation r
If selection is on a key attribute, can stop on finding record
average cost = tS + (br /2)* tT
Linear search can be applied regardless of
selection condition, or
ordering of records in the file, or
availability of indices
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Selections Using Indices
Index scan – search algorithms that use an index (B+-tree)
selection condition must be on search-key of index.
A2 (primary index, equality on key). Retrieve a single record that satisfies the corresponding equality condition
Cost = (hi + 1) * (tT + tS)
hi = height of the B+-tree index
traverse the height of the tree + one I/O to fetch the record; each requires a seek and a block transfer
A3 (primary index, equality on nonkey). Retrieve multiple records on consecutive blocks.
Cost = hi * (tT + tS) + tS + b * tT
b = number of blocks containing matching records
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Selections Using Indices (Cont.)
A4 (secondary index, equality).
Retrieve a single record if the search-key is a candidate key
Cost = (hi + 1) * (tT + tS)
Retrieve multiple records if search-key is not a candidate key
each of n matching records may be on a different block
Cost = (hi + n) * (tT + tS)
Can be very expensive!
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Selections Involving Comparisons
Can implement selections of the form AV (r) or A V(r) by using
a linear file scan,
or by using indices in the following ways:
A5 (primary index, comparison). (Relation is sorted on A)
For A V(r), use index to find first tuple V and scan relation sequentially from there (cost estimate is identical to A3)
For AV (r), use file scan till first tuple > V; do not use index
A6 (secondary index, comparison).
For A V(r), use index to find first index entry V and scan leaf index nodes sequentially from there, to find pointers to records (cost estimate is identical to A4, equality on nonkey)
For AV (r), just scan leaf index nodes finding pointers to records, till first entry > V
In either case, retrieve records that are pointed to
requires an I/O for each record
linear file scan may be cheaper
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Implementation of Complex Selections
Conjunction: 1 2. . . n(r)
A7 (conjunctive selection using one index).
Select a combination of i and algorithms A1 through A6 that results in the least cost for i (r).
Test other conditions on tuple after fetching it into memory buffer.
A8 (conjunctive selection using composite index).
Use appropriate composite (multiple-key) index if available.
A9 (conjunctive selection by intersection of identifiers).
Requires indices with record pointers.
Use corresponding index for each condition, and take intersection of all the obtained sets of record pointers.
Then fetch records from file.
If some conditions do not have appropriate indices, apply test in memory.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
14
Algorithms for Complex Selections
Disjunction:1 2 . . . n (r).
A10 (disjunctive selection by union of identifiers).
Applicable if all conditions have available indices.
Otherwise use linear scan.
Use corresponding index for each condition, and take union of all the obtained sets of record pointers.
Then fetch records from file.
Negation: (r)
Use linear scan on file.
If very few records satisfy , and an index is applicable to
Find satisfying records using index and fetch from file.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
15
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Sorting
Sorting is important because
SQL queries can specify that the output be sorted, and,
for query processing, several of the relational operations, such as joins, can be implemented efficiently if the input relations are first sorted.
For relations that fit in memory, techniques like quicksort can be used. For relations that don’t fit in memory, external
sort-merge is a good choice.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
External Sort-Merge
Create sorted runs. Let i be 0 initially.
Repeatedly do the following till the end of the relation:
(a) Read M blocks of relation into memory
(b) Sort the in-memory blocks
(c) Write sorted data to run Ri; increment i.
Let the final value of i be N
Merge the runs (next slide)…..
Let M denote memory size (in pages=blocks).
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
External Sort-Merge (Cont.)
Merge the runs (N-way merge). We assume (for now) that N < M.
Use N blocks of memory to buffer input runs, and 1 block to buffer output. Read the first block of each run into its buffer page
repeat
Select the first record (in sort order) among all buffer pages
Write the record to the output buffer. If the output buffer is full, write it to disk.
Delete the record from its input buffer page.
If the buffer page becomes empty then
read the next block (if any) of the run into the buffer.
until all input buffer pages are empty
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
External Sort-Merge (Cont.)
If N M, several merge passes are required.
In each pass, contiguous groups of M - 1 runs are merged to get a single run for the next pass.
A pass reduces the number of runs by a factor of M -1, and creates runs longer by the same factor.
E.g. If M=11, and there are 90 runs, one pass reduces the number of runs to 9, each 10 times the size of the initial runs
Repeated passes are performed till all runs have been merged into one.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Example: External Sorting Using Sort-Merge
memory size: 3 blocks
block size: one tuple
For merging
two input blocks
one output block
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
External Merge Sort (Cont.)
Cost analysis:
Initial number of runs: br / M
Using bb buffer blocks per run during merge, i.e., read/write bb blocks at a time, can merge M / bb - 1 runs in one pass
Total number of merge passes required: logM / bb–1 (br / M).
Block transfers for initial run creation and in each pass is 2br
for final pass, we don’t count write cost
Reminder: ignore final write cost
Thus total number of block transfers for external sorting:
br ( 2 log M / bb–1(br / M) + 1)
What is the number of block transfers in the example shown on the last slide?
Seeks: next slide
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
External Merge Sort (Cont.)
Cost of seeks
During run generation: one seek to read each run and one seek to write each run
2 br / M
During the merge phase
Need 2 br / bb seeks for each merge pass
except the final one which does not require a write
Total number of seeks:
2 br / M + br / bb (2 log M / bb–1(br / M) -1)
What is the cost of seeks in the example shown on the last slide?
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Join Operation
Several different algorithms to implement joins
Nested-loop join
Block nested-loop join
Indexed nested-loop join
Merge-join
Hash-join
Choice based on cost estimate
Examples use the following information
Number of records (n) of student: 5,000 takes: 10,000
Number of blocks (b) of student: 100 takes: 400
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Nested-Loop Join
To compute the theta join r ⨝ s
for each tuple tr in r do begin
for each tuple ts in s do begin
test pair (tr, ts) to see if they satisfy the join condition
if they do, add tr • ts to the result.
end
end
r is called the outer relation and s the inner relation of the join.
Requires no indices and can be used with any kind of join condition.
Expensive since it examines every pair of tuples in the two relations.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Nested-Loop Join (Cont.)
In the worst case, if there is enough memory only to hold one block of each relation, we have to perform a complete scan on s for each record in r and the estimated cost is
(nr bs + br) block transfers + (nr + br) seeks
where n: no. of tuples, b: no. of blocks
If the smaller relation fits entirely in memory, use that as the inner relation.
Reduces cost to br + bs block transfers and 2 seeks
Example
Assume worst case memory availability, cost estimate is
with student as outer relation:
5000 400 + 100 = 2,000,100 block transfers,
5000 + 100 = 5100 seeks
with takes as the outer relation
10000 100 + 400 = 1,000,400 block transfers and 10,400 seeks
If smaller relation (student) fits entirely in memory, the cost estimate will be 500 block transfers and 2 seeks
Block nested-loops algorithm (next slide) is preferable.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Block Nested-Loop Join
Variant of nested-loop join in which every block of inner relation is paired with every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr, ts) satisfy the join condition
if they do, add tr • ts to the result.
end
end
end
end
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Block Nested-Loop Join (Cont.)
Worst case estimate: br bs + br block transfers + 2 br seeks
Each block in the inner relation s is read once for each block in the outer relation
More efficient to use the smaller relation as the outer relation
Best case: br + bs block transfers + 2 seeks.
Example
Assuming worst case memory availability, cost estimate is
with student as outer relation:
100 400 + 100 = 40,100 block transfers,
2 100 = 200 seeks
The best-case cost remains the same.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Indexed Nested-Loop Join
Index lookups can replace file scans if an index is available on the inner relation’s join attribute.
For each tuple tr in the outer relation r, use the index to look up tuples in s that satisfy the join condition with tuple tr.
Worst case: buffer has space for only one block of r and one block of the index.
Cost of the join: br (tT + tS) + nr c
where c is the cost of traversing index and fetching all matching s tuples for one tuple of r
c can be estimated as cost of a single selection on s using the join condition.
If indices are available on join attributes of both r and s,
use the relation with fewer tuples as the outer relation.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Example of Nested-Loop Join Costs
Compute student takes, with student as the outer relation.
Let takes have a primary B+-tree index on the attribute ID, which contains 20 entries in each index node.
Since takes has 10,000 tuples, the height of the tree is 4, and one more access is needed to find the actual data
student has 5000 tuples
Cost of block nested loops join
100 * 400 + 100 = 40,100 block transfers,
2 * 100 = 200 seeks
Cost of indexed nested loops join
100 + 5000 * 5 = 25,100 block transfers and seeks.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Merge-Join
Sort both relations on their join attribute (if not already sorted on the join attributes).
Merge the sorted relations to join them
Similar to the merge stage of the sort-merge algorithm.
A group of tuples of one relation with the same join-attribute values is read into a set
Skip tuples of another relation with the join-attribute values smaller than the current join-attribute value of those tuples in the set
Join every tuple in the set with the tuples of another relation with the same join-attribute values
Detailed algorithm in book
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Merge-Join
A group of tuples of one relation with the same value on the join attributes is read into a set before matching.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Merge-Join (Cont.)
Can be used only for equi-joins and natural joins
Each block needs to be read only once
Thus the cost of merge join is (assuming bb buffer blocks are allocated to each relation):
br + bs block transfers + br / bb + bs / bb seeks + the cost of sorting if relations are unsorted.
Example
Assuming worst case memory availability and the relations are already sorted on the join attribute, cost estimate
400 + 100 = 500 block transfers,
400 + 100 = 500 seeks (assuming bb =1)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Hash-Join
Applicable for equi-joins and natural joins.
A hash function h is used to partition tuples of both relations into sets that have the same hash value on the join attributes.
h maps JoinAttrs values to {0, 1, ..., n-1}, where JoinAttrs denotes the common attributes of r and s used in the natural join.
r0, r1, . . ., rn-1 denote partitions of r tuples
Each tuple tr r is put in partition ri where i = h(tr [JoinAttrs]).
s0, s1,. . ., sn-1 denote partitions of s tuples
Each tuple ts s is put in partition si where i = h(ts [JoinAttrs]).
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Hash-Join (Cont.)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Hash-Join (Cont.)
An r tuple and an s tuple that satisfy the join condition will have the same value for the join attributes.
If that value is hashed to some value i, the r tuple has to be in ri and the s tuple in si.
As a result, r tuples in ri need only to be compared with s tuples in si
Need not be compared with s tuples in any other partition.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Hash-Join Algorithm
1. Partition the relation s using hash function h on the join attributes. When partitioning a relation, one block of memory is reserved as the output buffer for each partition.
2. Partition r similarly.
3. For each i: // perform indexed nested-loop join on each partition
(a) Load si into memory and build an in-memory hash index on it using the join attribute. This hash index uses a different hash function than the earlier one h.
(b) Read the tuples in ri from the disk block by block. For each tuple tr, locate each matching tuple ts in si using the in-memory hash index. Output the concatenation of their attributes.
The hash-join of r and s is computed as follows.
Relation s is called the build input and r is called the probe input.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Hash-Join Algorithm (Cont.)
The value n and the hash function h is chosen such that each si should fit in memory for build and probe.
The probe relation partitions need not fit in memory
Typically, M > bs/n n > bs/M
Note that n < M for partitioning
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Cost of Hash-Join
Cost of hash join is
for partitioning, a complete reading and a subsequent writing back of both relations:
2(br + bs) block transfers +
2(br / bb + bs / bb) seeks
for build and probe, a reading of each of the partitions once:
(br + bs) block transfers + 2 n seeks
total:
3(br + bs) block transfers +
2(br / bb + bs / bb) + 2 n seeks
If the entire build input can be kept in main memory, no partitioning is required
Cost estimate goes down to br + bs block transfers + 2 seeks
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Example of Cost of Hash-Join
Assume that memory size is 22 blocks
bstudent = 100 and btakes = 400.
student is to be used as build input. Partition it into five partitions, each of size 20 blocks.
Similarly, partition takes into five partitions, each of size 80.
Therefore total cost, assuming 3 blocks are allocated to the input and each of the 5 outputs during partitioning (bb=3):
3(100 + 400) = 1500 block transfers +
2( 100/3 + 400/3) + 25 = 346 seeks
takes student
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Complex Joins
Join with a conjunctive condition:
r ⨝ 1 2... n s
Either use nested loops/block nested loops, or
Compute the result of one of the simpler joins r ⨝ i s
final result comprises those tuples in the intermediate result that satisfy the remaining conditions
1 . . . i –1 i +1 . . . n
Join with a disjunctive condition
r ⨝ 1 2 ... n s
Either use nested loops/block nested loops, or
Compute as the union of the records in individual joins r ⨝ i s:
(r ⨝ 1 s) (r ⨝ 2 s) . . . (r ⨝ n s)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
42
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Other Operations: Duplicate elimination
Duplicate elimination can be implemented via hashing or sorting.
On sorting, duplicates will come adjacent to each other, and all but one copy can be deleted
In external sort merge, duplicates can be deleted during run generation and at intermediate merge steps.
On hashing, duplicates will come into the same bucket
relation is partitioned on the basis of a hash function on the whole tuple
read in each partition, construct an in-memory hash index and insert a tuple only if it is not already present, otherwise, discard it.
Worst-case cost estimate is the same as that for sorting or hash join.
Because of the relatively high cost, SQL requires an explicit request to remove duplicates.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Other Operations :
Projection and Aggregation
Projection:
perform projection on each tuple
followed by duplicate elimination.
Aggregation can be implemented in a manner similar to duplicate elimination, but based on the grouping attributes.
Sorting or hashing can be used to bring tuples in the same group together, and then the aggregate functions can be applied on each group.
The cost estimate is the same as that of duplicate elimination.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Other Operations : Set Operations
Set operations (, and ): can either use variant of merge-join after sorting, or variant of hash-join.
E.g., Set operations using hashing:
Partition both relations using the same hash function
Process each partition i as follows.
Using a different hashing function, build an in-memory hash index on ri.
Process si as follows (next slide)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Other Operations : Set Operations (Cont.)
Process si as follows
r s:
Add tuples in si to the hash index if they are not already in it.
At end of si, add the tuples in the hash index to the result.
r s:
Output tuples in si to the result if they are already there in the hash index
r – s:
For each tuple in si, if it is there in the hash index, delete it from the index.
At end of si, add remaining tuples in the hash index to the result.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Evaluation of Expressions
So far: we have seen algorithms for individual operations
Alternatives for evaluating an entire expression
Materialization
evaluate one operation at a time in an appropriate order
materialize (store) the result of each evaluation in a temporary relation for subsequent use
Pipelining
evaluate several operations simultaneously in a pipeline
pass on the results of one operation to the next, without the need to store a temporary relation
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Materialization
Materialized evaluation: evaluate one operation at a time, starting at the lowest-level. Use intermediate results materialized into temporary relations to evaluate next-level operations.
E.g., the following operator tree computes and stores
then computes and stores its join with instructor, and finally computes the projection on name.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Materialization (Cont.)
Materialized evaluation is always applicable
Cost of writing results to disk and reading them back can be quite high
Our cost formulas for operations ignore cost of writing results to disk, so
Overall cost = Sum of costs of individual operations +
cost of writing intermediate results to disk
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Pipelining
Pipelined evaluation : evaluate several operations simultaneously, passing the results of one operation on to the next.
E.g., in previous operator tree, don’t store result of
instead, pass tuples directly to the join. Similarly, don’t store result of join, pass tuples directly to projection.
Much cheaper than materialization: no need to store a temporary relation to disk.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Pipelining (Cont.)
However, pipelining may not always be possible.
Some blocking operations, e.g., sort, may not be able to output any results until all tuples from their inputs have been examined.
Other operations, such as join, are not inherently blocking but specific algorithms may be blocking.
Hash-join requires both its inputs to be fully partitioned before it outputs any tuples.
Indexed nested-loop join can output result tuples as it gets tuples for the outer relation (pipelined on its outer relation).
Merge join can be pipelined if both inputs are sorted on the join attribute and the join condition is an equi-join.
For pipelining to be effective, use evaluation algorithms that generate output tuples even as tuples are received for inputs to the operation.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Blocking Operations
Blocking operations: cannot generate any output until all inputs are consumed
E.g. sorting, aggregation, …
But, some blocking operators can often consume inputs from a pipeline, or produce outputs to a pipeline
Such operations actually execute in stages and blocking actually happens between two stages of the operation.
For sort: run generation and merge
For hash join: partitioning and build-probe
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Pipeline Stages
Pipeline stages:
All operations in a stage run concurrently
A stage can start only after preceding stages have completed execution
Example: Hash-join is a blocking operation since it requires both its inputs to be fully partitioned before it outputs any tuples.
The partitioning step for each input can be pipelined with its input
The build-probe step can be pipelined with its output
The build-probe step can start only after partitioning has been completed on both inputs
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Optimization
Reminder:
Query Optimization: Amongst all equivalent evaluation plans, choose the one with lowest cost.
Annotated expression specifying detailed evaluation strategy is called an evaluation plan (or execution plan).
Alternative ways of evaluating a given query
Different algorithms for each operation
Equivalent expressions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Optimization
Example: Find the names of all instructors in the Music department together with the course title of all the courses that the instructors teach.
instructor(ID, name, dept_name, salary)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Optimization (Cont.)
An evaluation plan defines exactly what algorithm is used for each operation, and how the execution of the operations is coordinated.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Optimization (Cont.)
Cost difference between evaluation plans for a query can be enormous
E.g. seconds vs. days in some cases
Steps in cost-based query optimization
Generate logically equivalent expressions using equivalence rules
Annotate resultant expressions to get alternative query plans
Choose the cheapest plan based on estimated cost
Estimation of plan cost based on:
Statistical information about relations.
Examples: no. of tuples, index depths
Cost formulae for algorithms, computed using statistics
Statistics estimation for intermediate results
to compute cost of complex expressions
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Transformation of Relational Expressions
Two relational algebra expressions are said to be equivalent if the two expressions generate the same set of tuples on every legal database instance
Note: order of tuples is irrelevant
An equivalence rule says that expressions of two forms are equivalent
Can replace expression of first form by second, or vice versa
However, the rules do not say that one is better than the other.
Notation:
: predicate
L : list of attributes
E : relation or relational-algebra expression
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Equivalence Rules
Conjunctive selection operations can be deconstructed into a sequence of individual selections.
σ1 2 (E) ≡ σ1 (σ2 (E))
Selection operations are commutative.
σ1(σ2(E)) ≡ σ2 (σ1(E))
Only the last in a sequence of projection operations is needed, the others can be omitted.
L1( L2(…( Ln(E))…)) ≡ L1(E)
where L1 ⊆ L2 … ⊆ Ln
Selections can be combined with Cartesian products and theta joins.
σ (E1 x E2) ≡ E1 ⨝ E2
σ 1 (E1 ⨝2 E2) ≡ E1 ⨝ 1∧2 E2
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
62
Equivalence Rules (Cont.)
Theta-join operations (and natural joins) are commutative.
E1 ⨝ E2 ≡ E2 ⨝ E1
6. (a) Natural join operations are associative:
(E1 ⨝ E2) ⨝ E3 ≡ E1 ⨝ (E2 ⨝ E3)
(b) Theta joins are associative in the following manner:
(E1 ⨝ 1 E2) ⨝ 2 3 E3 ≡ E1 ⨝1 3 (E2 ⨝ 2 E3)
where 2 involves attributes from only E2 and E3.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
63
Equivalence Rules (Cont.)
The selection operation distributes over the theta join operation under the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one of the expressions (E1) being joined.
0 E1 ⨝ E2) ≡ (0(E1)) ⨝ E2
(b) When 1 involves only the attributes of E1 and 2 involves
only the attributes of E2.
1 2 E1 ⨝ E2) ≡ (1(E1)) ⨝ (2(E2))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
64
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation as follows:
(a) Let L1 and L2 be sets of attributes from E1 and E2, respectively, if involves only attributes from L1 L2:
L1 L2(E1 ⨝ E2) ≡ L1(E1) ⨝ L2(E2)
(b) Consider a join E1 ⨝ E2.
Let L1 and L2 be sets of attributes from E1 and E2, respectively.
Let L3 be attributes of E1 that are involved in join condition , but are not in L1 L2, and
let L4 be attributes of E2 that are involved in join condition , but are not in L1 L2.
L1 L2(E1 ⨝ E2) ≡ L1 L2( L1 L3(E1) ⨝ L2 L4(E2))
Similar equivalences hold for outerjoin operations: ⟕, ⟖, and ⟗
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Equivalence Rules (Cont.)
The set operations union and intersection are commutative
E1 E2 ≡ E2 E1
E1 E2 ≡ E2 E1
(set difference is not commutative).
10. Set union and intersection are associative.
(E1 E2) E3 ≡ E1 (E2 E3)
(E1 E2) E3 ≡ E1 (E2 E3)
The selection operation distributes over , and –.
a. (E1 E2) ≡ (E1) (E2)
b. (E1 E2) ≡ (E1) (E2)
c. (E1 – E2) ≡ (E1) – (E2)
d. (E1 E2) ≡ (E1) E2
e. (E1 – E2) ≡ (E1) – E2
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Equivalence Rules (Cont.)
12. The projection operation distributes over union
L(E1 E2) ≡ (L(E1)) (L(E2))
13. Selection distributes over aggregation as below
(G𝛾A(E)) ≡ G𝛾A((E))
provided only involves attributes in G
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Transformation Example: Pushing Selections
Query: Find the names of all instructors in the Music department, along with the titles of the courses that they teach
name, title(dept_name= “Music”
(instructor ⨝ (teaches ⨝ course_id, title (course))))
Transformation using rule 7a.
name, title((dept_name= “Music”(instructor)) ⨝
(teaches ⨝ course_id, title (course)))
Performing the selection as early as possible reduces the size of the relation to be joined.
instructor(ID, name, dept_name, salary)
teaches(ID, course_id, sec_id, semester, year)
course(course_id, title, dept_name, credits)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Example with Multiple Transformations
Query: Find the names of all instructors in the Music department who have taught a course in 2009, along with the titles of the courses that they taught
name, title(dept_name= “Music”year = 2009
(instructor ⨝ (teaches ⨝ course_id, title (course))))
Transformation using join associativity (Rule 6a):
name, title(dept_name= “Music”year = 2009
((instructor ⨝ teaches) ⨝ course_id, title (course)))
Second form provides an opportunity to apply the “perform selections early” rule (Rule 7b), resulting in the subexpression
dept_name = “Music” (instructor) ⨝ year = 2009 (teaches)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Multiple Transformations (Cont.)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Transformation Example: Pushing Projections
Consider: name, title((dept_name= “Music” (instructor) ⨝ teaches)
⨝ course_id, title (course))
When we compute
(dept_name = “Music” (instructor) ⨝ teaches)
we obtain a relation whose schema is:
(ID, name, dept_name, salary, course_id, sec_id, semester, year)
Push projections using equivalence rule 8b; eliminate unneeded attributes from intermediate results to get:
name, title((name, course_id
((dept_name= “Music” (instructor)) ⨝ teaches))
⨝ course_id, title (course))
Performing the projection as early as possible reduces the size of the relation to be joined.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Join Ordering Example
A good join ordering is important for reducing the size of temporary results.
For all relations r1, r2, and r3,
(r1 ⨝ r2) ⨝ r3 = r1 ⨝ (r2 ⨝ r3 )
(Join Associativity)(Rule 6a)
If r2 ⨝ r3 is quite large and r1 ⨝ r2 is small, we choose
(r1 ⨝ r2) ⨝ r3
so that we compute and store a smaller temporary relation.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Join Ordering Example (Cont.)
Consider the expression
name, title((dept_name= “Music” (instructor)) ⨝ teaches
⨝ course_id, title (course))
Could compute
teaches ⨝ course_id, title (course)
first but the result is likely to be a large relation because it contains one tuple for every course.
Only a small fraction of the university’s instructors are likely to be from the Music department, it is better to compute
dept_name= “Music” (instructor) ⨝ teaches
first.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Statistics Estimation
Reminder: Cost of each operator needs statistics of input relations
The database-system catalog stores the following statistical information about database relations.
nr : number of tuples in a relation r.
br : number of blocks containing tuples of r.
lr : size of a tuple of r.
fr : blocking factor of r, i.e., the number of tuples of r that fit into one block.
V(A, r) : number of distinct values that appear in r for attribute A; same as the size of A(r).
If tuples of r are stored together physically in a file, then:
Others: heights of B+-tree indices, no. of leaf pages in the indices
However, inputs can be results of sub-expressions
Need to estimate statistics of expression results
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Statistics Estimation (Cont.)
A query-evaluation plan that has the lowest estimated execution cost may NOT have the lowest actual execution cost because the estimates are based on assumptions that may not hold exactly.
However, real-world experience has shown that the plans with the lowest estimated costs usually have actual execution costs that are either the lowest, or are close to the lowest.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Selection Size Estimation
A=v(r)
Assume uniform distribution (though may not be correct)
nr / V(A,r) : number of records that will satisfy the selection
Equality condition on a key attribute: size estimate = 1
AV(r) (case of A V(r) is symmetric)
Let c denote the estimated number of tuples satisfying the condition.
If min(A,r) and max(A,r) are available in catalog
c = 0 if v < min(A,r)
c = nr if v max(A,r)
c = , otherwise
If the value v is not available, c is assumed to be nr / 2.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Size Estimation of Complex Selections
The selectivity of a condition i is the probability that a tuple in the relation r satisfies i .
If si is the number of satisfying tuples in r, the selectivity of i is given by si /nr.
Conjunction: 1 2. . . n (r). Assuming independence, estimate of tuples in the result is:
Disjunction:1 2 . . . n (r). Estimated number of tuples:
Negation: (r). Estimated number of tuples:
nr – size((r))
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
78
Estimation of the Size of Joins
The Cartesian product r x s contains nr ns tuples; each tuple occupies lr + ls bytes.
If R S = , then r ⨝ s is the same as r x s.
If R S is a key for R, then a tuple of s will join with at most one tuple from r
therefore, the number of tuples in r ⨝ s is no greater than the number of tuples in s.
If R S in S is a foreign key in S referencing R, then the number of tuples in r ⨝ s is exactly the same as the number of tuples in s.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Estimation of the Size of Joins (Cont.)
If R S = {A} is not a key for R or S.
If we assume that every tuple t in R produces tuples in R ⨝ S, the number of tuples in R ⨝ S is estimated to be:
If the reverse is true, the estimate obtained will be:
The lower of these two estimates is probably the more accurate one.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Join Operation Example
Running example:
student ⨝ takes
Catalog information for join examples:
nstudent = 5,000.
fstudent = 50, which implies that bstudent =5000/50 = 100.
ntakes = 10000.
ftakes = 25, which implies that btakes = 10000/25 = 400.
V(ID, student) = 5000 (primary key!)
V(ID, takes) = 2500, which implies that on average, each student who has taken a course has taken 4 courses.
Attribute ID in takes is a foreign key referencing student.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Join Operation Example (Cont.)
Example: student takes, ID in takes is a foreign key referencing student
hence, the result has exactly ntakes tuples, which is 10000
Example: compute the size estimates for student ⨝ takes without using information about foreign keys:
V(ID, takes) = 2500, and V(ID, student) = 5000
The two estimates are 5000 * 10000/2500 = 20,000 and 5000 * 10000/5000 = 10000
We choose the lower estimate, which in this case, is the same as our earlier computation using foreign keys.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Size Estimation for Other Operations
Projection: estimated size of A(r) = V(A,r), duplicates eliminated
Aggregation: estimated size of G𝛾F(r) = V(G,r), one tuple in G𝛾F(r) for each distinct value of G.
Set operations
For unions/intersections of selections on the same relation: rewrite and use size estimate for selections
E.g. 1 (r) 2 (r) can be rewritten as 12 (r)
For operations on different relations:
estimated size of r s = size of r + size of s.
estimated size of r s = minimum size of r and size of s.
estimated size of r – s = r.
All the three estimates may be quite inaccurate, but provide upper bounds on the sizes.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Estimation of Number of Distinct Values
Selections: (r)
If forces A to take a specified value:
V(A, (r)) = 1.
e.g., A = 3
If forces A to take on one of a specified set of values:
V(A, (r)) = number of specified values.
e.g., (A = 1 A = 3 A = 4)
If the selection condition is of the form A op v, where op is a comparison operator:
estimated V(A, (r)) = V(A, r) * s
where s is the selectivity of the selection.
In all the other cases: use approximate estimate of
min(V(A,r), n (r) )
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
84
Estimation of Distinct Values (Cont.)
Joins: r ⨝ s
If all attributes in A are from r
estimated V(A, r ⨝ s) = min (V(A, r), n r ⨝ s)
If A contains attributes A1 from r and A2 from s, then
estimated V(A, r ⨝ s) =
min(V(A1, r)*V(A2 – A1, s), V(A1 – A2, r)*V(A2, s), nr ⨝ s)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
85
Estimation of Distinct Values (Cont.)
Estimation of distinct values are straightforward for projections.
They are the same in A (r) as in r.
The same holds for grouping attributes of aggregation.
For aggregated values
For min(A) and max(A), the number of distinct values can be estimated as min(V(A,r), V(G,r)) where G denotes grouping attributes
For other aggregates, assume all values are distinct, and use V(G,r)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
86
Query Processing and Optimization
Overview
Measures of Query Cost
Selection Operation
Sort Operation
Join Operation
nested-loop, block nested-loop, indexed nested-loop, merge-join and hash join
Other Operations
Evaluation of Expressions
Transformation of Relational Expressions
Statistics Estimation
Choices of Evaluation Plans
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Choice of Evaluation Plans
Choosing the cheapest algorithm for each operation independently may not yield best overall algorithm.
Must consider the interaction of evaluation techniques when choosing evaluation plans
merge-join may be costlier than hash-join, but may provide a sorted output which reduces the cost for an outer level aggregation.
nested-loop join may provide opportunity for pipelining
Practical query optimizers incorporate elements of the following two broad approaches:
1. Search all the plans and choose the best plan in a cost-based fashion.
2. Uses heuristics to choose a plan (at the potential risk of not finding the optimal plan).
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Cost-Based Optimization
Consider finding the best join-order for r1 ⨝ r2 ⨝ . . . ⨝ rn.
There are (2(n – 1))!/(n – 1)! different join orders for above expression.
n = 3, the number is 12
n = 7, the number is 665280
n = 10, the number is greater than 176 billion!
No need to generate all the join orders. Using dynamic programming, the least-cost join order for any subset of {r1, r2, . . . rn} can be computed only once and stored for future use.
Example: Find the best join order of the form (r1 ⨝ r2 ⨝ r3) ⨝ r4 ⨝ r5
There are 12 different join orders for computing r1 ⨝ r2 ⨝ r3 and 12 orders for computing the join of this result with r4 and r5.
We first find the best join order for the subset of relations {r1, r2, r3}, we can use that order for further joins with r4 and r5, and ignore all more expensive join orders of r1 ⨝ r2 ⨝ r3.
Thus, only 12 + 12 choices, instead of 144 join orders are examined.
Read pseudo-code in book for details.
Cost-based optimization is expensive, but worthwhile for queries on large datasets (typical queries have small n, generally < 10)
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Heuristic Optimization
Cost-based optimization is expensive.
Systems may use heuristics to reduce the number of choices that must be made in a cost-based fashion.
Heuristic optimization transforms the query-tree by using a set of rules that typically (but not in all cases) improve execution performance:
Perform selection early (reduces the number of tuples)
Perform projection early (reduces the number of attributes)
Perform most restrictive selection and join operations (i.e. with smallest result size) before other similar operations.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Heuristic Optimization (Cont.)
Many optimizers consider only left-deep join orders (instead of all join orders).
Right-hand-side input for each join is a relation, not the result of an intermediate join
Convenient for pipelined evaluation since the right operand is a stored relation
Reduces optimization complexity
For n=3, there are 6 ways.
Some versions of Oracle, consider n evaluation plans in a left-deep join order for an n-way join
Starting from each of the n relations
Repeatedly pick “best” relation to join next on the basis of a ranking of the available access paths
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Heuristic Optimization (Cont.)
Search for optimal plan is terminated when the optimization cost budget is exceeded and the best plan found so far is returned
Optimizers usually first apply cheap heuristics to find a plan
Then, start a full cost-based optimization with a budget based on the heuristically chosen plan
Plan caching reuses previous optimal query plan if query is resubmitted (with different constants in query).
Example: a query to find the courses for which a student has registered in a university application.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
Concluding Remarks
Real-life observations
The difference in execution time between a good plan and a bad one may be huge.
The added cost of cost-based query optimization is usually more than offset by the saving at query-execution time, which is dominated by slow disk accesses.
The achieved saving is magnified in applications that run on a regular basis, where a query can be optimized once, and the selected query plan can be used many times.
©Silberschatz, Korth and Sudarshan
‹#›
Database System Concepts
)
(
"
Watson
"
department
building
=
s
ú
ú
ú
ú
ù
ê
ê
ê
ê
é
=
r
f
r
n
r
b
)
,
min(
)
,
max(
)
,
min(
.
r
A
r
A
r
A
v
n
r
-
-
n
r
n
r n
sss
n
∗∗∗
∗
. . . 21
n
r
n
r
n
sss
n
***
*
. . .
21
⎟⎟
⎠
⎞
⎜⎜
⎝
⎛
−∗∗−∗−−∗ )1(...)1()1(1 21
r
n
rr
r n
s
n
s
n
s
n
÷
÷
ø
ö
ç
ç
è
æ
-**-*--* )1(...)1()1(1
21
r
n
rr
r
n
s
n
s
n
s
n
)
,
(
s
A
V
n
n
s
r
*
)
,
(
r
A
V
n
n
s
r
*
/docProps/thumbnail.jpeg