Query Optimization 2
1.3
Once a parse tree has been constructed for a query it is converted to a logical query plan
▪ A logical query plan consists of relational algebra operators and relations
▪ Nodes and components of the parse tree are replaced by relational algebra operators
The relational algebra plan is then modified ▪ To an expression that is expected to result in an
efficient physical query plan
John Edgar 3
A set of rules allow parse trees to be transformed into
relational algebra
For our simplified SQL subset
▪ Replace a
The relational algebra expression consists of
▪ The product of all the relations in the
(c (r1 r2 r3))
▪ A projection L where L consists of the attributes in the
r1 r2 r3
▪ A selection c where C is the
an argument to argument to
a1,a2 (c (r1 r2 r3))
John Edgar
4
SELECT ownerAcc, amount
FROM Transaction, Account
WHERE balance = 1000000 AND trans_accID = accID
John Edgar 5
SELECT
WHERE
SELECT ownerAcc, amount
trans_accID
FROM Transaction, Account
AND
WHERE balance = 1000000 AND trans_accID = accID
amount
Account
1000000
John Edgar
6
SELECT
WHERE
,
,
AND
1000000
John Edgar
7
SELECT
WHERE
Account
Account
Transaction
,
ownerAcc, amount
trans_accID
amount
Transaction
AND
balance = 1000000 AND trans_accID= accID
1000000
John Edgar
8
ownerAcc, amount
balance = 1000000 AND trans_accID= accID
Transaction
Account
John Edgar
9
Some parse trees include a
▪ Sub-queries add complexity to the translation Sub-queries are replaced by a selection and
other relational algebra operators
▪ Different types of sub-query require different rules to replace them
▪ IN, EXISTS, ANY, ALL, …
John Edgar 10
Consider sub-queries of the form t IN S
▪ Where t is a tuple made up of some attributes of R ▪ And S is a sub-query
Sub-queries with IN are usually uncorrelated ▪ They can be replaced by the expression tree for S
▪ If S might contain duplicates they are removed ()
▪ A selection where the condition equates t to the
corresponding attribute of S, and ▪ The Cartesian product of R and S
John Edgar 11
SELECT trans_accID, amount
FROM Transaction
WHERE trans_accID IN(
SELECT accID
FROM Account
WHERE balance = 1000000)
John Edgar 12
SELECT
trans_accID
WHERE
trans_accID
SELECT
accID
FROM
Account
WHERE
= 1000000
John Edgar
13
trans_accID, amount
Transaction
trans_accID
IN accID balance = 1000000
Account
Replace IN with the product of the two relations and an equality selection comparing the attributes
John Edgar
14
trans_accID, amount trans_accID= accID
Transaction
accID balance = 1000000 Account
John Edgar
15
Replace IN with the product of the two relations and an equality selection comparing the attributes
A correlated sub-query contains a reference to the outer query in the sub-query
▪ The sub-query cannot be translated in isolation ▪ It must be processed once for each outer query row
▪ The sub-query is usually replaced with a query that joins the sub-query and outer query relations
▪ The process is otherwise similar to that of uncorrelated queries
SELECT msp, email FROM Patient P WHERE EXISTS (
SELECT * FROM Operation O WHERE P.msp = O.msp AND … )
John Edgar 16
Once an expression tree has been created the plan can be rewritten
▪ Using the algebraic laws Next section …
▪ The initial plan could differ based on the SQL to
relational algebra conversion
▪ This will not be considered except for the issues relating to the order of joins
There are a number of transformations that commonly improve plans
John Edgar 17
Selections are pushed down as far as possible ▪ Selections with AND clauses can be split and the
components pushed down the tree
▪ This may reduce the size of intermediate relations
Projections should also be pushed down
▪ Additional projections may be added
Duplicate eliminations may be moved
Selections can be combined with Cartesian
products to create equijoins
John Edgar 18
It may be possible to group a sub-tree into a single node
▪ If it consists of nodes with the same associative and commutative operators
▪ Group the nodes into a single node with multiple children
Then consider which order to perform the operation in later
⋈
⋈⋈
VWX R
ST
⋈
VWX
RST
John Edgar
19
If an operator is commutative the order of its arguments do not matter
▪ e.g. + (x + y = y + x), but not – (x – y ≠ y – x)
If an operator is associative then two uses of it may
be grouped from the left or the right
▪ e.g. + (x + y) + z = x + (y + z)
If an operator is associative and commutative its
operands may be grouped and ordered in any way
John Edgar 21
Set
A
123
B
234
C
345
BC
2345
A (B C)
23
AB
23
AC
3
(A B) (A C)
23
SQL queries result in bags, not sets
▪ A bag may contain duplicates but sets cannot
▪ Some set-theoretic laws apply to sets but not to bags
The distributive law of intersection over union
▪ A (B C) (A B) (A C) ▪ Does not apply to bags
Bag
A
123
B
234
C
345
BC
233445
A (B C)
23
AB
23
AC
3
(A B) (A C)
233
John Edgar 22
Unions are both commutative and associative
▪RSSR
▪ R (S T) (R S) T
order does not matter
Intersections are both commutative and associative
▪RSSR
▪ R (S T) (R S) T
order does not matter
Set difference is neither commutative nor associative ▪R−SS−R
▪ R − (S − T) (R − S) − T John Edgar
order does matter
23
Cartesian product and joins are commutative ▪ e.g. R ⋈ S S ⋈ R
Cartesian products and joins are associative ▪ e.g. R (S T) (R S) T
Relations may therefore be joined in any order
John Edgar 24
A selection and Cartesian product can be combined to form a join
▪c(RS)R⋈c S
▪ e.g.P.msp=O.msp(PatientOperation)Patient⋈Operation
This may have an impact on the cost of a query ▪ Some join algorithms are much more efficient than
computing a Cartesian product
John Edgar 25
The order in which joins and Cartesian products are made affects the size of intermediate relations
▪ Which, in turn, affects the time taken to process a query Consider these three relations:
▪ Customer = {sin, fn, ln, age} – 1,000 records
▪ Account = {acc, type, balance} – 1,200 records
▪ Owns = {sinfkCustomer, accfkAccount} – 1,400 records
Owns ⋈ (Customer ⋈ Account)
▪ Intermediate relation – 1,000 * 1,200 = 1,200,000 records
(Owns ⋈ Customer) ⋈ Account
▪ Intermediate relation – 1,400 records
John Edgar 26
Pushing selections down the query plan tree
reduces the size of intermediate relations
Conjunctions can be split into a cascading selection
▪ c1 c2 … cn(R) c1(c2( … (cn(R))))
▪ dob<1970 name="Abe"(Patient) dob<1970 (name="Abe"(Patient))
Selections are commutative ▪ c1(c2(R)) c2(c1(R))
▪ dob<1970(name="Abe"(Patient)) name="Abe" ( dob<1970(Patient)) Disjunctive selections can be replaced by unions
▪ c1 c2(R) c1(R) c2(R) John Edgar
But only if R is a set – not a bag
27
A selection can be pushed through a union, and must be applied to both arguments
▪ c(R S) c(S) c(R)
A selection can be pushed through an intersection,
and need only be applied to one argument
▪ c(R S) c(S) (R)
A selection can be pushed through a set difference,
and must be applied to the first argument ▪ c(R − S) c(R) − (S)
John Edgar 28
A selection can be pushed through a Cartesian product, and is only required in one argument
▪ c(R S) c(R) S
▪ If the selection involves attributes of only one relation
This relationship can be stated more generally
▪ Replace c with: cRS (with attributes of both R and S), cR
(with attributes just of R) and cS (with attributes just of S): ▪ c(R S) cRS(cR(R) cS(S))
▪ dob<1970 P.msp=O.msp desc="lobotomy"(Patient Operation) P.msp=O.msp( dob<1970(Patient) desc="lobotomy"(Operation))
John Edgar 29
dob<197o P.msp=O.msp desc="lobotomy"(Patient Operation)
Pushing selections as far down as possible
dob<197o P.msp=O.msp desc="lobotomy"
Patient
Operation
dob<197o Patient
description = "lobotomy" Operation
result
⋈
John Edgar
30
Only the final projection in a series of projections is required
▪ a1 (R) a1(a2((an(R)))) ▪ where ai ai+1
For example:
▪ city(Patient) city(city,fName(city,fName,lName(Patient)))
John Edgar 31
Projections can be pushed through unions, and must be applied to both arguments
▪ a(R S) a(R) a(S)
Projections can not be pushed through intersections
or set difference
▪ a(R S) a(R) a(S)
Imagine both tables have sin as primary key ▪ lname(PatientDoctor)lname(Patient)lname(Doctor)
▪ a(R − S) a(R) − a(S)
▪ lname(Patient−Doctor)lname(Patient)−lname(Doctor)
Last names of patients who are not doctors
Patient last names that are not the last names of doctors
John Edgar 32
Projections can be pushed through Cartesian products
▪ a(R S) aR(R) aS(S)
▪ Let the attribute list a be made up of aR (attributes of R),
and aS (attributes of S)
▪ e.g.P.msp,fName,lName,description,O.msp(PatientOperation) msp,fName,lname(Patient) description,msp(Operation)
▪ In this example a selection could then be made to extract patient and operations records that relate to each other
John Edgar 33
Projections can be pushed through joins
▪ If the join condition attributes are all in the projection ▪ e.g.msp,dob,description(Patient⋈Operation)
▪ msp,dob(Patient)⋈msp,description(Operation)
More generally
▪ Let aR contains the attributes of R that appear in c or a,
and aS contains the attributes of S that appear in c or a: ▪ a(R ⋈c S) a(aR(R) ⋈c aS(S))
▪ e.g.fName,lName,acc(Acccount⋈C.sin=A.sinbalance>incomeCustomer)
▪ fName,lName,acc(acc,balance,sin(Account)⋈C.sin=A.sinbalance>income
sin,fName,lName,income(Customer))
John Edgar 34
Selections and Projections are commutative if the selection only involves attributes of the projection
▪ a(c(R)) c(a(R))
▪ e.g. msp,fName,lName(dob > 1970(Patient))
▪ is not equivalent to
▪ dob > 1970(msp,fName,lName(Patient))
In other words, don’t project out attributes that are required for downstream selections!
John Edgar 35
Duplicate removal may be pushed through several operators
▪ Selections, Cartesian products and joins
Duplicate removal can be moved to either or
both the arguments of an intersection
But cannot generally be pushed through
unions, set difference or projections
John Edgar 36
There are a number of transformations that may be applied to queries with aggregates
Some of the transformations depend on the aggregation
▪ The projection of attributes not included in the aggregation may be pushed down the tree
▪ Duplicate removal may be pushed through MIN and MAX, but not SUM, or COUNT, or AVG
John Edgar 37