程序代写代做代考 algorithm C AI Query Optimization 2

Query Optimization 2

1.3

 Once a parse tree has been constructed for a query it is converted to a logical query plan
▪ A logical query plan consists of relational algebra operators and relations
▪ Nodes and components of the parse tree are replaced by relational algebra operators
 The relational algebra plan is then modified ▪ To an expression that is expected to result in an
efficient physical query plan
John Edgar 3

 A set of rules allow parse trees to be transformed into
relational algebra
For our simplified SQL subset
▪ Replace a with a but no sub-queries by a relational algebra expression
 The relational algebra expression consists of
▪ The product of all the relations in the , which is
(c (r1 r2 r3))
▪ A projection L where L consists of the attributes in the
r1 r2 r3
▪ A selection c where C is the , which is an
an argument to argument to

a1,a2 (c (r1  r2  r3))
John Edgar
4

SELECT ownerAcc, amount
FROM Transaction, Account
WHERE balance = 1000000 AND trans_accID = accID
John Edgar 5

SELECT FROM ,
,
WHERE


SELECT ownerAcc, amount
trans_accID Transaction

FROM Transaction, Account
AND
WHERE balance = 1000000 AND trans_accID = accID
amount
Account


= balance
1000000
= trans_accID
accID
John Edgar
6


SELECT FROM

WHERE

trans_accID
,
amount

Transaction
,
Account

AND
= balance
1000000
= trans_accID
accID
John Edgar
7


SELECT FROM ,

WHERE

Account
Account




Transaction
,

ownerAcc, amount
trans_accID

amount
Transaction
AND
balance = 1000000 AND trans_accID= accID
= balance
1000000
= trans_accID
accID
John Edgar
8

ownerAcc, amount
balance = 1000000 AND trans_accID= accID

Transaction
Account
John Edgar
9

 Some parse trees include a with a sub-query
▪ Sub-queries add complexity to the translation  Sub-queries are replaced by a selection and
other relational algebra operators
▪ Different types of sub-query require different rules to replace them
▪ IN, EXISTS, ANY, ALL, …
John Edgar 10

 Consider sub-queries of the form t IN S
▪ Where t is a tuple made up of some attributes of R ▪ And S is a sub-query
 Sub-queries with IN are usually uncorrelated ▪ They can be replaced by the expression tree for S
▪ If S might contain duplicates they are removed ()
▪ A selection where the condition equates t to the
corresponding attribute of S, and ▪ The Cartesian product of R and S
John Edgar 11

SELECT trans_accID, amount
FROM Transaction
WHERE trans_accID IN(
SELECT accID
FROM Account
WHERE balance = 1000000)
John Edgar 12

SELECT FROM ,
trans_accID amount
Transaction
WHERE IN
trans_accID

SELECT

accID
FROM
Account
WHERE
balance

= 1000000

John Edgar
13

trans_accID, amount 
Transaction


trans_accID
IN accID balance = 1000000
Account
Replace IN with the product of the two relations and an equality selection comparing the attributes
John Edgar
14

trans_accID, amount trans_accID= accID

Transaction
accID balance = 1000000 Account
John Edgar
15
Replace IN with the product of the two relations and an equality selection comparing the attributes

 A correlated sub-query contains a reference to the outer query in the sub-query
▪ The sub-query cannot be translated in isolation ▪ It must be processed once for each outer query row
▪ The sub-query is usually replaced with a query that joins the sub-query and outer query relations
▪ The process is otherwise similar to that of uncorrelated queries
SELECT msp, email FROM Patient P WHERE EXISTS (
SELECT * FROM Operation O WHERE P.msp = O.msp AND … )
John Edgar 16

 Once an expression tree has been created the plan can be rewritten
▪ Using the algebraic laws Next section …
▪ The initial plan could differ based on the SQL to
relational algebra conversion
▪ This will not be considered except for the issues relating to the order of joins
 There are a number of transformations that commonly improve plans
John Edgar 17

 Selections are pushed down as far as possible ▪ Selections with AND clauses can be split and the
components pushed down the tree
▪ This may reduce the size of intermediate relations
 Projections should also be pushed down
▪ Additional projections may be added
 Duplicate eliminations may be moved
 Selections can be combined with Cartesian
products to create equijoins
John Edgar 18

 It may be possible to group a sub-tree into a single node
▪ If it consists of nodes with the same associative and commutative operators
▪ Group the nodes into a single node with multiple children
 Then consider which order to perform the operation in later

⋈⋈
VWX R
ST

VWX
RST
John Edgar
19

 If an operator is commutative the order of its arguments do not matter
▪ e.g. + (x + y = y + x), but not – (x – y ≠ y – x)
 If an operator is associative then two uses of it may
be grouped from the left or the right
▪ e.g. + (x + y) + z = x + (y + z)
 If an operator is associative and commutative its
operands may be grouped and ordered in any way
John Edgar 21

Set
A
123
B
234
C
345
BC
2345
A  (B  C)
23
AB
23
AC
3
(A  B)  (A  C)
23
 SQL queries result in bags, not sets
▪ A bag may contain duplicates but sets cannot
▪ Some set-theoretic laws apply to sets but not to bags
 The distributive law of intersection over union
▪ A  (B  C)  (A  B)  (A  C) ▪ Does not apply to bags
Bag
A
123
B
234
C
345
BC
233445
A  (B  C)
23
AB
23
AC
3
(A  B)  (A  C)
233
John Edgar 22

 Unions are both commutative and associative
▪RSSR
▪ R  (S  T)  (R  S)  T
order does not matter
 Intersections are both commutative and associative
▪RSSR
▪ R  (S  T)  (R  S)  T
order does not matter
 Set difference is neither commutative nor associative ▪R−SS−R
▪ R − (S − T)  (R − S) − T John Edgar
order does matter
23

 Cartesian product and joins are commutative ▪ e.g. R ⋈ S  S ⋈ R
 Cartesian products and joins are associative ▪ e.g. R  (S  T)  (R  S)  T
 Relations may therefore be joined in any order
John Edgar 24

 A selection and Cartesian product can be combined to form a join
▪c(RS)R⋈c S
▪ e.g.P.msp=O.msp(PatientOperation)Patient⋈Operation
 This may have an impact on the cost of a query ▪ Some join algorithms are much more efficient than
computing a Cartesian product
John Edgar 25

 The order in which joins and Cartesian products are made affects the size of intermediate relations
▪ Which, in turn, affects the time taken to process a query  Consider these three relations:
▪ Customer = {sin, fn, ln, age} – 1,000 records
▪ Account = {acc, type, balance} – 1,200 records
▪ Owns = {sinfkCustomer, accfkAccount} – 1,400 records
 Owns ⋈ (Customer ⋈ Account)
▪ Intermediate relation – 1,000 * 1,200 = 1,200,000 records
 (Owns ⋈ Customer) ⋈ Account
▪ Intermediate relation – 1,400 records
John Edgar 26

 Pushing selections down the query plan tree
reduces the size of intermediate relations
 Conjunctions can be split into a cascading selection
▪ c1  c2  …  cn(R)  c1(c2( … (cn(R))))
▪ dob<1970  name="Abe"(Patient)   dob<1970 (name="Abe"(Patient))  Selections are commutative ▪ c1(c2(R))  c2(c1(R)) ▪  dob<1970(name="Abe"(Patient))  name="Abe" ( dob<1970(Patient))  Disjunctive selections can be replaced by unions ▪ c1  c2(R)  c1(R)  c2(R) John Edgar But only if R is a set – not a bag 27  A selection can be pushed through a union, and must be applied to both arguments ▪ c(R  S)  c(S)  c(R)  A selection can be pushed through an intersection, and need only be applied to one argument ▪ c(R  S)  c(S)  (R)  A selection can be pushed through a set difference, and must be applied to the first argument ▪ c(R − S)  c(R) − (S) John Edgar 28  A selection can be pushed through a Cartesian product, and is only required in one argument ▪ c(R  S)  c(R)  S ▪ If the selection involves attributes of only one relation  This relationship can be stated more generally ▪ Replace c with: cRS (with attributes of both R and S), cR (with attributes just of R) and cS (with attributes just of S): ▪ c(R  S)  cRS(cR(R)  cS(S)) ▪  dob<1970  P.msp=O.msp  desc="lobotomy"(Patient  Operation)  P.msp=O.msp( dob<1970(Patient)  desc="lobotomy"(Operation)) John Edgar 29 dob<197o  P.msp=O.msp  desc="lobotomy"(Patient  Operation) Pushing selections as far down as possible dob<197o  P.msp=O.msp  desc="lobotomy"  Patient Operation dob<197o Patient description = "lobotomy" Operation result ⋈ John Edgar 30  Only the final projection in a series of projections is required ▪ a1 (R)  a1(a2((an(R)))) ▪ where ai  ai+1  For example: ▪ city(Patient)  city(city,fName(city,fName,lName(Patient))) John Edgar 31  Projections can be pushed through unions, and must be applied to both arguments ▪ a(R  S)  a(R)  a(S)  Projections can not be pushed through intersections or set difference ▪ a(R  S)  a(R)  a(S) Imagine both tables have sin as primary key ▪ lname(PatientDoctor)lname(Patient)lname(Doctor) ▪ a(R − S)  a(R) − a(S) ▪ lname(Patient−Doctor)lname(Patient)−lname(Doctor) Last names of patients who are not doctors Patient last names that are not the last names of doctors John Edgar 32  Projections can be pushed through Cartesian products ▪ a(R  S)  aR(R)  aS(S) ▪ Let the attribute list a be made up of aR (attributes of R), and aS (attributes of S) ▪ e.g.P.msp,fName,lName,description,O.msp(PatientOperation) msp,fName,lname(Patient)  description,msp(Operation) ▪ In this example a selection could then be made to extract patient and operations records that relate to each other John Edgar 33  Projections can be pushed through joins ▪ If the join condition attributes are all in the projection ▪ e.g.msp,dob,description(Patient⋈Operation) ▪ msp,dob(Patient)⋈msp,description(Operation)  More generally ▪ Let aR contains the attributes of R that appear in c or a, and aS contains the attributes of S that appear in c or a: ▪ a(R ⋈c S)  a(aR(R) ⋈c aS(S)) ▪ e.g.fName,lName,acc(Acccount⋈C.sin=A.sinbalance>incomeCustomer)
▪ fName,lName,acc(acc,balance,sin(Account)⋈C.sin=A.sinbalance>income
sin,fName,lName,income(Customer))
John Edgar 34

 Selections and Projections are commutative if the selection only involves attributes of the projection
▪ a(c(R))  c(a(R))
▪ e.g. msp,fName,lName(dob > 1970(Patient))
▪ is not equivalent to
▪ dob > 1970(msp,fName,lName(Patient))
 In other words, don’t project out attributes that are required for downstream selections!
John Edgar 35

 Duplicate removal may be pushed through several operators
▪ Selections, Cartesian products and joins
 Duplicate removal can be moved to either or
both the arguments of an intersection
 But cannot generally be pushed through
unions, set difference or projections
John Edgar 36

 There are a number of transformations that may be applied to queries with aggregates
 Some of the transformations depend on the aggregation
▪ The projection of attributes not included in the aggregation may be pushed down the tree
▪ Duplicate removal may be pushed through MIN and MAX, but not SUM, or COUNT, or AVG
John Edgar 37