Query Optimisation
Query Optimisation
In practice, query optimisers incorporate elements of the following three
optimisation approaches:
Semantic query optimisation
Use application specific semantic knowledge to transform a query into
the one with a lower cost (they return the same answer).
Rule-based query optimisation
Use heuristic rules to transform a relational algebra expression into an
equivalent one with a possibly lower cost.
Cost-based query optimisation
Use a cost model to estimate the costs of plans, and then select the
most cost-effective plan.
Semantic Query Optimisation
Can we use semantic information stored in a database (such as integrity
constraints) to optimise queries?
semantics: “meaning”.
Recall that, integrity constraints in the relational model include:
key constraints
entity integrity constraints
referential integrity constraints
domain constraints
. . .
user-defined integrity constraints
Key idea: Integrity constraints may not only be utilized to enforce
consistency of a database, but may also optimise user queries.
Semantic Query Optimisation
Example 1:
Constraint: The relation Employee has the primary key {ssn}.
Query: SELECT DISTINCT ssn FROM Employee;
We can avoid extra costs for duplicate elimination if the existing
constraint tells us that tuples in the result will be unique.
Semantic Query Optimisation
Example 2:
Constraint: No employee can earn more than 200000.
Query: SELECT name
FROM Employee
WHERE salary > 300000;
We do not need to execute a query if the existing constraint tells us
that the result will be empty.
Semantic Query Optimisation
Example 3:
Constraints: The relation WORKS ON has the foreign keys:
[ssn]⊆EMPLOYEE[ssn] and [pno]⊆PROJECT[pnumber]
Query: SELECT DISTINCT ssn
FROM Works on INNER JOIN Project
on Works on.pno=Project.pnumber;
We can reduce the number of joins by executing the following query
since both queries always return the same result.
SELECT DISTINCT ssn
FROM Works on ;
Rule-based Query Optimisation
A rule-based optimisation transforms the RA expression by using a set of
heuristic rules that typically improve the execution performance.
Key ideas: apply the most restrictive operation before other operations,
which can reduce the size of intermediate results:
Push-down selection:
Apply as early as possible to reduce the number of tuples;
Push-down projection:
Apply as early as possible to reduce the number of attributes.
Re-ordering joins:
Apply restrictive joins first to reduce the size of the result.
But we must ensure that the resulting query tree gives the same result as
the original query tree, i.e., the equivalence of RA expressions.
Heuristic Rules
Staff(sid, fname, lname, salary, position, branchNo)
Branch(branchNo, name, street, suburb, city)
There are many heuristic rules for transforming RA expressions, utilized
by the query optimiser, such as:
(1) σϕ(σψ(R)) ≡ σϕ∧ψ(R);
σbranchNo=′1′(σsalary>60000(Staff )) = σbranchNo=′1′∧salary>60000(Staff )
(2) πX (πY (R)) ≡ πX (R) if X ⊆ Y ;
πsalary (πbranchNo,salary (Staff )) = πsalary (Staff )
(3) σϕ(R1 × R2) ≡ R1 ./ϕ R2
σStaff .branchNo=Branch.branchNo(Staff × Branch) =
(Staff ) ./Staff .branchNo=Branch.branchNo (Branch)
Heuristic Rules
Staff(sid, fname, lname, salary, position, branchNo)
Branch(branchNo, name, street, suburb, city)
(4) σϕ1(R1 ./ϕ2 R2) ≡ R2 ./ϕ1∧ϕ2 R1
σsalary>60000(Staff ./Staff .branchNo=Branch.branchNo (Branch)) =
(Staff ) ./Staff .branchNo=Branch.branchNo∧salary>60000 (Branch)
(5) σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1
σsalary>60000(Staff ./ Branch) = σsalary>60000(Staff ) ./ Branch
(6) σϕ1∧ϕ2(R1 ./ R2) ≡ σϕ1(R1) ./ σϕ2(R2) if ϕ1 contains only
attributes in R1 and ϕ2 contains only attributes in R2.
σsalary>60000∧city=′Canberra′(Staff ./ Branch) =
(σsalary>60000(Staff )) ./ (σcity=′Canberra′(Branch))
Heuristic Rules
Staff(sid, fname, lname, salary, position, branchNo)
Branch(branchNo, name, street, suburb, city)
(7) If the join condition involves only attributes in X , we have
πX (R1 ./ R2) ≡ πX1(R1) ./ πX2(R2), where Xi contains attributes in
both R1 and R2, and ones in both Ri and X , and
πbranchNo,position,city (Staff ./ Branch) =
πbranchNo,position(Staff ) ./ (πbranchNo,city (Branch))
(8) If the join condition contains attributes not in X , we have
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)), where Xi contains attributes in
both in R1 and R2, and ones in both Ri and X
πposition,city (Staff ./ Branch) =
πposition,city (πbranchNo,position(Staff ) ./ (πbranchNo,city (Branch)))
……
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
DIRECTOR(id, title, production year)
MOVIE AWARD(title, production year, award name, year of award)
Query: List the first and last names of the directors who have directed
a movie that has won an ’Oscar’ movie award
πfirst name,last name(σaward name=‘Oscar ’((PERSON ./ DIRECTOR) ./
MOVIE AWARD))
Question: Can we apply the following rule to optimise the query?
σϕ(R1 ./ R2) ≡ σϕ(R1) ./ R2, if ϕ contains only attributes in R1
Push-down Selection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
DIRECTOR(id, title, production year)
MOVIE AWARD(title, production year, award name, year of award)
Query: List the first and last names of the directors who have directed
a movie that has won an ’Oscar’ movie award
πfirst name,last name(σaward name=‘Oscar ’((PERSON ./ DIRECTOR) ./
MOVIE AWARD))
We would have:
πfirst name,last name((PERSON ./ DIRECTOR) ./ σaward name=‘Oscar ’(MOVIE AWARD))
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
DIRECTOR(id, title, production year)
MOVIE AWARD(title, production year, award name, year of award)
Query: List the first and last names of the directors who have directed
a movie that has won an ’Oscar’ movie award
πfirst name,last name((PERSON ./ DIRECTOR) ./ σaward name=‘Oscar ’(MOVIE AWARD))
Question: Can we apply the following rule to optimise the query?
πX (R1 ./ R2) ≡ πX (πX1(R1) ./ πX2(R2)),
where Xi contains attributes in both in R1 and R2, and ones in both Ri and X
Push-down Projection – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
DIRECTOR(id, title, production year)
MOVIE AWARD(title, production year, award name, year of award)
Query: List the first and last names of the directors who have directed
a movie that has won an ’Oscar’ movie award
πfirst name,last name((PERSON ./ DIRECTOR) ./ σaward name=‘Oscar ’(MOVIE AWARD))
we would have:
πfirst name,last name(πfirst name,last name,title,production year (PERSON ./
DIRECTOR) ./ πtitle,production year (σaward name=‘Oscar ’(MOVIE AWARD)))
A Common Query Pattern (Be Careful)
A common query pattern is join-select-project involving three steps:
(1) join all the relevant relations,
(2) select the desired tuples, and
(3) project on the required attributes.
This query pattern can be expressed as an RA expression
πA1,…,An (σϕ(R1 × · · · × Rk )),
or as an equivalent SQL statement
SELECT DISTINCT A1, . . . ,An FROM R1, . . . ,Rk WHERE ϕ;
Queries falling into this pattern can be very inefficient, which may yield
huge intermediate result for the joined relations.
A Common Query Pattern (Be Careful)
push-down selection and push-down projection.
πA1,…,An(σϕ(R1 × · · · × Rk )),
Re-ordering Joins – Example
Given the relation schemas:
PERSON(id, first name, last name, year born)
Suppose that it has 10000 tuples.
DIRECTOR(id, title, production year) with
[title, production year ] ⊆ MOVIE AWARD[title, production year ];
[id ] ⊆ PERSON[id ] and
Suppose that it has 100 tuples.
MOVIE AWARD(title, production year, award name, year of award)
Suppose that it has 1000 tuples.
Example: Consider the following two RA queries. Which one is better?
PERSON ./ MOVIE AWARD ./ DIRECTOR
PERSON ./ DIRECTOR ./ MOVIE AWARD
Cost-based Query Optimisation
A query optimiser does not depend solely on heuristic optimisation. It
estimates and compares the costs of different plans.
It estimates and compares the costs of executing a query using different
execution strategies and chooses one with the lowest cost estimate.
The query optimiser needs to limit the number of execution strategies to
be considered for improving efficiency.
Summary
In general, there are many ways of executing a query in a database.
The user expects the result to be returned promptly, i.e., the query should
be processed as fast as possible.
But, the burden of optimising queries should not be put on the user’s
shoulder. The DBMSs need to do the job!
Nonetheless, SQL is not a suitable query language in which queries can be
optimised automatically.
Instead, SQL queries are transformed into their corresponding RA
queries and optimised subsequently.
A major advantage of relational algebra is to make alternative forms of a
query easy to explore.
Query Optimisation