File Structures
HEURISTIC QUERY OPTIMIZATION
Database Theory & Applications (M)
Dr Chris Anagnostopoulos
ROADMAP
Transformation of the canonical Tree to the optimal Tree
Re-writing equivalent trees
Heuristic inferential rules
Examples
Selection-Projection-Join Queries
Aggregation Analytics Queries
2
HEURISTIC QUERY OPTIMIZATION
Input: SQL query is issued by the user.
Step 1: Parser creates a naïve (canonical) tree T of query.
Step 2: Optimizer applies inferential rules to transform tree T to an
equivalent and efficient tree T*
i.e., T* produces exactly the same results as T after re-arranging
algebra operators.
Step 3: Generate code for optimal tree T*…within 2-2.5 ms
Step 4: Execute the operators with the order dictated in T*
Step 5: Return/show the results to the user.
3
CANONICAL TREE FROM PARSER
SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
FROM PROJECT P, DEPARTMENT D, EMPLOYEE E
WHERE P.Dnum=D.Dnumber AND D.mgr_ssn=E.ssn AND P.Plocation=‘Stafford’
4
TREE TRANSFORMATION
Canonical Tree A: simplistic, very expensive, involving Cartesian
products, having intermediate results whose size is huge!
Optimizer transforms Tree A to an equivalent & efficient Tree B
B
A
5
HEURISTIC RULES
Rule 1: Cascade of Selection : σc1 AND c2 AND c3 AND c4 (R) = σc1(σc2(σc3(σc4 (R))))
Split a conjunction of selections into nested individual selections.
SELECT *
FROM EMPLOYEE
WHERE DNO=5 AND Salary = 10000 AND Age = 45
σDNO=5 AND Age=45 AND Salary = 10000(EMPLOYEE)
equivalent with:
σDNO=5( σAge=45 (σSalary = 10000(EMPLOYEE)))
6
Rule 2: Commutativity of Selection: σc1 (σc2 (R))) = σc2 (σc1 (R)))
Reorder nested selections; the result is the same.
σDNO=5( σAge=45 (σSalary = 10000(EMPLOYEE)))
equivalent with:
σSalary=10000(σAge=45 (σDNO=5(EMPLOYEE)))
…
7
HEURISTIC RULES
HEURISTIC RULES
Rule 3: Cascade of Projection: πlist1 (πlist2 (πlist3 … (πlistn(R)…) = πlist1(R)
Retrieve only the required columns; reduce dimensions.
πSSN(πSSN, NAME(EMPLOYEE))
equivalent with:
πSSN(EMPLOYEE)
given than all nested (inner) lists contain the attributes of the outer list.
8
HEURISTIC RULES
Rule 4: Commuting σ with π: σc (πx (R))) = πx (σc (R)))
First reduce size (select) and then reduce dimensions (project)
πDNO(σSalary = 10000(EMPLOYEE))
equivalent with:
σSalary=10000 (πSalary, DNO(EMPLOYEE))
Rule 5: Commutativity of ⋈ and ×: R ⋈c S = S ⋈c R; R × S = S × R
EMPLOYEE ⋈DNO=DNUMBER DEPARTMENT
equivalent with:
DEPARTMENT ⋈DNUMBER=DNO EMPLOYEE
9
Rule 6: Commuting σ over ⋈ (or ×): σc(R ⋈S) = (σc(R)) ⋈S
Reduce tuples in a relation before joining via applying selection
SELECT E.LNAME
FROM DEPARTMENT D, EMPLOYEE E
WHERE E.SSN = D.MGR_SSN AND E.SALARY > 10000
σE.SALARY>10000(EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT)
equivalent with:
(σE.SALARY>10000(EMPLOYEE))⋈E.SSN=D.MGR_SSN DEPERTMENT)
10
HEURISTIC RULES
Rule 6: Commuting σ over ⋈ (or ×)
σ(c1 AND c2)(R ⋈S) = (σc1(R)) ⋈ (σc2(S))
Reduce tuples in both relations before joining via splitting selections.
SELECT E.LNAME
FROM DEPARTMENT D, EMPLOYEE E
WHERE E.SSN = D.MGR_SSN AND E.SALARY > 10000 AND D.DNUMBER IN (1,2,3)
σE.SALARY>10000 AND D.DNUMBER IN (1,2,3)(EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT)
equivalent with:
(σE.SALARY>10000(EMPLOYEE))⋈E.SSN=D.MGR_SSN (σD.DNUMBER IN (1,2,3)(DEPERTMENT))
11
HEURISTIC RULES
HEURISTIC RULES
Rule 7: Commuting π over ⋈ (or ×): π(x1 ∪ x2)(R ⋈c S) = (πx1 (R)) ⋈c (πx2 (S))
Reduce dimensions first, and then join.
SELECT E.LNAME, D.DNAME
FROM DEPARTMENT D, EMPLOYEE E
WHERE E.SSN = D.MGR_SSN
πE.LNAME, D.DNAME(EMPLOYEE ⋈ E.SSN=D.MGR_SSNDEPARTMENT)
equivalent with:
(πE.LNAME, E.SSN(EMPLOYEE))⋈E.SSN=D.MGR_SSN (πD.DNAME,D.MGR_SSN(DEPARTMENT))
12
HEURISTIC RULES
Rule 8: Associating ⋈ (or ×):
(R ⋈S) ⋈T = R ⋈ (S ⋈T)
(R × S) × T = S × (R × T)
Re-arrange the execution order of joins/cartesian products.
(EMPLOYEE ⋈ SSN=MGR_SSNDEPARTMENT) ⋈ SSN=ESSNDEPENDENT
equivalent with:
(EMPLOYEE ⋈ SSN=ESSNDEPENDENT) ⋈ SSN=MGR_SSNDEPARTMENT
13
HEURISTIC RULES
Rule 9: Converting (σ, X) into ⋈: σc(R × S) = R ⋈c S
Always convert a selection over cartesian product to a join for efficiency!
σE.SSN=D.MGR_SSN(EMPLOYEE x DEPERTMENT)
equivalent with:
EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT
14
OPTIMIZATION IN PRACTICE
Find the last names of employees born after 1957 working on the project ‘Aquarius’
SELECT E.LNAME
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.PNAME = ‘Aquarius’ AND P.PNUMBER = W.PNO
AND E.ESSN = W.SSN AND E.BDATE > ’1957-12-31’
Canonical Tree
15
16
Rule 1: Split up
conjunctions of selections
into separate selections.
Rule 9: Selection of a
product is a join…
17
Rule 2: Since PNAME is
unique, we would benefit
if we execute first that
selection!
Idea: Swap the order of
selections in light of
reducing intermediate
result size at the
beginning.
Idea: Check if ‘Aquarius’
exists ☺
Note: First select on
BDATE to reduce tuples,
and then join…(reduction
of tuples)
18
Note: Now, the first
operation to be performed
is the selection on PNAME
Does ‘Aquarius’ exist?
Note: The intermediate
result is much smaller,
due to the product of
WORKS_ON and
PROJECT with only one
tuple!
19
Rule 9: Replace selection
over cross product with a
join, thus, filtering out
irrelevant tuples during
the join!
(reduction of tuples)
Rule 4: Cascade projections in
order to retrieve those
columns that we need!
(reduction of attributes)
20
21
HEURISTIC CHECK LIST
Break up conjunctions of selections into separate selections
Move the broken selections as far down the tree as possible
Check: Selections should be moved close to leaves reducing the tuples;
Check: PK=FK joins should replace selections over products
Rearrange the leaves so that start the selection with the smallest size first.
Ensure that no Cartesian product exists in the tree.
Inject projections after selections and before joins thus retrieving only the
attributes we need!
22
[A1] WORKED EXAMPLE
Movie(ActorName, Title, Producer, URL)
Actor(Name, YearBirth, Address, City)
SELECT M.Title
FROM Movie M, Actor A
WHERE M.ActorName = A.Name AND A.YearBirth = 1955
Task 1: Draw the canonical tree
Task 2: Optimize the query
23
[A1] WORKED EXAMPLE
Canonical Tree
SELECT M.Title
FROM Movie M, Actor A
WHERE M.ActorName = A.Name AND A.YearBirth = 1955
24
M A
X
σActorName=Name AND YearBirth = 1955
πTitle
[A1] WORKED EXAMPLE
SELECT M.Title
FROM Movie M, Actor A
WHERE M.ActorName = A.Name AND A.YearBirth = 1955
25
M A
πName
σYearBirth = 1955
πActorName, Title
⋈ActorName=Name
πTitle
M A
σYearBirth = 1955
⋈ActorName=Name
πTitle
[A2] WORKED EXAMPLE
EMPLOYEE(SSN, SUPER_SSN, LNAME, …)
DEPARTMENT(DNUMBER, MGR_SSN, …)
SELECT E1.LNAME
FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D
WHERE E1.SUPER_SSN = E2.SSN
AND E2.SSN = D.MGR_SSN AND D.DNUMBER = 5
Context: Each supervisor is supervising up to 10 employees. There
are 100 employees.
Task 1: Draw the canonical tree
Task 2: Optimize the query 26
[A2] WORKED EXAMPLE
SELECT E1.LNAME
FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D
WHERE E1.SUPER_SSN = E2.SSN
AND E2.SSN = D.MGR_SSN AND D.DNUMBER = 5
D E2 E1
x
x
σE1.SUPER_SSN=E2.SSN AND E2.SSN=D.MGR_SSN AND D.DNUMBER = 5
πE1.NAME
27
D E2
E1
x
x
σE1.SUPER_SSN=E2.SSN AND E2.SSN=D.MGR_SSN AND D.DNUMBER = 5
πE1.NAME
D E2
πE1.LNAME
σD.DNUMBER=5
D.MGR_SSN= E2.SSN
E2.SSN =E1.SUPER_SSN
E1
1 tuple
1 tuple
up to 10 tuples
Context: Each supervisor is supervising up to 10 employees.
28
D E2
πE1.LNAME
σD.DNUMBER=5
D.MGR_SSN= E2.SSN
E2.SSN =E1.SUPER_SSN
E1
D E2
σD.DNUMBER=5
D.MGR_SSN= E2.SSN
E2.SSN =E1.SUPER_SSN
E1
πE1.LNAME
πE1.LNAME,
E1.SUPER_SSN
πE2.SSN
πD.MGR_SSN
πE2.SSN
29
E1 E2
E1.SUPER_SSN= E2.SSN
E2.SSN =D.MGR_SSN
D
πE1.LNAME
πE2.SSN
D E2
σD.DNUMBER=5
E2.SSN =D.MGR_SSN
E2.SSN =E1.SUPER_SSN
E1
πE1.LNAME
πE1.LNAME,
E1.SUPER_SSN
πE2.SSN
πD.MGR_SSN
πE1.LNAME, E1.SUPER_SSN
πE1.LNAME,E2.SSN
σD.DNUMBER=5
πD.MGR_SSN
One manager per
department!
Up to 10 employees are
supervised by Dept 5’s manager
All pairs
(employees, supervisors)
For each
supervisor,
check if they
are Dept 5’
manager
πE2.SSN
up to 100 tuples
up to 10 tuples
Firstly, start
the selection
with the
smallest
result set!
1 tuple
1 tuple
up to 10 tuples
30
[A3] WORKED EXAMPLE
SELECT D.MGRSTARTDATE
FROM DEPARTMENT D
WHERE D.MGR_SSN IN (SELECT E.SSN
FROM DEPENDENT L, EMPLOYEE E
WHERE L.ESSN = E.SSN AND E.AGE > 45)
AND D.DNAME IN (‘Research’, ‘HR’)
Task 1: Optimize the query…
Task 2: Which is the expected number of tuples retrieved?
Context:
On average, each employee over 45 has 2 dependents.
There are 100 employees over 45.
The DNAME (department name) is unique.
31
SELECT D.MGRSTARTDATE
FROM DEPARTMENT D
WHERE D.MGR_SSN IN (SELECT E.SSN
FROM DEPENDENT L, EMPLOYEE E
WHERE L.ESSN = E.SSN AND E.AGE > 45)
AND D.DNAME IN (‘Research’, ‘HR’)
32
DEPENDENT EMPLOYEE
L E
σΕ.AGE>45
πL.ESSN
⋈L.ESSN = E.SSN
πΕ.SSN
DEPARTMENT
D
σD.DNAME IN (‘Research’, ‘HR’)
⋈D.MGR_SSN = E.SSN
πD.MGR_SSN, D. MGRSTARTDATE
πD.MGRSTARTDATE
πE.SSN
2 tuples
~ 200 tuples
100 tuples
up to 2 tuples
OPTIMIZING ANALYTICS QUERIES (OPTIONAL)
SELECT DNO, DNAME, COUNT(*), AVG(Salary)
FROM EMPLOYEE, DEPARTMENT
WHERE DNO = DNUMBER AND LNAME LIKE ‘%Chris%’
GROUP BY DNO
HAVING AVG(Salary) > 10000
σ
AVG(Salary)>10000
(
DNOγCOUNT(*), DNAME, AVG(Salary)(
DEPARTMENT ⋈DNUMBER=DNO(σLNAME LIKE ‘%Chris%’(EMPLOYEE))
)
)
33
OPTIMIZING ANALYTICS QUERIES (OPTIONAL)
σ
AVG(Salary)>10000
(DNOγDNAME, COUNT(*), AVG(Salary)
(DEPARTMENT ⋈DNUMBER=DNO (σLNAME LIKE ‘%Chris%’(EMPLOYEE))))
E
πE.DNO, E.Salary
σE.LNAME LIKE ‘%Chris%’
E.DNOγCOUNT(*),DNAME, AVG(E.Salary)
σAVG(Salary)>10000
D
⋈DNUMBER=DNO
πD.DNAME, D.DNUMBER
34