CS计算机代考程序代写 SQL database File Structures

File Structures

HEURISTIC QUERY OPTIMIZATION

Database Theory & Applications (M)

Dr Chris Anagnostopoulos

ROADMAP

 Transformation of the canonical Tree to the optimal Tree

Re-writing equivalent trees

 Heuristic inferential rules

 Examples

 Selection-Projection-Join Queries

 Aggregation Analytics Queries

2

HEURISTIC QUERY OPTIMIZATION

 Input: SQL query is issued by the user.

 Step 1: Parser creates a naïve (canonical) tree T of query.

 Step 2: Optimizer applies inferential rules to transform tree T to an

equivalent and efficient tree T*

 i.e., T* produces exactly the same results as T after re-arranging

algebra operators.

 Step 3: Generate code for optimal tree T*…within 2-2.5 ms

 Step 4: Execute the operators with the order dictated in T*

 Step 5: Return/show the results to the user.

3

CANONICAL TREE FROM PARSER

SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate

FROM PROJECT P, DEPARTMENT D, EMPLOYEE E

WHERE P.Dnum=D.Dnumber AND D.mgr_ssn=E.ssn AND P.Plocation=‘Stafford’

4

TREE TRANSFORMATION

Canonical Tree A: simplistic, very expensive, involving Cartesian

products, having intermediate results whose size is huge!

Optimizer transforms Tree A to an equivalent & efficient Tree B

B

A
5

HEURISTIC RULES

Rule 1: Cascade of Selection : σc1 AND c2 AND c3 AND c4 (R) = σc1(σc2(σc3(σc4 (R))))

Split a conjunction of selections into nested individual selections.

SELECT *

FROM EMPLOYEE

WHERE DNO=5 AND Salary = 10000 AND Age = 45

σDNO=5 AND Age=45 AND Salary = 10000(EMPLOYEE)

equivalent with:

σDNO=5( σAge=45 (σSalary = 10000(EMPLOYEE)))

6

Rule 2: Commutativity of Selection: σc1 (σc2 (R))) = σc2 (σc1 (R)))

Reorder nested selections; the result is the same.

σDNO=5( σAge=45 (σSalary = 10000(EMPLOYEE)))

equivalent with:

σSalary=10000(σAge=45 (σDNO=5(EMPLOYEE)))

7

HEURISTIC RULES

HEURISTIC RULES

Rule 3: Cascade of Projection: πlist1 (πlist2 (πlist3 … (πlistn(R)…) = πlist1(R)

Retrieve only the required columns; reduce dimensions.

πSSN(πSSN, NAME(EMPLOYEE))

equivalent with:

πSSN(EMPLOYEE)

given than all nested (inner) lists contain the attributes of the outer list.

8

HEURISTIC RULES

Rule 4: Commuting σ with π: σc (πx (R))) = πx (σc (R)))

First reduce size (select) and then reduce dimensions (project)

πDNO(σSalary = 10000(EMPLOYEE))

equivalent with:

σSalary=10000 (πSalary, DNO(EMPLOYEE))

Rule 5: Commutativity of ⋈ and ×: R ⋈c S = S ⋈c R; R × S = S × R

EMPLOYEE ⋈DNO=DNUMBER DEPARTMENT

equivalent with:

DEPARTMENT ⋈DNUMBER=DNO EMPLOYEE

9

Rule 6: Commuting σ over ⋈ (or ×): σc(R ⋈S) = (σc(R)) ⋈S

Reduce tuples in a relation before joining via applying selection

SELECT E.LNAME

FROM DEPARTMENT D, EMPLOYEE E

WHERE E.SSN = D.MGR_SSN AND E.SALARY > 10000

σE.SALARY>10000(EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT)

equivalent with:

(σE.SALARY>10000(EMPLOYEE))⋈E.SSN=D.MGR_SSN DEPERTMENT)

10

HEURISTIC RULES

Rule 6: Commuting σ over ⋈ (or ×)

σ(c1 AND c2)(R ⋈S) = (σc1(R)) ⋈ (σc2(S))

Reduce tuples in both relations before joining via splitting selections.

SELECT E.LNAME

FROM DEPARTMENT D, EMPLOYEE E

WHERE E.SSN = D.MGR_SSN AND E.SALARY > 10000 AND D.DNUMBER IN (1,2,3)

σE.SALARY>10000 AND D.DNUMBER IN (1,2,3)(EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT)

equivalent with:

(σE.SALARY>10000(EMPLOYEE))⋈E.SSN=D.MGR_SSN (σD.DNUMBER IN (1,2,3)(DEPERTMENT))

11

HEURISTIC RULES

HEURISTIC RULES

Rule 7: Commuting π over ⋈ (or ×): π(x1 ∪ x2)(R ⋈c S) = (πx1 (R)) ⋈c (πx2 (S))

Reduce dimensions first, and then join.

SELECT E.LNAME, D.DNAME

FROM DEPARTMENT D, EMPLOYEE E

WHERE E.SSN = D.MGR_SSN

πE.LNAME, D.DNAME(EMPLOYEE ⋈ E.SSN=D.MGR_SSNDEPARTMENT)

equivalent with:

(πE.LNAME, E.SSN(EMPLOYEE))⋈E.SSN=D.MGR_SSN (πD.DNAME,D.MGR_SSN(DEPARTMENT))

12

HEURISTIC RULES

Rule 8: Associating ⋈ (or ×):

(R ⋈S) ⋈T = R ⋈ (S ⋈T)

(R × S) × T = S × (R × T)

Re-arrange the execution order of joins/cartesian products.

(EMPLOYEE ⋈ SSN=MGR_SSNDEPARTMENT) ⋈ SSN=ESSNDEPENDENT

equivalent with:

(EMPLOYEE ⋈ SSN=ESSNDEPENDENT) ⋈ SSN=MGR_SSNDEPARTMENT

13

HEURISTIC RULES

Rule 9: Converting (σ, X) into ⋈: σc(R × S) = R ⋈c S

Always convert a selection over cartesian product to a join for efficiency!

σE.SSN=D.MGR_SSN(EMPLOYEE x DEPERTMENT)

equivalent with:

EMPLOYEE ⋈E.SSN=D.MGR_SSN DEPERTMENT

14

OPTIMIZATION IN PRACTICE

Find the last names of employees born after 1957 working on the project ‘Aquarius’

SELECT E.LNAME

FROM EMPLOYEE E, WORKS_ON W, PROJECT P

WHERE P.PNAME = ‘Aquarius’ AND P.PNUMBER = W.PNO

AND E.ESSN = W.SSN AND E.BDATE > ’1957-12-31’

Canonical Tree

15

16

Rule 1: Split up

conjunctions of selections

into separate selections.

Rule 9: Selection of a

product is a join…

17

Rule 2: Since PNAME is

unique, we would benefit

if we execute first that

selection!

Idea: Swap the order of

selections in light of

reducing intermediate

result size at the

beginning.

Idea: Check if ‘Aquarius’

exists ☺

Note: First select on

BDATE to reduce tuples,

and then join…(reduction

of tuples)

18
Note: Now, the first

operation to be performed

is the selection on PNAME

Does ‘Aquarius’ exist?

Note: The intermediate

result is much smaller,

due to the product of

WORKS_ON and

PROJECT with only one

tuple!

19

Rule 9: Replace selection

over cross product with a

join, thus, filtering out

irrelevant tuples during

the join!

(reduction of tuples)

Rule 4: Cascade projections in

order to retrieve those

columns that we need!

(reduction of attributes)

20

21

HEURISTIC CHECK LIST

 Break up conjunctions of selections into separate selections

 Move the broken selections as far down the tree as possible

 Check: Selections should be moved close to leaves reducing the tuples;

 Check: PK=FK joins should replace selections over products

 Rearrange the leaves so that start the selection with the smallest size first.

 Ensure that no Cartesian product exists in the tree.

 Inject projections after selections and before joins thus retrieving only the

attributes we need!

22

[A1] WORKED EXAMPLE

Movie(ActorName, Title, Producer, URL)

Actor(Name, YearBirth, Address, City)

SELECT M.Title

FROM Movie M, Actor A

WHERE M.ActorName = A.Name AND A.YearBirth = 1955

Task 1: Draw the canonical tree

Task 2: Optimize the query
23

[A1] WORKED EXAMPLE

Canonical Tree

SELECT M.Title

FROM Movie M, Actor A

WHERE M.ActorName = A.Name AND A.YearBirth = 1955

24

M A

X

σActorName=Name AND YearBirth = 1955

πTitle

[A1] WORKED EXAMPLE

SELECT M.Title

FROM Movie M, Actor A

WHERE M.ActorName = A.Name AND A.YearBirth = 1955

25

M A

πName

σYearBirth = 1955
πActorName, Title

⋈ActorName=Name

πTitle

M A

σYearBirth = 1955

⋈ActorName=Name

πTitle

[A2] WORKED EXAMPLE

EMPLOYEE(SSN, SUPER_SSN, LNAME, …)

DEPARTMENT(DNUMBER, MGR_SSN, …)

SELECT E1.LNAME

FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D

WHERE E1.SUPER_SSN = E2.SSN

AND E2.SSN = D.MGR_SSN AND D.DNUMBER = 5

Context: Each supervisor is supervising up to 10 employees. There

are 100 employees.

Task 1: Draw the canonical tree

Task 2: Optimize the query 26

[A2] WORKED EXAMPLE

SELECT E1.LNAME

FROM EMPLOYEE E1, EMPLOYEE E2, DEPARTMENT D

WHERE E1.SUPER_SSN = E2.SSN

AND E2.SSN = D.MGR_SSN AND D.DNUMBER = 5

D E2 E1

x

x

σE1.SUPER_SSN=E2.SSN AND E2.SSN=D.MGR_SSN AND D.DNUMBER = 5

πE1.NAME

27

D E2
E1

x

x

σE1.SUPER_SSN=E2.SSN AND E2.SSN=D.MGR_SSN AND D.DNUMBER = 5

πE1.NAME

D E2

πE1.LNAME

σD.DNUMBER=5

D.MGR_SSN= E2.SSN

E2.SSN =E1.SUPER_SSN

E1

1 tuple

1 tuple

up to 10 tuples

Context: Each supervisor is supervising up to 10 employees.

28

D E2

πE1.LNAME

σD.DNUMBER=5

D.MGR_SSN= E2.SSN

E2.SSN =E1.SUPER_SSN

E1

D E2

σD.DNUMBER=5

D.MGR_SSN= E2.SSN

E2.SSN =E1.SUPER_SSN

E1

πE1.LNAME

πE1.LNAME,

E1.SUPER_SSN

πE2.SSN

πD.MGR_SSN

πE2.SSN

29

E1 E2

E1.SUPER_SSN= E2.SSN

E2.SSN =D.MGR_SSN

D

πE1.LNAME

πE2.SSN

D E2

σD.DNUMBER=5

E2.SSN =D.MGR_SSN

E2.SSN =E1.SUPER_SSN

E1

πE1.LNAME

πE1.LNAME,

E1.SUPER_SSN

πE2.SSN

πD.MGR_SSN

πE1.LNAME, E1.SUPER_SSN

πE1.LNAME,E2.SSN

σD.DNUMBER=5

πD.MGR_SSN

One manager per

department!

Up to 10 employees are

supervised by Dept 5’s manager

All pairs

(employees, supervisors)

For each

supervisor,

check if they

are Dept 5’

manager

πE2.SSN

up to 100 tuples

up to 10 tuples

Firstly, start

the selection

with the

smallest

result set!

1 tuple

1 tuple

up to 10 tuples

30

[A3] WORKED EXAMPLE

SELECT D.MGRSTARTDATE

FROM DEPARTMENT D

WHERE D.MGR_SSN IN (SELECT E.SSN

FROM DEPENDENT L, EMPLOYEE E

WHERE L.ESSN = E.SSN AND E.AGE > 45)

AND D.DNAME IN (‘Research’, ‘HR’)

Task 1: Optimize the query…

Task 2: Which is the expected number of tuples retrieved?

Context:

 On average, each employee over 45 has 2 dependents.

 There are 100 employees over 45.

 The DNAME (department name) is unique.
31

SELECT D.MGRSTARTDATE

FROM DEPARTMENT D

WHERE D.MGR_SSN IN (SELECT E.SSN

FROM DEPENDENT L, EMPLOYEE E

WHERE L.ESSN = E.SSN AND E.AGE > 45)

AND D.DNAME IN (‘Research’, ‘HR’)

32

DEPENDENT EMPLOYEE

L E

σΕ.AGE>45

πL.ESSN

⋈L.ESSN = E.SSN

πΕ.SSN

DEPARTMENT

D

σD.DNAME IN (‘Research’, ‘HR’)

⋈D.MGR_SSN = E.SSN

πD.MGR_SSN, D. MGRSTARTDATE

πD.MGRSTARTDATE

πE.SSN

2 tuples

~ 200 tuples

100 tuples

up to 2 tuples

OPTIMIZING ANALYTICS QUERIES (OPTIONAL)

SELECT DNO, DNAME, COUNT(*), AVG(Salary)

FROM EMPLOYEE, DEPARTMENT

WHERE DNO = DNUMBER AND LNAME LIKE ‘%Chris%’

GROUP BY DNO

HAVING AVG(Salary) > 10000

σ
AVG(Salary)>10000

(

DNOγCOUNT(*), DNAME, AVG(Salary)(

DEPARTMENT ⋈DNUMBER=DNO(σLNAME LIKE ‘%Chris%’(EMPLOYEE))

)

)

33

OPTIMIZING ANALYTICS QUERIES (OPTIONAL)

σ
AVG(Salary)>10000

(DNOγDNAME, COUNT(*), AVG(Salary)
(DEPARTMENT ⋈DNUMBER=DNO (σLNAME LIKE ‘%Chris%’(EMPLOYEE))))

E

πE.DNO, E.Salary

σE.LNAME LIKE ‘%Chris%’

E.DNOγCOUNT(*),DNAME, AVG(E.Salary)

σAVG(Salary)>10000

D

⋈DNUMBER=DNO

πD.DNAME, D.DNUMBER

34