CS计算机代考程序代写 SQL database algorithm Relational Algebra

Relational Algebra

RELATIONAL ALGEBRA

Database Theory & Applications (M)

Dr Chris Anagnostopoulos

ROADMAP

 Objective: explore what is inside a Database System

 …from what to do going to how to do

 List different strategies for SQL processing

 Relational Algebra

 Challenge: Transform SQL into Relational Expression

 Challenge: Transform Relational Expression into Relational Tree

 Background for Query Optimization

Notation: Cardinality of a set R, notated by |R|, is the number of

elements in R

 R = {a, b, c} then |R| = 3

 R = { } then |R| = 0
2

RELATIONAL ALGEBRA

Algebra: set of operations: select, project, join, aggregate, grouping

SELECT E.DNO, D.DNAME, E.AVG(Salary)

FROM EMPLOYEE E, DEPARTMENT D

WHERE E.DNO = D.DNUMBER

GROUP BY E.DNO

Relational Expression: is a sequence of operations which defines

the execution order, i.e., algorithm.

Objective: Find the optimal sequence of operations that minimizes

the computational complexity of a SQL query.

What to do How to do How to optimally do

3

RELATIONAL ALGEBRA SYMBOLS

 Unary operations

 SELECT: σ (sigma) //selecting tuples

 PROJECT: π (pi) //selecting columns

 Binary operations

 JOIN (symbol: ): join two relations w.r.t., FK = PK

 Cartesian Product (symbol: X ): combine two relations

 Aggregation operations

 Aggregation functions: γ (gamma), e.g., COUNT, AVG, …

 Visit: https://en.wikipedia.org/wiki/Relational_algebra

4

SELECT ()

SELECT:  (sigma); select tuples w.r.t. condition.

SELECT *

FROM EMPLOYEE

WHERE DNO = 4

DNO=4(EMPLOYEE)

SELECT *

FROM EMPLOYEE

WHERE Salary < 30000 AND Salary > 10000

 Salary<30000 AND Salary>1000(EMPLOYEE)

Syntax: (R)

5

NESTED SELECTIONS ()

 σ is commutative, i.e., we can change the selection order

(< condition2> (R)) =  (< condition1> (R))

 Salary>30000(DNO=4(EMPLOYEE)) =  DNO=4(Salary>30000(EMPLOYEE))

Q: Which one do you prefer most?

A: Depends on size of the intermediate result

Objective: choose the sequence with the minimum intermediate results

size.

Challenge: Without executing a query, can you predict the size of an

intermediate result? [*]

[*] Anagnostopoulos, C. et al. (2017) Query-driven learning for predictive analytics of data

subspace cardinality. ACM Transactions on Knowledge Discovery from Data, 11(4), 47.

6

Lemma 1: 0 ≤ |(R)|≤|R|

|DNO=4(EMPLOYEE)| ≤ |EMPLOYEE|

Selectivity: percentage (%) of tuples retrieved:

|(R) |/|R|  [0%, 100%]

Challenge: For any query, predict the selectivity without query execution*

Decision: choose the strategy with the minimum predicted size:

  Salary>30000(DNO=4(EMPLOYEE))

  DNO=4(Salary>30000(EMPLOYEE))

[*] Anagnostopoulos, C. et al. (2017) Efficient Scalable Accurate Regression Queries in In-DBMS

Analytics. In: IEEE International Conference on Data Engineering (ICDE), CA, USA, Apr 2017

SELECT ()

7

[A1] WORKED EXAMPLE

Task 1: SQL → Algebra

 JobType=‘Faculty’(EMPLOYEE)

 JobType=‘Faculty’ AND Dept=‘CS’(EMPLOYEE)

 (EMPLOYEE)

Task 2: Selectivity of each SQL.

• 2/4 = 50%

• 1/4 = 25%

• 4/4 = 100%

SELECT *

FROM EMPLOYEE

WHERE JobType = ‘Faculty’

SELECT *

FROM EMPLOYEE

WHERE JobType = ‘Faculty’

AND Dept = ‘CS’

SELECT *

FROM EMPLOYEE

8

PROJECT ()

PROJECT:  (pi) show only specified columns.

SELECT LNAME, Salary

FROM EMPLOYEE

LNAME, Salary(EMPLOYEE)

Syntax: (R)

9

[A2] WORKED EXAMPLE

Task 1: SQL→ Algebra.

 ID(EMPLOYEE)

 ID, Dept(EMPLOYEE)

Task 2: Selectivity of each SQL.

• 100%

• 100%

SELECT ID

FROM EMPLOYEE

SELECT ID, Dept

FROM EMPLOYEE

10

SELECTION & PROJECTION EXPRESSION

SELECT FNAME, LNAME, Salary, DNO

FROM EMPLOYEE

WHERE DNO = 5;

FNAME, LNAME, SALARY,DNO(DNO=5(EMPLOYEE)) //select-then-project

 R1  DNO=5(EMPLOYEE)

 R2  FNAME, LNAME, SALARY,DNO(R1)

DNO=5(FNAME, LNAME, SALARY,DNO(EMPLOYEE)) //project-then-select

 R1  FNAME, LNAME, SALARY,DNO(EMPLOYEE)

 R2  DNO=5(R1)

11

PROJECT OR SELECT, FIRST?

Strategy 1: first select employees in DNO=5, then project onto the attributes

 i.e., reduce number of retrieved tuples, and then reduce dimensions

 FNAME, LNAME, SALARY,DNO(DNO=5(EMPLOYEE))

Q: Size of intermediate result?

Strategy 2: first project onto attributes, then select the employees in DNO=5

 i.e., reduce dimensions, and then reduce number of tuples

DNO=5(FNAME, LNAME, SALARY,DNO(EMPLOYEE))

 Q: Size of intermediate results?

 What if only one tuple satisfies DNO = 5?

 Strategy 1 is better than Strategy 2, why?

 What if all employees satisfy DNO = 5?

 Strategy 1 is as good as Strategy 2, why?

12

[A3] WORKED EXAMPLE

SELECT W.ESSN, W.HOURS

FROM WORKS_ON AS W

WHERE W.PNO = 1

 Task 1: Propose two alternative strategies.

 Task 2: Propose the best alternative for each Case.

13

ESSN PNO HOURS

1 1 20

1 2 10

2 2 5

3 2 30

ESSN PNO HOURS

1 1 20

1 2 10

2 1 5

3 1 30

Case I Case II

[A3] WORKED EXAMPLE

SELECT W.ESSN, W.HOURS

FROM WORKS_ON AS W

WHERE W.PNO = 1

A1: W.ESSN, W.HOURS(W.PNO=1(WORKS_ON))

A2: W.PNO=1(W.ESSN, W.HOURS(WORKS_ON))

14

ESSN PNO HOURS

1 1 20

1 2 10

2 2 5

3 2 30

ESSN PNO HOURS

1 1 20

1 2 10

2 1 5

3 1 30

Case I Case II

LESSON LEARNT SO FAR…

 Heuristic Rule 1: first select and then project π(σ)

15

JOIN & CARTESIAN PRODUCT

JOIN operation ( ) concatenates only associated tuples from R and S:

R FK = PK S

Cartesian product ( X ) concatenates all tuples from R and S: R X S

SELECT *

FROM EMPLOYEE, DEPARTMENT

WHERE MGR_SSN = SSN

EMPLOYEE SSN = MGR_SSN DEPARTMENT

SELECT *

FROM EMPLOYEE, DEPARTMENT

EMPLOYEE X DEPARTMENT

Lemma 2: |R PK = FK S| ≤ |R|
. |S| = |R X S|

Why? Because, only associated tuples appear in the result.
16

REASONING ABOUT JOIN

Lemma 3: R FK = PK S = σFK=PK (R X S)

 Strategy 1: EMPLOYEE SSN = MGR_SSNDEPARTMENT

 Strategy 2: σSSN=MGR_SSN(EMPLOYEE X DEPARTMENT)

Context:

 EMPLOYEE: 1000 tuples. Some employees are managers of departments.

 DEPARTMENT : 5 tuples. Each department is managed by one manager.

Q1: Which is the expected number of tuples in Strategy 1?

A1: Only 5 employees will be retrieved; we have only 5 departments

Q2: Which is the expected size of the intermediate result in Strategy 2?

A2: 1000 x 5 = 5000 combined tuples (including fictitious)

 But: only 5 tuples satisfy the condition, i.e., 5/5000 = 0.1% of tuples should
have been matched…rest 99.9% is noise 

17

LESSON LEARNT SO FAR…

 Heuristic Rule 1: first select and then project π(σ)

 Heuristic Rule 2: always use over FK and PK to

join two tables and never the Cartesian product.

18

SELECT FNAME, LNAME

FROM EMPLOYEE, DEPARTMENT

WHERE DNAME = ‘Research’ AND DNUMBER = DNO

◼ There are 10 departments and ‘Research’ has 15 employees

◼ There are 1000 employees

Strategy 1:

FNAME,LNAME(σDNAME=‘Research’ AND DNUMBER=DNO(DEPARTMENT X EMPLOYEE))

//10,000 tuples

Strategy 2:

FNAME, LNAME(σDNAME=‘Research’ (DEPARTMENT DNUMBER=DNOEMPLOYEE))

//1,000 tuples

R1  DEPARTMENT DNUMBER=DNO EMPLOYEE

R2  σDNAME=‘Research’ (R1)

RESULT  π FNAME, LNAME (R2)

PUT ALL TOGETHER

19

SELECT FNAME, LNAME

FROM EMPLOYEE, DEPARTMENT

WHERE DNAME = ‘Research’ AND DNUMBER = DNO

◼ There are 10 departments and ‘Research’ has 15 employees

◼ There are 1000 employees

Strategy 3:

FNAME, LNAME(EMPLOYEE DNO = DNUMBER (σDNAME = ‘Research’ (DEPARTMENT)))

//1 + 15 = 16 tuples

//Optimal is: 15 tuples…very close!

R1  σDNAME=‘Research’(DEPARTMENT)

R2  (R1 DNUMBER= DNOEMPLOYEE)

RESULT  πFNAME, LNAME(R2)

10,000 tuples → 1,000 tuples (90% reduction) → 16 tuples (99.84% reduction)

PUT ALL TOGETHER

20

LESSON LEARNT SO FAR…

 Heuristic Rule 1: first select and then project π(σ)

 Heuristic Rule 2: always use over FK and PK to

join two tables and never the Cartesian product.

 Heuristic Rule 3: apply the select operation before

joining tables to significantly reduce the

intermediate result size. 21

[A4] WORKED EXAMPLE

Task 1: Propose a good strategy/expression

PAPER(AID, PAPERID, URL)

AUTHOR(ID,NAME)

//PAPER.AID references AUTHOR.ID

Context:

 There are 500 authors and 10,000 papers

 Chris is unique and has written 100 papers

 Each author has written at least one paper

SELECT P.URL

FROM PAPER AS P, AUTHOR AS A

WHERE A.NAME = ‘Chris’ AND A.ID = P.AID;
22

[A4] WORKED EXAMPLE

SELECT P.URL

FROM PAPER AS P, AUTHOR AS A

WHERE A.NAME = ‘Chris’ AND A.ID = P.AID;

π
P.URL


Α.ΝΑΜΕ=‘Chris’AND A.ID = P.AID

(PAPER X AUTHOR))

//500 x 10,000 = 5,000,000 tuples

π
P.URL


Α.ΝΑΜΕ=‘Chris’

(PAPER
P.AID=A.ID

AUTHOR))

//retrieve papers of each author; search for Chris’ papers’ URLs

//10,000 tuples

π
P.URL

(PAPER
P.AID=A.ID


Α.ΝΑΜΕ=‘Chris’

(AUTHOR))

//retrieve only Chris; get his papers’ URLs.

//1 + 100 = 101 tuples; 99.998% reduction of complexity! 23

RELATIONAL TREE

 A data-structure representing a query via algebra:
 Tree Nodes: intermediate results, e.g., selection, projection, join,

 Leaf Nodes: base relations.

 Root: the final query result.

 Interpretation:
 visual feel of the complexity of the query

 execution order of the operations—start from leaves to root

 Transform a tree into another optimal tree: Heuristic Optimization 24

SELECT E.DNO, E.LNAME

FROM EMPLOYEE AS E

WHERE E.DNO = 5;

EMPLOYEE

E

σE.DNO=5

πE.LNAME, E.DNOroot

leaves

operator

πE.LNAME,E.DNO(σE.DNO=5(EMPLOYEE))

σE.DNO=5(πE.LNAME,E.DNO(EMPLOYEE))

EMPLOYEE

E

πE.LNAME, E.DNO

σE.DNO=5

IN-CLASS EXAMPLE: TREES

25

SELECT FNAME, LNAME

FROM EMPLOYEE E, DEPARTMENT D

WHERE D.DNAME=‘Research’ AND D.DNUMBER=E.DNO

FNAME, LNAME(σDNAME=‘Research’ (DEPARTMENT DNUMBER=DNOEMPLOYEE))

DEPARTMENT

D

EMPLOYEE

E

D.DNUMBER=E.DNO

σD.DNAME= ‘Research’

πFNAME, LNAME
root

leaves

operator

operator

26

[A5] WORKED EXAMPLE

Task 1: Draw the two relational trees of the SQL:

SELECT P.URL

FROM PAPER AS P, AUTHOR AS A

WHERE A.NAME = ‘Chris’ AND A.ID = P.AID;

π
P.URL


Α.ΝΑΜΕ=‘Chris’

(PAPER
P.AID=A.ID

AUTHOR))

π
P.URL

(PAPER
P.AID=A.ID


Α.ΝΑΜΕ=‘Chris’

(AUTHOR))

27

[A5] WORKED EXAMPLE

SELECT P.URL

FROM PAPER AS P, AUTHOR AS A

WHERE A.NAME = ‘Chris’ AND A.ID = P.AID;

28

PAPER

P

AUTHOR

A

P.AID = A.ID

σA.Name= ‘Chris’

πP.URL

PAPER

P

AUTHOR

A

P.AID = A.ID

σA.Name= ‘Chris’

πP.URL

first join and the select… first select and the join…