Query Processing
Query Processing – Overview
1 Users submit SQL queries to a DBMS.
2 The DBMS processes and executes them in a database.
Note: SQL is a declarative language, so it is the task of DBMSs to decide
how SQL queries should be executed.
Query Processing – Example
From:
SELECT name FROM Person WHERE age<21; To: name Rickon Bran Questions: How does a relational DBMS process this? How can a relational DBMS process this efficiently? Query Processing – Example SELECT name FROM Person WHERE age<21; High-level language (SQL) ⇓ πname(σage<21(Person)) Low-level language (Relational Algebra) ⇓ Person 21σage< nameπ Execution plan (Query tree) ⇓ name Rickon Bran Query result Query Processing – Example Query Processing Steps Query parser and translator 1 Check the syntax of SQL queries 2 Verify that the relations do exist 3 Transform into relational algebra expressions Query optimiser 1 Transform into the best possible execution plan 2 Specify the implementation of each operator in the execution plan Evaluation engine 1 Evaluate the query execution plan 2 Return the result to the user Query Processing – Parser The parser checks the syntax of the query: Validation of table names, attributes, data types, access permission ...; Either the query is executable or an error message is generated. Query Processing – Parser Consider the relation schema: Person(id:integer, name:string, age:integer, address:string) Note: System catalog (also called data dictionary) is used at this stage, which contains the information about data managed by the DBMS. Example: attr name rel name type position id Person integer 1 name Person string 2 age Person integer 3 address Person string 4 . . . . . . . . . . . . Question: Can the following query be accepted by the parser? SELECT fname, lname FROM Person WHERE address<21; Query Processing – Parser Consider the relation schema: Person(id:integer, name:string, age:integer, address:string) Question: Can the following query be accepted by the parser? SELECT fname, lname FROM Person WHERE address<21; Answer: The query would be rejected because 1 The attributes fname and lname are not defined; 2 The attribute address is not comparable with 21. Query Processing – Translator The translator translates queries into RA expressions (not necessarily equivalent due to duplicates): A query is first decomposed into query blocks. Each query block is translated into an RA expression. Recall: RA and SQL Queries RA operators selection σϕ projection πA1,...,An Cartesian product R1 × R2 join R1 ./ϕ R2 and R1 ./ R2 renaming ρR(A1,...,An) union R1 ∪ R2 intersection R1 ∩ R2 difference R1 − R2 SQL statement SELECT attribute_list FROM table_list [WHERE condition] [GROUP BY attribute_list [HAVING group_condition]] [ORDER BY attribute_list]; σϕ(R)⇔ SELECT * FROM R WHERE ϕ; πA1,...,An (R)⇔ SELECT DISTINCT A1, . . . ,An FROM R; R1 × R2 ⇔ SELECT DISTINCT * FROM R1, R2; . . . Aggregate operations in SQL require extended RA expressions. Recall: RA and SQL Queries Nested subqueries are decomposed into separate query blocks. Example: SELECT Lname, Fname FROM Employee WHERE Salary > (SELECT Salary
FROM Employee
WHERE ssn=5);
Outer query block
SELECT Lname, Fname FROM Employee WHERE
Salary > c
⇓ translated
πLname,Fname(σSalary>c (EMPLOYEE))
Inner query block
(SELECT Salary FROM Employee WHERE
ssn=5)
⇓ translated
πSalary (σssn=5(EMPLOYEE))
Query Processing – Query Optimiser
1 Transform into the best possible execution plan
There are different possible relational algebra expressions for a single
query!
(will be covered in this course)
2 Specify the implementation of each operator in the execution plan
There are different possible implementations for a relational algebra
operator!
(will not be covered in this course)
Query Processing – Query Optimiser
SQL queries only specify what data to be retrieved and not how to
retrieve data.
There are many possible execution plans for a SQL query.
Query optimiser is responsible for identifying an efficient execution plan:
1 enumerating alternative plans (typically, a subset of all possible plans);
2 choosing the one with the least estimated cost.
Query optimisation is one of the most important tasks of a relational DBMS.
A good DBMS must have a good query optimiser!
Equivalent RA Expressions
Suppose that we have:
Students(matNr, firstName, lastName, email)
Exams(matNr, crsNr, result, semester)
Courses(crsNr, title, unit)
SELECT lastName, result, title
FROM Students, Exams, Courses
WHERE Students.matNr=Exams.matNr AND
Exams.crsNr=Courses.crsNr AND result≤1.3;
Question:
How many equivalent RA expressions for this SQL query can you find?
Equivalent RA Expressions
Students(matNr, firstName, lastName, email)
Exams(matNr, crsNr, result, semester)
Courses(crsNr, title, unit)
SELECT lastName, result, title
FROM Students, Exams, Courses
WHERE Students.matNr=Exams.matNr AND
Exams.crsNr=Courses.crsNr AND result≤1.3;
Answer:
1 πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams)
./σExams.crsNr=Courses.crsNr Courses))
2 πlastName,result,title(σresult≤1.3(σEXAMS.crsNr=Courses.crsNr
(σStudents.matNr=Exams.matNr(Students× Exams× Courses))))
3 πlastName,result,title((Students ./Students.matNr=Exams.matNr
(σresult≤1.3(Exams))) ./σExams.crsNr=Courses.crsNr Courses)
4 . . .
Query Trees
Each RA expression can be represented as a query tree:
leaf nodes represent the input relations;
internal nodes represent the intermediate result;
the root node represents the resulting relation.
Example:
πlastName,result,title(σresult≤1.3((Students ./Students.matNr=Exams.matNr Exams)
./σExams.crsNr=Courses.crsNr Courses))
Query Trees
Exercise: Can you draw the query tree for the following RA expression?
πlastName,result,title(σresult≤1.3(σExams.crsNr=Courses.crsNr
(σStudents.matNr=Exams.matNr(Students× Exams× Courses))))
. .σstudents matNr exams matNr=
lastName, result, titleπ
×
courses
exams
×
students
. .σexams crsNr courses crsNr=
1.3σresult≤
Query Trees
For each query tree, computation proceeds bottom-up:
child nodes must be executed before their parent nodes;
but there can exist multiple methods of executing sibling nodes, e.g.,
process sequentially;
process in parallel.
Execution Plan
A query execution plan consists of an (extended) query tree with
additional annotation at each node indicating:
(1) the access method to use for each table, and
(2) the implementation method for each RA operator.
Query Processing – Evaluation Engine
The evaluation engine executes an execution plan, and returns the query
answer to the user.
Query Processing