2021/4/28 DBMS Overview
DBMS Overview
DBMSs
Query Evaluation
Mapping SQL to RA Mapping Example
Query Cost Estimation Implementations of RA Ops DBMS Architecture
>>
COMP9315 21T1 ♢ DBMS Overview ♢ [0/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
1/13
2021/4/28 DBMS Overview
❖ DBMSs
DBMS = DataBase Management System
Our view of the DBMS so far …
∧ >>
A machine to process SQL queries.
COMP9315 21T1 ♢ DBMS Overview ♢ [1/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
2/13
2021/4/28 DBMS Overview
❖ DBMSs (cont)
One view of DB engine: “relational algebra virtual machine” Machine code for such a machine:
<< ∧ >>
selection (σ) union (∪) sort
projection (π) intersection (∩) insert
join (⨝, ×) difference (-) delete
For each of these operations:
various data structures and algorithms are available DBMSs may provide only one, or may provide a choice
COMP9315 21T1 ♢ DBMS Overview ♢ [2/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
3/13
2021/4/28 DBMS Overview
❖ Query Evaluation
The path of a query through its evaluation:
<< ∧ >>
COMP9315 21T1 ♢ DBMS Overview ♢ [3/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
4/13
2021/4/28 DBMS Overview
❖ Mapping SQL to RA Mapping SQL to relational algebra, e.g.
— schema: R(a,b) S(c,d)
select a as x
from R join S on (b=c)
where d = 100
— could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S Tmp2(a,b,c,d) = Sel[d=100](Tmp1) Tmp3(a) = Proj[a](Tmp2)
Res(x) = Rename[Res(x)](Tmp3)
In general:
SELECT clause becomes projection
WHERE condition becomes selection or join FROM clause becomes join
COMP9315 21T1 ♢ DBMS Overview ♢ [4/11]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
5/13
2021/4/28 DBMS Overview
❖ Mapping Example Consider the database schema:
Person(pid, name, address, …)
Subject(sid, code, title, uoc, …)
Terms(tid, code, start, end, …)
Courses(cid, sid, tid, …)
Enrolments(cid, pid, mark, ..)
and the query: Courses with more than 100 students in them?
which can be expressed in SQL as
select s.sid, s.code
from Course c join Subject s on (c.sid=s.sid)
join Enrolment e on (c.cid=e.cid)
group by s.sid, s.code
having count(*) > 100;
COMP9315 21T1 ♢ DBMS Overview ♢ [5/11]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
6/13
2021/4/28 DBMS Overview
❖ Mapping Example (cont) The SQL might be compiled to
Tmp1(cid,sid,pid) = Course Join[c.cid = e.cid] Enrolment
Tmp2(cid,code,pid) = Tmp1 Join[t1.sid = s.sid] Subject
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code) = Sel[nstu > 100](Tmp3)
or, equivalently
Tmp1(cid,code) = Course Join[c.sid = s.sid] Subject
Tmp2(cid,code,pid) = Tmp1 Join[t1.cid = e.cid] Enrolment
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code) = Sel[nstu > 100](Tmp3)
Which is better?
COMP9315 21T1 ♢ DBMS Overview ♢ [6/11]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
7/13
2021/4/28 DBMS Overview
<< ∧ >>
❖ Query Cost Estimation
The cost of evaluating a query is determined by
the operations speci ed in the query execution plan
sizeofrelations (databaserelationsandtemporaryrelations)
accessmechanisms (indexing,hashing,sorting,joinalgorithms)
size/numberofmainmemorybuffers (andreplacement strategy)
Analysis of costs involves estimating:
the size of intermediate results
then, based on this, cost of secondary storage accesses
Accessing data from disk is the dominant cost in query evaluation
COMP9315 21T1 ♢ DBMS Overview ♢ [7/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
8/13
2021/4/28 DBMS Overview
<< ∧ >>
❖ Query Cost Estimation (cont)
Consider execution plans for: σc (R ⨝d S ⨝e T) where
R(c,d), S(d,e), T(e)
Tmp1(c,d,e) := hash_join[d](R,S)
Tmp2(c,d,e) := sort_merge_join[e](tmp1,T)
Res(c,d,e) := binary_search[c](Tmp2)
or
Tmp1(d,e) := sort_merge_join[e](S,T)
Tmp2(c,d,e) := hash_join[d](R,Tmp1)
Res(c,d,e) := linear_search[c](Tmp2)
or
Tmp1(c,d) := btree_search[c](R)
Tmp2(c,d,e) := hash_join[d](Tmp1,S)
Res(c,d,e) := sort_merge_join[e](Tmp2,T)
All produce same result, but have different costs.
COMP9315 21T1 ♢ DBMS Overview ♢ [8/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
9/13
2021/4/28 DBMS Overview
❖ Implementations of RA Ops Sorting (quicksort,etc.arenotapplicable)
externalmergesort (costO(NlogBN)withBmemorybuffers)
Selection (differenttechniquesdevelopedfordifferentquerytypes) sequentialscan (worstcase,costO(N))
index-based (e.g.B-trees,costO(logN),treenodesarepages) hash-based (O(1)bestcase,onlyworksforequalitytests)
Join (fastjoinsarecriticaltosuccessofrelationalDBMSs) nested-loopjoin (costO(N.M),bufferingcanreducetoO(N+M)) sort-mergejoin (costO(NlogN+MlogM))
hash-join (bestcasecostO(N+M.N/B),withBmemorybuffers)
COMP9315 21T1 ♢ DBMS Overview ♢ [9/11]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
10/13
2021/4/28 DBMS Overview
❖ DBMS Architecture
Most RDBMSs are client-server systems:
<< ∧ >>
COMP9315 21T1 ♢ DBMS Overview ♢ [10/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
11/13
2021/4/28 DBMS Overview
❖ DBMS Architecture (cont) Layers within the DBMS server:
<< ∧
COMP9315 21T1 ♢ DBMS Overview ♢ [11/11]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
12/13
2021/4/28 DBMS Overview
Produced: 15 Feb 2021
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/dbms-overview/slides.html
13/13