CS计算机代考程序代写 algorithm SQL data structure database >>

>>
DBMS Overview

• DBMSs
• Query Evaluation
• Mapping SQL to RA
• Mapping Example
• Query Cost Estimation
• Implementations of RA Ops
• DBMS Architecture
COMP9315 21T1 ♢ DBMS Overview ♢ [0/11]
∧ >>
❖ DBMSs

DBMS = DataBase Management System
Our view of the DBMS so far …


A machine to process SQL queries.
COMP9315 21T1 ♢ DBMS Overview ♢ [1/11]
<< ∧ >>
❖ DBMSs (cont)

One view of DB engine: “relational algebra virtual machine”
Machine code for such a machine:
selection (σ)

projection (π)

join (⨝, ×)
union (∪)

intersection (∩)

difference (-)
sort

insert

delete

For each of these operations:
• various data structures and algorithms are available
• DBMSs may provide only one, or may provide a choice
COMP9315 21T1 ♢ DBMS Overview ♢ [2/11]
<< ∧ >>
❖ Query Evaluation

The path of a query through its evaluation:

COMP9315 21T1 ♢ DBMS Overview ♢ [3/11]
<< ∧ >>
❖ Mapping SQL to RA

Mapping SQL to relational algebra, e.g.

— schema: R(a,b) S(c,d)
select a as x
from R join S on (b=c)
where d = 100
— could be mapped to
Tmp1(a,b,c,d) = R Join[b=c] S
Tmp2(a,b,c,d) = Sel[d=100](Tmp1)
Tmp3(a) = Proj[a](Tmp2)
Res(x) = Rename[Res(x)](Tmp3)
In general:
• SELECT clause becomes projection
• WHERE condition becomes selection or join
• FROM clause becomes join
COMP9315 21T1 ♢ DBMS Overview ♢ [4/11]
<< ∧ >>
❖ Mapping Example

Consider the database schema:

Person(pid, name, address, …)
Subject(sid, code, title, uoc, …)
Terms(tid, code, start, end, …)
Courses(cid, sid, tid, …)
Enrolments(cid, pid, mark, ..)

and the query: Courses with more than 100 students in them?
which can be expressed in SQL as

select s.sid, s.code
from Course c join Subject s on (c.sid=s.sid)
join Enrolment e on (c.cid=e.cid)
group by s.sid, s.code
having count(*) > 100;

COMP9315 21T1 ♢ DBMS Overview ♢ [5/11]
<< ∧ >>
❖ Mapping Example (cont)

The SQL might be compiled to

Tmp1(cid,sid,pid) = Course Join[c.cid = e.cid] Enrolment
Tmp2(cid,code,pid) = Tmp1 Join[t1.sid = s.sid] Subject
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code) = Sel[nstu > 100](Tmp3)
or, equivalently

Tmp1(cid,code) = Course Join[c.sid = s.sid] Subject
Tmp2(cid,code,pid) = Tmp1 Join[t1.cid = e.cid] Enrolment
Tmp3(cid,code,nstu) = GroupCount[cid,code](Tmp2)
Res(cid,code) = Sel[nstu > 100](Tmp3)
Which is better?
COMP9315 21T1 ♢ DBMS Overview ♢ [6/11]
<< ∧ >>
❖ Query Cost Estimation

The cost of evaluating a query is determined by
• the operations specified in the query execution plan
• size of relations   (database relations and temporary relations)
• access mechanisms   (indexing, hashing, sorting, join algorithms)
• size/number of main memory buffers   (and replacement strategy)
Analysis of costs involves estimating:
• the size of intermediate results
• then, based on this, cost of secondary storage accesses

Accessing data from disk is the dominant cost in query evaluation
COMP9315 21T1 ♢ DBMS Overview ♢ [7/11]
<< ∧ >>
❖ Query Cost Estimation (cont)

Consider execution plans for:    σc (R  ⨝d  S  ⨝e  T)     where R(c,d), S(d,e), T(e)

Tmp1(c,d,e) := hash_join[d](R,S)
Tmp2(c,d,e) := sort_merge_join[e](tmp1,T)
Res(c,d,e) := binary_search[c](Tmp2)
or

Tmp1(d,e) := sort_merge_join[e](S,T)
Tmp2(c,d,e) := hash_join[d](R,Tmp1)
Res(c,d,e) := linear_search[c](Tmp2)
or

Tmp1(c,d) := btree_search[c](R)
Tmp2(c,d,e) := hash_join[d](Tmp1,S)
Res(c,d,e) := sort_merge_join[e](Tmp2,T)
All produce same result, but have different costs.
COMP9315 21T1 ♢ DBMS Overview ♢ [8/11]
<< ∧ >>
❖ Implementations of RA Ops

Sorting   (quicksort, etc. are not applicable)
• external merge sort   (cost O(NlogBN) with B memory buffers)
Selection   (different techniques developed for different query types)
• sequential scan   (worst case, cost O(N))
• index-based   (e.g. B-trees, cost O(logN), tree nodes are pages)
• hash-based   (O(1) best case, only works for equality tests)
Join   (fast joins are critical to success of relational DBMSs)
• nested-loop join   (cost O(N.M), buffering can reduce to O(N+M))
• sort-merge join   (cost O(NlogN+MlogM))
• hash-join   (best case cost O(N+M.N/B), with B memory buffers)
COMP9315 21T1 ♢ DBMS Overview ♢ [9/11]
<< ∧ >>
❖ DBMS Architecture

Most RDBMSs are client-server systems:

COMP9315 21T1 ♢ DBMS Overview ♢ [10/11]
<< ∧ ❖ DBMS Architecture (cont) Layers within the DBMS server:  COMP9315 21T1 ♢ DBMS Overview ♢ [11/11] Produced: 15 Feb 2021