代写代考 INFO20003 Database Systems

INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 12
Query Processing Part II

INFO20003 Database Systems © University of Melbourne

Remember this? Components of a DBMS
Will briefly touch upon …
Query processing module
Parser/ Compiler
Optimizer Executor
TODAY Joins
Concurrency control module
Transaction mgr.
Crash recovery module
Concurrency
control module
Storage module
File and access methods mgr.
Buffer pool mgr. Disk space mgr.
This is one of several possible architectures; each system has its own slight variations.
Index files Heap files
INFO20003 Database Systems © University of Melbourne

• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne

Are very common and can be very expensive (cross product in the worst case)
There are many implementation techniques for join operations
Join techniques we will cover:
1. Nested-loops join 2. Sort-mergejoin 3. Hashjoin
INFO20003 Database Systems © University of Melbourne

Equality Joins With One Join Column
Example: SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid
• In algebra: RS. They are very common and need to be carefully optimized.
• R X S is large; so, R X S followed by a selection is inefficient.
Left / Outer Right / Inner
• Join is associative and commutative: − AxB == BxA
− Ax(BxC)==(AxB)xC
• Cost metric : Number of pages; Number of I/O
15,80 15,44
INFO20003 Database Systems © University of Melbourne

Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string)
• Sailors (S):
–80 tuples per page, 500 pages
–NPages(S) = 500, NTuplesPerPage(S) = 80 –NTuples(S) = 500*80 = 40000
• Reserves (R):
–100 tuples per page, 1000 pages –NPages(R) = 1000, NTuplesPerPage(R) =100 –NTuples(R) = 100000
INFO20003 Database Systems © University of Melbourne

Simple Nested Loops Join
• For each tuple in the outer relation R, we scan the entire inner relation S
Pseudo code:
foreach tuple r in R do foreach tuple s in S do
if ri == sj then add to result
15,80 15,44
Cost (SNJL) = NPages(Outer) + NTuples(Outer) * NPages(Inner)
• Our example:
Cost (SNLJ)= 1000+ 100*1000*500
= 50001000 (I/O)
INFO20003 Database Systems © University of Melbourne

Page-Oriented Nested Loops Join
• For each page of R
–get each page of S
–write out matching pairs of tuples , where r is in R-page and S is in S-page
Pseudo code:
foreach page bR in R do foreach page bS in S do
foreach tuple r in bR do foreach tuple s in bSdo
if ri == sj then add to result
• Our example:
Cost (PNLJ)= 1000+1000*500 = 501000 (I/O)
15,80 15,44
Cost (PNJL) = NPages(Outer) + NPages(Outer) * NPages(Inner)
INFO20003 Database Systems © University of Melbourne

Block Nested Loops Join
• Page-oriented NL doesn’t exploit extra memory buffers • Alternative approach:
–Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ‘block’ of outer R
• For each matching tuple r in R-block, s in S-page, add to result. Then read next R-block, scan S, etc
R & S Join Result
block of R tuples (B-2 pages)
Input buffer for S Output buffer
INFO20003 Database Systems © University of Melbourne

Block Nested Loops Join Cost
Cost (BNJL) = NPages(Outer) + NBlocks(Outer) * NPages(Inner)
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝑂𝑂𝑂𝑂𝑂𝑂𝑁𝑁𝑂𝑂) • NBlocks(Outer) = 𝐵𝐵−2
• Our example:
Let’s say we have 102 pages of space in memory, and consider
Reserves (R) as the outer and Sailors (S) as the inner table.
NBlocks(R) = 1000/(102-2) = 10 Cost(BNLJ) = 1000 + 10* 500 = 6000 I/O
15,80 15,44
INFO20003 Database Systems © University of Melbourne

Query Processing: Joins
• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne

Sort-Merge Join (R S) i=j
• Sort R and S on the join column, then scan them to do a merge (on join column), and output result tuples
• Sorted R is scanned once;
• Each S group of the same key values is scanned once per matching R tuple (typically means Sorted S is scanned once too).
• Useful when:
–one or both inputs are already sorted on join attribute(s) –output is required to be sorted on join attributes(s)
INFO20003 Database Systems © University of Melbourne

Sort-Merge Join Cost
Cost (SMJ) = Sort(Outer) + Sort(Inner)
+ NPages(Outer) + NPages(Inner)
Sort inputs Merge inputs
Sort(R) = External Sort Cost = 2*NumPasses*NPages(R)
Our example:
Let’s say that both Reserves and Sailors can be sorted in 2 passes, then:
Cost(SMJ) = Sort R + Sort S + NPages(R) + NPages(S) = 2*2*NPages(R)+ 2*2*NPages(S)
+ NPages(R) + NPages (S)
= 5*1000 + 5* 500 = 7500 I/O
INFO20003 Database Systems © University of Melbourne

• Partition both relations using hash function h: R tuples in partition l will only match S tuples in partition I
• Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, probe hash table for matches
Original Relation
Partitions
INPUT hash
hash fn h2
B-1 Join Result
Disk B main memory buffers Disk
Partitions of R & S
Hash table for partition Ri (k < B-1 pages) Input buffer Output B main memory buffers Disk INFO20003 Database Systems © University of Melbourne Hash-Join Cost 1. Inpartitioningphase,weread+writebothrelations 2. Inmatchingphase,wereadbothrelations Cost (HJ) = 2 * NPages(Outer) + 2* NPages(Inner) Create partitions + NPages(Outer) + NPages(Inner) Match partitions • Our example: Cost(HJ) = 2*NPages(R) + 2*NPages(S) + NPages(R) + NPages(S) = 3 * 1000 + 3* 500 = 4500 I/Os INFO20003 Database Systems © University of Melbourne Watch this video if you are confused https://www.youtube.com/watch?v=o1dMJ6-CKzU INFO20003 Database Systems © University of Melbourne Query Processing: Joins • Nested loops join • Sort-merge join • Hash join • General joins Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems INFO20003 Database Systems © University of Melbourne General Join Conditions • Equalities over several attributes (e.g., R.sid=S.sid AND R.rname=S.sname): –For Sort-Merge and Hash Join, sort/partition on combination of the two join columns • Inequality conditions (e.g., R.rname < S.sname): –Hash Join, Sort Merge Join not applicable –Block NL quite likely to be the best join method here INFO20003 Database Systems © University of Melbourne • A virtue of relational DBMSs: – Queries are composed of a few basic operators – Implementation of operators can be carefully tuned – Important to do this • Many alternative implementations for each operator –No universally superior technique for most operators • Must consider alternatives for each operation in a query and choose best one based on system statistics... –Part of the broader task of optimizing a query composed of several operations INFO20003 Database Systems © University of Melbourne What’s examinable • Understand alternatives for join operator implementations –Be able to calculate the cost of alternatives • Important for Assignment 3 as well INFO20003 Database Systems © University of Melbourne Next Lecture • Query optimization ‒ How does a DBMS pick a good query plan? INFO20003 Database Systems © Univers-it-y of Melbourne 程序代写 CS代考加微信: powcoder QQ: 1823890830 Email: powcoder@163.com

Related Posts