INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 12
Query Processing Part II
Copyright By PowCoder代写 加微信 powcoder
INFO20003 Database Systems © University of Melbourne
Remember this? Components of a DBMS
Will briefly touch upon …
Query processing module
Parser/ Compiler
Optimizer Executor
TODAY Joins
Concurrency control module
Transaction mgr.
Crash recovery module
Concurrency
control module
Storage module
File and access methods mgr.
Buffer pool mgr. Disk space mgr.
This is one of several possible architectures; each system has its own slight variations.
Index files Heap files
INFO20003 Database Systems © University of Melbourne
• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne
Are very common and can be very expensive (cross product in the worst case)
There are many implementation techniques for join operations
Join techniques we will cover:
1. Nested-loops join 2. Sort-mergejoin 3. Hashjoin
INFO20003 Database Systems © University of Melbourne
Equality Joins With One Join Column
Example: SELECT *
FROM Reserves R1, Sailors S1
WHERE R1.sid=S1.sid
• In algebra: RS. They are very common and need to be carefully optimized.
• R X S is large; so, R X S followed by a selection is inefficient.
Left / Outer Right / Inner
• Join is associative and commutative: − AxB == BxA
− Ax(BxC)==(AxB)xC
• Cost metric : Number of pages; Number of I/O
15,80 15,44
INFO20003 Database Systems © University of Melbourne
Schema for Examples
Sailors (sid: integer, sname: string, rating: integer, age: real) Reserves (sid: integer, bid: integer, day: dates, rname: string)
• Sailors (S):
–80 tuples per page, 500 pages
–NPages(S) = 500, NTuplesPerPage(S) = 80 –NTuples(S) = 500*80 = 40000
• Reserves (R):
–100 tuples per page, 1000 pages –NPages(R) = 1000, NTuplesPerPage(R) =100 –NTuples(R) = 100000
INFO20003 Database Systems © University of Melbourne
Simple Nested Loops Join
• For each tuple in the outer relation R, we scan the entire inner relation S
Pseudo code:
foreach tuple r in R do foreach tuple s in S do
if ri == sj then add
15,80 15,44
Cost (SNJL) = NPages(Outer) + NTuples(Outer) * NPages(Inner)
• Our example:
Cost (SNLJ)= 1000+ 100*1000*500
= 50001000 (I/O)
INFO20003 Database Systems © University of Melbourne
Page-Oriented Nested Loops Join
• For each page of R
–get each page of S
–write out matching pairs of tuples
Pseudo code:
foreach page bR in R do foreach page bS in S do
foreach tuple r in bR do foreach tuple s in bSdo
if ri == sj then add
• Our example:
Cost (PNLJ)= 1000+1000*500 = 501000 (I/O)
15,80 15,44
Cost (PNJL) = NPages(Outer) + NPages(Outer) * NPages(Inner)
INFO20003 Database Systems © University of Melbourne
Block Nested Loops Join
• Page-oriented NL doesn’t exploit extra memory buffers • Alternative approach:
–Use one page as an input buffer for scanning the inner S, one page as the output buffer, and use all remaining pages to hold ‘block’ of outer R
• For each matching tuple r in R-block, s in S-page, add
R & S Join Result
block of R tuples (B-2 pages)
Input buffer for S Output buffer
INFO20003 Database Systems © University of Melbourne
Block Nested Loops Join Cost
Cost (BNJL) = NPages(Outer) + NBlocks(Outer) * NPages(Inner)
𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁(𝑂𝑂𝑂𝑂𝑂𝑂𝑁𝑁𝑂𝑂) • NBlocks(Outer) = 𝐵𝐵−2
• Our example:
Let’s say we have 102 pages of space in memory, and consider
Reserves (R) as the outer and Sailors (S) as the inner table.
NBlocks(R) = 1000/(102-2) = 10 Cost(BNLJ) = 1000 + 10* 500 = 6000 I/O
15,80 15,44
INFO20003 Database Systems © University of Melbourne
Query Processing: Joins
• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne
Sort-Merge Join (R S) i=j
• Sort R and S on the join column, then scan them to do a merge (on join column), and output result tuples
• Sorted R is scanned once;
• Each S group of the same key values is scanned once per matching R tuple (typically means Sorted S is scanned once too).
• Useful when:
–one or both inputs are already sorted on join attribute(s) –output is required to be sorted on join attributes(s)
INFO20003 Database Systems © University of Melbourne
Sort-Merge Join Cost
Cost (SMJ) = Sort(Outer) + Sort(Inner)
+ NPages(Outer) + NPages(Inner)
Sort inputs Merge inputs
Sort(R) = External Sort Cost = 2*NumPasses*NPages(R)
Our example:
Let’s say that both Reserves and Sailors can be sorted in 2 passes, then:
Cost(SMJ) = Sort R + Sort S + NPages(R) + NPages(S) = 2*2*NPages(R)+ 2*2*NPages(S)
+ NPages(R) + NPages (S)
= 5*1000 + 5* 500 = 7500 I/O
INFO20003 Database Systems © University of Melbourne
Query Processing: Joins
• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne
• Partition both relations using hash function h: R tuples in partition l will only match S tuples in partition I
• Read in a partition of R, hash it using h2 (<> h!). Scan matching partition of S, probe hash table for matches
Original Relation
Partitions
INPUT hash
hash fn h2
B-1 Join Result
Disk B main memory buffers Disk
Partitions of R & S
Hash table for partition Ri (k < B-1 pages)
Input buffer Output
B main memory buffers Disk
INFO20003 Database Systems © University of Melbourne
Hash-Join Cost
1. Inpartitioningphase,weread+writebothrelations 2. Inmatchingphase,wereadbothrelations
Cost (HJ) = 2 * NPages(Outer) + 2* NPages(Inner) Create partitions
+ NPages(Outer) + NPages(Inner)
Match partitions
• Our example:
Cost(HJ) = 2*NPages(R) + 2*NPages(S) + NPages(R) + NPages(S)
= 3 * 1000 + 3* 500 = 4500 I/Os
INFO20003 Database Systems © University of Melbourne
Watch this video if you are confused
https://www.youtube.com/watch?v=o1dMJ6-CKzU
INFO20003 Database Systems © University of Melbourne
Query Processing: Joins
• Nested loops join • Sort-merge join
• Hash join
• General joins
Readings: Chapter 14, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne
General Join Conditions
• Equalities over several attributes (e.g., R.sid=S.sid AND R.rname=S.sname):
–For Sort-Merge and Hash Join, sort/partition on combination of the two join columns
• Inequality conditions (e.g., R.rname < S.sname): –Hash Join, Sort Merge Join not applicable
–Block NL quite likely to be the best join method here
INFO20003 Database Systems © University of Melbourne
• A virtue of relational DBMSs:
– Queries are composed of a few basic operators
– Implementation of operators can be carefully tuned
– Important to do this
• Many alternative implementations for each operator –No universally superior technique for most operators
• Must consider alternatives for each operation in a query and choose best one based on system statistics...
–Part of the broader task of optimizing a query composed of several operations
INFO20003 Database Systems © University of Melbourne
What’s examinable
• Understand alternatives for join operator implementations
–Be able to calculate the cost of alternatives • Important for Assignment 3 as well
INFO20003 Database Systems © University of Melbourne
Next Lecture
• Query optimization
‒ How does a DBMS pick a good query plan?
INFO20003 Database Systems © Univers-it-y of Melbourne
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com