2021/4/28 Nested-loop Join
Nested-loop Join
Join Example
Nested Loop Join
Block Nested Loop Join Cost on Example Query Block Nested Loop Join Index Nested Loop Join
>>
COMP9315 21T1 ♢ Nested-loop Join ♢ [0/8]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 1/10
2021/4/28 Nested-loop Join
❖ Join Example
SQL query on student/enrolment database:
select E.subj, S.name
from Student S join Enrolled E on (S.id = E.stude)
order by E.subj
And its relational algebra equivalent: Sort[subj] ( Project[subj,name] ( Join[id=stude]
(Student,Enrolled) ) )
Database: rS = 20000, cS = 20, bS = 1000, rE =
80000, cE = 40, bE = 2000
We are interested only in the cost of Join, with N buffers
COMP9315 21T1 ♢ Nested-loop Join ♢ [1/8]
∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 2/10
2021/4/28 Nested-loop Join
❖ Nested Loop Join Basic strategy (R.a ⨝ S.b):
Result = {}
for each page i in R {
pageR = getPage(R,i)
for each page j in S {
pageS = getPage(S,j)
for each pair of tuples tR,tS
from pageR,pageS { if (tR.a == tS.b)
Result = Result ∪ (tR:tS)
}}}
Needs input buffers for R and S, output buffer for “joined” tuples
Terminology: R is outer relation, S is inner relation Cost=bR.bS … ouch!
COMP9315 21T1 ♢ Nested-loop Join ♢ [2/8]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 3/10
2021/4/28 Nested-loop Join
❖ Block Nested Loop Join Method (for N memory buffers):
read N-2-page chunk of R into memory buffers for each S page
check join condition on all (tR,tS) pairs in buffers
repeat for all N-2-page chunks of R
<< ∧ >>
COMP9315 21T1 ♢ Nested-loop Join ♢ [3/8]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 4/10
2021/4/28 Nested-loop Join
<< ∧ >> ❖ Block Nested Loop Join (cont)
Best-case scenario: bR ≤ N-2
read bR pages of relation R into buffers
while whole R is buffered, read bS pages of S Cost = bR+bS
Typical-case scenario: bR > N-2
read ceil(bR/(N-2)) chunks of pages from R foreachchunk,readbS pagesofS
Cost = bR+bS.ceil(bR/N-2)
Note:alwaysrequiresrR.rS checksofthejoin condition
COMP9315 21T1 ♢ Nested-loop Join ♢ [4/8]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 5/10
2021/4/28 Nested-loop Join
<< ∧ >>
❖ Cost on Example Query
With N = 12 buffers, and S as outer and E as inner
Cost = bS + bE.ceil(bS/(N-2)) = 1000 + 2000.ceil(1000/10) = 201000
With N = 12 buffers, and E as outer and S as inner
Cost = bE + bS.ceil(bE/(N-2)) = 2000 + 1000.ceil(2000/10) = 202000
With N = 102 buffers, and S as outer and E as inner
Cost = bS + bE.ceil(bS/(N-2)) = 1000 + 2000.ceil(1000/100) = 21000
With N = 102 buffers, and E as outer and S as inner
Cost = bE + bS.ceil(bE/(N-2)) = 2000 + 1000.ceil(2000/100) = 22000
COMP9315 21T1 ♢ Nested-loop Join ♢ [5/8]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 6/10
2021/4/28 Nested-loop Join
<< ∧ >>
❖ Block Nested Loop Join
Why block nested loop join is actually useful in
practice …
Many queries have the form
select *
from R join S on (R.i = S.j)
where R.x = K
This would typically be evaluated as
Tmp = Sel[x=K](R)
Res = Join[i=j](Tmp, S)
If Tmp is small ⇒ may t in memory (in small #buffers) COMP9315 21T1 ♢ Nested-loop Join ♢ [6/8]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 7/10
2021/4/28 Nested-loop Join
❖ Index Nested Loop Join
A problem with nested-loop join:
needs repeated scans of entire inner relation S
If there is an index on S, we can avoid such repeated scanning.
Consider Join[i=j](R,S) :
for each tuple r in relation R {
use index to select tuples
from S where s.j = r.i
for each selected tuple s from S {
add (r,s) to result
}}
COMP9315 21T1 ♢ Nested-loop Join ♢ [7/8]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 8/10
2021/4/28 Nested-loop Join
❖ Index Nested Loop Join (cont) This method requires:
one scan of R relation (bR)
only one buffer needed, since we use R tuple- at-a-time
for each tuple in R (rR), one index lookup on S cost depends on type of index and number of
results
best case is when each R.i matches few S tuples
Cost = bR + rR.SelS (SelS is the cost of performing a select on S).
Typical SelS = 1-2 (hashing) .. bq (unclustered index) Trade-off: rR.SelS vs bR.bS, where bR ≪ rR and SelS
≪bS
COMP9315 21T1 ♢ Nested-loop Join ♢ [8/8]
<< ∧
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 9/10
2021/4/28 Nested-loop Join
Produced: 28 Mar 2021
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/join-nested-loop/slides.html 10/10