2021/4/28 Implementing Projection
Implementing Projection
The Projection Operation Sort-based Projection
Cost of Sort-based Projection Hash-based Projection
Cost of Hash-based Projection Projection on Primary Key Index-only Projection
Comparison of Projection Methods Projection in PostgreSQL
>>
COMP9315 21T1 ♢ Implementing Projection ♢ [0/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
1/17
2021/4/28 Implementing Projection
❖ The Projection Operation
Consider the query:
select distinct name,age from Employee; If the Employee relation has four tuples such as:
(94002, John, Sales, Manager, 32)
(95212, Jane, Admin, Manager, 39)
(96341, John, Admin, Secretary, 32)
(91234, Jane, Admin, Secretary, 21)
then the result of the projection is:
(Jane, 21) (Jane, 39) (John, 32) Note that duplicate tuples (e.g. (John,32)) are
eliminated.
COMP9315 21T1 ♢ Implementing Projection ♢ [1/15]
∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
2/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ The Projection Operation (cont)
ReliesonfunctionTuple projTuple(AttrList, Tuple)
rst arg is list of attributes
second arg is a tuple containing those attributes (and more)
return value is a new tuple containing only those attributes
Examples,usingtuplesoftype(id:int, name:text, degree:int)
projTuple([id], (1234,’John’,3778)) returns (id=1234)
projTuple([name,degree]), (1234,’John’,3778)) returns (name=’John’,degree=3778)
COMP9315 21T1 ♢ Implementing Projection ♢ [2/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
3/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ The Projection Operation (cont)
Without distinct, projection is straightforward
// attrs = [attr1,attr2,…]
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P)-1 {
T = getTuple(P,j)
T’ = projTuple(attrs, T)
if (outBuf is full) write and clear
append T’ to outBuf
} }
if (nTuples(outBuf) > 0) write
Typically, bOutFile < bInFile (same number of tuples, but tuples are smaller)
COMP9315 21T1 ♢ Implementing Projection ♢ [3/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
4/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ The Projection Operation (cont)
With distinct, the projection operation needs to: 1. scan the entire relation as input
already seen how to do scanning
2. create output tuples containing only requested attributes
implementation depends on tuple internal structure
essentially, make a new tuple with fewer attributes and where the values may be computed from existing attributes
3. eliminate any duplicates produced two approaches: sorting or hashing
COMP9315 21T1 ♢ Implementing Projection ♢ [4/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
5/17
2021/4/28 Implementing Projection
❖ Sort-based Projection
<< ∧ >>
COMP9315 21T1 ♢ Implementing Projection ♢ [5/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
6/17
2021/4/28 Implementing Projection
❖ Sort-based Projection (cont) Requires a temporary le/relation .
for each tuple T in RelFile {
T’ = projTuple([attr1,attr2,…],T)
add T’ to TempFile
}
sort TempFile on [attrs]
for each tuple T in TempFile {
if (T == Prev) continue
write T to Result
Prev = T
}
Reminder:”for each tuple” meanspage-by-page,
tuple-by-tuple
COMP9315 21T1 ♢ Implementing Projection ♢ [6/15]
<< ∧ >>
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
7/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ Cost of Sort-based Projection
The costs involved are (assuming B=n+1 buffers for sort): scanningoriginalrelationRel: bR (withcR)
writing Temp relation: bT (smaller tuples, cT > cR, sorted)
sorting Temp relation:
2.bT.ceil(lognb0) whereb0=ceil(bT/B)
scanningTemp,removingduplicates: bT
writing the result relation: bOut (maybe less tuples)
Cost = sum of above = bR + bT + 2.bT.ceil(lognb0) + bT + bOut
COMP9315 21T1 ♢ Implementing Projection ♢ [7/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
8/17
2021/4/28 Implementing Projection
❖ Hash-based Projection Partitioning phase:
<< ∧ >>
COMP9315 21T1 ♢ Implementing Projection ♢ [8/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
9/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ Hash-based Projection (cont)
Duplicate elimination phase:
COMP9315 21T1 ♢ Implementing Projection ♢ [9/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
10/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ Hash-based Projection (cont)
Algorithm for both phases:
for each tuple T in relation Rel {
T’ = mkTuple(attrs,T)
H = h1(T’, n)
B = buffer for partition[H]
if (B full) write and clear B
insert T’ into B
}
for each partition P in 0..n-1 {
for each tuple T in partition P {
H = h2(T, n)
B = buffer for hash value H
if (T not in B) insert T into B // assumes B never gets full
}
write and clear all buffers
}
Reminder:”for each tuple” meanspage-by-page, tuple-by-tuple
COMP9315 21T1 ♢ Implementing Projection ♢ [10/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
11/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ Cost of Hash-based Projection
The total cost is the sum of the following:
scanningoriginalrelationR: bR writingpartitions: bP (bRvsbP?) re-readingpartitions: bP writingtheresultrelation: bOut
Cost=bR+2bP+bOut
To ensure that n is larger than the largest partition …
use hash functions (h1,h2) with uniform spread allocate at least sqrt(bR)+1 buffers
if insuf cient buffers, signi cant re-reading overhead
COMP9315 21T1 ♢ Implementing Projection ♢ [11/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
12/17
2021/4/28 Implementing Projection
<< ∧ >>
❖ Projection on Primary Key
No duplicates, so simple approach from above works:
bR = nPages(Rel)
for i in 0 .. bR-1 {
P = read page i
for j in 0 .. nTuples(P) {
T = getTuple(P,j)
T’ = projTuple([pk], T)
if (outBuf is full) write and clear
append T’ to outBuf
}
}
if (nTuples(outBuf) > 0) write
COMP9315 21T1 ♢ Implementing Projection ♢ [12/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
13/17
2021/4/28 Implementing Projection
<< ∧ >>
❖ Index-only Projection
Can do projection without accessing data le iff …
relation is indexed on (A1,A2,…An) (indexes described later)
projected attributes are a pre x of (A1,A2,…An)
Basic idea:
scan through index le (which is already sorted on attributes)
duplicates are already adjacent in index, so easy to skip
Cost analysis …
indexhasbipages (wherebi≪bR)
Cost = bi reads + bOut writes
COMP9315 21T1 ♢ Implementing Projection ♢ [13/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
14/17
2021/4/28 Implementing Projection
<< ∧ >> ❖ Comparison of Projection Methods
Dif cult to compare, since they make different assumptions:
index-only: needs an appropriate index
hash-based: needs buffers and good hash functions
sort-based: needs only buffers ⇒ use as default Best case scenario for each (assuming n+1 in-memory
buffers):
index-only: bi + bOut ≪ bR + bOut
hash-based: bR+2.bP+bOut
sort-based: bR+bT+2.bT.ceil(lognb0)+bT+bOut
We normally omit bOut … each method produces the same result
COMP9315 21T1 ♢ Implementing Projection ♢ [14/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
15/17
2021/4/28 Implementing Projection
<< ∧
❖ Projection in PostgreSQL
Code for projection forms part of execution iterators:
include/nodes/execnodes.h backend/executor/execQual.c
Types:
ProjectionInfo { type, pi_state, pi_exprContext }
ExprState { tag, flags, resnull, resvalue, ... }
Functions:
ExecProject(projInfo,...) ...extractsprojected data
check_sql_fn_retval(...) ...evaluatesattribute value
COMP9315 21T1 ♢ Implementing Projection ♢ [15/15]
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
16/17
2021/4/28 Implementing Projection
Produced: 6 Mar 2021
https://cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/projection/slides.html
17/17