Sample Solutions to Midterm #3
You can ask the TAs or me (Ed) for more detailed explanations during office hours; however, all mark changes must go through me.
Page 2
Q1. a,c
Q2. a, d, e (We accepted either T or F for choice “a”.) Q3. a,b,c,d,e
Page 3 Q4.
floor( 4096 bytes/page / (4 + 4 + 10) bytes/data entry ) = 227 data entries / page
ceiling(5,000,000D.E./227D.E./leafpage)=22,027leafpages
ceiling ( 22,027 pointers / (227 + 1 ) pointers / parent page ) = 97 parent
pages … in other words, 97 level-1 pages
ceiling ( 97 pointers / (227 + 1 ) pointers / parent page ) = 1 parent page … in other words, 1 root page
1
Page 4
Q5a. Compute the cost in pages (i.e., # of page I/Os).
cost = (probe – 1) + ceiling( expected # of leaf pages to be referenced ) + + ceiling( expected # of data pages to be referenced)
We note that the number of data pages to be referenced is the same as the # of rows that qualify—when using an unclustered index. Recall that the worst-case scenario is 1 page per qualifying row.
= (3 – 1) + ceiling( RF * # of leaf pages ) + ceiling( RF * # of rows ) = (3 – 1) + ceiling( (1/500) * (1/1000) * 22,027 )
+ ceiling( (1/500) * (1/1000) * 5,000,000)
= 2 + ceiling( 1/500,000 * 22,027 ) + ceiling( 1/500,000 * 5,000,000) = 2 + 1 + 10
= 13 page I/Os
Note: For Q5a and Q5b, we allowed students to replace “22,027” with the answer they got from Question 4, as long as it didn’t trivialize the answer.
Q5b. Compute the cost in pages (i.e., # of page I/Os).
cost = (probe – 1) + ceiling( expected # of leaf pages to be referenced ) +
+ ceiling( expected # of data pages to be referenced)
= (3 – 1) + ceiling( RF * # of leaf pages ) + ceiling( RF * # of rows )
= (3 – 1) + ceiling( 1/500 * 22,027 ) + ceiling( 1/500 * 50,000 )
… where 50,000 is the number of pages in the Reserves table, which is easily computed from: ceiling( 5,000,000 / 100 ) = 50,000 pages
= 2 + 45 + 100 = 147 page I/Os
2
Page 5
Q6. 3 passes are required to sort the Sailors table:
ceiling( 640 / 16 ) = 40 ceiling( 40/(16–1)) =3 ceiling( 3 / (16 – 1) ) = 1
5 passes are required to sort the Reserves table:
ceiling( 64,000 / 16 ) ceiling( 4,000/(16–1)) ceiling( 267 / (16 – 1) ) ceiling( 19 / (16 – 1) ) ceiling( 2 / (16 – 1) )
SMJ cost = 640(2)(3) + 64,000(2)(5)
= 4,000 = 267 = 19 =2
=1
+ (640 + 64,000) = 708,480 page I/Os
Page 6
Q7. BNL formula: We want 750 + ceiling( 750 / (B – 2) ) * 75,000 ≤ 301,000.
Solve for B.
ceiling( 750 / (B – 2) ) * 75,000 ≤ 300,250
ceiling( 750 / (B – 2) )
So, we want:
ceiling(750/(B–2)) ≤ 4 750/4 ≤ B–2
189.5 ≤ B Therefore, B = 189 or B = 190.
≤ 4.003
Test each:
o B=189ceiling(750/(189–2))=ceiling(4.01)=5…notgood o B=190ceiling(750/(190–2))=ceiling(3.99)=4…good
Conclusion: Use B = 190 buffer pages. B = 189 will not work.
3
Page 7
Q8a. Hash Join
Partitioning Phase:
# of pages per partition of Sailors = ceiling( 300 / (20 – 1) ) = 16
# of pages per partition of Reserves = ceiling( 60,000 / (20 – 1) ) =
3158
Note that 16 pages of each Sailors partition fits into B – 2 = 20 – 2 = 18 buffer pages. So, we advance to the join phase.
Join Phase:
Cost = 3*(M + N) = 3 * (300 + 60,000) = 180,900 page I/Os
Q8b. Hash Join (cont.)
Tofinishin3(M+N)pageI/Os,werequire:ceiling(x/(B–1))= B–2
We already know what B is; therefore, we need to solve for x.
ceiling(x/(50–1))= 50–2
ceiling(x/ 49 ) = 48
x=48(49)=2,352pages
To verify (not required):
o ceiling(2,352/(50–1))=48=(B–2)…whichisgood
o But,ceiling(2,353/(50–1))=49>(B–2)…whichisbad
o Therefore,themaximumsizeoftheSailorstableis2,352pages.
Q8c. Hash Join (cont.)
Note that ceiling( 64,000 / (1000 – 1 ) ) = ceiling( 64.1 ) = 65
So, the partitions of Reserves already fit into member with a lot of buffer
pages left over.
4
Therefore, Sailors can be arbitrarily large (“infinite”) and HJ will still finish in 3*(M + N) page I/Os.
Page 8
Q9. Using the appropriate reduction factors, we get:
Cost =(1/75+1/75)*(1/365)*2,000,000 = (2 / 75) * (1/365) * 2,000,000
= 146 records
Page 9
Q10.
Left child’s cost
= (4 – 1) + ceiling( 1/75 * # of leaf pages ) + ceiling( 1/75 * # of data pages )
= 3 + ceiling( 1/75 * 19,608 ) + ceiling( 1/75 * 50,000)
where the 50,000 comes from:
floor( 4,096 bytes/page / 100 bytes/record) = 40 rec/page
and
ceiling( 2,000,000 rows / 40 rec/page ) = 50,000 pages
= 3 + 262 + 667 page I/Os = 932 page I/Os
= (# of qualifying records) * (1.2 + 1) = 1/75 * 2,000,000 * (1.2 + 1)
= 26,667 * 2.2
= 58,667 page I/Os
= 59,599 page I/Os
Right child’s cost
Total cost
5
Page 10 Q11 (a)
π aName,aTown
σpSport = “Tennis” (on-the-fly)
►◄
INL join on aID (on-the-fly)
Participation
= cost of a table scan of Athlete
where floor(4096 bytes/page / 200 bytes/record) = 20 rec/page
and ceiling( 60,000 records / 20 records/page ) = 3,000 pages = 3000 page I/Os
= (# of qualifying records) * (1.2 + # Participation pages per Athletes tuple)
= ( 1/75 * 60,000 ) * ( 1.2 + 2,000,000 / 60,000 ) = 800 * ( 1.2 + 33.3 )
= 800 * 34.5
= 27,600 page I/Os
= 3,000 + 27,600 = 30,600 page I/Os (or close to it)
(pipeline)
σ aAge = 25
Athlete
(b)
Left child’s cost
Right child’s cost
Total cost
6