CS5481 Data Engineering Assignment 2
COMMON MISTAKES
Included unnecessary projections in the relational algebra expressions.
Failed to specify the algorithms used for each operation in the evaluation plan.
Failed to sort the input relations of merge join on the join attributes
Allocated memory blocks more than the available memory at one time.
Failed to correctly estimate the size of joins.
Failed to correctly estimate the size of disjunctive selection.
Failed to correctly calculate the blocking factor for intermediate results.
Failed to include the cost of writing intermediate results to disk for materialization.
CS5481 Data Engineering Assignment 2
INCOMPLETE SHORT ANSWERS
(a) [5%]
𝜎 age <= 30 ∧ age >= 20 ∧ gender=’male’ ∧ type=’E2’ ∧ (department = ‘dermatology’ or ‘endocrinology’) (
((Citizen ⋈ citizen‐id = c‐id Hospital‐record)
⋈ i-id insurance-id Insurance) ⋈ plan-no p-no Medical‐care‐plan)
(b) [10%]
18000000 * 1/2 * 1/10 * (1‐ (1‐1/59) * (1‐1/60)) * 1/10 = 3000
(c) [65%]
One possible evaluation plan using left‐deep join order and materialization:
⨝ plan‐no = p‐no Nested‐loop join
⨝ i‐id = insurance‐id Merge join
⨝ citizen‐id = c‐id Hash join
Sort i‐id
σ type = ‘E2’
linear search
σ department = ‘dermatology’
∨ department = ‘endocrinology’
linear search
σ age >= 20 ∧ age <= 30 ∧
gender = ‘male’
linear search
Insurance Medical‐care‐plan Hospital‐record Citizen
Memory allocation
Selection: 1 block for input and 59 blocks for output
Nested‐loop join: 1 block for each of the two inputs and 58 blocks for output
Sort (merge stage): 10 blocks for each of the inputs and output
Merge join: 25 blocks for each of the two inputs and 10 blocks for output
Hash join (partitioning): 12 blocks for input and 1 block for each of the 48
partitions
Statistic estimation
Number of tuples at the output of Selection of Citizen: 300,000
Number of blocks at the output of Selection of Citizen: 4,000
Number of tuples at the output of Selection of Hospital‐record: 600,000
CS5481 Data Engineering
Assignment 2
Number of blocks at the output of Selection of Hospital‐record: 6,000
Number of tuples at the output of Selection of Medical‐care‐plan: 60
Number of blocks at the output of Selection of Medical‐care‐plan: 1
Number of tuples at the output of Nested‐loop join: 900,000
Number of blocks at the output of Nested‐loop join: 27,273
Number of tuples at the output of Merge join: 60,000
Number of blocks at the output of Merge join: 2,400
Number of tuples at the output of Hash join: 3,000
Number of blocks at the output of Hash join: 167
Cost estimation (in number of block transfers (tT) and number of disk seeks (tS))
Selection of Citizen: 84000 tT + 136 tS
Selection of Hospital‐record: 186000 tT + 204 tS
Selection of Medical‐care‐plan: 11 tT + 2 tS
Nested‐loop join: (120000 + 1 + 27273) tT + (942 + 1) tS
Sort: 48000 tT + 3800 tS
Merge join: (27273 + 6000 + 2400) tT + 1571 tS
Hash join: ((2400+4000)*3) tT + ((2400/12+2400)+(4000/12+4000) + 48*2) tS
Total cost: 520158 tT + 13686 tS ~ 107 seconds
(d) [15%]
One possible better evaluation plan using any join orders:
⨝ i‐id = insurance‐id Merge join
Sort i‐id
⨝ plan‐no = p‐no Nested‐loop join
⨝ citizen‐id = c‐id Merge join
Sort c‐id
σ type = ‘E2’
linear search
σ department = ‘dermatology’
∨ department = ‘endocrinology’
linear search
σ age >= 20 ∧ age <= 30 ∧
gender = ‘male’
linear search
Insurance Medical‐care‐plan Hospital‐record Citizen
(e) [5%]