CS计算机代考程序代写 SQL algorithm COMP9315 21T1

COMP9315 21T1
Exercises 04
Implementing Sorting and Projection
DBMS Implementation
[Show with no answers]   [Show with all answers]
1.
2. You have an unsorted heap file containing 4500 records and a select query is asked that requires the file to be sorted. The DBMS uses an external merge-sort that makes efficient use of the available buffer space. 
Assume that: records are 48-bytes long (including a 4-byte sort key); the page size is 512-bytes; each page has 12 bytes of control information in it; 4 buffer pages are available.
a. How many sorted subfiles will there be after the initial pass of the sort algorithm? How long will each subfile be? 
[show answer]


b. How many passes (including the initial pass considered above) will be required to sort this file? 
[show answer]


c. What will be the total I/O cost for sorting this file? 
[show answer]


d. What is the largest file, in terms of the number of records, that you can sort with just 4 buffer pages in 2 passes? How would your answer change if you had 257 buffer pages? 
[show answer]


3.
4. For each of these scenarios:
i. a file with 10,000 pages and 3 available buffer pages.
ii. a file with 20,000 pages and 5 available buffer pages.
iii. a file with 2,000,000 pages and 17 available buffer pages.
5. answer the following questions assuming that external merge-sort is used to sort each of the files:
a. How many runs will you produce on the first pass?
[show answer]


b. How many passes will it take to sort the file completely?
[show answer]


c. What is the total I/O cost for sorting the file? (measured in #pages read/written)
[show answer]


6.
7. Consider processing the following SQL projection query: 
 select distinct course from Students;
8. 
where there are only 10 distinct values (0..9) for Student.course, and student enrolments are distributed over the courses as follows:
9. cid
10. course
11. #students
12. cid
13. course
14. #students
15. 0
16. BSc
17. 5,000
18. 1
19. BA
20. 4,000
21. 2
22. BE
23. 5,000
24. 3
25. BCom
26. 3,000
27. 4
28. BAppSc
29. 2,000
30. 5
31. LLB
32. 1,000
33. 6
34. MA
35. 1,000
36. 7
37. MSc
38. 1,000
39. 8
40. MEng
41. 2,000
42. 9
43. PhD
44. 1,000
45. 
Show the flow of records among the pages (buffers and files) when a hash-based implementation of projection is used to eliminate duplicates in this relation. 
Assume that:
◦ each page (data file page or in-memory buffer) can hold 1000 records
◦ the partitioning phase uses 1 input buffer, 3 output buffers and the hash function (x mod 3)
◦ the duplicate elimination phase uses 1 input buffer, 4 output buffers and the hash function (x mod 4)
46. 
[show answer]


47.
48. Consider processing the following SQL projection query: 
select distinct title,name from Staff;
49. 
You are given the following information:
◦ the relation Staff has attributes id, name, title, dept, address
◦ except for id, all are string fields of the same length
◦ the id attribute is the primary key
◦ the relation contains 10,000 pages
◦ there are 10 records per page
◦ there are 10 memory buffer pages available for sorting
50. Consider an optimised version of the sorting-based projection algorithm: The initial sorting pass reads the input and creates sorted runs of tuples containing only the attributes name and title. Subsequent merging passes eliminate duplicates while merging the initial runs to obtain a single sorted result (as opposed to doing a separate pass to eliminate duplicates from a sorted result).
a. How many sorted runs are produced on the first pass? What is the average length of these runs? 
[show answer]


b. How many additional passes will be required to compute the final result of the projection query? What is the I/O cost of these additional passes? 
[show answer]


c. Suppose that a clustered B+ tree index on title is available. Is this index likely to offer a cheaper alternative to sorting? Would your answer change if the index were unclustered? 
[show answer]


d. Suppose that a clustered B+ tree index on name is available. Is this index likely to offer a cheaper alternative to sorting? Would your answer change if the index were unclustered? 
[show answer]


e. Suppose that a clustered B+ tree index on (name,title) is available. Is this index likely to offer a cheaper alternative to sorting? Would your answer change if the index were unclustered? 
[show answer]