W4112 Database Systems Implementation, Spring 2022
Homework 1, due 2/16/2022.
Homework should be submitted electronically using gradescope by 11:59pm on 2/16.
1. Find the technical specifications of:
Copyright By PowCoder代写 加微信 powcoder
• a Barracuda magnetic hard drive from Seagate (http://www.seagate.com). • a Barracuda SSD from Seagate (http://www.seagate.com).
• an Intel Optane SSD
Compare the devices according to price, capacity, sequential I/O bandwidth, and random I/O band- width. You might need to calculate/approximate these numbers from other parameters in the product specifications. If you can’t find exact pricing information from the manufacturer, estimate the price based on information from retailers, resellers, reviewers, etc. for the same or similar models.
From these numbers, estimate the following derived metrics: (a) Cost per terabyte; (b) Cost per MB/second for sequential I/O; (c) Cost per MB/second for random I/O. Based on your comparison, describe what kind of customer would prefer each kind of drive.
2. Given a set S of items to insert into an extandable hash table, does the order of insertion matter? In other words, could we end up with different hash table sizes (measured in total disk blocks used) if we insert the items in S in different orders? Either explain why the structure will always have the same size, or provide an example of two orderings of a set of insertions that result in different table sizes. The internal organization of individual buckets is not important for this question.
3. Answer the following questions about the FAST tree (see the additional readings on courseworks).
(a) The paper describes a SIMD width of 128 bits. Since that paper was written, SIMD widths using AVX512 have grown to 512 bits, which matches the size (64 bytes) of a cache line. How does this change the structure of the FAST tree?
(b) Suppose that the next generation of SIMD instructions uses 1024 bits, but that machines still have a cache line size of 64 bytes and there is no change to the maximum number of simultaneous outstanding cache misses. Would it be a good idea to extend the cache-line blocking structure to 2 cache lines rather than one, to match the new SIMD width? What components of the index lookup performance improve relative to a 512-bit SIMD scheme? In what ways might the performance worsen?
4. The DASH Tree uses a fingerprinting technique (Section 4.2 of the paper in the additional readings) to avoid some unnecessary work for nonmatching keys. Why do you think the authors used this data structure rather than a Bloom Filter?
5. Table 2 of the Learned Indexes paper shows that the studied learned indexes are not only faster than B-trees or FAST trees, but also significantly smaller. Explain at a high level where these space savings come from. (2 paragraphs should be enough here.)
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com