程序代写代做 algorithm go database C arm ER Problem Set 4

Problem Set 4
Note: For simplicity, you may assume that KB, MB, and GB refer to 103, 106, and 109 bytes,
respectively.
Problem 1
Assume the following simple Btree with n4:
This tree consists of only a root node and three leaf nodes. Recall that the root node must have between 2 and n child pointers basically RIDs and between 1 and n1 key values that separate the subtrees. In this case, the values 8 and 16 mean that to search for a value strictly less than 8 you visit the leftmost child, for at least 8 and strictly less than 16, you visit the second child, and otherwise the third child. Each internal node none in this figure has between 2 and 4 child pointers and between 1 and 3 key values always one less than the number of pointers, and each leaf node has between 2 and 3 key values, each with an associated RID to its left pointing to a record with that key value in the underlying indexed table. Sketch the state of the tree after each step in the following sequence of insertions and deletions:
Delete 14, Delete 4, Insert 9, Insert 21, Insert 7, Insert 17, Insert 12, Delete 8
Note that for insertions, there are two algorithms, one that splits a full node without trying to offload data to a direct neighbor, and one that first tries to balance with a direct neighbor in the case of a full node. Please use the first algorithm! For deletion, try to first merge with a sibling in the case of underflow, and only balance with a neighbor if you cannot merge with any sibling.
December 3, 2019

Problem 2
You are given a sequence of 8 key values and their 8bit hash values that need to be inserted into an extendible hash table where each hash bucket holds at most two entries. The sequence is presented in Table 1 below. You do not need to know what function was used to compute the hashes, since the resulting hashes are already given. In Figure 1 you can see the state of the hash table after inserting the first two keys, where we only use the first leftmost bit of each hash to organize the buckets. Now insert the remaining six keys k2 to k7 in the order given. Sketch the bucket address table and the buckets after each insertion.
keys
Hash values
k0 k1 k2 k3 k4 k5 k6 k7
10001101
01000100
01010101
01001010
10011001
11010001
10101010
10000111
Table 1: Sequence of 8 keys and their corresponding 8bit hashes.
Figure 1: Hash Table state after the insertion of the first two keys
Problem 3:
In the following, assume the latencytransferrate model of disk performance, where we estimate disk access times by allowing blocks that are consecutive on disk to be fetched with a single seek time and rotational latency cost as shown in class. Also, we use the term RID Record ID to refer to an 8byte logical pointer that can be used to locate a record tuple in a table.
You are given the following very simple database schema that models a database keeping track of airline passengers and the flights they take. A passenger is identified by a pID and has a name and a city and country of residence. Airlines have an aID, name, and country they are based on. Airports are identified by a 3letter airport code e.g., JFK or LAX, and have a nearby city e.g., New York for JFK and a country. A city can of course have several airports. A flight has a unique fID, departure and

arrival timestamps time plus date and airports, and is run by a particular airline; we only model direct flights and if a flight happens every day, there will be a separate entry for each day in the Flight table. For simplicity, we assume that every flight arrives on the same day that it departs no redeye flights that go overnight. Finally, Passengers take flights. The details of the schema are as follows:
Passenger pID, pName, pCity, pCountry
AirlineaID, aName, aCountry
AirportapCode, apCity, apCountry
FlightfID, aID, deptTime, arrTime, departAp, destinAp, flightdate
aID refs Airline; departAp and destinAp ref Airport TakespID, fID pID refs Passenger and fID refs Flight
a Draw an ER diagram that models this relational schema, identifying weak entities and the cardinalities of the relationships.
Now assume there are 1 billion passengers, 1000 airlines, 10000 airports, 20 million flights, and 2 billion records in the Takes table. Each Takes tuple is of size 20 bytes, and all other tuples are 100 bytes. For simplicity, assume that pCity and apCity take 20 bytes each, and each airport code takes 4 bytes. Consider the following queries:
SELECT P.pID, P.pName FROM Passenger P
WHERE P.pcity New York
SELECT P.pID, P.pName
FROM Passenger P JOIN Takes T JOIN Flight F WHERE F.destinAp BAR
SELECT P.pID, P.pName
FROM Passenger P JOIN Takes T JOIN Flight F, Airport A
WHERE F.destinAp A.apCode AND P.pcity Dingleboro AND A.apCity Detroit
b For each query, describe in one sentence what it does. That is, what task does it perform?
In the following, to describe how a query could be best executed, draw a query plan tree and state what algorithms should be used for the various selections and joins. Provide estimates of the running times, assuming these are dominated by disk accesses.
c Assume that there are no indexes on any of the relations, and that all relations are unclustered not sorted in any way. Describe how a database system would best execute all three queries in this

case, given that 1GB of main memory is available for query processing, and assuming a hard disk with 10ms for seek time plus rotational latency i.e., a random access requires 10ms to find the right position on disk and a maximum transfer rate of 100 MBs. Assume that 1 of all passengers live in New York, but only 50 passengers live in Dingleboro. Also, assume that 1 of all flights had an airport in Detroit as their destination, while only 10 flights ever went to BAR. Otherwise, assume independence between different attributes.
c Consider a sparse clustered Btree index on pid in the Passenger table, and a dense unclustered Btree index on fid in the Taken table. For each index, what is the height and the size of the tree? How long does it take to fetch a single record with a particular key value using these indexes? How long would it take to fetch all, say, 50 records matching a particular fid value in the case of the second index?
d Suppose that for each query, you could create just one index structures to make the query faster. What index structures would you create, and how would this change the evaluation plans and running times? In other words, redo b for each query, using your best choice of index for that query.
Problem 4:
a Consider a hard disk with 2400 RPM and 2 doublesided platters. Each surface has 100,000 tracks and 3000 sectors per track. For simplicity, we assume that the number of sectors per track does not vary between the outer and inner area of the disk. Each sector has 512 bytes. What is the capacity of the disk? What is the maximum rate at which data can be read from disk, assuming that we can only read data from one surface at a time? What is the average rotational latency?
b Suppose we have the same disk as in a, where the average seek time time for moving the readwrite arm is 5ms. How long does it take to read a file of size 40 KB? How about a file of 4 MB? Use both the block model 4KB per block and the latencytransferrate model, and compare.
c Suppose you have a file of size 50 GB that must be sorted, and you have only 500 MB of main memory to do the sort plus unlimited disk space. Estimate the running time of the IOefficient merge sort algorithm from the class on this data, using the hard disk from part b. Use the latencytransferrate model of disk performance, and ignore CPU performance. Assume that in the merge phase, all sorted runs from the initial phase are merged together in a single merge pass.
d Suppose you use two instead of one merge phases in the scenario in c, a 10way merge followed by another 10way merge. Estimate the running time.