CS计算机代考程序代写 SQL Functional Dependencies data structure database concurrency algorithm ExamWithSolution

ExamWithSolution

CS 4320 Fall 2019 Final Exam Page � of �2 15

Part A) SQL Queries. (20 points)

Consider the database schema created by the following SQL commands:

CREATE TABLE Sailors (sid integer PRIMARY KEY, 

sname varchar(20), rating integer, age real);

CREATE TABLE Boats (bid integer PRIMARY KEY, 

bname varchar(20), color varchar(20));

CREATE TABLE Reserves(sid integer, bid integer, 

day date, PRIMARY KEY (sid, bid, day), 

FOREIGN KEY (sid) references Sailors (sid), 

FOREIGN KEY (bid) references Boats (bid));

The database stores information on sailors (table Sailors), on boats (table
Boats), and reservations for boats by sailors (table Reserves). Duplicates in
the query result are okay (e.g., if the question asks to retrieve sailor names, it is
okay if the query returns the same name multiple times). You may assume that the
database contains no NULL values and that no table is empty.

A.1) Write an SQL query retrieving pairs of boats and sailors (i.e., bid and sid)
that satisfy all of the following conditions: the sailor made a reservation for the
boat, the sailor never made a reservation for a different boat, and the boat was
never reserved by another sailor. (5 points)

Select R.Sid, R.bid from Reserves R where not exists (select * from Reserves R2
where R2.sid = R.sid and r2.bid <> R.bid) and not exists (select * from Reserves
R3 where r3.bid = R.bid and R3.sid <> R.sid);

http://r2.bid
http://R.bid
http://r3.bid
http://R.bid
Brian

Brian

CS 4320 Fall 2019 Final Exam Page � of �3 15

A.2) Write an SQL query retrieving sailors (attribute sid) who made two
consecutive reservations for the same boat (i.e., the sailor did not make any
reservation for different boats between the two reservations for the same boat).
Note that the day attribute in Reserves stores the reservation date. You may
assume that sailors reserve at most one boat per day. 

(5 points)

Select Sid from Reserves R where exists (select * from Reserves R2 where R2.sid
= R.sid and R2.bid = R.bid and R2.date>R.date and not exists (select * from
Reserves R3 where R3.sid=R.sid and R3.bid=R.bid and R.date T.count and T2.sid =
T.sid);

Brian

Brian

CS 4320 Fall 2019 Final Exam Page � of �5 15

Part B) Processing Cost. (15 points)

Note: we do not consider the cost of writing out the final result in the following.

B.1) We perform a block nested loops join between relations R and S with R as
outer relation. R consumes x pages on disk and S consumes 2*x pages on disk
(where x is an integer). We use 12 buffer pages (total, including input and output
buffers) for the join. The join cost is 30 page reads. How large is x? Justify by
showing your calculations. 

(7.5 points)

After subtracting one page for the input buffer for the inner operand and one output
buffer, we can store blocks of size 10 pages. The join cost is x + ceil(x/10)*2*x.
Assume x<=10 which means that we have one block. Then the join cost simplifies to x+2*x. Setting x=10 leads to a cost of 30 page reads. Using a smaller x leads to smaller cost while using a larger x leads to larger cost. Hence, x=10 is the unique solution. Brian Brian CS 4320 Fall 2019 Final Exam Page � of �6 15 B.2) We perform an index nested loops join between relations R and S with R as outer relation. Each tuple of R consumes 10 bytes and R has 5,000 tuples. Each page stores 1,000 bytes. We have an index for S on the join column and enough main memory to execute the join (the precise amount does not matter nor does the size of S). How expensive can each index access be (measured by the number of page reads per access, assuming that this number is an integer) such that the total join cost is at most 10,000 page reads? Justify by showing your calculations. (5 points) For R, each page stores 100 tuples. Hence, R consumes 5,000/100 = 50 pages in total. The join cost is therefore 50 + 5,000 * x page reads where x is the number of page reads per index access. If 50+5,000 * x<=10,000 and x is integer then x can be one at most. B.3) We use a hard disk for which page writes take twice as long as page reads. Propose a new cost formula for the hash join for this scenario. The formula should depend on parameters m and n, representing the number of pages in the two input relations. Assume that enough main memory is available to partition data in one single pass. Justify your formula in one to two sentences. 
 (2.5 points) The formula (m+n)*4 (as opposed to (m+n)*3) integrates the fact that writing out data after the first partitioning pass is twice as expensive as reading data. 
 Brian Brian CS 4320 Fall 2019 Final Exam Page � of �7 15 Part C) Database Design and Normalization. (20 points) C.1) Consider the set of functional dependencies F={A→C, D→E, A→D, B→A}. Relation R has attributes A, B, C, D, and E. Which of those five attributes are keys for R? Justify by calculating the attribute closure. 
 (5 points) The attribute closure of A is ACDE. A is therefore no key. The attribute closure of B is BACDE, it is therefore a key. The attribute closure of C is C, it is no key. The attribute closure of D is DE, it is no key. The attribute closure of E is E, it is no key. Brian Brian CS 4320 Fall 2019 Final Exam Page � of �8 15 C.2) Consider the set of functional dependencies F={A→B, A→C}. Relation R has attributes A, B, and C. Which decompositions of R (into two relations such that exactly one attribute appears in both) are lossless join decompositions? Justify for each decomposition in one or two sentences. 
 (5 points) Assume attribute A appears in both relations then we decompose into relations (AB) and (AC) or into (ABC), (A). A is a key in all those relations (as ABC is its attribute closure), hence the intersection of attributes contains a key for at least one decomposed relation. Hence, all decompositions are lossless join decompositions. Assume attribute B appears in both relations then we decompose into relations (BA), (BC) or (BCA), B. The attribute closure of B is only B so only the second possibility is a lossless join decomposition (since the intersection of attributes, B, is a key for relation with attribute B). Assume attribute C appears in both relations then we decompose into relations (CA), (CB) or (CAB), C. The attribute closure of C is only C so only the second possibility is a lossless join decomposition (since the intersection of attributes, C, is a key for relation with attribute C). Brian Brian Brian CS 4320 Fall 2019 Final Exam Page � of �9 15 C.3) Consider the database created by the following SQL commands: CREATE TABLE Employee (eid integer PRIMARY KEY, 
 ename varchar(20)); CREATE TABLE Customer (cid integer PRIMARY KEY, 
 cname varchar(20), customerRepresentative integer, FOREIGN KEY (customerRepresentative) references Employee(eid)); CREATE TABLE Orders (oid integer PRIMARY KEY, 
 volume integer, orderedBy integer NOT NULL, 
 FOREIGN KEY (orderedBy) references Customer(cid)); CREATE TABLE Manages(managerID integer, subordinateID integer, PRIMARY KEY (managerID, subordinateID),
 FOREIGN KEY (managerID) references Employee (eid), 
 FOREIGN KEY (subordinateID) references Employee (eid)); Draw an entity relationship diagram which translates into the SQL commands above when using the techniques seen in class. Make sure that all SQL constraints are represented in the diagram. Multiple valid solutions are possible. 
 (10 points) 
 Brian Brian CS 4320 Fall 2019 Final Exam Page � of �10 15 
 Employee Customer Orders Eid Ename Cid Cname CustomerRep OrderedBy Manager Subordinate Manages Oid Volume Brian CS 4320 Fall 2019 Final Exam Page � of �11 15 Part D) Concurrency Control. (20 points) In the following, we ask you to write out schedules with certain properties. Use the notation seen in the lecture (i.e., WT(A) means transaction number T writes object A, RT(A) means transaction T reads object A, CT means transaction T commits, AT means transaction T aborts). D.1) Propose a schedule involving two transactions that is not recoverable but conflict-serializable. Justify why it is non-recoverable in one to two sentences. Justify why it is conflict-serializable by drawing the conflict graph. 
 (5 points) W1(A) R2(A) W2(B) C2 A1. This schedule is non-recoverable since transaction 2 commits after reading data from an uncommitted transaction (which aborts here). Nevertheless, the conflict graph has no cycles and only one node for T2 (aborted transactions are not taken into account) which means the schedule is conflict serializable. D.2) Propose a schedule involving two transactions that avoids cascading aborts but is not strict. Justify in one to two sentences why it avoids cascading aborts and why it is not strict. 
 (5 points) W1(A) W2(A) C2 C1. The schedule avoids cascading aborts since no transaction reads uncommitted data. The schedule is not strict since uncommitted data is overridden (transaction 2 overrides the uncommitted version of A, written by transaction 1). Brian CS 4320 Fall 2019 Final Exam Page � of �12 15 D.3) Propose a schedule that has the following conflict graph: 
 (5 points) W1(A) R2(A) W3(B) R2(B) W3(C) R4(C) W2(D) R4(D). D.4) Propose a schedule that could not have been generated when using any variant of two-phase locking. Justify in one to two sentences. 
 (5 points) W1(A) R2(A) W1(A) This schedule is not conflict-serializable (cycle via conflicts on A). As two-phase locking only generates conflict-serializable schedules, the schedule could not have been generated by 2PL. 
 1 2 3 4 Brian CS 4320 Fall 2019 Final Exam Page � of �13 15 Part E) Logging and Recovery. (20 points) The ARIES algorithm maintains various data structures during execution: - A dirty page table, containing entries of the form (pageID, recLSN) referring to a data page and the log entry at which it became dirty, - A transaction table, containing entries of the form (transactionID, status, lastLSN) referring to a transaction, its status, and its last action, - An "in-memory" pageLSN for each data page in main memory, - An "on-disk" pageLSN for each data page stored on hard disk, - the flushedLSN counter indicating up to which point the log has been written to disk. For the following log entries, indicate precisely which data structures are changed and how. For changes to dirty page table or transaction table, indicate each field for newly inserted entries or indicate precisely which fields of existing entries change. Assume that data structures are only changed when necessary (e.g., flushedLSN is only updated when required by the rules of write-ahead logging). Assume that dirty page table and transaction table are initially empty, all pageLSN and flushedLSN counters are initialized to zero. Page steals happen only if explicitly mentioned. After each entry on the following page, describe precisely how data structures are updated after the entry has been processed (e.g., write "in-memory pageLSN for page P10 changes to 40 and transaction T5 is inserted into the transaction table with status running and lastLSN 40"). Brian CS 4320 Fall 2019 Final Exam Page � of �14 15 10 T1 writes P3 in-memory pageLSN for P3 changes to 10 and transaction T1 is inserted with status running and lastLSN 10, P3 with recLSN 10 is inserted into the DPT. 20 T2 writes P2 in-memory pageLSN for P2 changes to 20 and transaction T2 is inserted with status running and lastLSN 20, P2 is inserted with recLSN 20 into the DPT. 30 T1 writes P5 - P2 gets written to disk due to page stealing in-memory pageLSN for P5 changes to 30, T1 lastLSN is updated to 30, P2 is taken out of DPT, on-disk pageLSN for P2 changes to 20, P5 is inserted into DPT with recLSN 30, flushedLSN is updated to 20. 40 T3 writes P1 in-memory pageLSN for P1 changes to 40, T3 inserted into transaction table with status running and lastLSN 40, P1 inserted into DPT with recLSN 40. 50 T1 Commits flushedLSN is updated to 50, status of T1 changed to committed in transaction table and lastLSN changed to 50 60 T1 End T1 is taken out of transaction table 70 T3 writes P6 in-memory page LSN for P6 changes to 70, lastLSN for T3 is updated to 70, P6 is inserted into DPT with recLSN 70 80 T2 Aborts Status of transaction T2 is updated to aborted, lastLSN set to 80 90 CLR T2 P2 (Undoing LSN 20) P2 is inserted into DPT with recLSN 90, in-memory pageLSN for P2 updated to 90, lastLSN of T2 changed to 90 Brian