Information Management
Question 1)
February 26, 2021
1. Clearly illustrate the concepts of data-centric and client-centric consistency, describing the differences between the two concepts, and clarifying in which scenario each consistency concept can (or should) be adopted.
2. Consider the following schedule of operations performed by four processes over one variable (initially set to zero): P1: R(X =1)R(X =4)R(X =3)
P2: R(X =1)W(X =2)R(X =3)R(X =4)
P3: W (X = 1) W (X = 3) R(X = 2)
P4: R(X =3)R(X =2)W(X =4)
(a) Is the schedule sequential consistent?
If sequential consistency is satisfied, include the minimum number of operations that makes the schedule non sequential consistent.
Otherwise, indicate the minimum number of operations (and which one) that should be removed to guarantee sequential consistency.
(b) Is the schedule causal consistent?
List all the causal dependencies in the schedule.
If the schedule is causal consistent, include the minimum number of operations to make the schedule non causal consistency.
If the schedule is not causal consistent, indicate the operation(s) that should be removed to guarantee causal consistency.
Question 2)
1. What is a bitmap index? Clearly explain its advantages and disadvantages and how to insert/remove values from it.
2. Build the bitmap index for attribute PRODUCT and the bitmap index for attribute CITY.
Write the condition operating on bitmap indexes to filter sales in Milan for products P1 and P3.
1 2 3 4 5 6 7 8
Venice Milan Venice Milan Venice Turin
Question 3)
1. Describe and discuss the association rule mining and the frequent itemset mining problems and the relationship between these two problems.
2. Considering the table below and assuming min sup=0.75, identify all the frequent itemsets using Apriori algorithm.
3 A,B,C,D,E 4 A,B,C,D
Question 4)
Assume a distributed database for a company, where each warehouse keeps track, for each product, of the flows of items during
, ProductId, Date, Quantity). the year in a relation having schema FLOW(Id
Note that attribute Quantity has a positive value in case of input flow, and a negative value in case of output flow.
How would you use a MapReduce framework to identify, for each product, the overall flow (positive or negative quantity) of each product?
How would you define the map and the reduce functions?
Illustrate your solution through an example with small tables.
Question 5) only for students who did not attend database course with Prof. Samarati
1. Illustrate the idempotency property for UNDO and REDO operations in log management.
2. Given the following log:
DUMP, B(T1), B(T2), B(T3), I(T1,O1,A1), I(T2,O2,A2), C(T1), B(T4), D(T4,O3,B3), CK(…), U(T2,O4,B4,A4), D(T3,O5,B5), B(T5), A(T4), U(T5,O6,B6,A6), CK(. . . ), D(T5,O7,B7), C(T2), B(T6), I(T6,08,A8), A(T5), C(T3) FAI- LURE
(a) write, for each checkpoint record, active transactions;
(b) illustrate in the details the steps of a warm restart to recover from failure.