QUESTION 1 10
1. Illustrate a Big Data application scenario in Smart Cities according to the five V’s. Outline how it can benefit from the MapReduce model.
1. Chose between a public, private and hybrid cloud solution for the following application domains: (i) Smart Grid power utilities and (ii) Municipalities. Discuss and justify the requirements and implications for costs and privacy at each case.
QUESTION 3 10
1. Which database type would you choose to support Big Data Analysis about the spread of the Coronavirus and why? Sketch an example of what data will be in this database, how they are structured and what knowledge can be extracted.
QUESTION 4 10
1. Consider the scenario of collecting smart phone sensors data, such as GPS, accelerometer, etc., from several users. The sensor data are stored in a Big Data infrastructure. (i) Chose a database type solution for the sensor data storage and explain the rationale of your choice and the involved trade-offs. (iii) Provide an example of how the data records in the database look like.
QUESTION 5 8
1. Why simple Bloom filters come with false positives but never with false negatives? Can you identify design features that influence the likelihood of false positives?
QUESTION 6 8
1. Illustrate an application scenario of a simple Bloom filter. Explain the following: (i) What information you add. (ii) For what purpose membership queries are used for. (iii) What performance gains the simple Bloom filter provides to the application?
QUESTION 2 10
QUESTION 7 8
1. To improve the performance of a Big Data system you introduce a simple bloom filter. Calculate the allocated size of the bloom filter in KB for the storage of 250,000 elements with a false positive probability of 0.00003 assuming an optimum number of hash functions.
QUESTION 8 3
1. How many hash functions should be used in the calculation of the previous question? Calculate your answer.
Calculate any necessary adjustment in the bloom filter size to avoid the decimals in the calculation of the number of hash functions.
QUESTION 9 3
1. Assume a Big Data system that stores 64*n bits of data in an array, where n is the number of elements added in the array. Calculate how many more times smaller the storage in a simple Bloom filter is, given that a false positive probability of 0.005 can be tolerated and an optimal number of hash functions is used.
QUESTION 10 10
1. Model an automated and decentralized load balancer for Hadoop cluster computing infrastructures using the EPOS system. Specifically determine the following: what the agents and plans represent, what the global (system-wide) and local (agent) objectives are, what could be the agent preference trade-offs over the two objectives and how these preferences are modeled.
QUESTION 11 10
1. Illustrate an application scenario of decentralized in-network aggregation such as the one of DIAS, the Dynamic Intelligent Aggregation Service. Determine: (i) Which users/devices are the data producers and consumers? (ii) What do the possible states represent and how can they be
generated? (iii) Which aggregation functions are computed and what useful information they provide?
QUESTION 12 10
1. Assume the network of 4 nodes {A, B, C, D}. Each node has a value as follows: {A=1, B=2, C=4, D=3}. Each node estimates the SUM aggregation function with input its
value and the received values of the nodes in the network. Each node randomly connects to other nodes via directed links as follows: A to {B}, B to {A,C,D}, C to {A,D} and D to {B}. Calculate the average relative estimation error over all nodes in the network.