Raft Programming
In this project, you will build a distributed service based on Raft. Raft organizes client requests into a sequence, called the log, and ensures that all the replica servers see the same log. Each replica executes client requests in log order, applying them to its local copy of the service’s state.
You are required to:
Q1. Choose and run one Raft implementation. The following are two examples of production-grade Raft implementations. You can use any one of them or choose other codebases.
1. hazelcast/hazelcast: Open Source In-Memory Data Grid
2. sofastack/sofa-jraft: A production-grade java implementation of RAFT consensus
algorithm.
Q2. Configure and run a Raft replication service with at least three replicas using any codebase on your own computer or AWS. You can either compile from the source code or directly run the bin/Jar.
Q3. Save a distributed counter in the raft group of multiple nodes (at least three). The counter can increment/decrement and be called while remaining consistent among all nodes. The counter can normally provide three external services when a minority of nodes (e.g., one node for a cluster with three replicas) fail:
1. incrementAndGet(delta): increments the value of delta and returns the incremented value.
2. decrementAndGet(delta): decrements the value of delta and returns the decremented value.
3. get(): gets the latest value.
If your codebase has already provided these high-level interfaces (i.e., incrementAndGet(), decrementAndGet()), do not use them directly, please implement your own distributed counter using the basic Get() and Set() APIs.
Submission:
1. Source code of the distributed counter.
2. Project report (including both your distributed counter setup procedure and runtime
screenshots).
Spark-AWS Programming
Background
We suggest you read/review the following resources before starting this question:
● Install Spark:https://www.tutorialspoint.com/apache_spark/apache_spark_installation.htm
● Deploying your Spark cluster on AWS:https://spark.apache.org/docs/0.7.0/ec2-scripts.html
● An easy way to deploy your Spark-AWS cluster with Spark-EC2:https://github.com/amplab/spark-ec2
● Introduction to Scala, if it’s new to you:
○ https://docs.scala-lang.org/tour/tour-of-scala.html ○ https://learnxinyminutes.com/docs/scala/
● Deploying and running the Spark Application:https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/spark_d evelop_run.html
Question
Here is a stock price file (stock_prices.csv) that contains the daily closing price of a few stocks. You have to write programs to answer three questions below:
Q1. Load up the stock price file as a DataFrame, Dataset, o r RDD. Write a spark program to compute the average daily return of every stock for every date, and your program should print all results to screen. In other words, your output should have the columns:
Date
average_return
YYYY-MM-DD
return of all stocks on that date
Q2. Which stock was traded most frequently – as measured by closing price * volume – on average?
Q3. Which stock was the most volatile as measured by the annualized standard deviation of daily returns?
We provide you a scala version template for your reference:
Notes:
1. You have to first deploy a Spark cluster on AWS and provide snapshots to show your steps in the project report (README).
2. You can choose whatever programming language you like to complete this task, such as Java, scala, and python.
2. No dividends adjustments are necessary, using only the closing price to determine returns.
3. If a price is missing on a given date, you can compute returns from the closest available date.
4. Return can be trivially computed as the % difference of two prices.
Submission:
You have to submit two files for this question: one for the program file (for three questions, we will test it in spark on AWS) and one for the project report (including both your AWS-spark setup procedure and your answers to Q2 & Q3).
How do we grade this exam?
● Correctness is the most important.
● Elegance and style of the program are the next most important.
● Performance is the last criteria but can be detrimental to you if your implementation is
unreasonably costly.
● Each student must work on the final project on his/her own. No discussions among
students are allowed (note that this is different from programming assignments or
tutorials, where students are encouraged to discuss but work on solutions on his/her own. This is a final project, like a final exam, no discussion among students is allowed; if we find your submitted final projects contain similar solutions or ideas, we will deeply look into your cases and conduct penalty). A mark of zero (0) will be assigned for the assessment in which the plagiarism was found to occur.
● You must not search through Google to find solutions either (of course, you can look for relevant tech materials such as understanding Raft from different angles); If we find such solutions, we will conduct penalties too.
● Technical questions (e.g., “can you give me some ideas to solve my programming task?”) related to the final project will not be answered by the TAs or teacher during the project time, because the final project is treated a final exam. They will only answer potentially ambiguous wording in the final project doc via emails.