1 Organisation
Course Project
The course project consists of 5 parts. The first part is simply an introduction to Kompics and is optional if you have worked with Kompics before. The rest will be summarised in a final project report which is graded at the end of the course and forms the basis for the lab part of the course.
1.1 Goals
The goal of the project is to implement and test and simple partitioned, distributed in-memory key-value store with linearisable operation semantics. You have significant freedoms in your choice of implementation, but you will need to motivate all your decisions in the report. Some of the issues to consider are:
• Networking Protocol
• Bootstrapping
• Group Membership
• Failure Detector
• Routing Protocol
• Replication Algorithm
You will also have to write verification scenarios for your implementation. Distributed algorithms are notoriously difficult to test and verify, so do not take this part of the tasks lightly.
Note: For team tasks, it is usually a good idea to separate the work into chunks that can be worked on individually after agreeing on APIs (in our case usually Kompics Ports). In this case separating testing scenarios and implementation work may also prove advantageous.
1.2 Requirements
For this lab you will need the following:
• Java SDK, version 8 or newer. Oracle Java is recommended but OpenJDK should work as well.
• Maven or SBT
• An IDE like Netbeans, Eclipse, IntelliJ is recommended, but any simple text-editor would be enough.
2 Tasks
2.0 Introduction to Kompics (0 Points)
Implement all the PingPnng examples from the Kompics tutorial at: https://kompics.github.io/kompics-scala/ and/or complete Programming Exercise 1 in Canvas.
This task is optional and does not give any points. However, if you haven’t worked with Kompics before you should most definitely do it. If you have questions you can ask them during the tutorial exercise
1
session.
2.1 Infrastructure (4 Points)
For this task you have to design the basic infrastructural layers for the key-value store. Your system should support a partitioned key-space of some kind (e.g. hash-partitioned strings or range-partitioned integers). The partitions need to be distributed over the available nodes such that all value are replicated with a specific replication degree 6. You are free to keep 6 as a configuration value or hardcode it, as long as it fulfils the requirements of your chosen replication algorithm (task 2.2), so plan ahead.
For this task you need to be able to set up the system, assign nodes to partitions and replication groups, lookup (preloaded) values1 from an external system (client), and detect (but not necessarily handle) failures of nodes. Additionally, you should be able to broadcast information within replication groups and possibly across.
On the verification side you have to write simulator scenarios that test all the features mentioned above. You should also be able to run in a simple deployment (it’s fine to run multiple JVMs on the same machine and consider them separate nodes).
For the report describe and motivate all your decisions and tests.
2.2 KV-Store (4 Points)
After you have gotten the basic infrastructure working, you have to add a PUT (key, value) operation, that updates (or adds) a value at a key, to the GET from the previous task. As mentioned in the goals, the operations should fulfil the linearisable property, so make sure choose you the right replication algorithm.
For full points, also implement a compare-and-swap (CAS2) operation that compares the current value at the key to a given reference value and only updates with the new value if the old value and the reference value are the same.
As before, be sure to write test scenarios for the simulator that check the correctness of your implementation. Especially be very careful to explain how you are verifying the linearisability of the store.
2.3 Reconfiguration (4 Points)
At this point your store is fairly static and can’t really deal with node failures apart from complaining about them. For this task you should implement reconfiguration support for your replication groups and your routing protocol. You should be able to deal with both nodes leaving the system (treat a voluntary leave the same way as a failure, for simplicity) and new nodes joining the system. There are many ways to interpret the semantics of this, including making some of the partitions unavailable while they are under-replicated. Any approach you take is acceptable as long as you document it properly in the report. However, you have to make sure that reconfiguration does not violate linearisability for correct nodes. To get full points for this task you’ll have to write test scenarios that reconfigure the system and verify that for all correct nodes the operations are still linearisable.
1 This is effectively a GET (key) : value operation. 4CAS(key, referenceValue, newValue) — oldValue
2
2.4 Leader Leases
In order to be awarded bonus points in the project, implement one (or both) of the following more challenging tasks.
2.4.1 Leader Leases (2 Points)
Most workloads are > 90% reads, so having a read-optimised store is typically a good choice for performance. Implement a lease-based mechanism for allowing reads (GETs) to be processed without involving a majority of replicas. Implement this mechanism on the version of your code that does not feature Reconfiguration! Demonstrate the performance improvement of this implementation with a simple benchmark.
To get full points for this task you’ll have to write test scenarios that interleave reads with writes during temporary partitions and verify that the operations are still linearisable.
2.4.2 Reconfigurable Leaser Leases (2 Points)
Enhance your implementation by allowing dynamic reconfiguration of the leasebased variant of your system.
Again, verify that the operations are still linearisable.
Note These tasks are fairly open ended and can get quite difficult, depending on the choices you have made before. You will be awarded partial points for demonstrating in the report that you have understood what is involved in a proper implementation of this. The expectation for full points is to have something demonstrably working in the code, not a perfect solution to all the previously described issues implemented.
3