Simplified Distributed Streaming Platform, Part I (200 points)
Programming Assignment #1
On machine with IP address 104.130.67.11, do:
$ server itu_server2
itu_server2 at IP address 104.130.67.11 and port number: 3571 On any machine do:
$ client client_1
client_1> add (name=itu_server_1 ip=23.253.20.67 port=9998) (name=itu_server_2 ip=104.130.67.11 port=3571)
client_1> create (topic=topic_1 partitions=2) (topic=t2)
Due date: midnight February 5, 2016
Please implement a simplified distributed streaming platform using Python, C/C++ or
Java using client-server network programming. In a distributed streaming platform, a producer can publish a stream of records to one or more topics, a consumer (with a unique name) can subscribe one or more topics, the platform stores stream of records in a reliable fault-tolerant way, and process (i.e., transform the input streams to output streams) stream of records as they occur in a queue for each partition of each topic. Each record consists of a key, a value and a timestamp. To simplify your work, the key is a string and the value is an integer, but there is no need to support timestamp. Each consumer owns a partition for one or more topics it subscribed, and the platform will keep the offset for each consumer and a “now” offset for each topic. If the number of consumers is less than or equal to the number of partitions, the consumers share all partitions. If the number of consumers is equal to the number of partitions, then any new consumer needs to wait until other completed. We use replication factor of two. Part I of this assignment implement the basic functions of the platform with assumption that no server will be down, and part II supports redundancy and fault-tolerant.
To simplify your work, both server and client run on the Linux machines. You should first run a server program on every machine which may join the platform, and the server will find an available port automatically and display the corresponding port number. Then run P1 with a unique name and IP address of the server, and use subcommand (add/delete hosts, publish, subscribe). E.g.,
On machine with IP address 23.253.20.67, do:
$ server itu_server1
itu_server1 at IP address 23.253.20.67 and port number: 9998
(topic=topic_1)
topic_1 and can
t2 and can get partition 0
client_1> subscribe
client_1 subscribed
client_1>
On another machine, do:
$ client client_2
client_2> subscribe
client_2 subscribed
client_2 subscribed
At this time, the client_1 should get the notification as below: client_1 subscribed topic_1 and can get partition 0 Continue on client_2, do:
client_2> publish (topic=topic_1 partition=2 key=def value=1) put (“def”, 1) to topic_1 on partition 1 on itu_server2 client_2> publish (topic=topic_1 key=abc value=3)
put (“abc”, 3) to topic_1 on partition 0 on itu_server1 and partition 1 on itu_server2
client_2> get (topic=topic_1 partition=1)
get (“def”, 1) from topic1 and partition 1 on itu_server2 client_2> get (topic=t2 partition=0)
Error: topic t2 has no data left
client_2>
On client_1, do:
client_1> get (topic=topic_1 partition=0)
get (“abc”, 3) from topic1 and partition 0 on itu_server1 client_1>
The add hosts will run server program on the hosts and report available port numbers. All those information should be saved on the hosts’ /tmp/<login>/stream/ <name>/info and all record should store in /tmp/<login>/stream/<name>/data. Make the /tmp/<login>, /tmp/<login>/stream and /tmp/<login>/stream/<name> mode 777, and make info and data files mode 666. Which host to store the topic/partition is decided by hashing, e.g., “md5sum – <<< “<string>”, or echo “<string>” | md5sum. The <string> is the topic with optionally concatenated with the partition number.
get partition 0 and 1
(topic=t2)
get partition 1
The command are server and client, and the subcommands are:
- 1) <add|delete> {(<host name>, <ip address>, <port number>)}
- 2) <create|remove> {(topic=<topic> [partition=<number>])} # default <number>=1
- 3) <subscribe/unsubscribe> {(topic=<topic>)}
- 4) publish (topic=<topic> [partition=<number>])
- 5) get (topic=<topic> partition=<number>)
Where […] means optional, {…} means can be one or more times. And we use <key>=<value> for parameter without worry the ordering. The delete, remove, and unsubscribe is for part II.
Student Name: ID:
Score:
Correctness and boundary condition (60%): Error Handling (5%):
Automatic available port finding and support both host name and IP address (5%):
Display output on both server and client windows whenever there is an event happens (10%):
Modular design, file/directory organizing, showing input, documentation, coding standards, sympathy/typing point with README (20%):
Subtotal:
Late penalty (20% per day):
Special service penalty (5%):
Total score: