Microsoft Word – streaming1.docx
Programming Assignment #1
COEN 317 Distributed System
Department of Computer Engineering
Santa Clara University
Dr. Ming-Hwa Wang Spring Quarter 2017
Phone: (408) 526-4844 Email address: mwang2@engr.scu.edu
Course website: http://www.cse.scu.edu/~mwang2/distributed/
Office Hours: Monday 9:30-10:00pm
Due date: midnight February 5, 2016
Simplified Distributed Streaming Platform, Part I (200 points)
Please implement a simplified distributed streaming platform using Python, C/C++ or
Java using client-server network programming. In a distributed streaming platform,
a producer can publish a stream of records to one or more topics, a consumer (with
a unique name) can subscribe one or more topics, the platform stores stream of
records in a reliable fault-tolerant way, and process (i.e., transform the input streams
to output streams) stream of records as they occur in a queue for each partition of
each topic. Each record consists of a key, a value and a timestamp. To simplify your
work, the key is a string and the value is an integer, but there is no need to support
timestamp. Each consumer owns a partition for one or more topics it subscribed, and
the platform will keep the offset for each consumer and a “now” offset for each topic.
If the number of consumers is less than or equal to the number of partitions, the
consumers share all partitions. If the number of consumers is equal to the number of
partitions, then any new consumer needs to wait until other completed. We use
replication factor of two. Part I of this assignment implement the basic functions of
the platform with assumption that no server will be down, and part II supports
redundancy and fault-tolerant.
To simplify your work, both server and client run on the Linux machines. You should
first run a server program on every machine which may join the platform, and the
server will find an available port automatically and display the corresponding port
number. Then run P1 with a unique name and IP address of the server, and use
subcommand (add/delete hosts, publish, subscribe). E.g.,
On machine with IP address 23.253.20.67, do:
$ server itu_server1
itu_server1 at IP address 23.253.20.67 and port number: 9998
On machine with IP address 104.130.67.11, do:
$ server itu_server2
itu_server2 at IP address 104.130.67.11 and port number: 3571
On any machine do:
$ client client_1
client_1> add (name=itu_server_1 ip=23.253.20.67 port=9998)
(name=itu_server_2 ip=104.130.67.11 port=3571)
client_1> create (topic=topic_1 partitions=2) (topic=t2)
client_1> subscribe (topic=topic_1)
client_1 subscribed topic_1 and can get partition 0 and 1
client_1>
On another machine, do:
$ client client_2
client_2> subscribe (topic=topic_1) (topic=t2)
client_2 subscribed topic_1 and can get partition 1
client_2 subscribed t2 and can get partition 0
At this time, the client_1 should get the notification as below:
client_1 subscribed topic_1 and can get partition 0
Continue on client_2, do:
client_2> publish (topic=topic_1 partition=2 key=def value=1)
put (“def”, 1) to topic_1 on partition 1 on itu_server2
client_2> publish (topic=topic_1 key=abc value=3)
put (“abc”, 3) to topic_1 on partition 0 on itu_server1 and
partition 1 on itu_server2
client_2> get (topic=topic_1 partition=1)
get (“def”, 1) from topic1 and partition 1 on itu_server2
client_2> get (topic=t2 partition=0)
Error: topic t2 has no data left
client_2>
On client_1, do:
client_1> get (topic=topic_1 partition=0)
get (“abc”, 3) from topic1 and partition 0 on itu_server1
client_1>
The add hosts will run server program on the hosts and report available port
numbers. All those information should be saved on the hosts’ /tmp/
Make the /tmp/
mode 777, and make info and data files mode 666. Which host to store the
topic/partition is decided by hashing, e.g., “md5sum – <<< “
“
partition number.
The command are server and client, and the subcommands are:
1)
2)
3)
4) publish (topic=
5) get (topic=
Where […] means optional, {…} means can be one or more times. And we use
unsubscribe is for part II.
Student Name:
ID:
Score:
Correctness and boundary condition (60%):
Error Handling (5%):
Automatic available port finding and support both host name and IP address
(5%):
Display output on both server and client windows whenever there is an event
happens (10%):
Modular design, file/directory organizing, showing input, documentation, coding
standards, sympathy/typing point with README (20%):
Subtotal:
Late penalty (20% per day):
Special service penalty (5%):
Total score: