Department of
Computer Science
 
[ Home | Description | Lectures | Labs | Programming | Piazza ]
Spring 2020
CSCI 353
Programming Assignment #4
(100 points total)
Network of Servers in C/C++, Part 1 – Link Layer and Network Layer
Due 11:45PM 4/10/2020 (firm)
IMPORTANT: If you have a working PA3, you can use that as a starting point for this assignment. If you don’t have a working PA3, you can start this assignment from scratch (or start from your code for the labs mentioned below). Other than the code given to everyone as part of the course material, all the code you write to implement this assignment must be your own. We will ABSOLUTELY NOT give you a solution to PA3 to start this assignment! Please note that if you start with your PA3 code, you probably have to delete a lot of code from it and do some major reorganization. Therefore, it’s probably best to start this assignment from your lab code.
IMPORTANT: Lab 9, Lab 10, and Lab 11 are designed to get you going on this assignment. It’s highly recommended that you finish those labs as early as possible. When you are done with all these labs, you just need to clean up your code and implement logging and you are pretty much done!
Introduction
(Please check out the PA4 FAQ before sending your questions to the the course producers or the instructor.)
In the real world, layer 4 and below are implemented inside the operating system. Since we are not doing kernel hacking in this class, we will implement these functionalities in an overlay network, i.e., an application-level network, layered on top of the Internet (also known as “over-the-top network” mentioned in Ch 2). In this assignment, we will impmlement layer 2 (linke layer) and layer 3 (network layer) functionalities in such an overlay network. We will use graph terminology to describe the topology of an overlay network. An end-system in an overlay network will be referred to as a node. In Ch 2, we talked about the peer-to-peer architecture, which can also be considered as an overlay network.
Conceptually, we can start with your PA3 web server and turn it into an autonomous and independently operated node in an overlay network. Nodes in this network are part server, part client, and part router. We will refer to this overlay network as 353NET. A pair of directly connected nodes are referred to as neighbors and the “direct connection” between them is a persistent TCP connection (i.e., you can send multiple messages over such a connection). We will use standard graph terminology to describe our network of nodes where a vertex is a node and an edge is a connection between a pair of neighboring nodes. Not all nodes in the 353NET may be up and running. With some nodes being down, the 353NET can be partitioned and that is not unusual.
Neighboring nodes in a 353NET exchange messages by following the 353NET protocol described in the rest of this spec. The main goal we would like to achieve in this assignment is that every node in a 353NET knows the topology of the network partition it’s in and maintains that information in an adjacency list data structure (similar to the one used in your PA1 assignment) and use this information to perform message routing for networking applications that would run on top of the 353NET in PA5. This needs to be achieved without an active central authority (although we will provide every node in the 353NET with an identical “network map”).
When you start a node, you must specify a configuration file to be used for that node and each node must use a different configuration file. Inside each configuration file is a map of the entire 353NET (for this assignment, this map is guaranteed to be identical in all the configuration files). Using this network map, a node can figure out which nodes are supposed to be its neighbors. When functioning as a router, the main job of this node is to make connections to all of its neighbors, if they are up and running. If every node does that and if all the nodes in the 353NET are up and running, the network that’s formed should be identical to the network map. If some nodes are not running or died, the network formed should be identical to the original network map with those nodes and links to those nodes removed.
Please note that the next programming assignment (PA5) builds on top of this assignment. In order to get any credit for PA5, your implementation of PA4 needs to be reasonably functioning.
Electronic submissions only.
Although you won’t have to implement routing and forwarding until PA5, if you are not sure about these concepts, please review Ch 1 of the textbook.
Disclaimer: The 353NET is a toy network. It’s not terribly efficient. It’s designed as a programming exercise in an introductory networking class.
Compiling
You must use a Makefile so that when the grader simply enters:
make pa4
(or simply “make”), an executable named pa4 is created and the compiler command that gets run must start with “g++ -g -Wall -std=c++11”. Please make sure that your submission conforms to the general compilation requirements and README requirements.
The grader will be using the above command to create your executable on a standard 32-bit Ubuntu 16.04 system running inside VirtualBox as mentioned in the general compilation requirements.
Commandline Syntax
The commandline syntax (also known as “usage information”) for pa4 is as follows:
pa4 CONFIGFILE
The CONFIGFILE is a configuration file that contains all the information your node needs in order to function in the 353NET.
Unless otherwise specified, output of your program must go to cout and error messages must go to stderr/cerr.
Network Formation
We want to build a network of nodes that “works” even when the entire network is partitioned. As long as a node can reach another node through a network path, they should be able communicate and exchange messages. The design of the 353NET is to make sure that even when the network is partitioned, nodes within a partition can still work together to perform networking functions. Here is the basic scheme.
When a node starts, let’s call this node X, it must read its CONFIGFILE to figure out which nodes are supposed to be its neighbors. Node X will attempt to connect to all of its neighbors. If a TCP connection can be made to say, node Y, the very first message that must be sent on this connection is a hello message where node X will introduce itself to node Y. When node Y received a hello message from node X, it must check to see if there is already a connection to node X. If there is, node Y should simply shutdown and close the newly established connection. If there isn’t, node Y should add this connection to its list of connections and send a hello message back to node X using the same connection (in this case, the first message from Y to X is also a hello message). After a pair of hello message has been exchanged on a particular connection, that connection is said to be active.
The link state of a node is a list of its active neighbors. Once node X has active connections, it must send link state update messages to the entire network via flooding every time its link state has changed to tell the rest of the network who its active neighbors are. Clearly, these messages can only reach nodes that are in the same network partition as node X. A node should represent its knowledge of the current topology of the 353NET in an adjacency list data structure. When a node received a link state update message, it must use the information in the message to update the adjacency list data structure. We can view the adjacency list data structure as a list of “link states”. Using the adjacency list data structure, a node can run the Breadth-First-Search (BFS) algorithm or Dijkstra’s Shortest Path algorithm to construct a forwarding table and use it to forward 353NET application messages. Although in this assignment, you will not implementing “forwarding”.
Maintaining Network State – Soft State vs. Hard State
Since any node can die at any time, a particular link state in the adjacency list data structure may be invalid. One approach to make sure that the adjacency list data structure is up to date is to treat link state information as a “soft state”, meaning that if a link state information is not refreshed, you must delete after a timeout. Even though this approach is taken in the Internet in some protocols, we will not take this approach because we will end up with too many messages. Also, it will make your code very difficult to debug. For this assignment, you must take the “hard state” approach in Lab 11. Basically, for a node, its link state information about other nodes is only removed or updated when certain external messages are received.
KeepAlive – Maintaining Neighbor Connection
A TCP connection has a built-in keepalive mechanism to make sure that the TCP connection is up and running. Therefore, even if a node is unresponsive, the connection to that node would appear to be healthy. In order to detect a unresponsive neighbor, a keepalive timer can be used to timeout a unresponsive neighbor. This is important in a real network where nodes are heterogeneous. But we will not implement this since all your nodes in your network are implemented by you!
GoodBye – Leaving 353NET Nicely
When you disconnect from your neighbor, it would be nice to be polite to say “good-bye”. Although typically, there is no time to say “good-bye”, and therefore, we will not say “good-bye”.
353NET Protocol Message Format
The 353NET protocol message format is similar in design to the HTTP protocol message format. All 353NET protocol messages has a message header which is followed by an empty line and an optional message body. The first line of a message must be in the following format:
353NET/1.0 MSGTYPE NONE/1.0\r\n
The first field specifies the protocol name and version to which the message conforms. For this assignment, the protocol name must be “353NET” and we are using version 1.0 of the 353NET protocol. The 2nd field is a message type (the rest of this section specifies the format for each type of messages). The 3rd field specifies the application-level protocol and version to which the message conforms (and it’s always “NONE/1.0” for this assignment and you can either ignore this field or not implement it).
The message header consists of lines of text, each of which is terminated by a “\r\n” sequence. Each line in the header (except for the first line) has the following format (similar to HTTP):
KEY:VALUE\r\n
where VALUE may have leading and trailing space characters. The end of the message header is an empty line (i.e., only contains “\r\n”).
One of the lines in the message header must have the “Content-Length” key. The corresponding value specifies the exact number of bytes in the message body which immediately follows the empty line after the message header. Since we are using persistent TCP connections, within a connection, multiple messages can be sent. At the end of a message body, another message begins immediately. You must treat the message body as binary data when you are detecting message boundaries. You must reliably detect message boundaries or your node can get very confused.
You should be able to easily adapt your code from PA3 to read and parse a 353NET message since the basic structure of 353NET messages is the same as HTTP messages.
There can be various reasons why a message is considered malformed. For example, if the first line in the header has an unknown MSGTYPE or the rest of the line does not contain the expected information, then the message is considered malformed. If some lines (except for the first line) in the header is missing a colon, the message is considered malformed. If the header does not contain a “Content-Length” key, the message is considered malformed. And so on. If a node receives a malformed message, it must immediately shutdown and close the socket and delete the connection. But since your node only has to work with other nodes running exactly the same code and written by yourself, your node should never see a malformed message!
Below are the details of each of the 353NET protocol messages.
Hello Message (MSGTYPE: SAYHELLO) :
A SAYHELLO message header, followed by an empty line, must look like the following:
353NET/1.0 SAYHELLO NONE/1.0\r\n
TTL: 1\r\n
Flood: 0\r\n
From: NODEID\r\n
Content-Length: 0\r\n
\r\n
TTL must be 1 for this message. NODEID is the NodeID of the sending node (which must be a neighbor). This message has no message body.
Link State Update Message (MSGTYPE: LSUPDATE) :
An LSUPDATE message header, followed by an empty line, must look like the following:
353NET/1.0 LSUPDATE NONE/1.0\r\n
TTL: NUMBER\r\n
Flood: 1\r\n
MessageID: MSGID\r\n
From: ORIGIN_NODEID\r\n
OriginStartTime: ORIGIN_START_TIME\r\n
Content-Length: LENGTH\r\n
\r\n
where NUMBER is an integer. For the node that initiated this message, MSGID must be a freshly created MessageID. ORIGIN_NODEID is the NodeID of the node that initiated this message. ORIGIN_START_TIME is the time ORIGIN_NODEID was started. Please see the GetObjID() function below regarding how to create both the MSGID and ORIGIN_START_TIME. LENGTH is the number of bytes in the message body, which immediately follows the empty line after the message header.
The message body are simply a list of NodeIDs separated by “,” characters between any two consecutive NodeIDs. These are the list of neighbors of the node that initiated this message. There must be no “\r”, “\n”, or “\0” in the message body.
When a node receives a LSUPDATE message, it must follow the flooding protocol to flood the message to rest of the 353NET.
Unicast Application Message (MSGTYPE: UCASTAPP) :
Please note that this is ONLY for PA5. Please do not implement this message in this assignment.
An UCASTAPP message header, followed by an empty line, must look like the following:
353NET/1.0 UCASTAPP NONE/1.0\r\n
TTL: NUMBER\r\n
Flood: 0\r\n
MessageID: MSGID\r\n
From: SRC_NODEID\r\n
To: DEST_NODEID\r\n
Next-Layer: NEXT_LAYER\r\n
Content-Length: LENGTH\r\n
\r\n
where NUMBER is an integer. For the node that initiated this message, MSGID must be a freshly created MessageID. SRC_NODEID is the NodeID of the node that initiated this message. DEST_NODEID is the NodeID of the node that this message is trying to reach (i.e., the target node). NEXT_LAYER is for demultiplexing and it can be either 1 (for “udt”) or 2 (for “rdt”). LENGTH is the number of bytes in the message body, which immediately follows the empty line after the message header.
Please note that LENGTH is a byte count. Please note that there must be no null character (i.e., “\0”) in the message body for this message. If you don’t handle LENGTH properly in your code, you will not be able to detect message boundary correctly. So, it’s very important to implement this correctly.
Multithreading
You are required to create one console thread to interact with the user and two networking threads (developed in Lab 9) to handle each connection with a neighbor (one for reading from the socket and one for writing to the socket). You need a neighbors thread (developed in Lab 10) to make sure that you are connected to all your neighbors who are up and running. You should also have a reaper thread (developed in Lab 9) to clean up dead connections and a timer thread to timeout messages in the message cache. You may need to create other threads if needed.
User Console
Each node must have an interactive commandline interface. Your program must use “NODEID> ” (i.e., the NodeID, followed by a “greater than” symbol, followed by a space character) as the command prompt to tell that user which node they are on and that you are expecting the user to enter a line of command. Unless the user shutdowns the node using the quit command, the node should run forever.
The commands and their meanings are:
Name
Arguments
Description
neighbors
(Please ignore arguments if they are present.) List the NodeIDs of your active neighbors.
For example, if SELF is the NodeID of the node whose console you are typing into and it has N active neighbors whose NodeIDs are NodeID_1, NodeID_2, …, NodeID_N (in arbitrary order), you must print the following to the console:
Active neighbors of SELF:\n
\tNodeID_1,NodeID_2,…,NodeID_N\n
Please note that the list of active neighbors of your node is, by definition, the link state of your node!
If SELF does not have active neighbors, you must print the following to the console:
SELF has no active neighbors\n
netgraph
(Please ignore arguments if they are present.) Print the adjacency list representation of the network graph stored in the node whose console you are typing into.
If NodeID_1 is the NodeID of the node in the adjacency list with N1 neighbors whose NodeIDs are NodeID_11, NodeID_12, …, NodeID_1N1, NodeID_2 is the NodeID of the node in the adjacency list with N2 neighbors whose NodeIDs are NodeID_21, NodeID_22, …, NodeID_2N2, etc., you must print the following to the console:
NodeID_1: NodeID_11,NodeID_12,…,NodeID_1N1\n
NodeID_2: NodeID_21,NodeID_22,…,NodeID_2N2\n
…
Let SELF be the NodeID of the node whose console you are typing into, please note that when you print the adjacency list, you must include the neighbor information about SELF as well. If SELF does not have active neighbors, you must print the following to the console for the line that corresponds to SELF:
[BC: paragraph added 4/3/2020]
Before you print the network graph, you must run BFS (or Dijkstra’s algorithm) to clean up the adjacency list since there may be unreachable nodes in your adjacency list data structure.
SELF has no active neighbors\n
linkdown
neighbor
(Please ignore additional arguments if they are present.) Block connection to and from a node with the neighbor NodeID to simulate a broken link. [BC: updated 4/2/2020] Close connection with neighbor if you currently have a connection with it. If the link with neighbor is not blocked, you must print the following to the console:
Link to NODE_ID is now down\n
where NODE_ID is the NodeID of neighbor.
If you are currently connected with neighbor, you must disconnect from neighbor and print the following to the console:
Disconnected from NODE_ID\n
If the link with neighbor is already down because you have previously issued a linkdown command with neighbor, you must print the following to the console:
Link to NODE_ID is already down\n
If neighbor is not a valid neighbor, you must print the following to the console:
NODE_ID is not a valid neighbor\n
Once the link is down with neighbor, you must not initiate a connection with neighbor. Also, when you get a SAYHELLO message from neighbor, you must close the connection immediately and not say hello back.
linkup
neighbor
(Please ignore additional arguments if they are present.) Unblock connection to and from a node with the neighbor NodeID to simulate that a link has been repaired.
Link to NODE_ID is now up\n
where NODE_ID is the NodeID of neighbor.
If the link with neighbor is not down, you must print the following to the console:
Link to NODE_ID is already up\n
If neighbor is not a valid neighbor, you must print the following to the console:
NODE_ID is not a valid neighbor\n
listdownlinks
(Please ignore arguments if they are present.) List the links that are down (i.e., blocked by the linkdown commands you have issued).
If the node whose console you are typing into and it has N blocked neighbors whose NodeIDs are NodeID_1, NodeID_2, …, NodeID_N (in arbitrary order), you must print the following to the console:
Links to the following neighbors are down:\n
\tNodeID_1,NodeID_2,…,NodeID_N\n
If the node does not have any blocked neighbors, you must print the following to the console:
No links are down\n
quit
Exit your program gracefully. (Please ignore arguments if they are present.)
You must ask all threads to self-terminate. Afterwards, your console thread must print the following to the console and then self-terminate:
Console thread terminated\n
You should shutdown and close the master socket that your main thread is calling accept() on. This way, your main thread can wait for the console thread to die and the main thread can exit.
Please note that in Unix, you can create the end-of-input condition (also known as the EOF condition) in stdin by typing the
If the command name is not one of the above, you must print the following to the console:
Command not recognized. Valid commands are:\n
\tlinkdown neighbor\n
\tlinkup neighbor\n
\tlistdownlinks\n
\tneighbors\n
\tnetgraph\n
\tquit\n
If a particular command is malformed (such as wrong number of arguments), please give the correct usage information for that command.
Logging
Logging is extremely important in a networking application like this one. It should be very useful to help you debug this assignment. It’s also extremely important because part of the grading will be done by looking at the log file your node produced. Without looking at the log, it would be very difficult for the grader to tell what your node is doing. The bottomline is that if you don’t log all the required messages correctly, you could end up losing a lot of points!
Every message coming into a node and every message going out of a node must be logged. The format for a line of log entry is as follows:
[TIMESTAMP] {r|i|d} MSGTYPE NEIGHBOR TTL FLOOD CONTENT_LENGTH msg-dependent-data
TIMESTAMP is the current time in the same format as a PA2 timestamp. The next field is referred to as the “category” field. It’s a single character and it can have 3 possible values:
• You must use “r” if the message was “received” by this node.
• You must use “i” if the message was sent and “initiated” by this node.
• You must use “d” if the message was sent due to “flooding” by this node (and not initiated by this node).
If the “category” is “r”, then the NEIGHBOR field is the NodeID of the neighbor from which you received the corresponding message.
If the “category” is “i”, “d”, or “f”, then the NEIGHBOR field is the NodeID of the neighbor to which you sent the corresponding message.
The TTL field corresponds to the value of the TTL key in every message header.
The FLOOD field must be either “F” (if the value of the Flood key in the message header is 1) or the “dash” character (if the value of the Flood key in the message header is 0).
The CONTENT_LENGTH field corresponds to the value of the Content-Length key in every message header. If CONTENT_LENGTH is 0 (such as the case for a SAYHELLO message), there would be no msg-dependent-data. Otherwise, please see the following table regarding what to log for each message type for msg-dependent-data. For the meaning of the symbols in the 3rd column, please click on the corresponding link in the 2nd column.
MSGTYPE
Description
msg-dependent-data
LSUPDATE
Link State Update
MSGID ORIGIN_START_TIME ORIGIN_NODEID (NodeIDs)
The “(NodeIDs)” for the LSUPDATE message is a list of comma-separated NodeIDs from the message body of the respective message, surrounded by a pair of parentheses. You must print the pair of parentheses and there must be no space character anywhere between the left parenthesis and the right parenthesis.
PID File
This part is almost identical to PA2. The only difference is that there is no default PID file name and you must use the PID file name specified in the CONFIGFILE. You should delete this file (using the unlink() system call) before your node shuts down so that we can use the presence of this file to know whether your node is running or not (and if your node is running, we know what PID to use to kill the node in case your node is not responding).
Configuration File Format
A configuration file is a file in the INI format in Lab 5. Please read the details about the INI file format there.
Below is an example of a configuration file that you would use with this assignment:
[startup]
host=localhost
port=12000
pidfile=pa4data/12000.pid
logfile=pa4data/12000.log
[params]
max_ttl=9
msg_lifetime=8
neighbor_retry_interval=4
[topology]
localhost:12000=localhost:12002
localhost:12002=localhost:12000,localhost:12004,localhost:12010,localhost:12012
localhost:12004=localhost:12002,localhost:12006,localhost:12012
localhost:12006=localhost:12004,localhost:12008
localhost:12008=localhost:12006,localhost:12010
localhost:12010=localhost:12002,localhost:12008
localhost:12012=localhost:12002,localhost:12004
[map]
; +——-+
; /–+ 12010 +————————–\
; | +——-+ |
; | |
; +——-+ +—+—+ +——-+ +——-+ +—+—+
; | 12000 +—+ 12002 +—–+ 12004 +—+ 12006 +—+ 12008 |
; +——-+ +—+—+ +—+—+ +——-+ +——-+
; | |
; | +——-+ |
; \–+ 12012 +–/
; +——-+
[logging]
SAYHELLO=1
LSUPDATE=0
UCASTAPP=1
The “[startup]” section is required and it must contain the following key=value lines (can come in any order and additional keys can be ignored):
• host=hostname – hostname is name of the machine your server must serve on. Just like all the other assignments, we will only use localhost as the hostname.
• port=number – number is the Control Port Number your server must listen on. When you call getaddrinfo(), you must use the string representation of the number as the 2nd argument. If that caused getaddrinfo() to fail, you must print an error message and quit your program.
• pidfile=pidfilename – pidfilename is the file name of a file for storing the process ID of your server.
• logfile=logfilename – logfilename is the file name of a file for logging messages.
All the other sections of the configuration files are identical for all nodes in the 353NET. The only section that’s different in all the configuration files is the “[startup]” section.
The “[params]” section is required and it must contain the following key=value lines (can come in any order and additional keys can be ignored):
• max_ttl=number – This is referred to as the Max TTL of your node. number is the TTL value your node must put into a message header when it initiates a flooded or routed message and it must be an integer > 0 and < 256.
• msg_lifetime=number - This is referred to as the Message Life Time of your node. number is the number of seconds your node must use to timeout an object in the message cache and it must be an integer > 0 and ≤ 60. Since we are assuming that all nodes in 353NET have the same keys and values in the “[params]” section of its configuration files, this timeout value can be used by 353NET applications.
• neighbor_retry_interval=number – This is referred to as the Neighbor Retry Interval of your node. number is the number of seconds your node must wait before it tries to connect to a neighbor that was previously down and it must be an integer > 0 and ≤ 60.
The “[topology]” section is required and it contains key=value lines where a key is a NodeID and value is a list of comma-separated NodeIDs to mean that the node named in key should be neighbors with all the nodes listed in the corresponding value.
The “[map]” section is optional and it’s empty (since everything in it are comments). This section a graphic representation of the network topology when all the nodes in the 353NET are up and running and connected to all its neighbors.
The “[logging]” section is required and it must contain the following key=value lines (can come in any order and additional keys can be ignored):
• SAYHELLO=number – If number is 1, you must log all SAYHELLO messages your node sends and receives into the log file. If number is 0, you must print the information you would log to cout instead.
• LSUPDATE=number – If number is 1, you must log all LSUPDATE messages your node sends, receives, and forward into the log file. If number is 0, you must print the information you would log to cout instead.
• UCASTAPP=number – If number is 1, you must log all UCASTAPP messages your node sends, receives, and forwar into the log file. If number is 0, you must print the information you would log to cout instead.
Input and Output Examples
Please download pa4data.tar.gz and put it in the same directory of your pa4 executable file. Then cd (change directory) into that directory and type:
gunzip pa4data.tar.gz
tar xvf pa4data.tar
This should create a subdirectory called “pa4data” with a bunch of files in it.
Below is an example similar to part (A) of the grading guidelines (with “netgraph” commands added).
Stay in the same directory as your pa4 executable file and type:
./pa4 pa4data/pa4AB-12000.ini
You should see the following prompt:
localhost:12000>
Start another Terminal in the same directory and type:
./pa4 pa4data/pa4AB-12002.ini
You should see the following prompt in the 2nd window:
localhost:12002>
These two nodes should exchange SAYHELLO messages with each other soon after the 2nd node got started. In the first window, you should see the following two log messages (more than two is fine and order is not important):
[TIMESTAMP] i SAYHELLO localhost:12002 1 – 0
[TIMESTAMP] r SAYHELLO localhost:12002 1 – 0
Of course, TIMESTAMP depends on the current time.
In the 2nd window, you should see the following two log messages (more than two is fine and order is not important):
[TIMESTAMP] r SAYHELLO localhost:12000 1 – 0
[TIMESTAMP] i SAYHELLO localhost:12000 1 – 0
In the 1st window, type “neighbors” and should see the following:
Active neighbors of localhost:12000:
localhost:12002
In the 2nd window, type “neighbors” and should see the following:
Active neighbors of localhost:12002:
localhost:12000
In the 1st window, type “netgraph” and should see the following (order is not important):
localhost:12000: localhost:12002
localhost:12002: localhost:12000
In the 2nd window, type “netgraph” and should see the following (order is not important):
localhost:12002: localhost:12000
localhost:12000: localhost:12002
In the 1st window, type “linkdown localhost:12002” and should see the following:
Link to localhost:12002 is now down
Disconnected from localhost:12002
In the 2nd window, you should see the following every 4 seconds:
[TIMESTAMP] i SAYHELLO localhost:12000 1 – 0
Wait long enough to see at least 3 of the above messages. Check timestamps to see that they should be a little over 4 seconds apart.
In the 1st window, type “listdownlinks” and should see the following:
Links to the following neighbors are down:
localhost:12002
In the 1st window, type “linkup localhost:12002” and should see the following:
Link to localhost:12002 is now up
Wait for SAYHELLO messages to be exchanged in both windows, then type “neighbors” and “netgraph” in both windows to see that they are connected and they have the correct information.
In the 1st window, type “listdownlinks” and should see the following:
No links are down
In the 1st window, type “quit” and node localhost:12000 should self-terminate.
In the 2nd window, type “neighbors” and should see the following:
localhost:12002 has no active neighbors
In the 2nd window, type “quit” and node localhost:12002 should self-terminate.
Below are the printouts in two windows for your reference. Please note that they look a little mixed up because they are log messages can interleave console input and output.
script.12000.txt
script.12002.txt
Implementation Issues
There are some major difference between this assignment and previous networking assignments.
Full Duplex Connection
A TCP connection is a full duplex connection, i.e., you can send a message and receive a message simultaneously. In previous networking assignments, we have implemented the client/server model where the client sends a request (while the server waits for a request then reads from the socket), then the server processes the request (while the client waits for a response), then the server sends a response (while the client reads from the socket). If you look at the connection between the server and the client, either data is going from the server to the client or from the client to the server. It’s never the case that both are sending data to each other. Why is that? Well, it’s the nature of the protocol we are using for the client/server model. Because of that, on the server side, you just need one thread and this thread is either reading from the socket, processing data, or writing to the socket.
For this assignment, a node is part client, part server, and part router. There can be simultaneous activities going over a connection. How do you handle such a connection using one thread? Well, it can be done, but it can get messy. The easier solution is to use two threads per connection. One thread stays in an infinite loop, reading from the socket, processing the data, then go back and loop again. Another thread stays in an infinite loop, waiting for things to send. When it has things to send, it writes to the socket, then go back and loop again.
Event-driven Design
When a node receives a message to be flooded, what’s a good way to implement this (and why does this require a solution)? When a node receives a message from a neighbor, it’s one of the connection thread that receives this message. After this thread process the message, it knows that it needs to send a copy of this message to each of its other neighbors. This requires interacting with other threads. So, if you have N heighbors, each thread that’s reading from a socket would need to “communicate” with N-1 threads that writes to sockets. That sounds like a big mess!
The typical solution for this is to use an event-drive design, which is a “centralized” approach. Instead of having each socket-reading thread “communicate” with many other threads, each of these threads will only communicate with one thread, and this thread is the “event thread” that handles the event loop.
The event thread stays in an infinite loop, waiting for event to be added to the event queue. When an event gets added to the event queue, it will dequeue it from the event queue and dispatch the event by processing the event. Then it goes back to the loop. What’s an “event” anyway? Well, that’s up to you. If you go with this programming model, you need to design all the event data structures. An event is a self-describing object that contains all the necessary information that’s necessary for the event thread to process the event. When a connection-reading thread has finished processing an incoming message, it will create an appropriate event and add it to the event queue and trusts that the event thread will do the right things when it processes the event. The event thread would be the one that “communicates” with all the socket-writing thread by creating message based on an event and add the message to a queue that will be read by a socket-writing thread.
Timeouts
In networking programming, there are often lots of timeouts that needs to be handled. For this assignment, when you cache a flooded message, you need to timeout that message. That’s a lot of timers if you have to use a timer for each cached message! Is there a less painful way to deal with this?
An important observation is that, unlike throttling in PA3, it is not critical that you timeout at exactly the expiration time! If a object is supposed to be timed out and deleted 5 seconds from now, what damange would it do if you actually delete the object 5.999 seconds later instead? If the answer is that nothing bad would really happen, then you can implement it in a cleaner way! Since all timeout intervals in the assignment are specified as integers, you can use a timer that wakes up every 0.25 second. If you want to timeout in 5 seconds, you just have to start a counter and initialize it to 20. Every time the 0.25 second timer goes off, you decrement the counter by one. When the counter reaches 0, you create an appropriate timer-expiration event and add it to the event queue and let the event thread worry about what to do.
So, what you can do is that for all the timeout events you want to handle, you create a timeout event. Inside each event, you initialize a counter properly and add the event to the timer queue. Every time the timer goes off, it will talk down the timer queue and decrement each counter. If a counter reaches zero, it will simply remove that event from the timer queue and add it to the event queue.
Duplicate Connections
If nodes X and node Y simultaneously connect with each other, you may end up with two active connections between nodes X and Y. Please follow the solution proposed in Lab 10 to avoid having two connections between a pair of neighboring nodes.
Glossary of Terms
Here are some of the important terms used in this spec:
Active Connection :
A connection over which a pair of SAYHELLO messages have been successfully exchanged.
Active Neighbor :
A neighbor with which you have an active connection.
Link State :
The “link state” of a node is the information about its active neighbors (i.e., list of NodeIDs of its active neighbors).
Every node knows its link state. When its link state changes (i.e., when it gains or loses a neighbor), it must tell the rest of the 353NET that its link state has changed by flooding a LSUPDATE message so all other nodes in the network can update their information about this node’s link state.
MessageID :
Due to the way messages are routed in the 353NET, each message needs to be uniquely identified by its MessageID. In order to guarantee uniqueness, a MessageID must be generated in a very specific way. You must use the GetObjID() function with obj_category being “msg”.
The GetObjID() function returns an object ID (which is a hexstring, i.e., a null-terminated ASCII string, of length 40) When you need to include a MessageID (also referred as MSGID) in a message or in a log file, you must use this hexstring.
Neighbors :
A pair of nodes are referred to as neighbors if they are supposed to be connected with each other when both of them are up and running. This information can be obtained by reading the CONFIGFILE. The connection between them is a persistent TCP connection.
Network Graph :
This is the topology of the 353NET. If you combine all the link state information of all the nodes in the 353NET and store them in a adjacency list data structure, you have a graph representation of the 353NET.
When a node “hears” that another node’s link state have changed, it must update the adjacency list data structure to reflect the change.
NodeID :
This is a unique identifier for a node, which is simply the concatenation of its hostname and its port number, with a colon (i.e., ‘:’) character inserted between them. The hostname must be the value of the “host” key in the “[startup]” section of the configuration file. The port number must be the value of the “port” key in the “[startup]” section of the configuration file. For example, if the hostname is “localhost” and port number is “12345”, the corresponding NodeID must be “localhost:12345”. Please note that the number of characters in a NodeID must be ≤ 39 (therefore, use a 40-byte buffer to store a NodeID is guaranteed to work).
TTL :
TTL stands for “time to live”. As a message travels through the 353NET, the TTL in a message is decrement when the message is received. After the TTL is decremented, if the TTL value is greater than zero, the message must be either “routed” towards its final destination or continue to be flooded. Please note that we are not doing “message routing” in this assignment. We are only doing message flooding in this assignment.
TTL limits the number of hops a message can travel. Therefore, it’s very important to implement it correctly or you may end up with a message that lives forever if there is a routing loop in your network.
Grading Guidelines
The grading guidelines has been made available. Please understand that the grading guidelines is part of the spec and the grader will stick to it when grading. It is possible that there are bugs in the guidelines. If you find bugs, please let the instructor know as soon as possible so they can be fixed.
The grading guidelines is the only grading procedure we will use to grade your program and the grader must grade your submission on a standard 32-bit Ubuntu 16.04 system running inside VirtualBox. No other grading procedure will be used. If your program only runs on some other platform, you will get no credit for it. Therefore, it’s imperative that you run your code through the grading guidelines on a standard 32-bit Ubuntu 16.04 system running inside VirtualBox. To make sure that you didn’t hard-code anything so that your code would only work with what’s explicitly and exactly mentioned in the grading guidelines, we will change the testing data (including file names and commandline arguments) when grading. We will try our test to keep the basic commands the same (although we may make minor changes if we discover bugs in the script or things that we forgot to test.) Finally, please understand that for the same mistake in your code, you may get points deducted over and over again (i.e., do expect “double jeopardy”, “triple jeopardy”, etc.).
To save typing all those commands in the grading guidelines, you can use tmux. Below are scripts that uses tmux so you can easily start with the configuration in each section of the grading guidelines.
tmux-pa4-A.txt
tmux-pa4-B.txt
tmux-pa4-C.txt
tmux-pa4-D.txt
Plesae note that they are provided for your convenience (i.e., to save typing) and it may not be exactly the same as the grading guidelines.
Before you run these tmux scripts, please read the comment section in the script regarding how to set it up and run it. Please see PA2 FAQ regarding tmux installation instructions.
Miscellaneous Requirements and Hints
• Please take this spec and the grading guidelines very seriously. Please understand that graders must follow the grading guidelines when grading and they are not permitted to use a different grading procedure to grade your submission (due to our fairness policy). Please understand that you won’t get credit for simply coding, i.e., we cannot give you any partial credit for your “effort”. We can also give you credit for “results”. It is also imperative that you test your code against the grading guidelines on a standard 32-bit Ubuntu 16.04 system running inside VirtualBox. Even if your code runs perfectly on some other platform, we cannot take that into consideration for grading.
• Please read the programming FAQ if you need a refresher on C/C++ file I/O and bit/byte manipulications.
• You must not look at, copy, or share any code fragments from your classmates or previous semesters to implement this assignment.
• When you perform numerical calculation, please make sure that you never use or print any value that’s “NaN” (not-a-number). If you see a “NaN” on the screen at any time, you have a bug in your code and you must get it fixed!
• It’s important that every byte of your data is read and written correctly. You will lose a lot of points if one byte of data is generated incorrectly! The grading of this assignment will be harsh and you must make your code work according to the spec and the posted grading guidelines.
• If you are programming in C, I/O functions such as fgets(), scanf(), and printf() are really meant for inputing/outputing ASCII strings. Do not use them to input/output binary data!
• Start working on this early! This assignment is tedious and it would take quite a bit of time to get it done correctly.
Submission
All assignments are to be submitted electronically (including the required “README” file). To submit your work, you must first tar all the files you want to submit into a “tarball” and gzip it to create a gzipped tarfile named “pa4.tar.gz”. Then you upload “pa4.tar.gz” to the Bistro system. The command you can use to create a gzipped tarfile is:
tar cvzf pa4.tar.gz MYLISTOFFILES
ls -l pa4.tar.gz
Where MYLISTOFFILES is a list of file names that you are submitting (you can also use wildcard characters if you are sure that it will pick up only the right files). DO NOT submit your compiled code, just your source code and README file. Two point will be deducted for each binary file included in your submission (e.g., pa4, .o, .gch, core, etc.). The last command shows you how big the created “pa4.tar.gz” file is. If “pa4.tar.gz” is larger than 1MB in size, the submission server will not accept it.
Please note that the 2nd commandline argument of the tar command above is the output filename of the tar command. So, if you omit pa4.tar.gz above, you may accidentally replace one of your files with the output of the tar command and there is no way to recover the lost file (unless you have made a backup copy). So, please make sure that the first commandline argument is cvzf and the 2nd commandline argument is pa4.tar.gz.
A pa4-README.txt template file is provided here. You must save it as your “pa4-README.txt” file and follow the instructions in it to fill it out with appropriate information and include it in your submission. Please make sure that you satisfy all the README requirements.
Here is an example commands for creating your pa4.tar.gz file:
tar cvzf pa4.tar.gz Makefile *.c* *.h pa4-README.txt
If you use an IDE, the IDE may put your source code in subdirectories. In that case, you need to modify the commands above so that you include ALL the necessary source files and subdirectories (and don’t include any binary files) and make sure you have not forgotten to include a necessary file in order for the grader to compile your code on a “clean” system using just your Makefile (please remember that the grader is not permitted to use an IDE to compile or run your code).
You should read the output of the above commands carefully to make sure that “pa4.tar.gz” is created properly. If you don’t understand the output of the above commands, you need to learn how to read it! It’s your responsibility to ensure that “pa4.tar.gz” is created properly.
To check the body of “pa4.tar.gz”, you can use the following command:
tar tvf pa4.tar.gz
Please read the output of the above command carefully to see what files were included in “pa4.tar.gz” and what are their file sizes and make sure that they make sense.
Please enter your USC e-mail address and your submission PIN below. Then click on the Browse button and locate and select your submission file (i.e., “pa4.tar.gz”). Then click on the Upload button to submit your “pa4.tar.gz”. (Be careful what you click! Do NOT submit the wrong file!) If you see an error message, please read the dialogbox carefully and fix what needs to be fixed and repeat the procedure. If you don’t know your submission PIN, please visit this web site to have your PIN e-mailed to your USC e-mail address.
USC E-mail:
@usc.edu
Submission PIN:
Event ID (read-only):
merlot.usc.edu_80_1557931083_34
Submission File Full Path:
If the command is executed successfully and if everything checks out, a ticket will be issued to you to let you know “what” and “when” your submission made it to the Bistro server. The next web page you see would display such a ticket and the ticket should look like the sample shown in the submission web page (of course, the actual text would be different, but the format should be similar). Also, an e-mail (showing the ticket) will be sent to your USC e-mail address. Please read the ticket carefully to know exactly “what” and “when” your submission made it to the Bistro server. If there are problems, please contact the instructor.
It is extreme important that you also verify your submission after you have submitted “pa4.tar.gz” electronically to make sure that every you have submitted is everything you wanted us to grade. If you don’t verify your submission and you ended up submit the wrong files, please understand that due to our fairness policy, there’s absolutely nothing we can do.
Finally, please be familiar with the Electronic Submission Guidelines and information on the bsubmit web page.
[Last updated Fri Apr 24 2020] [Please see copyright regarding copying.]
[ Home | Description | Lectures | Labs | Programming | Piazza ]