Updates
Feb 4: Added detail about the max depth. Specifying that the depth argument is “non-inclusive”: if the max-depth is N, then it prints the tree to depth N, not including the nodes at depth N.
Introduction
The goal of this assignment is to practice system calls and dynamic memory allocation. You’ll be writing a program that explores the /proc virtual file system to generate (and display) a tree of the processes (the executing programs) on the system. To do so, you will need to dynamically allocate and manipulate nodes in a tree structure. The end result will be output that describes the children of a given process, much like the pstree command does:
$ pstree -p 1123 sshd(1123)---sshd(25914)---sshd(25987)---bash(25988)---pstree(1243) |-sshd(27395)---sshd(27468)---bash(27469) |-sshd(29462)---sshd(29498)---bash(29531) |-sshd(29499)---sshd(29529)---sftp-server(29530) |-sshd(29874)---sshd(29912)---bash(29913) |-sshd(30932)---sshd(30962)---sftp-server(30963) | |-sftp-server(30964) | |-sftp-server(30965) |-sshd(31511)---sshd(31547)---bash(31548)
Please remember that we are using testing scripts to evaluate your work. As a result, it’s very important that (a) all of your files are named correctly and located in the specified locations in your repository and (b) your output matches the expected output precisely.
The assignment is initially due at 10:00 p.m. on Wednesday, February 14. We will test your code on a linux lab machine immediately after the deadline and will send you the results of our tests. Second submissions are due at 10:00 p.m. on Friday, February 16, immediately before reading week.
Getting Started
Pull your git repository. There should be a new folder named a2. All of your code for this assignment should be located in this folder and includes a Makefile; print_ptree.c, which contains the main function and will need some additional argument handling code; test_print.c, which contains a main function for testing a single function; ptree.h, which contains function prototypes and a struct for a tree node; and ptree.c, which contains the signatures for the two functions to be written. There is also a directory for testing with simulated data.
Like Assignment 1, I recommend that you start by doing a walkthrough of the Background section and then, as you explore, create tests that you can use to verify your program. You may want to create a few proc-like directory trees that you can create for testing. Then, you can write a bash script that executes your code on the tests so that you can check the output.
Testing
Your Makefile contains an option to help you with testing. Open it with a text editor. You should see two lines that define FLAGS, one of which is commented out:
FLAGS = -Wall -Werror -std=c99 # FLAGS = -Wall -Werror -g -std=c99 -DTEST
If you wish to run tests using your tests/ directory, rather than the actual /proc directory for the system, comment out the first line and uncomment the second. Then, when you compile your code, your program will look for process information in the tests/ directory rather than in /proc. You can add additional examples to your tests/ directory. We have included the processes necessary to reproduce the example at the top of this handout.
Important Note: The tests in the tests/ directory are likely to have small differences from the entries in /proc. For example, the exe file in our tests is a regular file, rather than a link, and none of the cmdline files include anything other than just the executable name. You should also test on a linux machine with /proc, but it’s useful to have a clean, replicable example to test on (and if you are on a non-linux machin, you don’t have a proc/ at all) which is why tests/ will be useful.
While creating new tests — or while looking for real processes in /proc to test with — you’ll find the pscommand useful. In particular, if you type “ps -u your_username“, you will get a listing of the processes running on the system that are owned by you. You can open up a second terminal and run commands or open a web browser to get additional processes to test.
Background
A process is an executing program. Every time that you run a program on the command line, you are creating a process. The processes on a Unix system are naturally organized in a tree structure, since each process has a parent. When you execute a progam in a shell (on the command line), the shell process is the parent. The shell process also has a parent (often the login that created the shell). Each process knows the identify of its parent as well as some information about itself.
The operating system must track processes to make sure that they are getting the resources they need. For many flavors of linux, the information about processes is stored in the directory /proc. (Mac users: note that OS X does not have a /proc directory since it’s based on bsd, which uses a different interface.) /proc is very special in that it is a virtual filesystem. It doesn’t contain “real” files; rather, it contains runtime system information that can be accessed as if it were stored in files. Many system utilities are simply programs that access data from the “files” in this directory. Your task is also to access the data in /proc to get information about the processes running on the system.
/proc Structure
Every directory in /proc that we care about is a number. (There are quite a few non-numeric directories. Those refer to mounted devices, memory information, and other system data. We’ll ignore them for this assignment.) That number is a process ID (PID) — a unique identifier for a process on the system. Every process has a PID, and every process running on the system has an entry in the /proc directory. In this document, we’ll refer to the entry for a single PID as the directory /proc/PID.
You will need to access three files inside of /proc/PID: exe, cmdline, and children. /proc/PID/exe is the file to check to make sure the entry is for a running program. If it exists, then PID is a valid running program. /proc/PID/cmdline contains information about how the program was called. In particular, you’ll need it to get the name of the executable. Finally, /proc/PID/task/PID/children contains the PIDs of child processes. The PIDs are space (or NULL) separated.
In addition, you may find the file /proc/PID/status to be useful for verification purposes. It contains the parent process’s ID (PPID).
Take a few minutes to log onto a linux machine and browse through its /proc. Use cat and grep to view and search through these files to make sure you know how they are formatted, so you can begin to think about extracting information from them. You’ll likely find that you don’t have access to the contents of one file; you can check if that link exists but cannot open the file it refers to. (Why?) You’ll find that the other files are sometimes empty. (What does that suggest?)
Useful C Functions
To access the data in /proc, you’ll need to use a number of system calls that access the filesystem. A filesystem is an operating system module that encapsulates the data structures used to track files on a disk and the functions used to manipulate those data structures. For this course, we don’t need to know much about those structures to use files (but just wait until CSC369!). Instead, we simply need to understand the system calls that the operating system exposes for users to access the file system and the macros that C provides to make using those system calls easier.
Here is a set of system and library calls that we think you’ll need for this assignment. You should look them up, so you know what they provide for you. From the filesystem, we expect you will need fopen, fread, and fclose to access and read data from files. You will need lstat to determine that a file exists without opening it. You will also need the library call malloc to dynamically create nodes for the tree structure you will be creating to track process information. You may find the call sprintf helpful for creating filenames and various functions from the string library, like strchr and strtok, useful for parsing the contents of files.
You may not use the system function or functions like it that allow you to execute commands on the shell. The intent of the assignment is to have you working with system calls — not to use shell programs that will invoke those system calls for you.
Requirements
You must write two functions, generate_ptree and print_ptree, that are located in ptree.c. You must also write code in print_ptree.c to handle the expected command line arguments and to emit appropriate errors.
print_ptree.c
Your program should emit an appropriate error message (which we are not specifying) to stderr and return 1if an error is detected in the command line arguments used. The arguments may be incorrect if an insufficient or too many arguments are provided or if an option is provided that is not supported. (The only option we will support is “-d”.) Your program should return 0 if it is able to run without error. If there is some sort of error that is not a command line error, return 2 (for example, if a library call or generate_ptree were to fail).
The expected command line usage is already provided in the starter code. The program takes one optional flag, “-d”, which requires an argument. This sets the maximum depth of the process ID tree to be displayed. If the depth is not provided, it is assumed to be 0 (which means “print the entire tree”). The program requires one argument, a process ID, which must be an integer.
While it’s not required, it’s strongly recommended that you use getopt to parse the command line.
print_ptree
You should complete the print_ptree function before the other required function, since you can test it by manually creating small trees for it to print. (See the starter code file test_print.c for an example of how to test the print function separately.) Then, you can test the other function, generate_ptree, by printing the trees it generates.
This function should be recursive. It accepts the root of a tree, as described in the section about generate_ptreeand a depth argument. The function prints (to stdout) the tree to the desired depth, not including nodes at the specified depth. (i.e., Nodes from depth 0 to the specified depth are printed, not including the latter endpoint.) If the depth argument is 0, then the entire tree should be printed.
The function will print one process per line. The root of the tree will be printed flush to the left. Each child process will be indented two spaces to the right with respect to its parent. Child processes should be displayed in the order in which they occur in the list of children. Each line will contain the process’s ID (its PID), a colon and a space, and the name of the executable. If the executable is not named, then the line will just contain the PID (no colon, no space, no executable name).
generate_ptree
This function may need a recursive helper function. It takes a PID as an argument and creates a tree that describes that process and all of its children. The root of the tree is “returned” by assigning it to the parameter root. The entire tree, including the name strings in each tree node, should be dynamically allocated using malloc, and it must be built as described below.
Each node in tree is a struct TreeNode. It contains the PID of the process it is describing, as well as the name of the executable associated with that PID. If there is no executable name (from /proc/pid/cmdline), then the name pointer should be NULL. The node also contains pointers to build a linked list. This linked list is used to store the children of this node. Why do we use a list for the child nodes?
In CSC148, you probably built binary trees: trees with two child nodes. In those cases, it makes sense to create links to just those two nodes (often named “left” and “right”). However, in this case, each node in your tree can have an arbitrary number of children: it’s not binary. Therefore, we have to use a different method for storing children: a list. If the child_procs pointer has value NULL, then the node has no children. If child_procs is non-null, then the first element of the list is the first (left-most, if you prefer) child. The node it points to, using the next_sibling pointer, is the second child. If the next_sibling pointer is NULL, then the node is the last child of its parent. The children should appear in the list in the same order as they were observed in the children file.
In the example we’ve been using throughout this handout, the node associated with PID 1123 is the root of our tree. Its next_sibling pointer is NULL. Its child_procs pointer refers to the node for PID 25914. The node for PID 25914 has a child_procs pointer that refers to the node for PID 25987. PID 25914’s node’s next_siblingpointer refers to the node for PID 27395. This handout provides a second example relating a process structure to a tree.
generate_ptree should return 0 if the tree is built successfully and 1 otherwise. In particular, it should return 1 if any of the system or library calls fail. Your function will also return 1 if /proc/PID doesn’t exist for the specified PID or if /proc/PID/exe doesn’t exist, indicating that the process was not executing. In both of these cases, no node should be created for the PID. If your function returns a 1, then the root parameter should still contain as much of the tree as could be built. The tree should be as valid as possible: it should contain no dangling or uninitialized pointers and should contain a node for every PID that could be successfully processed.
Note: You will need to create paths to various files in order to view them. The starter code contains an example that uses sprintf to generate paths. The example also relies on a constant, MAX_PATH_LENGTH, which contains an upper bound on the length of paths we will test with. In general, defining such an upper bound is tricky, but we guarantee that our tests will not exceed this bound, so you can safely use it to define arrays for storing paths.
Submission
Please commit to your repository often, so that you don’t lose your work. We will mark the last submission to your repository that occurs before the deadline. We will look at print_ptree.c and ptree.c, so those files must be pushed for us to successfully test your code. You may want to clone your repository a second time to verify that these files have been successfully committed and pushed.
You may add test files to your repository, if you wish. When we test, we will be copying your .c and .h files to a testing folder. We will compile them using our original Makefile and then run a series of tests using the executable. We may also compile your helper functions (in ptree.c) with our own main programs to test them separately, so it’s important that they have the required signature.
Remember to also test your code on a lab machine before your final submission. Your program must compile cleanly (without any warning or error messages) on the lab machines using our original Makefile and may not crash (seg fault, bus error, abort trap, etc.) on our tests. Programs that do not compile or which crash will be assigned a 0 and will need to be resubmitted.