Computer Science 3357 Fall 2015 – Assignment 1

Computer Science 3357
Fall 2015 – Assignment 1, Part 1

Assignment 1, Part 1 – Client (35%)

Overview

In the first part of this assignment, you’ll lay down the foundation for what will become the Hooli Drive client — that is, the program that will run on a user’s computer to sync all of his/her files with the Hooli Drive server.

Because we need a little more background knowledge before we get into network programming, you won’t be doing any network programming yet. Instead, you’ll be writing the non-networking code that you’ll need when it comes time to communicate over the network.

This is a good practice in any event. Socket (network) programming adds a layer of complexity to a program. Where possible, it is ideal to isolate your networking code from the rest of the program. This way, you can develop and test the two largely independently of one another.

Hence, in this assignment, you’ll write a relatively simple program that recursively scans a directory tree, computing CRC-32 checksums for each file found in the tree. Your program will then dump a list of the files and their corresponding checksums to a comma-separated file. Eventually, this information will be sent over the network to the Hooli Drive server.

Prerequisites

  • Ensure that you have set up the course VM and cloned the course starter repository. See the Lab Manual for more details.
  • Read the “Overview of Fall 2015 Assignments” document to familiarize yourself with what we’re building this semester.

Task

Notice the directory client in the root of your repository. In this directory, create a file client.c that performs the following tasks:

  • Accepts two arguments: a directory and a filename
  • Scans the specified directory recursively, computing a CRC-32 checksum for each file in the directory or any of its subdirectories.
  • Writes a comma-separated list of files and their corresponding CRC-32 checksums to the file specified on the command line.
  • The filenames must be relative to the specified directory (see the sample output below)
  • If any arguments are missing or invalid, prints an appropriate error message to stderr and exits with code EXIT_FAILURE (see stdlib.h).
  • If no errors occur, your program should print nothing and exit with code EXIT_SUCCESS.
  • Your program may ignore symbolic links — it need only consider regular files.

Sample output

Assume we have a directory tree /home/vagrant/hooli with the following files/structure:

$ tree ~/hooli

/home/vagrant/hooli
|-- code
|   |-- bin
|   |   `-- hello.o
|   `-- hello.c
|-- documents
|   `-- resume.pdf
`-- tax-returns
    |-- 2013-tax-return.pdf
    `-- 2014-tax-return.pdf

Executing the program on ~/hooli:

$ ./client ~/hooli files.txt
$ cat files.txt
code/bin/hello.o,12345678
code/hello.c,87654321
documents/resume.pdf,11111111
tax-returns/2013-tax-return.pdf,22222222
tax-returns/2014-tax-return.pdf,33333333

Observe that the files listed in the text file are relative to ~/hooli — they do not include the prefix /home/vagrant/hooli.

Note: the order of the files in the text file is not important. As long as all files and correct checksums are listed, the order is unimportant.

Important notes

  • You MUST create a file Makefile in the client directory that is capable of building your code simply by typing make.
    • See the hdb directory for an example.
  • The executable generated MUST be named client.
  • You MUST use the following compilation flags: -Wall and -Werror. -Wall turns on all warnings, and -Werror treats all warnings as errors. Repeat this mantra:WARNINGS ARE ERRORS. WARNINGS ARE ERRORS. WARNINGS ARE ERRORS.
    • Yes, your favourite open-source software probably emits a slew of warnings during compilation.
    • When you become a prolific open-source developer managing a high-profile project, you can allow all the compiler warnings you like.
    • Until then, we code to a higher standard. 🙂
  • You MAY use other compilation flags, but you MUST not disable any warnings or errors.
  • You MAY use the -std=gnu11 option to enable new C11 features and GNU extensions. It’s not 1980 anymore. Let’s join year 2015.
  • You SHOULD use the -g option to compile your program with debugging symbols. This allows you to use GDB to debug your code, and allows valgrind(more later) to print line numbers from which memory errors originated.
  • You SHOULD compile your code with -O0 during debugging to ensure that the compiler does not optimize out any lines, leading to confusion when running your code through GDB.

For example, you might invoke gcc as follows:

gcc -Wall -Werror -g -O0 -std=gnu11 -o client client.c -lz
  • You MUST comment your code (I know, I know), including header comments at the top of each file, comments at the top of each function, and inline comments. Each variable declaration should also be commented, briefly describing its purpose.
  • Your final submission MUST NOT include any .o files, any executables, test data, or other binary files. Keep your repository clean (this is worth marks). See the discussion of the .gitignore file in the Git lab. Use this file to your advantage.

Tips

  • To scan the directory tree, look up the functions opendir, readdir, and closedir.
  • Use the zlib library to compute CRC-32 checksums. Do not reinvent the wheel.
    • Install zlib and its header files in your VM by typing sudo apt-get install zlib1g-dev
    • See the crc32 function in the zlib manual: http://www.zlib.net/manual.html#Checksum
    • You’ll need to #include <zlib.h> in your program.
    • You’ll need to add -lz at the end of your gcc command to link with the zlib library.
  • Check your executable with valgrind, a memory-leak detector (among other things):
    • Install it: sudo apt-get install valgrind
    • Test your code: valgrind --leak-check=full --show-leak-kinds=all ./client
    • Be sure to fix any memory leaks found, as they will lose you marks.
    • #1 cause of memory leaks: forgetting to free memory that was malloc-ed.
    • valgrind will tell you exactly which line number allocated the memory that was not freed.
  • Modularize your code, where possible
    • client.c is where the main function should be found, but should not necessarily be the only file. Use additional .c and .h files, when appropriate.
    • Putting all your code in main or in 1 or 2 functions will earn you a low grade.
    • Your functions should be short and tight. Large functions performing all sorts of tasks will lose you marks.
    • In general, each function should perform a single task. Refactor, refactor, refactor. Refactor until your have a collection of short, cohesive functions.
    • Finding that your functions aren’t cohesive (i.e. they don’t really relate to each other)? Ask yourself if some of them should be grouped in a different file.
  • See the support directory:
    • There is a file crc32.rb in this directory that you can use to compute a CRC-32 checksum on a file, to verify that your program is correctly calculating them
      • You may NOT simply system out and run this program from your code — it is to help you validate the correctness of your output
      • Run the Ruby program as follows ruby crc32.rb /path/to/some/file
Computer Science 3357
Fall 2015 – Assignment 1, Part 2

Assignment 1, Part 2 – Server (65%)

Overview

In the second part of this assignment, you’ll lay down the foundation for what will become the Hooli Drive server — the program to which Hooli Drive clients will upload files.

As noted in the “Overview of Fall 2015 Assignments”, the server will need store information about each file uploaded in some sort of data store. At a minimum, for each file, we need to store:

  • The username of the user that uploaded the file
  • The filename
  • The CRC-32 checksum of the file

In this part of the assignment, you’ll write a static library called libhdb (hdb = Hooli Database). This library will contain a collection of functions that your server program will use in Assignment 2 to store metadata received from the client.

This course has no prerequisite courses that teach SQL databases. Hence, we’ll forego the use of an SQL database, and instead use the super-cool new kid on the block, Redis. Redis bills itself as a data structure store, allowing you to store data in a variety of data structures, including lists, sets, hashes, bitmaps — even a crazy structure called a hyperloglog.

Because Redis is a server accessed over a network, our library will be acting as a client of the Redis server. Hence, when we incorporate our library in our Hooli Drive server, our server will be both a server (to Hooli Drive clients) as well as a client (of Redis).

Rather than doing low-level socket programming to access the Redis server, we’ll instead communicate with it using a library libhiredis, which provides a collection of functions for interacting with a Redis server.

Prerequisites

  • Install Redis server:
vagrant@cs3357:~$ sudo apt-get install redis-server

Install libhiredis and its header files. This is the library you’ll use to interact with the Redis server:

vagrant@cs3357:~$ sudo apt-get install libhiredis-dev

If you haven’t already, install valgrind for memory leak detection (discussed later):

vagrant@cs3357:~$ sudo apt-get install valgrind

Finally, install the check library, the unit test library being used in the provided code:

vagrant@cs3357:~$ sudo apt-get install check

Task

  • See the hdb directory in your repository. Notice that there are two files: hdb.c and hdb.h.
  • Your task is to implement the functions in hdb.c. Read over the comments in both files.
  • Confused? Don’t freak out — we will discuss this library in more detail in class on September 23.
  • First, read over the comments. Then, follow the steps in the Strategy section below.
  • Feel free to ask questions — don’t be shy.

The basic idea is this:

  • We have a number of users using the Hooli Drive service.
  • For each user, we have a set of files that he/she has uploaded.
  • For each file, we have a checksum associated with the file’s contents.

We therefore want to write a set of functions that allows us to store users, their files, and their associated checksums in the database, as well as retrieve, update, and delete that information.

Why are we implementing a library? There are a few reasons:

  1. One can envision that there might be multiple programs on the server-side of a service like Hooli Drive that might need to interact with the database. Implementing a library allows these programs to simply link with the library and use its functions, rather than re-inventing code to communicate with the database.
  2. For practice, to illustrate how we can package code in a library.
  3. For practical reasons, since it’s easy for the course staff to link your library into our test code, quickly run some tests on it, and get it marked quickly for you.

Strategy

The library code might seem intimidating at first, but it’s really not very difficult. My library turned out to be about 150 lines of code. You were writing 150-line programs when you were in diapers, so this should be a breeze!

  • Figure out Redis. It’s super easy. Check out the list of commands at http://redis.io/commands. Click on HSET. Notice that a description is given, followed by a return value, followed by examples. Also, notice the Related commands list on the right side. You’ll probably find this useful.
  • Click into the Examples field. Note that you can interact with a test Redis server directly from the Web page. Try out the command a few times.
  • Look up the following commands and play with them in the same way until you understand them:
    • HEXISTS
    • HGET
    • HSET
  • Play with HSET. Hint: we want to associate a checksum with a file, and a file with a user.
  • In your terminal, start up redis-cli. This is one way to interact with the Redis server installed in your VM. Play around with the commands some more until you’re comfortable with them.
  • Look at the list of functions you have to implement for libhdb. Try to figure out if any of the commands you’ve seen already might be useful for some of these functions.
  • Look at the commands list at http://redis.io/commands. Try to figure out the rest of the commands you might need for the remaining libdhb functions.
  • Implement your library code. See https://github.com/redis/hiredis/blob/master/examples/example.c for an example of how to interact with the Redis server from your code.
  • Be sure to create and use helper functions (this will be worth marks). For example, you might create a set of helper functions to execute Redis queries. One function might execute a query that returns a string, another that returns an integer, another that returns nothing, etc. Using helper functions, the code for my library was about 150 lines.
  • As you code up functions, write a test program to test them (don’t hand in your test code). In another window, start up redis-cli and check the keys stored in your Redis server as you run your test code and work with the server. This way, you’ll be able to examine what your program is storing. Useful commands might include:
    • KEYS
    • EXISTS
    • HEXISTS
    • HGET
  • I have provided a series of unit tests that you can run against your library, once you are finished coding it. Please do not get used to this — this will not be done for every assignment. If you want tests in future assignments, you should write them yourself! 🙂 Testing is an extremely important skill to learn.

    This should not be viewed as a comprehensive set of unit tests, and we may run additional tests on your library. However, the tests should give you an idea of whether or not your functions are working properly.

    To run the unit tests, run make test.

  • If you have failing tests, you’ll want to debug them with GDB. The unit test library in use runs each test in a separate process, which makes debugging with GDB difficult.

    Fortunately, you can tell it not to do so by running GDB as follows:

    CK_FORK=no gdb ./testlibhdb
    

    Suppose the unit test suite reported that the test test_correct_checksum_returned_for_file was failing. You would therefore launch GDB and set a breakpoint in this function as follows:

    vagrant@cs3357:~$ CK_FORK=no gdb ./testlibhdb
    ...
    ...
    (gdb) b test_correct_checksum_returned_for_file
    Breakpoint 1 at 0x401bc3: file testlibhdb.c, line 100.
    (gdb) r
    

    You could then use commands like n(ext), s(tep), p(rint), etc.

Memory Leaks

You MUST ensure that your program does not have any memory leaks. These will lose you a good chunk of marks.

How do memory leaks happen? From memory being allocated and not freed. Make sure you free any memory that you malloc. Make sure you free any Redis replies you obtain, using the appropriate function from libhiredis.

To check your program for memory leaks, you can use the tool valgrind. Normally, to test a program for memory leaks, we would run the following:

vagrant@cs3357:~$ valgrind --leak-check=full --show-leak-kinds=all ./programToCheck

Once again, however, since the unit test suite forks to run each test, you’ll need to use CK_FORK=no to tell it not to do this:

vagrant@cs3357:~$ make test  # builds ./testlibhdb
vagrant@cs3357:~$ CK_FORK=no valgrind --leak-check=full --show-leak-kinds=all ./testlibhdb

If valgrind reports any memory not freed, or any memory still reachable, you’ve got problems. It will tell you where in the source code the problems lie, so just run it, check out the problematic functions, and fix up the errors.