README Logistics
Deadlines Part 1: Tuesday, May 1st 11:59 PM Part 2: Wednesday, May 9th 11:59 PM (Last day of classes)
WARNING: Even though Part 1 is due Tuesday, May 1, you will likely not finish Part 2 unless you start before before May 1st.
Project Overview
Treedisk may be unlike any filesystem you have learned about in class, so we have provided slides on the course webpage (found in the Schedule section, but a quick link to the slides is this: http://www.cs.cornell.edu/courses/cs4410/2018sp/schedule/slides/block-stores.pdf). We HIGHLY RECOMMEND that you look at these slides ASAP. These slides go over the details of treedisk, layered filesystems, and A4.
This directory contains an implementation of a layered block store. A block store is much like a file, except that the unit of access is a block, not a byte. Each block store has the following simple interface:
#include "block_store.h" Defines BLOCK_SIZE, the size of a block. Also has typedefs for:
block_store_t: a block store interface block_t: a block of size BLOCK_SIZE block_no: an offset into a block store
int nblocks = (*block_store->nblocks)(block_store_t *block_store); Returns the size of the block store in #blocks, or -1 if error.
int (*block_store->read)(block_store, block_no offset, OUT block_t *block); Reads the block at the given offset. Return 0 upon success, -1 upon error.
int (*block_store->write)(block_store, block_no offset, IN block_t *block); Writes the block at the given offset. Some block stores support automatically growing. Returns -1 upon error.
int (*block_store->setsize)(block_store, block_no size); Set the size of the block store to 'size' blocks. May either truncate or grow the underlying block store. Not all sizes may be supported. Returns the old size, or -1 upon error.
void (*block_store->destroy)(block_store);
Clean up the block store. In case of a low-level storage (an actual disk), will typically leave the blocks intact so it can be reloaded again.
To create a block store, you need the block stores init function. The simplest two block stores are the following:
block_store_t *disk_init(char *file_name, block_no nblocks); Implements a block store with 'nblocks' blocks in the POSIX file 'file_name'. The file simply stores the list of blocks.
block_store_t *ramdisk_init(block_t *blocks, block_no nblocks); Implements a block store with 'nblocks' blocks in the provided memory, pointed to by 'blocks'.
For example, if you want caching, you can invoke (once you implement cachedisk):
block_store_t *cachedisk_init(block_store_t *below, block_t *blocks, block_no nblocks);
This adds a caching layer to ‘below’ with ‘nblocks’ of cache, pointed to by ‘blocks’, but they use different algorithms. So if you run:
block_store_t *lower = disk_init("file", 100); block_t cache[10]; block_store_t *higher = cachedisk_init(lower, cache, 10);
then ‘higher’ is a block store that stores its blocks in “file” and caches its blocks in the given (write-through) cache. One can still read and write ‘lower’, by-passing the cache (but particularly writing would be dangerous—the cache would not reflect the latest content). If you like, you can dump caching statistics:
void cachedisk_dump_stats(block_store_t *this_bs);
There’s a disk layer that does nothing but count operations:
block_store_t *higher = statdisk_init(lower);
You can dump the stats using:
void statdisk_dump_stats(block_store_t *this_bs);
A handy debugging tool is:
block_store_t *debugdisk_init(block_store_t *below, char *descr); This prints any invocation to and result from nblocks(), read(), and write() along with the string 'descr'.
Another useful layer is the “check disk” that checks if what’s read is what has been written:
block_store_t *checkdisk_init(block_store_t *below, char *descr);
One can also virtualize the underlying block store, creating multiple virtual block stores on a single underlying block store. Currently, there is one such module available:
void treedisk_create(block_store_t *below, unsigned int n_inodes) Create a file system on the block store 'below'. The number of inodes (virtual block stores) is n_inodes. Each such virtual block store is initially empty (0 blocks), but grows dynamically as blocks are
written.
int treedisk_check(block_store_t *below) Checks the integrity of a tree virtual block store. Returns 0 if the block store is broken, and 1 if it's in good shape. Useful for testing.
block_store_t *treedisk_init(block_store_t *below, unsigned int inode_no) Return a block store interface to the virtual block store identified by the inode number.
For example:
block_t cache[10]; block_store_t *disk = disk_init("filesystem", 100); block_store_t *cdisk = cachedisk_init(disk, cache, 10); treedisk_create(cdisk, 2); block_store_t *file0 = treedisk_init(cdisk, 0); block_store_t *file1 = treedisk_init(cdisk, 1);
This creates two virtual block stores file0 (associated with inode 0) and file1 (associated with inode 1) on top of a single cached block store stored in “filesystem”.
A “trace disk” is a top-level block store (does not support layers on top of it) that generates a load on the underlying layers. It is supposed to run over a treedisk-virtualized block store. The load is read from a file that is given as the first argument. The file contains entries of the following form:
command:inode-number:block-number
There are 4 supported commands:
R: read a block of the given inode W: write a block of the given inode S: set the size of the given inode N: check the size of the given inode
To use a tracedisk, run
block_store_t *tracedisk_init(block_store_t *below, char *trace, unsigned int n_inodes);
'trace' specifies the name of the trace file 'n_inodes' specifies the number of inodes in the underlying file
A tracedisk automatically adds treedisk layers as needed for each inode, and also adds a checkdisk layer for each to check that the content read is the same as the last content written.
TODO
Your project to-do list consists of the following:
1) read README to learn about the available layers. stay up to date with the a4 FAQ on piazza.
2) run “make” in order to compile the sources. This should generate an executable called “trace” that can be used for testing your software. The syntax of trace is as follows:
./trace [trace-file [cache-size]]
The default trace-file is “trace.txt”, and we have included an example. The optional cache-size lets you set the size of the cache.
3) run “./trace”. The output will likely look like this:
blocksize: 512 refs/block: 128 !!TDERR: setsize not yet supported !!ERROR: tracedisk_run: setsize(1, 0) failed !!CHKSIZE 10: nblocks 1: 0 != 2 !$STAT: #nnblocks: 0 !$STAT: #nsetsize: 0 !$STAT: #nread: 32 !$STAT: #nwrite: 20
4) look in treedisk.c and look for the function treedisk_setsize(). It is currently incomplete and causes the error that you saw in the previous step. Implement the missing code. You only need to support setting the size to 0. If you run ./trace again, the output should look more like:
blocksize: 512 refs/block: 128 !$STAT: #nnblocks: 0 !$STAT: #nsetsize: 0 !$STAT: #nread: 36 !$STAT: #nwrite: 24
5) look in cachedisk.c. It is supposed to cache blocks, but currently it simply forwards read and write operations to the layer below. You are to implement a caching algorithm. You can use any algorithm that appeared in the literature (LRU, LFU, MRU, …) or design your own. Note that it is supposed to be a “write-through” cache: the write operation cannot buffer writes until a later time.
6) we are going to run a contest! You are to provide your own version of the file “trace.txt”. We will run everybody’s trace against everybody’s cache layer. In particular, we will run ./trace with your treedisk and caching layer (configured with 16 blocks) against everybody’s trace, and count up the total number of read operations that your caching layer performs. We will rank contestants by this total.
Note that you will get disqualified if your treedisk layer or caching layer does not work. You will also get disqualified if you cache any user data in a place other than the memory that is passed to the caching layer, or if you change the layout of the treedisk file system. The outcome of this contest does not count toward your Project grade—however, there will be prizes.
Your trace.txt file has the following limitations:
it can have at most 10,000 lines/commands
inode numbers must range between 0 and 127 inclusive
block numbers must range between 0 and (1<<27)-1 inclusive To win, you can use the following strategies or a combination:
- defensive approach: modify treedisk and/or cachedisk so as to minimize the number of read operations below.
- offensive approach: create a trace.txt that makes it very hard for the underlying layers (except yours of course) to cache anything.
Things to submit
Submit the following files on CMS, but be sure you use your git repository wisely so as to protect your self against any unexpected life or scholastic events.
Part 1
1. trace.txt — A valid trace file. Just overwrite the original trace.txt 2. alias.txt — the name by which you shall be known for the contest
Part 2
1. treedisk.c 2. cachedisk.c 3. trace.txt
4. alias.txt