README

Logistics

DeadlinesPart 1: Tuesday, May 1st 11:59 PMPart 2: Wednesday, May 9th 11:59 PM (Last day of classes)

WARNING: Even though Part 1 is due Tuesday, May 1, you will likely not finish Part 2 unless you start before before May 1st.

Project Overview

Treedisk may be unlike any filesystem you have learned about in class, so we have provided slides on the course webpage (found in the Schedule section,but a quick link to the slides is this: http://www.cs.cornell.edu/courses/cs4410/2018sp/schedule/slides/block-stores.pdf).We HIGHLY RECOMMEND that you look at these slides ASAP.These slides go over the details of treedisk, layered filesystems, and A4.

This directory contains an implementation of a layered block store.A block store is much like a file, except that the unit of accessis a block, not a byte. Each block store has the following simpleinterface:

#include "block_store.h"
    Defines BLOCK_SIZE, the size of a block.  Also has typedefs
    for:
        block_store_t: a block store interface
        block_t:  a block of size BLOCK_SIZE
        block_no: an offset into a block store

int nblocks = (*block_store->nblocks)(block_store_t *block_store);
    Returns the size of the block store in #blocks, or -1 if error.

int (*block_store->read)(block_store, block_no offset, OUT block_t *block);
    Reads the block at the given offset.  Return 0 upon success,
    -1 upon error.

int (*block_store->write)(block_store, block_no offset, IN block_t *block);
    Writes the block at the given offset.  Some block stores support
    automatically growing.  Returns -1 upon error.

int (*block_store->setsize)(block_store, block_no size);
    Set the size of the block store to 'size' blocks.  May either
    truncate or grow the underlying block store.  Not all sizes
    may be supported.  Returns the old size, or -1 upon error.

void (*block_store->destroy)(block_store);
    Clean up the block store.  In case of a low-level storage (an
    actual disk), will typically leave the blocks intact so it can
    be reloaded again.

To create a block store, you need the block stores init function.The simplest two block stores are the following:

block_store_t *disk_init(char *file_name, block_no nblocks);
    Implements a block store with 'nblocks' blocks in the POSIX
    file 'file_name'.  The file simply stores the list of blocks.

block_store_t *ramdisk_init(block_t *blocks, block_no nblocks);
    Implements a block store with 'nblocks' blocks in the provided
    memory, pointed to by 'blocks'.

For example, if you want caching, you can invoke (once you implement cachedisk):

block_store_t *cachedisk_init(block_store_t *below, block_t *blocks, block_no nblocks);

This adds a caching layer to ‘below’ with ‘nblocks’ of cache, pointed to by ‘blocks’,but they use different algorithms. So if you run:

block_store_t *lower = disk_init("file", 100);
block_t cache[10];
block_store_t *higher = cachedisk_init(lower, cache, 10);

then ‘higher’ is a block store that stores its blocks in “file” andcaches its blocks in the given (write-through) cache. One can stillread and write ‘lower’, by-passing the cache (but particularly writingwould be dangerous—the cache would not reflect the latest content).If you like, you can dump caching statistics:

void cachedisk_dump_stats(block_store_t *this_bs);

There’s a disk layer that does nothing but count operations:

block_store_t *higher = statdisk_init(lower);

You can dump the stats using:

void statdisk_dump_stats(block_store_t *this_bs);

A handy debugging tool is:

block_store_t *debugdisk_init(block_store_t *below, char *descr);
    This prints any invocation to and result from nblocks(), read(),
    and write() along with the string 'descr'.

Another useful layer is the “check disk” that checks if what’s read is whathas been written:

block_store_t *checkdisk_init(block_store_t *below, char *descr);

One can also virtualize the underlying block store, creating multiplevirtual block stores on a single underlying block store. Currently, thereis one such module available:

void treedisk_create(block_store_t *below, unsigned int n_inodes)
    Create a file system on the block store 'below'.  The number of inodes 
    (virtual block stores) is n_inodes.  Each such virtual block store is 
    initially empty (0 blocks), but grows dynamically as blocks are written.

int treedisk_check(block_store_t *below)
    Checks the integrity of a tree virtual block store.  Returns
    0 if the block store is broken, and 1 if it's in good shape.
    Useful for testing.

block_store_t *treedisk_init(block_store_t *below, unsigned int inode_no)
    Return a block store interface to the virtual block store identified
    by the inode number.

For example:

block_t cache[10];
block_store_t *disk = disk_init("filesystem", 100);
block_store_t *cdisk = cachedisk_init(disk, cache, 10);
treedisk_create(cdisk, 2);
block_store_t *file0 = treedisk_init(cdisk, 0);
block_store_t *file1 = treedisk_init(cdisk, 1);

This creates two virtual block stores file0 (associated with inode 0)and file1 (associated with inode 1) on top of a single cached blockstore stored in “filesystem”.

A “trace disk” is a top-level block store (does not support layerson top of it) that generates a load on the underlying layers.It is supposed to run over a treedisk-virtualized block store.The load is read from a file that is given as the first argument.The file contains entries of the following form:

command:inode-number:block-number

There are 4 supported commands:

R: read a block of the given inode
W: write a block of the given inode
S: set the size of the given inode
N: check the size of the given inode

To use a tracedisk, run

block_store_t *tracedisk_init(block_store_t *below, char *trace, unsigned int n_inodes);
    'trace' specifies the name of the trace file
    'n_inodes' specifies the number of inodes in the underlying file

A tracedisk automatically adds treedisk layers as needed for each inode,and also adds a checkdisk layer for each to check that the content readis the same as the last content written.

TODO

Your project to-do list consists of the following:

1) read README to learn about the available layers. stay up to date with the a4 FAQ on piazza.

2) run “make” in order to compile the sources. This should generate an executable called “trace” that can be used for testing your software. The syntax of trace is as follows:

./trace [trace-file [cache-size]]

The default trace-file is “trace.txt”, and we have included an example. The optional cache-size lets you set the size of the cache.

3) run “./trace”. The output will likely look like this:

blocksize:  512
refs/block: 128
!!TDERR: setsize not yet supported
!!ERROR: tracedisk_run: setsize(1, 0) failed
!!CHKSIZE 10: nblocks 1: 0 != 2
!$STAT: #nnblocks:  0
!$STAT: #nsetsize:  0
!$STAT: #nread:     32
!$STAT: #nwrite:    20

4) look in treedisk.c and look for the function treedisk_setsize(). It is currently incomplete and causes the error that you saw in the previous step. Implement the missing code. You only need to support setting the size to 0. If you run ./trace again, the output should look more like:

    blocksize:  512
    refs/block: 128
    !$STAT: #nnblocks:  0
    !$STAT: #nsetsize:  0
    !$STAT: #nread:     36
    !$STAT: #nwrite:    24

5) look in cachedisk.c. It is supposed to cache blocks, but currently it simply forwards read and write operations to the layer below. You are to implement a caching algorithm. You can use any algorithm that appeared in the literature (LRU, LFU, MRU, …) or design your own. Note that it is supposed to be a “write-through” cache: the write operation cannot buffer writes until a later time.

6) we are going to run a contest! You are to provide your own version of the file “trace.txt”. We will run everybody’s trace against everybody’s cache layer. In particular, we will run ./trace with your treedisk and caching layer (configured with 16 blocks) against everybody’s trace, and count up the total number of read operations that your caching layer performs. We will rank contestants by this total.

Note that you will get disqualified if your treedisk layer or caching layer does not work. You will also get disqualified if you cache any user data in a place other than the memory that is passed to the caching layer, or if you change the layout of the treedisk file system. The outcome of this contest does not count toward your Project grade—however, there will be prizes.

Your trace.txt file has the following limitations:

it can have at most 10,000 lines/commands
- inode numbers must range between 0 and 127 inclusive
  - block numbers must range between 0 and (1<<27)-1 inclusive

To win, you can use the following strategies or a combination:

    - defensive approach: modify treedisk and/or cachedisk so as
      to minimize the number of read operations below.
    - offensive approach: create a trace.txt that makes it very
      hard for the underlying layers (except yours of course) to
      cache anything.

Things to submit

Submit the following files on CMS, but be sure you use your gitrepository wisely so as to protect your self against any unexpectedlife or scholastic events.

Part 1

trace.txt — A valid trace file. Just overwrite the original trace.txt
alias.txt — the name by which you shall be known for the contest

Part 2

treedisk.c
cachedisk.c
trace.txt
alias.txt