数据库代写:Catalog HeapFiles

Overview

In this assignment, you will first construct the basic building block of a table: Tuples. Next, you will implement a simple Catalog that keeps track of what tables are currently in the database and provides a way to access them. Finally, you will implement HeapFiles which are the physical representation of the data in our database.

Tuples

Your first task is to complete the TupleDesc and Tuple classes. TupleDesc is a class that specifies a schema for a Tuple. Examine the methods and come up with a design before beginning your implementation.

Some implementation hints for TupleDesc:

* For the purposes of our database, we will only be using two datatypes: int and string (char). ints should be 4 byte signed values. Strings can be up to 128 characters in length. The first byte of a string will contain the length of the string, followed by the bytes containing the content of the string (one byte per character). If a string is less than 128 characters, the leftover bytes will be filled with zeroes until it reaches a size of 128.

* The hashCode method is not required, but if you intend to hash TupleDesc objects (for use with a HashMap for example, then you will need to implement it.

* When implementing toString please use the format specified in the comments, since we will be using this for testing purposes.

Once you have TupleDesc implemented, it is time to focus on the Tuple class. A Tuple is responsible for storing data and making sure that it conforms to the schema specified by a TupleDesc.

Some implementation hints for Tuple:

* You will need to decide on a data structure that can be used to map field names to their content.

Once you have finished these two classes, your code should pass the TupleDescTest and TupleTest tests.

Catalog

Next, we will shift our focus to tables. Tables will consist of many Tuples. Before we construct the tables themselves, we will implement a Catalog that will keep track of what tables currently exist and provide access to those tables. Examine the Catalog class and come up with a design before implementing it.

Some implementation hints for Catalog:

* You will likely need to create a structure to hold relevant information for a table (such as its name, the location of the file, and its primary key). The best way to do this is probably to create a private inner class.

* Many of the methods in this class rely on a table id. This id is generated by a HeapFile which you have not implemented yet. It may be worthwhile to take a look at that method (it is very simple) before testing your Catalog implementation.

Heap Files

The actual physical storage of our data is called a HeapFile. HeapFiles consist of many HeapPages. The primary purpose of a HeapFile is to manage access to the HeapPages and also to stream the data out to disk. The primary function of a HeapPage is to manage the various tuples – making sure that each tuple is written to a proper location and accessing tuples as necessary.

It is highly recommended that you design and implement the HeapPage class first, then design and implement the HeapFile class.

Some implementation hints for the HeapPage class:

* Each HeapPage will have a header and some slots. Slots are used to store the actual data – the size of each slot is the same as the size of the tuples contained in the HeapPage. The header is a binary encoding of which slots are occupied. Each slot is assigned one bit in the header. If that bit is a one, then that slot is occupied, and contains data that can be accessed. If the bit is a zero, then that slot contains no data and can have new data written to it. The header format should track slot occupancy starting with the least significant bit. In other words, the header bit for slot 0 would be in byte 0, bit 0 of the header. Slot 1 would be in byte 0, bit 1. Slot 8 would be in byte 1, bit 0, etc.

* It is important to consider how much space the header will require. All pages are of constant size (see the static variable in HeapPage) so the space that we are using for the header will decrease the amount of space that we can use for tuples. Headers must be a whole number of bytes (no partial bytes).

* Any space that is not taken up by the header can be used to store tuples. Tuples should be stored immediately following the header. No partial tuples can be stored in a HeapPage. If there is any space leftover, the HeapPage will be padded with zeroes until it reaches its defined size.

* When adding or deleting a tuple from a HeapPage be sure to update the header to indicate the occupied status of the modified slot.

Here are some implementation hints for the HeapFile class:

* To create an id for your HeapFile it is sufficient to create some sort of hash based on the File.

* When reading or writing a page, it is strongly recommended that you use a RandomAccessFile for file access.

* When adding a tuple, you must first find a page with an empty slot. If there are no empty slots, create a new heap page instead. Feel free to create some additional methods in HeapPage to help with this if necessary. You should then pass the new tuple to the HeapPage. Finally, be sure to write the modified HeapPage to disk.

* When deleting a tuple, you should be able to find the page id and the tuple id (slot number) from the Tuple object. Be sure to write the modified HeapPage to disk.

* It will be helpful to use the iterator from your HeapPage class to aggregate all of the tuples in the getAllTuples()method.

Example

The following example will load a table from a file, then display all of the tuples contained in that table. In other words, it is equivallent to SELECT * FROM test;. The .txt file contains the schema and the catalog looks for the data in a file called test.dat.

Catalog c = Database.getCatalog();

c.loadSchema(“testfiles/test.txt”);

 

int tableId = c.getTableId(“test”);

td = c.getTupleDesc(tableId);

System.out.println(td);

 

hf = c.getDbFile(tableId);

ArrayList tups = hf.getAllTuples();

Iterator it = tups.iterator();

 

while(it.hasNext()) {

System.out.println(it.next());

}

 

Right now we aren’t properly reading in any SQL code or processing any queries. In future assignments we will add this functionality to our database.

Testing

You have been provided some tests that you can use to check basic functionality of each of the objects. These tests are NOT comprehensive. Just because you pass the tests does not mean that you will get a perfect score on this assignment. Please feel free to write some of your own tests to check your implementation. In particular, notice that none of the tests cover HeapFiles with multiple pages.

Your tests will not be part of your grade. When we do grade your assignment, we will replace the existing tests with some more thorough tests and record the output.

Rubric

The below rubric is intended as a guideline and will be followed as closely as possible, however submissions that deviate too much from the specification may be graded using an alternate rubric.

TupleDesc (15 points): instance variables and getters and setters (5 points), toString (5 points), field access methods (5 points)

Tuple (15 points): instance variables and getters and setters (5 points), toString (5 points), field access methods (5 points)

Catalog (15 points): instance variables and getters and setters (5 points), implementation of inner class (5 points), table management methods (5 points)

HeapPage (25 points): instance variables and getters and setters (5 points), numSlots and header size (5 points), adding and deleting tuples (10 points), iterator (5 points)

HeapFile (30 points): instance variables and getters and setters (5 points), adding and deleting tuples (10 points), read and write page (10 points), getAllTuples() (5 points)

I reserve the right to deduct points for code that is poorly formatted or poorly documented.