CS44800 Project 2
The RDBMS Lower-Level Layers
Spring 2017
Due: February 27, 2017, 11:59PM
(There will be a 10% penalty for each late day up to four days. The assignment will not
be accepted afterwards.)
Notes:
● This project should be carried out in groups of two. Choose a partner as soon as possible.
● This project will be based on Minibase, a small relational DBMS, structured into several layers. In
this project, you are required to implement parts of the lower-level layers: BM – the buffer
manager layer and HFPage – the heap file layer. Layers should be tested and turned in
separately.
PART 1: Buffer Management
1. Introduction
You are required to implement a simplified version of the Buffer Manager layer without support
for concurrency control or recovery. You will be given the code for the lower layer: the Disk
Space Manager.
You should begin by reading the chapter on Disks and Files to get an overview of buffer
management. In addition, HTML documentation is available for Minibase. There is a link to the
Minibase home page here. In particular, you should read the description of the DB class, which
you will call extensively in this assignment. The Java documentation for the diskmgr package
can be found here. You should also read the code under diskmgr/ carefully to learn how the
package is declared and how exceptions are handled in Minibase.
2. The Buffer Manager Interface
The simplified Buffer Manager interface that you will implement in this assignment allows a
client (a higher level program that calls the Buffer Manager) to allocate/de-allocate pages on
disk, to bring a disk page into the buffer pool and pin it, and to unpin a page in the buffer pool.
The methods that you have to implement are described below:
http://research.cs.wisc.edu/coral/mini_doc/intro/single_user.html
https://www.cs.purdue.edu/homes/pearso29/MinibaseJavadoc/index.html
/**
* Create the BufMgr object.
* Allocate pages (frames) for the buffer pool in main memory and
* make the buffer manage aware that the replacement policy is
* specified by replacerArg (e.g., LH, Clock, LRU, MRU, LFU, etc.).
*
* @param numbufs number of buffers in the buffer pool
* @param lookAheadSize: Please ignore this parameter
* @param replacementPolicy Name of the replacement policy, that parameter will be set to “MRU” (you
can safely ignore this parameter as you will implement only one policy)
*/
public BufMgr(int numbufs, int lookAheadSize, String replacementPolicy) {};
/**
* Pin a page.
* First check if this page is already in the buffer pool.
* If it is, increment the pin_count and return a pointer to this
* page.
* If the pin_count was 0 before the call, the page was a
* replacement candidate, but is no longer a candidate.
* If the page is not in the pool, choose a frame (from the
* set of replacement candidates) to hold this page, read the
* page (using the appropriate method from {\em diskmgr} package) and pin it.
* Also, must write out the old page in chosen frame if it is dirty
* before reading new page.__ (You can assume that emptyPage==false for
* this assignment.)
*
* @param pageno page number in the Minibase.
* @param page the pointer point to the page.
* @param emptyPage true (empty page); false (non-empty page)
*/
public void pinPage(PageId pageno, Page page, boolean emptyPage) {};
/**
* Unpin a page specified by a pageId.
* This method should be called with dirty==true if the client has
* modified the page.
* If so, this call should set the dirty bit
* for this frame.
* Further, if pin_count>0, this method should
* decrement it.
*If pin_count=0 before this call, throw an exception
* to report error.
*(For testing purposes, we ask you to throw
* an exception named PageUnpinnedException in case of error.)
*
* @param pageno page number in the Minibase.
* @param dirty the dirty bit of the frame
*/
public void unpinPage(PageId pageno, boolean dirty) {};
/**
* Allocate new pages.
* Call DB object to allocate a run of new pages and
* find a frame in the buffer pool for the first page
* and pin it. (This call allows a client of the Buffer Manager
* to allocate pages on disk.) If buffer is full, i.e., you
* can’t find a frame for the first page, ask DB to deallocate
* all these pages, and return null.
*
* @param firstpage the address of the first page.
* @param howmany total number of allocated new pages.
*
* @return the first page id of the new pages.__ null, if error.
*/
public PageId newPage(Page firstpage, int howmany) {};
/**
* This method should be called to delete a page that is on disk.
* This routine must call the method in diskmgr package to
* deallocate the page.
*
* @param globalPageId the page number in the data base.
*/
public void freePage(PageId globalPageId) {};
/**
* Used to flush a particular page of the buffer pool to disk.
* This method calls the write_page method of the diskmgr package.
*
* @param pageid the page number in the database.
*/
public void flushPage(PageId pageid) {};
/**
* Used to flush all dirty pages in the buffer pool to disk
*
*/
public void flushAllPages() {};
/**
* Returns the total number of buffer frames.
*/
public int getNumBuffers() {}
/**
* Returns the total number of unpinned buffer frames.
*/
public int getNumUnpinned() {}
3. Internal Design
The buffer pool is a collection of frames (page-sized sequence of main memory bytes) that is
managed by the Buffer Manager. Page objects are implemented as an array of bytes and deal
with the buffer pool at the byte array level. Note that the size of the Minibase pages is defined in
the interface GlobalConst of the global package. Before jumping into coding, please make sure
that you understand how the Page object is defined and manipulated in Java Minibase.
In addition, you should maintain an array bufDescr[numbuf] of descriptors, one per frame. Each
descriptor is a record with the following fields:
page number, pin_count, dirtybit.
The pin_count field is an integer, page number is a PageId object, and dirtybit is a boolean. This
describes the page that is stored in the corresponding frame. You are free to extend this
descriptor to hold additional information if necessary for your implementation. A page is
identified by a page number that is generated by the DB class when the page is allocated, and
is unique over all pages in the database. The PageId type is defined as an integer type in global
package.
A simple hash table should be used to figure out which frame a given disk page occupies. You
should implement your own hash table class and not use existing HashTable Java classes. The
hash table should be implemented (entirely in main memory) by using an array of pointers to
lists of
pairs is called a bucket. Given a page number, you should apply a hash function to find the
directory entry pointing to the bucket that contains the frame number for this page, if the page is
in the buffer pool. If you search the bucket and do not find a pair containing this page number,
the page is not in the pool. If you find such a pair, it will tell you the frame in which the page
resides.
The hash function must distribute values in the domain of the search field uniformly over the
collection of buckets. If we have HTSIZE buckets, numbered 0 through M-1, a hash function h of
the form h(value) = (a*value+b) mod HTSIZE works well in practice. HTSIZE should be chosen
to be a prime number.
When a page is requested, the buffer manager should do the following:
1. Check the buffer pool (by using the hash table) to see if it contains the requested page. If the
page is not in the pool, it should be brought in as follows:
(a) Choose an unused frame. If all frames are occupied (i.e., the buffer pool is full),
choose a frame for replacement using the MRU policy as described below.
(b) If the frame chosen for replacement is dirty, flush it (i.e., write out the page that it
contains to disk, using the appropriate DB class method).
(c) Read the requested page (again, by calling the DB class) into the frame chosen for
replacement; the pin_count and dirtybit for the frame should be initialized to 0 and
FALSE, respectively.
(d) Delete the entry for the old page from the Buffer Manager’s hash table and insert an
entry for the new page. Also, update the entry for this frame in the bufDescr array to
reflect these changes.
2. Pin the requested page by incrementing the pin_count in the descriptor for this frame and
return a pointer to the page to the requestor.
The Most Recently Used (MRU) Replacement Policy
The MRU replacement policy keeps track of how recently buffered pages have been accessed.
When the buffer pool is full and requires more room to host an unbuffered page, MRU will select
the most recently used page as a victim (i.e., will be evicted from the buffer pool to allow
buffering a new page).
4. Error protocol
Though the Throwable class in Java contains a snapshot of the execution stack of its thread at
the time it was created and also a message string that gives more information about the error
(exception), we have decided to maintain a copy of our own stack to have more control over the
error handling.
We provide the chainexception package to handle the Minibase exception stack. Every
exception created in your bufmgr package should extend the ChainException class. The
exceptions are thrown according to the following rule:
Error caused by an exception caused by another layer:
For example: (when try to pin page from diskmgr layer)
try {
Minibase.BufferManager.pinPage(pageId, page, true);
}
catch (Exception e) {
throw new DiskMgrException(e, “DB.java: pinPage() failed”);
}
Error not caused by exceptions of others, but needs to be acknowledged:
For example: (when try to unpin a page that is not pinned)
if (pin\_count == 0) {
throw new PageUnpinnedException (null, “BUFMGR: PAGE_NOT_PINNED.”);
}
Basically, the ChainException class keeps track of all the exceptions thrown across the different
layers. Any exceptions that you decide to throw in your bufmgr package should extend the
ChainException class.
For testing purposes, we ask you to throw the following exceptions in case of error (use the
exact same name, please):
BufferPoolExceededException
Throw a BufferPoolExceededException when an attempt is made to pin a page to the buffer
pool with no unpinned frame left.
PagePinnedException
Throw a PagePinnedException when an attempt is made to free a page that is still pinned.
HashEntryNotFoundException
Throw a HashEntryNotFoundException when the page specified by PageId is not found in the
buffer pool.
Feel free to throw other new exceptions as you see fit. The provided test cases can help you
figure out where and which exceptions need to be thrown. But make sure that you follow the
error protocol when you catch an exception. Also, think carefully about what else you need to do
in the catch phase. Sometimes you do need to unroll the previous operations when failure
happens.
5. Where to Find Makefiles, Code, etc.
Please copy this file (proj21.zip) into your own Unix local directory. The directory lib contains the
jar file that implements the disk manager, you will use this file to compile your code. The content
of the directory src is:
under bufmgr directory
You should make all your code for bufmgr package implemented under this directory.
under diskmgr directory
The code for the diskmgr package has been available for you to study in order to understand
what is happening when the Buffer Manager is making calls to the Disk Manager. The Javadoc
also gives information on Disk Manager behavior. You should not have to make any changes to
code here.
under tests directory
TestDriver.java, BMTest.java: Buffer manager test driver program.
The provided test cases are given to you in order to give you a starting place. They are not
going to be the test cases used for grading. You are encouraged to add your own test cases in
order to test for corner cases.
https://www.cs.purdue.edu/homes/pearso29/cs448_S17/project2/proj21.zip
We provide a sample Makefile to compile your project. You will have to set up any
dependencies by editing this file. You may need to design your own Makefile. Whatever you do,
please make sure that the classpath is correct.
You can find other useful information by reading the java documentation on other packages,
such as the global and diskmgr package.
6. What to Turn in
You should turn in copies of your code together with the Makefile and a readme file. All files
need to be zipped in a file named: your_career_login1_your_career_login2_bufmgr.zip.
In the readme file, put the name of your group members, and the feature you would like the TAs
to know. The TAs should be able to compile/run your program using make on the university
Linux machines. You do not need to include the library file and other files provided.
The directory structure of your zip file should be identical to the directory structure of the
provided zip file (i.e., having the directory src, the Makefile, …), except the top-level name
(should be your career login above). Your grade may be reduced 5% off if you do not follow this.
In the readme file, make sure to state the roles of each member in the group (i.e., who did
what). Only one member should submit the project on behalf of the other member (please
indicate in the read me file who is the one submitting).
PART 2: Heap Files
1. Introduction
In this part, you will implement the Heap file layer. You will be given the documentation for the
lower layers (Buffer Manager and Disk Space Manager), as well as the documents for
managing records on a Heap file page. You can find the package index for the above here.
This assignment has three parts. You have to do the following:
1. Familiarize yourself with the HeapFile, HFPage, Buffer Manager and Disk Space Manager
interfaces.
2. Implement the HeapFile class. You can ignore concurrency control and recovery issues, and
concentrate on implementing the interface routines. You should deal with free space
intelligently, using either a page directory to identify pages with room for records. When a record
is deleted, you must update your information about available free space, and when a record is
inserted, you must try to use free space on existing pages before allocating a new page.
3. Run the tests provided.
The major methods of HeapFile.java that you will implement include:
https://www.cs.purdue.edu/homes/pearso29/MinibaseJavadoc/index.html
public HeapFile(String name)
public RID insertRecord(byte[] record)
public Tuple getRecord(RID rid)
public boolean updateRecord(RID rid, Tuple newRecord)
public boolean deleteRecord(RID rid)
public int getRecCnt() //get number of records in the file
public HeapScan openScan()
Note: For the purposes of this assignment, we have slightly different requirements than the full
implementation of Minibase. The high-level descriptions in the Javadoc of how methods perform
is the same. However, for any conflicts between the Javadoc and this provided interface in
terms of return types or parameters, refer to this interface. The provided test cases can also be
used to check which version is correct.
2. Available Documentation
You should begin by reading the chapter on Disks and Files, to get an overview of the HF layer
and buffer management.
3. Page Directory Requirements
In order to track free space on pages in the file and be able to efficiently implement the insertRecord,
updateRecord, and deleteRecord methods, you will need to use a Page Directory data structure. These
operations should be implemented in O(N). All other operations should take at most O(1). However, your
HeapFile will also have to efficiently handle variable-length records. Although a Linked List
implementation to track free space would technically satisfy the O(N) requirements, this performs
relatively poorly for variable-length records. A Linked List implementation will therefore not receive full
credit compared to a Page Directory implementation.
In addition, your Page Directory needs to handle the case where the Page Directory is too large to fit in
one page. You must make use of Pages to store the Page Directory on-disk. It is not sufficient to store the
Page Directory as an in-memory data structure.
4. Representing Records
Records in the database are represented using the Tuple class. Although the Minibase
documentation provided to you does have a specification for a Tuple class, this is implemented
at a higher level of the database than we need for this project.
You will implement your own, simple Tuple class for the purposes of this project. The Tuple
class essentially needs to just provide accessor methods for the data stored in the Tuple. You
should not implement any sort of validation logic in the Tuple class, just return the data.
Validation and error handling should be done in the relevant HeapFile methods.
The basic interface the Tuple class needs to satisfy is as follows:
public Tuple()
public Tuple(byte[] data, int offset, int length)
public int getLength()
public byte[] getTupleByteArray()
You can ignore the offset variable. It will not be used in this assignment.
5. Classes to Familiarize Yourself With First
There are three main packages with which you should familiarize yourself: heap, bufmgr,
diskmgr. Note that part of the heap package contains implementation for HFPage. The java
documentation of HFPage is provided to you. A HeapFile is seen as a collection of records.
Internally, records are stored on a collection of HFPage objects.
6. Compiling Your Code and Running the Tests
Copy this file (proj22.zip) to your own local directory and study them carefully. The files are:
– In directory src/heap: Again, the java documentation for package bufmgr, diskmgr and heap
are online.
Note that you DO NOT have to throw the same exceptions as documented for the heap
package. However, for testing purposes, we DO ask you to use InvalidUpdateException to
signal any illegal operations on the record. In other error situations, you should throw exceptions
as you see fit following the error protocol introduced in the Buffer Manager Assignment.
Please check that you are not redefining an exception already defined in the provided
heapAssign.jar library. If an exception you want to create is already defined, just use the
version provided in the library. All required exceptions that the test cases are expecting
in order to compile should be included in the provided library.
– In directory src/tests: This directory contains the source code of the test. The provided test
cases are given to you in order to give you a starting place. They are not going to be the test
cases used for grading. You are encouraged to add your own test cases in order to test for
corner cases.
– In directory lib:heapAssign.jar: this is a library that has the implementation of the Disk
Manager and Buffer Manager layers. As you know, you will develop the Buffer Manager layer in
the first part of this project. We are providing this library to help you start working with the
second part as soon as possible and also to help you test your code comparing the behavior of
your code with the one obtained used the library.
6. What to Turn in
Part 2 & 3 should be compiled together. See Part 3 to know what to turn in.
https://www.cs.purdue.edu/homes/pearso29/cs448_S17/project2/proj22.zip
PART 3: Heap Scans
After completing the buffer manager and heap file layers, you will be able to implement a basic
scan of a given file. A heap scan is simply an iterator that traverses the file’s directory and data
pages to return all records. The major methods of HeapScan.java that you will implement include
the following (please see the javadoc for detailed descriptions):
protected HeapScan(HeapFile hf)
protected void finalize() throws Throwable
public void close()
public boolean hasNext()
public Tuple getNext(RID rid)
Internally, each heap scan should consist of (at least) the following:
Directory Position
The current directory page and/or entry in the scan.
Record Position
The current data page and/or RID in the scan.
The Tuple class is the wrapper of an array of data. Internally it should contain a declaration
byte[] data and other methods called by the test driver.
Your implementation should have at most one directory page and/or at most one data page
pinned at any given time.
What to Turn in
Part 2 & 3 should be compiled together. You should turn in copies of your code together with the
Makefile and a readme file. All need to be zipped in a file
named:your_career_login1_your_career_login2_heapfile.zip.
In the readme file, put the name of your group members, and the feature you would like me to
know. I should be able to compile/run your program using make. You don’t need to include the
library file and other files provided. In the readme file, you should also include the roles of each
group member in the project, i.e., who implemented what.
The directory structure of your zip file should be identical to the directory structure of the
provided zip file (i.e., having the directory src, the Makefile, …), except the top-level name
(should be your Purdue login above). Your grade may be deduced 5% off if you don’t follow this.
Only one member is allowed to submit Part 2 & 3 (the same person who submitted Part 1).
You should upload all your zip files using Blackboard.
https://www.cs.purdue.edu/homes/pearso29/MinibaseJavadoc/index.html