Project 2: Buffer Manager
Instructor: Davood Rafiei
Introduction
You are given the skeleton code for a simple buffer manager, which is also copied to your course github folder, and are asked to extend its functionality. Your code will implement a buffer manager that allows a client program (another program that calls the functions in your buffer manager) to call the functions of the buffer manager to create and operate on a heap file. Your job is to extend the support of the buffer manager to a heap file as well as to implement a new frame replacement policy. The provided skeleton is based on a public repository (referred to as index-BMP) that implements a buffer manager on top of a B+ tree index.
This assignment is meant to be completed individually or in pairs, under the consultation model of collaboration as per the Computing Science Department Course Policies. Your submission cannot include any binary or large data file. That repository already has the folder structure for the assignment. DO NOT MODIFY that directory structure. Submit the URL of your repository through eClass before the deadline.
The learning objectives of this project are (1) gaining a better understanding of database internals, (2) learning the working of a buffer manager, and (3) getting acquainted with the issues and the implementation details of a buffer manager. The relevant reading for this project is our lecture coverage of the buffer manager, and Sections 13.3 (Organization of records in files) and 13.5 (Database buffer) of our recommended textbook. There are also plenty of resources on the Web that can be consulted. Cite all such resources in your submission. You should check out the code and get a good understanding of how the code implements the functions of a buffer manager and how client programs work with the buffer manager.
Part 0 – Understanding the code
You need to understand the provided code and how different functions are implemented. Answer the following questions by studying the provided skeleton code.
Q1 — What information is kept for each file (in file.c and file.h) and why? Explain the structure of headerPage_t and freePage_t as defined in file.h.
Q2 — Answer the following questions based on buffer.c and buffer.h:
· (a) What happens when a page is requested? How is the LRU replacement policy implemented?
· (b) What is involved in opening/closing a table?
· (c) What happens when a page is not needed? What does the application do and what actions does the buffer manager take?
Part 1 – Building a buffered interface to a heap file
Your job in this part is to extend the buffer manager to support heap files. Your program will allow a client program (another program that calls the functions in the buffer manager) to operate on the pages of a table organized as a heap file, i.e. bringing the pages into the buffer, pushing the buffer pages to disk and supporting searches, updates, pinning, unpinng, etc. Each table is stored as a file on the secondary storage. For simplicity, each table is in the form of key-value pairs organized as a heap file with gaps, and the gap is always at the end of each page. The key is an 8-byte integer whereas the value can be of any length. The first 4 bytes of the value field is an integer that keeps the size of the rest of the value part. For example, if the value part is a character string of length 20 bytes, then it will occupy 24 bytes and the first 4 bytes will be set to 20. The sum of sizes of key and value fields cannot exceed the page size minus 10, i.e. 4086 bytes.
Buffer manager interface
Your code will implement the following methods in C programming language. Your code will be named as heap.c and heap.h and these will be stored in src and include directories. Your code with the existing code will compile into a library named libdbBM.a, and the library will provide the buffer manager API for client programs. The generated library will be in the lib directory.
· int open_table(char *filePath); // This opens an existing file or creates one if the file does not exist. It returns table_id as a positive integer if successful, or -1 if not successful. If open_table is called on a table that is open (i.e. it is opened more than once before being closed), the same id should be returned.
· int close_table(int table_id); // This writes all dirty frames of the file to disk and unpins any pinned frame. Return 0 if successful and -1 if not.
· int db_insert(int table_id, int64_t key, char *val, int trn_id); // Insert a key-value pair. The argument trn_id can be ignored at this time. It is not used in this project. The row cannot be inserted if the key exists in the file. The return value is 0 on success and -1 on failure.
· int db_find(int table_id, int64_t key, char *val, int trn_id); // Find the row matching the key and return the relevant data in val. The return value is 0 on success and -1 on failure.
· int db_delete(table_id, int64_t key, int trn_id); // Delete the row matching the key (if exists) and return 0 on success and -1 on failure.
· int db_update(int table_id, int64_t key, char *val , int trn_id); // Update the row in place. The new value will have the same length as the old value.
· int print_buffer(); // Print the content of the buffer in a readable format. The result should include for each frame, the frame number, the table id, the dirty bit, the pin count, the reference bit, and a listing of the records. For each record, return the key and up to the first 6 characters of the value field. This and the next functions are helper functions that may be used for debugging.
· int print_buffer_brief); // Print the content of the buffer same as print_buffer() except that the keys and values are not returned. Instead, for each frame, the number of key/value pairs are returned.
· int init_db(int buf_size); // Initialize the buffer pool and allocate buf_size frames.
· int shutdown_db(); // Shut down the buffer manager.
Internal design
Your buffer pool consists of a set of frames stored in main memory. The number of frames is set when the buffer manager is initialized using the method init_db(). Each frame has enough space to store a whole disk page plus additional information as discussed next. We assume the disk page size is fixed and is 4096 bytes, which is the standard page size under Unix/Linux. For each disk page, the first 2 bytes is a short integer that gives the offset where the gap starts, and the last 8 bytes keeps the address of the next page. Each buffer frame has the following fields:
· tableId: Each table or file has a unique id, which is assigned when it is opened and is used when it is accessed.
· pageNum: This is the page number inside the file. Pages are numbered as 0, 1, 2, etc. Page 0 refers to the first 4096 bytes in the file, Page 1 refers to the next 4096 bytes, etc.
· isPinned: This is an integer that shows the number of client programs or transactions that currently use the frame. It is zero if the frame is not currently used. In our case, it is expected to be either 0 or 1.
· isDirty: This is set to 1 if the buffer frame is dirty (updated); otherwise, it is zero.
· LRU next and/or prev: The buffer frames will be organized as a queue to implement the LRU replacement policy.
· page: This is a copy of a disk page, 4096 bytes.
You may keep a table in main memory that has some information about each opened table including (but not limited to), table id, file Path, and file descriptor (returned by fopen or open in C). When a buffer page is requested, it is pinned by the buffer manager. The program requesting the buffer frame is responsible for unpinning it. Each buffer page is unpinned after use (see index.c in index-BPM for examples). We are using the LRU replacement policy for selecting frames for replacement. Unpinned frames are added to a queue, which is implemented as a linked list. Every unpinned frame is added to the end of the queue. When choosing a frame for replacement, the frame at the front of the queue should be chosen.
All our interface functions operate over the buffer pool. The find command will go over the pages of the file, bringing one page at a time to buffer (if not there) until the key is found or we reach the end of the file. The insert command will insert a row into the table. The table is checked first and the insert is rejected if the key already exists in the table. Otherwise, the row is inserted into the last page of the table (in the linked list of allocated pages). If the file is empty or the last page does not have enough space for the record, a new page is allocated from the list of free pages. If the linked list of free pages is empty, a page is allocated at the end of the file. We assume the length of a row cannot exceed the page size minus 10 (with 2 bytes in each page allocated for the gap offset and 8 bytes allocated for the address of the next page). The delete and update commands will run the find command to find the key first before deleting or updating the row. Deleting a row will create gaps, and the gap is always pushed to the end of pages. If a page has no record after a delete, the page is removed from the linked list of allocated pages and is added to the linked list of free pages. The updates are done in place if the page has enough space. Otherwise, the old record is deleted and the new record is inserted at the end. After each delete and update, the dirty field is set, the gap offset may be updated (if needed) and all the gaps are pushed to the end of the page (if not there already).
Part 2 – Clock replacement policy
The provided code implements an LRU replacement policy to evict pages from the buffer when it is full. Your job in this part is to implement the clock replacement policy, which will replace the LRU policy in the code. Your changes will be done in file buffer.c. Everything else in the code will operate as before. The clock policy has less overhead (hence more efficient) since the unpinned pages are not constantly pushed to the end of the queue. Under the clock policy, the buffer frames are arranged in a circle, and a hand points to one frame. Each frame has a reference flag, which can be 0 or 1. When a page is read into buffer or accessed in buffer, its reference bit is set to 1. Buffer pages with reference bit 0 can be evicted whereas those with reference bit 1 are not.
When the buffer is full and a page is requested, the frame pointed by the hand is examined. If its reference bit is 0, its content is thrown out (written to disk if dirty), the frame is allocated for the new request, the reference bit is set to 1, and the hand is moved clockwise to the next frame. If the reference bit is 1, it is set to zero and the hand is moved to the next frame. As noted, a page is evicted only if it is not accessed for the time it takes for the hand to make a rotation and set it to 0 and then make another full rotation to find the reference bit of the frame still 0.
Submission and testing
Your solution for Part (0) will be added to file Part0.txt, for Part (1) will be implemented in heap.c and heap.h, and your solution for Part (2) will be implemented in buffer.c and buffer.h. Your solution for Part (1) should work with both the existing buffer.c that implements LRU and the new buffer.c that implements clock. Your submission must compile withe the provided makefile and create a library named libdbBM.a.
You are provided with some basic test cases in heapTest.c, and are expected to develop your extensive set of test cases.
Grading
The project mark is broken as follows. 10% of the mark is allocated for Part 0, 60% for Part 1, and 30% for Part 2. For Parts 1 and 2, we will be checking the correctness and code quality. The correctness will be tested using some test cases and manually. The code quality will be in terms of the clarity, comments in the code, checking for possible errors and avoiding memory leaks (Valgrind is a useful tool for checking memory leaks).
© 2021 Davood Rafiei
Content licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, except where indicated otherwise.
/docProps/thumbnail.jpeg