2021/4/28 File Management
File Management
File Management
DBMS File Organisation Single- le DBMS
Single- le Storage Manager Example: Scanning a Relation Single-File Storage Manager Multiple- le Disk Manager DBMS File Parameters
>>
COMP9315 21T1 ♢ File Management ♢ [0/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
1/21
2021/4/28 File Management
❖ File Management
Aims of le management subsystem:
organise layout of data within the lesystem
handle mapping from database ID to le address
transfer blocks of data between buffer pool and lesystem also attempts to handle le access error problems (retry)
Builds higher-level operations on top of OS le operations.
∧ >>
COMP9315 21T1 ♢ File Management ♢ [1/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
2/21
2021/4/28 File Management
<< ∧ >>
❖ File Management (cont)
Typical le operations provided by the operating system:
fd = open(fileName,mode)
// open a named file for reading/writing/appending
close(fd)
// close an open file, via its descriptor
nread = read(fd, buf, nbytes)
// attempt to read data from file into buffer
nwritten = write(fd, buf, nbytes)
// attempt to write data from buffer to file
lseek(fd, offset, seek_type)
// move file pointer to relative/absolute file offset
fsync(fd)
// flush contents of file buffers to disk
COMP9315 21T1 ♢ File Management ♢ [2/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
3/21
2021/4/28 File Management
❖ DBMS File Organisation
How is data for DB objects arranged in the le system? Different DBMSs make different choices, e.g.
by-pass the le system and use a raw disk partition have a single very large le containing all DB data have several large les, with tables spread across them have multiple data les, one for each table
have multiple les for each table etc.
COMP9315 21T1 ♢ File Management ♢ [3/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
4/21
2021/4/28 File Management
<< ∧ >>
❖ Single- le DBMS
Consider a single le for the entire database (e.g. SQLite)
Objects are allocated to regions (segments) of the le.
If an object grows too large for allocated segment, allocate an extension.
What happens to allocated space when objects are removed?
COMP9315 21T1 ♢ File Management ♢ [4/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
5/21
2021/4/28 File Management
❖ Single- le DBMS (cont) Allocating space in Unix les is easy:
simply seek to the place you want and write the data if nothing there already, data is appended to the le if something there already, it gets overwritten
If the seek goes way beyond the end of the le:
Unix does not (yet) allocate disk space for the “hole” allocates disk storage only when data is written there
With the above, a disk/ le manager is easy to implement.
COMP9315 21T1 ♢ File Management ♢ [5/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
6/21
2021/4/28 File Management
<< ∧ >>
❖ Single- le Storage Manager
Consider the following simple single- le DBMS layout:
E.g.
SpaceMap = [ (0,10,U), (10,10,U), (20,600,U), (620,100,U), (720,20,F) ]
TableMap = [ (“employee”,20,500), (“project”,620,40) ]
COMP9315 21T1 ♢ File Management ♢ [6/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
7/21
2021/4/28 File Management
❖ Single- le Storage Manager (cont)
Each le segment consists of a number xed-size blocks The following data/constant de nitions are useful
<< ∧ >>
#define PAGESIZE 2048
typedef long PageId;
typedef char *Page;
Typical PAGESIZE values: 1024, 2048, 4096, 8192
COMP9315 21T1 ♢ File Management ♢ [7/19]
// bytes per page
// PageId is block index
// pageOffset=PageId*PAGESIZE
// pointer to page/block buffer
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
8/21
2021/4/28 File Management
<< ∧ >>
❖ Single- le Storage Manager (cont)
Possible storage manager data structures for opened DBs &
Tables
typedef struct DBrec {
char *dbname;
int fd;
SpaceMap map;
TableMap names; // map names to areas + sizes
} *DB;
typedef struct Relrec {
char *relname;
int start;
int npages;
…
} *Rel;
COMP9315 21T1 ♢ File Management ♢ [8/19]
// copy of table name
// page index of start of table data
// number of pages of table data
// copy of database name
// the database file
// map of free/used areas
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
9/21
2021/4/28 File Management
❖ Example: Scanning a Relation
With the above disk manager, a query like
select name from Employee
might be implemented as
DB db = openDatabase(“myDB”);
Rel r = openRelation(db,”Employee”);
Page buffer = malloc(PAGESIZE*sizeof(char));
for (int i = 0; i < r->npages; i++) {
PageId pid = r->start+i;
get_page(db, pid, buffer);
for each tuple in buffer {
get tuple data and extract name
add (name) to result tuples
}
}
COMP9315 21T1 ♢ File Management ♢ [9/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
10/21
2021/4/28 File Management
❖ Single-File Storage Manager
// start using DB, buffer meta-data
DB openDatabase(char *name) {
DB db = new(struct DBrec);
db->dbname = strdup(name);
db->fd = open(name,O_RDWR);
db->map = readSpaceTable(db->fd);
db->names = readNameTable(db->fd);
return db;
}
// stop using DB and update all meta-data
void closeDatabase(DB db) {
writeSpaceTable(db->fd,db->map);
writeNameTable(db->fd,db->map);
fsync(db->fd);
close(db->fd);
free(db->dbname);
free(db);
}
COMP9315 21T1 ♢ File Management ♢ [10/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
11/21
2021/4/28 File Management
❖ Single-File Storage Manager (cont)
// set up struct describing relation
Rel openRelation(DB db, char *rname) { Rel r = new(struct Relrec); r->relname = strdup(rname);
// get relation data from map tables r->start = …;
r->npages = …;
return r; }
// stop using a relation
void closeRelation(Rel r) {
free(r->relname);
free(r);
}
COMP9315 21T1 ♢ File Management ♢ [11/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
12/21
2021/4/28 File Management
❖ Single-File Storage Manager (cont) // assume that Page = byte[PageSize]
// assume that PageId = block number in file
// read page from file into memory buffer
void get_page(DB db, PageId p, Page buf) {
lseek(db->fd, p*PAGESIZE, SEEK_SET);
read(db->fd, buf, PAGESIZE);
}
// write page from memory buffer to file
void put_page(Db db, PageId p, Page buf) {
lseek(db->fd, p*PAGESIZE, SEEK_SET);
write(db->fd, buf, PAGESIZE);
}
COMP9315 21T1 ♢ File Management ♢ [12/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
13/21
2021/4/28 File Management
❖ Single-File Storage Manager (cont)
Managing contents of space mapping table can be complex:
// assume an array of (offset,length,status) records
// allocate n new pages
PageId allocate_pages(int n) {
if (no existing free chunks are large enough) {
int endfile = lseek(db->fd, 0, SEEK_END);
addNewEntry(db->map, endfile, n);
} else {
grab “worst fit” chunk
split off unused section as new chunk
}
// note that file itself is not changed
}
COMP9315 21T1 ♢ File Management ♢ [13/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
14/21
2021/4/28 File Management
❖ Single-File Storage Manager (cont) Similar complexity for freeing chunks
// drop n pages starting from p
void deallocate_pages(PageId p, int n) {
if (no adjacent free chunks) {
markUnused(db->map, p, n);
} else {
merge adjacent free chunks
compress mapping table
}
// note that file itself is not changed
}
Changes take effect when closeDatabase() executed.
COMP9315 21T1 ♢ File Management ♢ [14/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
15/21
2021/4/28 File Management
❖ Multiple- le Disk Manager
Most DBMSs don’t use a single large le for all data. They typically provide:
multiple les partitioned physically or logically
mapping from DB-level objects to les (e.g. via catalog meta- data)
Precise le structure varies between individual DBMSs. Using multiple les (one le per relation) can be easier, e.g.
adding a new relation
extending the size of a relation computing page offsets within a relation
COMP9315 21T1 ♢ File Management ♢ [15/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
16/21
2021/4/28 File Management
❖ Multiple- le Disk Manager (cont) Example of single- le vs multiple- le:
<< ∧ >>
Consider how you would compute le offset of page[i] in table[1] …
COMP9315 21T1 ♢ File Management ♢ [16/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
17/21
2021/4/28 File Management
❖ Multiple- le Disk Manager (cont)
Structure of PageId for data pages in such systems … If system uses one le per table, PageId contains:
relation indenti er (which can be mapped to lename) page number (to identify page within the le)
If system uses several les per table, PageId contains: relation identi er
le identi er (combined with relid, gives lename) page number (to identify page within the le)
COMP9315 21T1 ♢ File Management ♢ [17/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
18/21
2021/4/28 File Management
❖ DBMS File Parameters Our view of relations in DBMSs:
a relation is a set of r tuples, with average size R bytes
the tuples are stored in b data pages on disk
each page has size B bytes and contains up to c tuples
data is transferred disk↔memory in whole pages
costofdisk↔memorytransferTr,Tw dominatesother costs
COMP9315 21T1 ♢ File Management ♢ [18/19]
<< ∧ >>
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
19/21
2021/4/28 File Management
❖ DBMS File Parameters (cont) Typical DBMS/table parameter values:
<< ∧
Quantity total # tuples
record size total # pages
page size
# tuples per page page read/write time
cost to process
one page in memory
Symbol r
R b
B
c
Tr ,Tw
-
E.g. Value
106
128 bytes
105
8192 bytes 60
10 msec
≅ 0
COMP9315 21T1 ♢ File Management ♢ [19/19]
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
20/21
2021/4/28 File Management
Produced: 21 Feb 2021
cgi.cse.unsw.edu.au/~cs9315/21T1/lectures/files/slides.html
21/21