Record and Page Format Files
John Edgar 2
4.1
A database is made up of tables
▪ A table is generally represented by a single
file
Tables are collections of records
▪ A record describes an entity or relationship ▪ And is defined by its attribute values
Attributes also referred to as fields
▪ Each attribute is associated with a type
▪ Alsoknownasadomain
▪ Information about fields is stored in the
system catalog metadata SQL Server types John Edgar
database
id
name
dob
rate
1
bob
12/3/1974
21.35
2
kate
21/10/2001
17.34
3
sue
11/11/1999
19.22
name
type
size
…
id
int
4
name
varchar
255
dob
datetime
8
rate
real
8
4
page 123
slot 1
record (bob)
slot 2
record (kate)
slot 3
record (sue)
free space
header
header information
Records are stored in pages
▪ Fixed-size abstract units of storage
▪ Generally map to a block on a disk
▪ Or a frame in main memory
Page contents
▪ Page header – administrative data
▪ Slots – each slot contains a record Record IDs
▪ Records are uniquely identified by record IDs (RID) ▪ Record IDs are page number : slot number pairs
RID: 123,2
John Edgar 5
Records are either fixed or variable length
▪ Fixed length records have some important advantages
▪ Easier to rearrange records
▪ Less overhead for header data
▪ More efficient search
Fixed length records
▪ Both the number of fields and the field length is fixed
▪ Given the address of a record a field can be found
▪ By referring to the field size in the system catalog
▪ This format is sometimes referred to as relative location
e.g. find field 3 = base address + size(field1) + size (field2)
header
field 1
field 2
field 3
…
field n
John Edgar
6
In the relational model each record contains the same number of fields
▪ However fields may be of variable length ▪ e.g. VARCHAR
▪ Two tables could be stored in the same file – a mixed file
▪ Allows fast access to the results of a join between two related tables
Alternative 1 – separate attributes with delimiters ▪ Delimiters identify the end of a field
delimiters requires a sequential scan to find a field
header
field 1
field 2
field 3
field 4
John Edgar
7
There are other alternatives for recording variable length records
Store additional data in the record header ▪ An array containing the length of each field
▪ Or just the lengths of the variable length fields, then store all fixed length fields first
▪ Allows fields to be found by an offset calculation
If a record contains a variable number of fields other
organizations may be used ▪ Pointers to a list of fields
Variable length field sizes may grow / shrink Gets smaller – wastes space
▪ e.g. a mixed file
▪ Preceding the attribute by the attribute identifier
Gets larger – may not fit in its slot, or in its page, or may even exceed the page size
John Edgar 8
Some data types may not fit on a single page ▪ Large object data
▪ Text, images, video, sound, …
LOB data is stored as either binary or character data
▪ BLOB – unstructured binary data
▪ CLOB, NCLOB – character data
▪ BFILE – unstructured binary data in OS files
LOB data types store and manipulate large blocks of unstructured
data
▪ The maximum size of a LOB is large
▪ At least 8 terabytes in Oracle 10g
▪ LOB data must be processed by application programs
John Edgar 9
LOB’s have to be stored on a sequence of pages ▪ Ideally the pages should be contiguous for efficient
retrieval, or
▪ Store the LOB on a linked list of blocks
▪ Where each block contains a pointer to the next block
If fast retrieval of LOBs is required they can be striped across multiple disks for parallel access
It may be necessary to provide an index to a LOB ▪ For example indexing by seconds for a movie to allow
a client to request small portions of the movie
John Edgar 10
Packed
▪ Offset calculation to find record ▪ Move records up on deletion
Simple But
▪ Time consuming to move records ▪ External RIDs no longer valid if slot
number changes on deletion
Bit array
▪ Page is divided into slots
▪ Header includes a bit array which shows which slots contain records
Advantages
▪ No changes to external references ▪ No rearranging records
But slightly more administration
slot 1
record
slot 2
empty
slot 3
record
…
slot n
empty
header
1
0
1
…
0
N
slot 1
record
slot 2
record
slot 3
record
free space
header
N
John Edgar
11
Maintain a slot directory
▪ Each entry is a pair ▪ Recordoffset
▪ Recordlength
Records can be moved without affecting RID
▪ By modifying the slot entry Flexible
▪ But more complex than fixed record organization
N21 slot directory
length = 24 length = 16
length = 20
free space
16
…
24
20
N
John Edgar
12
4.2
A DB increases and decreases in size
▪ The disk space manager for the DBMS maps pages to blocks
▪ Over time gaps in sequences of allocated blocks may appear Free blocks need to be recorded so that they can be
allocated in the future, using either
▪ A linked list with the head pointing to the first free block
▪ A bitmap where each bit corresponds to a single block ▪ Allows for fast identification of
contiguous areas of free space
John Edgar 14
Pages containing related records are organized into files ▪ One file usually corresponds to one table
Files are expected to span several pages
▪ So must allow access to all the pages in the file
There are different file organizations
▪ With their own strengths and weaknesses
All file organizations support
▪ Opening and closing the file
▪ Searching for and reading records
▪ Inserting, deleting and modifying records ▪ Scanning the entire file
John Edgar 15
Heap files are not ordered in any way
▪ They guarantee that all of the records in a file can be retrieved by
repeatedly requesting the next record
Inserting new records into heap files is very efficient
▪ Searching for records is inefficient ▪ As it requires a linear search of the file
▪ Deletion and modification of records first requires they be found To support file operations it is necessary to
▪ Keep track of the pages in the file
▪ Keep track of which of those pages contain free space
Fast access to records in a heap file requires additional data structures Indices
John Edgar 16
Linked List
▪ Two doubly linked lists of pages
▪ Pages with free space
▪ Pages that are full
▪ Records page ID of list head Finding a page with enough
space may be time-consuming
▪ Variable-length records may make the list long
full page
full page
Directory
▪ File pages recorded in a directory ▪ Entries kept in page order
Directory entries record
▪ Whether or not the page is full
▪ The amount of free space
▪ Pages need not be visited to see if they
contain enough free space
page 2
header
entry for page 1
page with free space
(small) linked list of directory pages
page n
page with free space
John Edgar
17
Heap files do not provide efficient access to records based on (any) search criteria
find 81
Other file organizations support more efficient access to records
94
97
34
68
65
80
68
94
83
70
56
81
75
79
53
▪ Ordered files
▪ Ordered on some record field
▪ Hash files
▪ Files of mixed records
Discussed in the section on indices
▪ That contain records of more than one table ▪ Files organized as B or B+ trees
John Edgar 18