程序代写代做代考 flex database data structure  Record and Page Format  Files

 Record and Page Format  Files
John Edgar 2

4.1

 A database is made up of tables
▪ A table is generally represented by a single
file
 Tables are collections of records
▪ A record describes an entity or relationship ▪ And is defined by its attribute values
 Attributes also referred to as fields
▪ Each attribute is associated with a type
▪ Alsoknownasadomain
▪ Information about fields is stored in the
system catalog metadata SQL Server types John Edgar
database
id
name
dob
rate
1
bob
12/3/1974
21.35
2
kate
21/10/2001
17.34
3
sue
11/11/1999
19.22
name
type
size

id
int
4
name
varchar
255
dob
datetime
8
rate
real
8
4

page 123
slot 1
record (bob)
slot 2
record (kate)
slot 3
record (sue)
free space
header
header information
 Records are stored in pages
▪ Fixed-size abstract units of storage
▪ Generally map to a block on a disk
▪ Or a frame in main memory
 Page contents
▪ Page header – administrative data
▪ Slots – each slot contains a record  Record IDs
▪ Records are uniquely identified by record IDs (RID) ▪ Record IDs are page number : slot number pairs
RID: 123,2
John Edgar 5

 Records are either fixed or variable length
▪ Fixed length records have some important advantages
▪ Easier to rearrange records
▪ Less overhead for header data
▪ More efficient search
 Fixed length records
▪ Both the number of fields and the field length is fixed
▪ Given the address of a record a field can be found
▪ By referring to the field size in the system catalog
▪ This format is sometimes referred to as relative location
e.g. find field 3 = base address + size(field1) + size (field2)
header
field 1
field 2
field 3

field n
John Edgar
6

 In the relational model each record contains the same number of fields
▪ However fields may be of variable length ▪ e.g. VARCHAR
▪ Two tables could be stored in the same file – a mixed file
▪ Allows fast access to the results of a join between two related tables
 Alternative 1 – separate attributes with delimiters ▪ Delimiters identify the end of a field
delimiters requires a sequential scan to find a field
header
field 1
field 2
field 3
field 4
John Edgar
7

 There are other alternatives for recording variable length records
 Store additional data in the record header ▪ An array containing the length of each field
▪ Or just the lengths of the variable length fields, then store all fixed length fields first
▪ Allows fields to be found by an offset calculation
 If a record contains a variable number of fields other
organizations may be used ▪ Pointers to a list of fields
Variable length field sizes may grow / shrink Gets smaller – wastes space
▪ e.g. a mixed file
▪ Preceding the attribute by the attribute identifier
Gets larger – may not fit in its slot, or in its page, or may even exceed the page size
John Edgar 8

 Some data types may not fit on a single page ▪ Large object data
▪ Text, images, video, sound, …
 LOB data is stored as either binary or character data
▪ BLOB – unstructured binary data
▪ CLOB, NCLOB – character data
▪ BFILE – unstructured binary data in OS files
 LOB data types store and manipulate large blocks of unstructured
data
▪ The maximum size of a LOB is large
▪ At least 8 terabytes in Oracle 10g
▪ LOB data must be processed by application programs
John Edgar 9

 LOB’s have to be stored on a sequence of pages ▪ Ideally the pages should be contiguous for efficient
retrieval, or
▪ Store the LOB on a linked list of blocks
▪ Where each block contains a pointer to the next block
 If fast retrieval of LOBs is required they can be striped across multiple disks for parallel access
 It may be necessary to provide an index to a LOB ▪ For example indexing by seconds for a movie to allow
a client to request small portions of the movie
John Edgar 10

 Packed
▪ Offset calculation to find record ▪ Move records up on deletion
 Simple  But
▪ Time consuming to move records ▪ External RIDs no longer valid if slot
number changes on deletion
 Bit array
▪ Page is divided into slots
▪ Header includes a bit array which shows which slots contain records
 Advantages
▪ No changes to external references ▪ No rearranging records
 But slightly more administration
slot 1
record
slot 2
empty
slot 3
record

slot n
empty
header
1
0
1

0
N
slot 1
record
slot 2
record
slot 3
record
free space
header
N
John Edgar
11

 Maintain a slot directory
▪ Each entry is a pair ▪ Recordoffset
▪ Recordlength
 Records can be moved without affecting RID
▪ By modifying the slot entry  Flexible
▪ But more complex than fixed record organization
N21 slot directory
length = 24 length = 16
length = 20
free space
16

24
20
N
John Edgar
12

4.2

 A DB increases and decreases in size
▪ The disk space manager for the DBMS maps pages to blocks
▪ Over time gaps in sequences of allocated blocks may appear  Free blocks need to be recorded so that they can be
allocated in the future, using either
▪ A linked list with the head pointing to the first free block
▪ A bitmap where each bit corresponds to a single block ▪ Allows for fast identification of
contiguous areas of free space
John Edgar 14

 Pages containing related records are organized into files ▪ One file usually corresponds to one table
 Files are expected to span several pages
▪ So must allow access to all the pages in the file
 There are different file organizations
▪ With their own strengths and weaknesses
 All file organizations support
▪ Opening and closing the file
▪ Searching for and reading records
▪ Inserting, deleting and modifying records ▪ Scanning the entire file
John Edgar 15

 Heap files are not ordered in any way
▪ They guarantee that all of the records in a file can be retrieved by
repeatedly requesting the next record
 Inserting new records into heap files is very efficient
▪ Searching for records is inefficient ▪ As it requires a linear search of the file
▪ Deletion and modification of records first requires they be found  To support file operations it is necessary to
▪ Keep track of the pages in the file
▪ Keep track of which of those pages contain free space
Fast access to records in a heap file requires additional data structures Indices
John Edgar 16

 Linked List
▪ Two doubly linked lists of pages
▪ Pages with free space
▪ Pages that are full
▪ Records page ID of list head  Finding a page with enough
space may be time-consuming
▪ Variable-length records may make the list long
full page
full page
 Directory
▪ File pages recorded in a directory ▪ Entries kept in page order
 Directory entries record
▪ Whether or not the page is full
▪ The amount of free space
▪ Pages need not be visited to see if they
contain enough free space
page 2
header
entry for page 1
page with free space
(small) linked list of directory pages
page n
page with free space
John Edgar
17

 Heap files do not provide efficient access to records based on (any) search criteria
find 81
 Other file organizations support more efficient access to records
94
97
34
68
65
80
68
94
83
70
56
81
75
79
53
▪ Ordered files
▪ Ordered on some record field
▪ Hash files
▪ Files of mixed records
Discussed in the section on indices
▪ That contain records of more than one table ▪ Files organized as B or B+ trees
John Edgar 18