Storing Data: Disks and Files
1
11.1 Memory Hierarchy
• Primary Storage: main memory.
fast access, expensive.
• Secondary storage: hard disk.
slower access, less expensive.
• Tertiary storage: tapes, cd, etc.
slowest access, cheapest.
2
11.2 Disks
Characteristics of disks:
• collection of platters
• each platter = set of tracks
• each track = sequence of sectors (blocks)
• transfer unit: 1 block (e.g. 512B, 1KB)
• access time depends on proximity of heads to required block
access
• access via block address (p, t, s)
3
11.2 Disks
• Data must be in memory for the DBMS to operate on it.
• If a single record in a block is needed, the entire block is transferred.
4
11.2 Disks
Access time includes:
• seek time (find the right track, e.g. 10msec)
• rotational delay (find the right sector, e.g. 5msec)
• transfer time (read/write block, e.g. 10μsec)
èRandom access is dominated by seek time and
rotational delay
5
11.3 Disk Space Management
Disk space is managed by the disk space manager.
1. Improving Disk Access:
Use knowledge of data access patterns.
E.g. two records often accessed together
⇒ put them in the same block (clustering)
E.g. records scanned sequentially
⇒ place them in consecutive sectors on same track
6
11.3 Disk Space Management
2. Keeping Track of Free Blocks
– Maintain a list of free blocks.
– Use bitmap.
3. Using OS File System to Manage Disk Space
– extend OS facilities, but
– not rely on the OS file system.
(portability and scalability)
7
11.4 Buffer Management
• Buffer Manager
• Manages traffic between disk and memory by
maintaining a buffer pool in main memory.
• Buffer pool = collection of page slots (frames) which
can be filled with copies of disk block data.
8
11.4.1 Buffer Pool
Page requests from
DBMS upper levels
Buffer pool
Rel R
Block 0
Free
Rel R
Block 1
Free
Rel S
Block 6
Free
Rel S
Block 2
Free
Rel R
Block 5
Free
Free
Rel S
Block 4
Rel R
Block 9
Free Free
DB on disk
9
11.4.1 Buffer Pool
• The request_block operation replaces read block in all file access algorithms.
• If block is already in buffer pool:
– no need to read it again
– use the copy there (unless write-locked)
• If block is not already in buffer pool:
– need to read from hard disk into a free frame
– if no free frames, need to remove block using a buffer replacement policy.
• The release_block function indicates that block is no longer in use ⇒good
candidate for removal.
10
11.4.1 Buffer Pool
For each frame, we need to know:
• whether it is currently in use
• whether it has been modified since loading (dirty bit)
• how many transactions are currently using it (pin count)
• (maybe) time-stamp for most recent access
11
11.4.1 Buffer Pool
The request_block Operation
Method:
1. Check buffer pool to see if it already contains requested block.
If not, the block is brought in as follows:
(a) Choose a frame for replacement, using replacement policy
(b) If frame chosen is dirty, write block to disk
(c) Read requested page into now-vacant buffer frame (and set dirty = False and pinCount = 0)
2. Pin the frame containing requested block.
(This simply means updating the pin count.)
3. Return address of frame containing requested block.
12
11.4.1 Buffer Pool
The release_block Operation
Method:
1. Decrement pin count for specified page.
No real effect until replacement required.
The write_block Operation
Method:
1. Updates contents of page in pool
2. Set dirty bit on
Note: Doesn’t actually write to disk.
The force_block operation “commits” by writing to disk.
13
11.4.2 Buffer Replacement Policies
Several schemes are commonly in use:
• Least Recently Used (LRU)
– release the frame that has not been used for the longest period.
– intuitively appealing idea but can perform badly
• First in First Out (FIFO)
– need to maintain a queue of frames
– enter tail of queue when read in
• Most Recently Used (MRU): release the frame used most recently
• Random
No is guaranteed better than the other.
For DBMS, we may predict accesses better.
14
Example1:
Data pages: P1, P2, P3, P4
Q1: read P1; Q2: read P2;
Q3: read P3; Q4: read P1;
Q5: read P2; Q6: read P4;
Buffer:
15
P1 P2 P3Q4 Q5
Regarding Q6,
§ LRU: Replace P3
§ MRU: Replace P2
§ FIFO: Replace P1
§ Random: randomly
choose one buffer
to replace
Example 2:
Data pages: P1, P2, …, P11
10 buffer pages as in Example 1
Q1: read P1, P2,…, P11;
Q2, read P1, P2,…, P11;
Q3: Read P1, P2,…,P11
LRU/FIFO: I/O P1, P2, …, P11 for
each query.
MRU performs the best.
11.5 Record Formats
Records are stored within fixed-length blocks.
• Fixed-length: each field has a fixed length as well as the number of fields.
– Easy for intra-block space management.
– Possible waste of space.
• Variable-length: some field is of variable length.
– complicates intra-block space management
– does not waste (as much) space.
Record format info:
• best stored in data dictionary
• with dictionary memory-resident
16
11.5.1 Fixed-Length
Encoding scheme for fixed-length records:
• length + offsets stored in header
0 4 24 34 38Offsets
Record length
33357462 Neil Young Musician 0277Record
17
11.5.2 Variable-Length
Encoding schemes for variable-length records:
• Prefix each field by length
xxxx Neil Young Musician xxxx
• Terminate fields by delimiter
33357462/Neil Young/Musician/0277/
• Array of offsets
4 10 8 4
33357462 Neil Young Musician 0277
18
11.6 Block (Page) Formats
A block is a collection of slots.
Each slot contains a record.
A record is identified by rid =< page id, slot number >.
19
11.6.1 Fixed Length Records
For fixed-length records, use record slots:
Insertion: occupy first free slot; packed more efficient.
Deletion: (a) need to compact, (b) mark with 0; unpacked more efficient.
Packed
Slot 1
Slot 2
Slot 3
. . .
Slot N
Free
N
Unpacked, Bitmap
Slot 1
Slot 2 Free
Slot 3
Slot 4 Free
Slot 5
Free
Slot M
1 1 0 1 0 1 M
M 5 4 3 2 1
20
11.6.2 Variable-Length Records
For variable-length records, use slot directory.
Possibilities for handling free-space within block:
• compacted (one region of free space)
• fragmented (distributed free space)
In practice, probably use a combination:
• normally fragmented (cheap to maintain)
• compact when needed (e.g. record won’t fit)
21
11.6.2 Variable-Length Records
• Compacted free space:
• Note: “pointers” are implemented as offsets within block; allows block to
be loaded anywhere in memory.
Free Space
N
N 4 3 2 1
recN
rec1
rec2
Slot Directory
22
11.6.2 Variable-Length Records
• Fragmented free space:
free3
N
N 4 3 2 1
recN
rec2
rec1
free1
free2
Slot Directory
23
11.6.2 Variable-Length Records
Overflows
Some file structures (e.g. hashing) allocate records to specific blocks.
What happens if specified block is already full?
Need a place to store “excess” records.
Introduce notion of overflow blocks:
• located outside main file (don’t destroy block sequence of main file)
• connected to original block
• may have “chain” of overflow blocks
New blocks are always appended to file.
24
11.6.2 Variable-Length Records
• Overflow blocks in a separate file:
• Note: “pointers” are implemented as
file offsets.
Data Overflow
0
1
2
3
File #2
4
5
File #1
25
11.6.2 Variable-Length Records
• Overflow blocks in a single file:
• Not suitable if accessing blocks via
offset (e.g. hashing).
Data + overflows
0
1
2
Overflow
3
4
File #1 26
11.7 Files
A file consists of several data blocks.
Heap Files: unordered pages (blocks).
Two alternatives to maintain the block information:
• Linked list of pages.
• Directory of pages.
27
11.7.1 Linked List of Pages
• Maintain a heap file as a doubly linked list of pages.
• Disadvantage: all pages will virtually be on the free list of records if
records are of variable length. To insert a record, several pages may be
retrieved and examined.
Data
Page
Data
Page
Data
Page
Header
Page
Data
Page
Data
Page
Data
Page
pages with
free space
full pages
Organized by a Linked List
28
11.7.2 Directory of Pages
Maintain a directory of pages.
• Each directory entry identifies a page (or a sequence of pages) in the heap file.
• Each entry also maintains a bit to indicate if the corresponding page has any free
space.
Data
Page 1
Data
Page 2
Data
Page N
Organized with a Directory 29