程序代写代做代考 database SQL scheme IOS PowerPoint Presentation

PowerPoint Presentation

File Organizations
R & G – Chapter 9

Architecture of a DBMS
Completed
And We’ll Visit
You are Here

Database Management
System
Database
Query Parsing
& Optimization
Relational Operators
Files and Index Management
Buffer Management
Disk Space Management
SQL Client

Recall: Heap Files
Unordered collection of records
Recall API for higher layers of the DBMS.
Today we’ll ask: “How? At what cost?”
Insert/delete/modify record
Fetch a particular record by record id …
Record id is a pointer encoding pair of (pageID, location on page)
Scan all records
Possibly with some conditions on the records to be retrieved

Recall: Multiple File Organizations
Many alternatives exist, each good in some situations and less so in others.
This is a theme in DB systems work!
Heap Files: Suitable when typical access is a full scan of all records
Sorted Files: Best for retrieval in order, or when a range of records is needed
Clustered Files & Indexes: Group data into blocks to enable fast lookup and efficient modifications.
More on this soon …

Bigger Questions
What is the “best” file organization?
Depends on access patterns …
How? What are common access patterns anyway?
Can we be quantitative about tradeoffs?
If one is better … by how much?

Goals
Big picture overheads for data access
We’ll (overly) simplify performance models to provide insight, not to get perfect performance
Still, a bit of discipline:
Clearly identify assumptions up front
Then estimate cost in a principled way
Foundation for query optimization
Can’t choose the fastest scheme without an estimate of speed!

Cost Model and Analysis

Cost Model for Analysis
B: The number of data blocks in the file
R: Number of records per block
D: (Average) time to read/write disk block
Focus: Average case analysis for uniform random workloads
For now, we will ignore
Sequential vs Random I/O
Pre-fetching
Any in-memory costs
Good enough to show the overall trends

Block 1
Block 2
…
Block B

Record 1
Record 2
…
Record R

More Assumptions
Single record insert and delete
Equality selection – exactly one match
For Heap Files:
Insert always appends to end of file.
For Sorted Files:
Packed: Files compacted after deletions.
Sorted according to search key

Extra Challenge
After understanding these slides …
You should question all these assumptions and rework
Good exercise to study for tests, and generate ideas

Heap Files & Sorted Files
B: The number of data blocks = 5
R: Number of records per block = 2
D: (Average) time to read/write disk block = 5ms

2, 5
1, 6
4, 7
3, 10
8, 9
Heap File

1, 2
3, 4
5, 6
7, 8
9, 10
For illustration, records are just integers
Sorted File

Cost of Operations: Scan?
B: The number of data blocks = 5
R: Number of records per block = 2
D: (Average) time to read/write disk block = 5ms
Heap File Sorted File
Scan all records
Equality Search
Range Search
Insert
Delete

Scan All Records
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Pages touched: ?
Time to read the record: ?

2, 5
1, 6
4, 7
3, 10
8, 9
Heap File

1, 2
3, 4
5, 6
7, 8
9, 10
Sorted File

Cost of Operations: Scan Cost
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk bloc
Heap File Sorted File
Scan all records B*D B*D
Equality Search
Range Search
Insert
Delete

Cost of Operations: Equality Search?
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search
Range Search
Insert
Delete

Find Key 8: Heap File
P(i): Probability that key is on page i is 1/B
T(i): Number of pages touched if key on page i is i
Therefore the expected number of pages touched
Pages touched on average?
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9

Find Key 8: Sorted File
Worst-case: Pages touched in binary search
log2B
Average-case: Pages touched in binary search
log2B?
Sorted File

1, 2
3, 4
5, 6
7, 8
9, 10

Average Case Binary Search
Expected Number of Reads: 1 (1 / B) + 2 ( 2 / B) + 3 (4 / B) + 4 (8 / B)

Average Case Binary Search cont
Expected Number of Reads: 1 (1 / B) + 2 ( 2 / B) + 3 (4 / B) + 4 (8 / B)
1 IO
2 IOs
3 IOs
4 IOs

Cost of Operations: Equation Search Cost
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search
Insert
Delete

Cost of Operations: Range Search?
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search
Insert
Delete

Find Keys Between 7 and 9: Heap File
Always touch all blocks. Why?
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9

Find Keys Between 7 and 9: Comparison
Find beginning of range
Search for start of range
Scan right
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9

Sorted File

1, 2
3, 4
5, 6
7, 8
9, 10

Cost of Operations: Range Search Cost
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert
Delete

Cost of Operations: Insert?
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert
Delete

Insert 4.5: Heap File
Stick at end of file
Cost = 2*D
Why 2?
Heap File
2, 5
1, 6
4, 7
3, 10
8, 9
4.5,_

Insert 4.5: Heap VS Sorted File
Read last page, append, write. Cost = 2*D
Find location for record. Cost = log2BD
Heap File
2, 5
1, 6
4, 7
3, 10
8, 9
4.5,
Sorted File

1, 2
3, 4
5, 6
7, 8
9, _

Insert 4.5: Heap Vs Sorted Pt 2
Read last page, append, write. Cost = 2*D
Find location for record. Cost = log2BD
Insert and shift rest of file

1, 2
3, 4
5, 6
7, 8
9, 10
2, 5
1, 6
4, 7
3, 10
8, 9
4.5,
10, _

Heap File
Sorted File
4.5, 5
6, 7
8, 9

Cost of Operations: Insert Cost
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert 2*D ((log2B)+B)*D
Delete

Cost of Operations: Delete?
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert 2*D ((log2B)+B)*D
Delete

Delete 4.5: Heap File
Average case to find the record: B/2 reads
Delete record from page
Cost = (B/2 + 1) * D
Why + 1?
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9
4.5,

Delete 4.5: Heap File Vs Sorted File
Average case runtime: (B/2+1) * D
Find location for record: log2B
Delete record in page  Gap
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9
Sorted File

1, 2
3, 4
4.5, 5
6, 7
8, 9
10, _
_

Delete 4.5: Heap File Vs Sorted File Pt 2
Average case runtime: (B/2+1) * D
Find location for record: log2B
Shift the rest by 1 record 2 * (B/2)
Heap File

2, 5
1, 6
4, 7
3, 10
8, 9
Sorted File

1, 2
3, 4
__, 5
6, 7
8, 9
10, _

7, 8
9, 10

Cost of Operations Complete
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert 2*D ((log2B)+B)*D
Delete (0.5*B+1)*D ((log2B)+B)*D

Cost of Operations Complete Pt 2
B: The number of data blocks
R: Number of records per block
D: Average time to read/write disk block
Can we do better?
Indexes!

Heap File Sorted File
Scan all records B*D B*D
Equality Search 0.5*B*D (log2B)*D
Range Search B*D ((log2B)+pages)*D
Insert 2*D ((log2B)+B)*D
Delete (0.5*B+1)*D ((log2B)+B)*D

i=1

T(i)P(i) =
BX

i=1

i
1

B
=

B(B + 1)

2B
⇡

log2 BX

i=1

i
2

i�1

B
=

log2 BX

i=1

i2i�1 = log
2

B �
B � 1
B

/docProps/thumbnail.jpeg

Related Posts