CS计算机代考程序代写 File Organizations and Indexes

File Organizations and Indexes

1

COST MODEL
D: average time to read or write a disk page.
C : average time to process a record.
H : the time required to apply a hash function
to a record.

3 File Organizations:
Heap Files.
Sorted Files.
Hashed Files.

2

Operations to be investigated
Scan: fetch all records in a file.

Search with equality selection. (SWES) (“Find the students record
with sid = 23”)

Search with Range Selection. (SWRS)
(“Find all students with name alphabetically after ‘Smith’”)

Insert: Insert a given record into the file.

Delete: Delete a record with given rid.

Below, we examine the costs of these operations with respect to
the 3 different file organizations.

3

Heap Files
Scan:
B(D + RC ) where
• B is the number of pages, and
• R is the average number of records in a

page (block).

SWES:
• 0.5B(D + RC ) on average if the selection is

specified on a key.
• Otherwise B(D + RC ).

4

Heap Files
SWRS: B(D + RC ).
Insert: 2D + C . (Always insert to the end of the file)

Delete:
• Only one record is involved.

q The average cost is 0.5B(D + RC ) + D if rid is not
given;

q otherwise (D + C ) + D.
• Several records are involved. Expensive.

5

Sorted Files
Sorted on a search key – a combination of one or more
fields.

If the following query is made against the search key,
then:
1. Scan: B(D + RC ).
2. SWES:

• O(D log2 B + C log2 R) if single record.
• O(D log2 B + C log2 R + #matches).

3. SWRS: O (D log2 B + C log2 R + #matches).
4. Insert: expensive.

• Search cost plus 2 ∗ (0.5B(D + RC )).
5. Delete: expensive.

• Search cost plus 2 ∗ (0.5B(D + RC )).
6

Hashed Files
§ The pages in a file are grouped into buckets.
§ The buckets are defined by a hash function.
§ Pages are kept at about 80% occupancy.

Assume the data manipulation is based on the hash
key.
§ Scan: 1.25B(D + RC ).
§ SWES: H + D + 0.5RC if each hash bucket contains

only one page.
§ SWRS: 1.25B(D + RC ). (No help from the hash

structure)
§ Insert: Search cost plus C + D if one block involved.
§ Delete: Search cost plus C + D if one block involved.

7

Summary

File
Type Scan

Equality
Search

Range
Search

Insert Delete

8

Heap BD 0.5 BD BD Search + D
Search
+ D

Sorted B D D log B
D log B

+
# matches

Search
+ BD

Search
+ BD

Hashed
1.25 BD D 1.25 BD

2 D
Search
+ BD

A Comparison of I/O Costs

Indexes

Basic idea behind index is as for books.

9

INDEXES
aardvark 25,36 lion . . . . . . . 18
bat . . . . . . . 12 llama 17,21,22
cat . . . . 1,5,12 sloth . . . . . . 18
dog . . . . . . . . 3 tiger . . . . . . 18
elephant . . 17 wombat . . . 27
emu . . . . . . 28 zebra . . . . . 19

§ A table of key values, where each entry gives places where key
is used.

§ Aim: efficient access to records via key values.

Indexing Structure

10

k1
k2
k3
k4
k5
k6
k7
k8

Index

Data File


Indexing Structure
Index is collection of data entries k∗.

Each data entry k∗ contains enough information
to retrieve (one or more) records with search
key value k.

Indexing:
§ How are data entries organized in order to

support efficient retrieval of data entries with a
given search key value?

§ Exactly what is stored as a data entry?

11

Alternatives for Data Entries in an Index

§ A data entry k∗ is an actual data record (with
search key value k).

§ A data entry is (k, rid) pair (rid is the record id of
a data record with search key value k).

§ A data entry is a (k, rid − list) pair (rid − list is the
list of record ids of data records with search key
value k).

Example: (Xuemin Lin, page 12), (Xuemin Lin, page 100)

VS
(Xuemin Lin, page 12, page 100)

12

Clustered Index
§ Clustered: a file is organized of data records is the same as

or close to the ordering of data entries in some index.
§ Typically, the search key of file is the same as the search key

of index.

13

Index

Data File

Index Entries

Data Entries

Unclustered Index

14

……
Index

Data File

Index Entries

Data Entries

§ Clustered indexes are relatively expensive to maintain.
§ A data file can be clustered on at most one search key.

Dense VS Sparse Indexes

§ Dense: it contains (at least) one data entry
for every search key value.

§ Sparse: otherwise.

Q: Can we build a sparse index that is not
clustered?

15

16

Ashby

Cass

Smith

22

25

30

33

Ashby, 25, 6003

Basu, 33, 4003

Bristow, 30, 2007

Cass, 50, 5004

Daniels, 22, 6003

Dense, 44, 6003

Smith, 44, 3000

Tracy, 44, 5004

40

44

44

50

Sparse Index VS Dense Index

Primary and Secondary Indexes

§ Primary: Indexing fields include primary key.
§ Secondary: otherwise.

There may be at most one primary index for a
file.

Composite search keys: search key contains
several fields.

17