程序代写代做代考 concurrency database SQL compiler data structure INFO20003 Database Systems

INFO20003 Database Systems

INFO20003 Database Systems 1© University of Melbourne 2018

INFO20003 Database Systems

Lecture 10

Storage and Indexing

Semester 2 2018, Week 5

Dr Renata Borovica-Gajic

INFO20003 Database Systems 3© University of Melbourne 2018

What this subject is all about.

Remember this?

Database System

Store

Access

Process

SQL

Queries

Results

select val from sales

where id = max;

Organisational Description

and Problem Area
• An investment bank wants to have a database to provide it with the ability to store information about its trading operations. The bank essentially works with

customers by providing the capability for trading stocks, shares and other commodities. The bank has three branches in which exist a number of departments.

Departments have a department manager who supervises a number of staff within the department. A set of accounts are used to store information about the

currency of the organisations operations. Accounts can be customer accounts or internal “house” accounts, each of which allow trades to be made upon them.

There are a number of account types. There are many customers and customers may have one or more contacts. Customers have a facility for lending

money to pay for their purchases of stocks and commodities. Staff make deals on the behalf of their customers using a funding source and keeping track of

settlements on the deals being made. There are many types of deal to be made. Settlements are full or partial payments of the deals and are recorded

whenever a payment is made.

• Please note that this section is purely made up and by all means is a very short description of a real investment bank (although many details have been left out

and wide ranging assumptions have been made.

MODELLING

SQL ARCHITECTURE / INTERNAL WORKINGS

INFO20003 Database Systems 4© University of Melbourne 2018

Components of a DBMS

DBMS

Index files

Heap files

Database

Concurrency

control module

File and access

methods mgr.

Buffer pool mgr.

Disk space mgr.

Storage module Concurrency

control module

Transaction

mgr.

Lock mgr.

Crash recovery

module

Log mgr.

Query processing module

Parser/

Compiler
Optimizer Executor

This is one of several possible architectures;

each system has its own slight variations.

TODAY

INFO20003 Database Systems 5© University of Melbourne 2018

Coverage

• File organization (Heap & sorted files)

• Index files & indexes

• Index classification

Readings: Chapter 8, Ramakrishnan & Gehrke, Database Systems

INFO20003 Database Systems 6© University of Melbourne 2018

• FILE: A collection of pages, each containing a collection of

records.

• DBMS must support:

–insert/delete/modify record

–read a particular record (specified using record id)

–scan all records (possibly with some conditions on the

records to be retrieved)

Files (in a DBMS)

Page Page

Record
col1, col2, col3
col1, col2, col3 FILE

INFO20003 Database Systems 7© University of Melbourne 2018

Alternative File Organizations

• Many alternatives exist, each good for some situations, and

not so good in others:

1. Heap files: no particular order among records

–Suitable when typical access is a file scan retrieving all records

2. Sorted Files: pages and records within pages are ordered by some

condition

–Best for retrieval (of a range of records) in some order

3. Index File Organizations:

–Special data structure that has the fastest retrieval in some order

–Will cover shortly..

INFO20003 Database Systems 8© University of Melbourne 2018

1. Heap (Unordered) Files

• Simplest file structure, contains records in no

particular order

• As file grows and shrinks, disk pages are allocated

and de-allocated

–Fastest for inserts compared to other alternatives

12,20 12,10

11,80

13,75

16,20 14,10

14,80

18,75

22,20

21,10

20,80

36,75

…

Used to denote that pages are linked

Sorted files

• Similar structure like heap files (pages and records), but pages

and records are ordered

• Fast for range queries, but hard for maintenance (each insert

potentially reshuffles records)

• Example: A sorted file ordered by age

12,20

12,10

11,80

13,75

16,20

14,10

14,80

18,75

22,20

21,10

20,80

36,75

…

How does a DBMS decide which option is better?

• DBMS model the cost of all operations

• The cost is typically expressed in the number of page accesses (or disk

I/O operations)

–1 page access (on disk) == 1 I/O (used interchangeably)

• Example: If we have a table of 100 records, and each page can store 10

records, what would be the cost of accessing the entire file

• Answer: For 100 records we have 10 pages in total (100/10), thus the cost

to access the entire file is 10 I/O (or 10 pages)

• Heap file (no order) = B;

• Sorted file (exploit order) = log2 B

• Example: Find all records with ages between 20 and 30, for the file that

has B pages. Consider both alternative: having an unsorted and sorted

file. What would be the cheapest cost?

• 20 < age <30, num pages = B Which alternative is better? 12,20 12,10 11,80 13,75 16,20 14,10 14,80 18,75 22,20 21,10 20,80 36,7552,20 41,10 36,80 80,75 12,20 12,10 11,80 13,75 16,20 14,10 14,80 18,75 22,20 21,10 20,80 36,75 52,20 41,10 36,80 80,75 Heap file Sorted file INFO20003 Database Systems 13© University of Melbourne 2018 File Organization & Indexing • File organization (Heap & sorted files) • Index files & indexes • Index classification INFO20003 Database Systems 14© University of Melbourne 2018 Indexes • Sometimes, we want to retrieve records by specifying the values in one or more fields, e.g., –Find all students in the “CIS” department –Find all students with a gpa > 3

• An index is a data structure built on top of data pages used for
efficient search. The index is built over specific fields called
search key fields. E.g. we can build an index on GPA, or
department name.

–The index speeds up selections on the search key fields

–Any subset of the fields of a relation can be the search key for an index
on the relation

–Note: Search key is not the same as key (e.g., doesn’t have to be
unique)

Related Posts