程序代写代做代考 go concurrency database data structure compiler INFO20003 Database Systems

INFO20003 Database Systems
Dr Renata Borovica-Gajic
Lecture 10 Storage and Indexing
Week 5
1
INFO20003 Database Systems © University of Melbourne



SQL
An investment bank wants to have a database to provide it with the ability to store information about its trading operations. The bank essentially works with customers by providing the capability for trading stocks, shares and other commodities. The bank has three branches in which exist a number of departments. Departments have a department manager who supervises a number of staff within the department. A set of accounts are used to store information about the currency of the organisations operations. Accounts can be customer accounts or internal “house” accounts, each of which allow trades to be made upon them. There are a number of account types. There are many customers and customers may have one or more contacts. Customers have a facility for lending money to pay for their purchases of stocks and commodities. Staff make deals on the behalf of their customers using a funding source and keeping track of settlements on the deals being made. There are many types of deal to be made. Settlements are full or partial payments of the deals and are recorded whenever a payment is made.
Please note that this section is purely made up and by all means is a very short description of a real investment bank (although many details have been left out and wide ranging assumptions have been made.
INFO20003 Database Systems
© University of Melbourne
3
What this subject is all about. Remember this?
Organisational Description and Problem Area
SQL Queries
select val from sales where id = max;
MODELLING ARCHITECTURE / INTERNAL WORKINGS
Results
Process Access Store
Database System

Components of a DBMS
DBMS
Parser/ Compiler
Optimizer Executor
Concurrency control module
Transaction mgr.
Lock mgr.
Crash recovery module
Concurrency
control module
Log mgr.
Storage module
File and access methods mgr.
Buffer pool mgr. Disk space mgr.
This is one of several possible architectures; each system has its own slight variations.
Index files Heap files
Database
TODAY
Query processing module
INFO20003 Database Systems © University of Melbourne
4

Coverage
• File organization (Heap & sorted files) • Index files & indexes
• Index classification
Readings: Chapter 8, Ramakrishnan & Gehrke, Database Systems
INFO20003 Database Systems © University of Melbourne
5

Files (in a DBMS)
• FILE: A collection of pages, each containing a collection of records.
FILE
Page Page
Record
col1, col2, col3
col1, col2, col3
• DBMS must support:
–insert/delete/modify record
–read a particular record (specified using record id)
–scan all records (possibly with some conditions on the records to be retrieved)
INFO20003 Database Systems © University of Melbourne
6

Alternative File Organizations
• Many alternatives exist, each good for some situations, and not so good in others:
1. Heap files: no particular order among records
–Suitable when typical access is a file scan retrieving all records
2. Sorted Files: pages and records within pages are ordered by some condition
–Best for retrieval (of a range of records) in some order
3. Index File Organizations:
–Special data structure that has the fastest retrieval in some order –Will cover shortly..
INFO20003 Database Systems © University of Melbourne
7

1. Heap (Unordered) Files
• Simplest file structure, contains records in no particular order
• As file grows and shrinks, disk pages are allocated and de-allocated
–Fastest for inserts compared to other alternatives
Used to denote that pages are linked

11,80
16,20
12,20
21,10
14,80
14,10
12,10
18,75
20,80
13,75
22,20
36,75

INFO20003 Database Systems © University of Melbourne
8

Sorted files
• Similar structure like heap files (pages and records), but pages and records are ordered
• Fast for range queries, but hard for maintenance (each insert potentially reshuffles records)
• Example: A sorted file ordered by age
11,80
12,10
12,20
13,75
14,80
14,10
16,20
18,75
20,80
21,10
22,20
36,75


INFO20003 Database Systems © University of Melbourne
9

Storage hierarchy
• Data is typically stored in pages on Hard Disks (HDD).
• To be able to process and analyze it – data needs to be brought to Memory (RAM).
INFO20003 Database Systems © University of Melbourne
10

How does a DBMS decide which option is better?
• DBMS model the cost of all operations
• The cost is typically expressed in the number of page accesses (or disk
I/O operations – to bring data from disk to memory)
–1 page access (on disk) == 1 I/O (used interchangeably)
• Example: If we have a table of 100 records, and each page can store 10 records, what would be the cost of accessing the entire file
• Answer: For 100 records we have 10 pages in total (100/10), thus the cost to access the entire file is 10 I/O (or 10 pages)
INFO20003 Database Systems © University of Melbourne
11

Which alternative is better?
• Example: Find all records with ages between 20 and 30, for the file that has B pages. Consider both alternative: having an unsorted and sorted file. What would be the cheapest cost?
• 20 3
• An index is a data structure built on top of data pages used for efficient search. The index is built over specific fields called search key fields. E.g. we can build an index on GPA, or department name.
–The index speeds up selections on the search key fields
–Any subset of the fields of a relation can be the search key for an index
on the relation
–Note: Search key is not the same as key (e.g., doesn’t have to be unique)
INFO20003 Database Systems © University of Melbourne
15

Example: Simple Index on GPA
An index contains a collection of data entries, and supports efficient retrieval of data records matching a given search condition
2 2. Index on
3. Efficient to answer:
age=12 and sal = 10 age=12 and sal > 15
Data entries
11,80
12,10
12,20
13,75
name age sal
bob 12 10
cal 11 80
joe 12 20
sue 13 75

10,12
20,12
75,13
80,11

Data records sorted by name
INFO20003 Database Systems © University of Melbourne
26

Hash-based index
• Hash-based index:
–Represents index as a collection of buckets. Hash function
maps the search key to the corresponding bucket.
– h(r.search_key) = bucket in which record r belongs
–Good for equality selections
• Example: Hash-based index on (sal)
Bucket 1
Find Sal = 2007
2007 mod 4 = 3 go to Buck.4
Bucket 3
Bucket 1 …
Bucket 4 Index File (sal)
H=sal(mod 4)
Data File
INFO20003 Database Systems © University of Melbourne
27

Tree-based index
• Tree-based index:
–Underlying data structure is a binary (B+) tree. Nodes contain pointers to lower levels (search left for lower, right for higher). Leaves contain data entries sorted by search key values.
–Good for range selections
–So far we have shown those
• Example: Tree-based index on (age)
Find age > 39
Index File (age)
Data File
INFO20003 Database Systems © University of Melbourne
28

Summary
• Many alternative file organizations exist, each appropriate in some situation
• If selection queries are frequent, sorting the file or building an index is important
• Index is an additional data structure (i.e. file) introduced to quickly find entries with given key values
–Hash-based indexes only good for equality search
–Sorted files and tree-based indexes best for range search; also
good for equality search
–Files rarely kept sorted in practice (because of the cost of maintaining them); B+ tree index is better
INFO20003 Database Systems © University of Melbourne
29

What’s examinable
• Describe alternative file organizations
• What is an index, when do we use them • Index classification
INFO20003 Database Systems © University of Melbourne
30

Next Lecture
• Query processing part 1
‒ Selection and projection (execution, costs)
‒ Let’s demystify how DBMS perform work
INFO20003 Database Systems © Univers-it-y of Melbourne
31