www.cardiff.ac.uk/medic/irg-clinicalepidemiology
Physical database design
Copyright By PowCoder代写 加微信 powcoder
Information modelling
& database systems
in the last few lectures, we looked into database design issues
in particular, we studied functional dependencies and normalisation
in this lecture we consider a range of issues concerning physical database design, i.e. issues concerning database performance
Physical database design
physical database design involves:
translating a logical design (tables) into a technical specification for storing and retrieving data
creating a design that will provide adequate performance and insure database integrity and security
optimising both processing and space, but processing efficiency is usually more important
it is about making decisions, not just implementation
the decisions made at this stage will have a major impact on data accessibility, response time, etc.
selecting a data type for an attribute so that it
minimises storage space
represents all possible values
improves data integrity
supports all data manipulations
Data coding
an attribute with a limited number of possible values can be translated into a code
data coding is good for query performance and storage
Product# Description Finish
P1 chair C
P3 table C
… … …
Code Value
Data compression
data compression may also be used, e.g.
Roberton 0 – Roberton
Robertson 6 – son
Robertstone 7 – tone
Robinson 3 – inson
the relational model says that order does not matter
from a practical point of view (e.g. finding information fast), order is very important
indexes are to do with “ordering” data
book index is an ordered list of headings and associated pointers to pages where useful material relating to that heading is mentioned
we do not have to read the whole book page by page in order to find specific information
instead, using an index we can jump straight to the relevant page
an index on an attribute (or a set of attributes) of a table is a data structure that makes it efficient to access values of the attribute
a widely used indexing structure
B-tree is a self-balancing tree that keeps data sorted and allows searches, sequential access, insertions and deletions in logarithmic time
B–tree of order m
all leaf nodes are at the same level
each internal node has at least m/2 and at most m children
each leaf node has at least [m/2] and at most (m – 1) keys
each node with j children has got (j – 1) keys
the root is either a leaf or has at least 2 children
Example: B–tree of order 5
all leaf nodes are at the same level
each internal node has at least 3 and at most 5 children
each leaf node has at least 2 and at most 4 keys
each node with j children has got (j – 1) keys
the root is either a leaf or has at least 2 children
B–tree example
https://www.youtube.com/watch?v=YIROy2XFjQk
Search in a B–tree
the search for a record with key x as a recursive function search(node, x)
if node is null, return null // not found!
otherwise, iterate through records (k1, r1) …, (km, rm)
of the node
if x = ki, then return ri // found it!
if x < ki , then search(childi, x) // search left sub-tree
if ki < x, then search(childi+1 , x) // search right sub-tree
Example: find 40
Inserting into a B–tree
search the B–tree to find the node where the item is to be inserted
if the node is not full, then insert the item into the node and maintain order
if the node is full, then it has to be split
Inserting into a B–tree
Example: insert 12
Example: insert 12
Example: insert 53
node full!
Splitting a node
find middle value of old keys in the node and the new key, e.g. 52, 53, 55, 67, 89
keep records with keys smaller than middle (e.g. 52, 53) in the old node
put records with keys greater than middle (e.g. 67, 89) in the new node
push middle record (e.g. 55) up into parent node
Splitting a node
move to new node
Inserting into a B–tree
if pushing the middle record up into parent node makes it full (overflow), then split it the parent node and push its middle record upwards
continue doing this until either:
some space is found in an ancestor node, or
a new root node is created
Deleting from a B–tree
Deleting from a B–tree
search the B–tree to find the node where the item is to be deleted
delete the key and replace it by its in–order successor, which will be found in a leaf node
remove successor from its leaf node
this may cause underflow (fewer than m/2 records in the node)
depending on how many records the sibling has, this is fixed either by fusion or by transfer
Example: delete 37
Example: delete 37 – replace by 39
Example: delete 37 – fix underflow (fusion)
Example: delete 37 – fix underflow (fusion)
Example: delete 37 – fix underflow (fusion)
Example: delete 16
Example: delete 16 – fix underflow (transfer)
Example: delete 16 – fix underflow (transfer)
Example: delete 16 – fix underflow (transfer)
Example: delete 16 – fix underflow (transfer)
Efficiency
Efficiency of B–trees
space complexity: O(n)
time complexity
Operation Average Worst
search O(log(n)) O(log(n))
insert O(log(n)) O(log(n))
delete O(log(n)) O(log(n))
Is a B–tree index useful?
appropriate use of index can help data retrieval:
SELECT Student.S#, Module.title
FROM Student, Module
WHERE Student.S# = Module.S#
AND Student.name = 'John';
assumptions:
table Student has n tuples
table module has m tuples
tuples are unordered
Q: How many comparisons do we have to perform in order to answer the query?
(a) with or (b) without a B–tree index on S#
Conclusion
an index file is much smaller than a record file, thus more efficient to handle
if small enough, could be kept in the main memory
multiple indexes may be created for a single table
an index file may be ordered, but there is no need for a record file to be ordered
if a table is updated frequently, index may not be good
if a column contains few distinct values, then index like B–trees may not be useful
/docProps/thumbnail.jpeg
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com