代写代考 www.cardiff.ac.uk/medic/irg-clinicalepidemiology

www.cardiff.ac.uk/medic/irg-clinicalepidemiology

Physical database design

Copyright By PowCoder代写 加微信 powcoder

Information modelling
& database systems

in the last few lectures, we looked into database design issues
in particular, we studied functional dependencies and normalisation
in this lecture we consider a range of issues concerning physical database design, i.e. issues concerning database performance

Physical database design
physical database design involves:
translating a logical design (tables) into a technical specification for storing and retrieving data
creating a design that will provide adequate performance and insure database integrity and security
optimising both processing and space, but processing efficiency is usually more important
it is about making decisions, not just implementation
the decisions made at this stage will have a major impact on data accessibility, response time, etc.

selecting a data type for an attribute so that it
minimises storage space
represents all possible values
improves data integrity
supports all data manipulations

Data coding
an attribute with a limited number of possible values can be translated into a code

data coding is good for query performance and storage
Product# Description Finish
P1 chair C
P3 table C
… … …

Code Value

Data compression
data compression may also be used, e.g.
Roberton 0 – Roberton
Robertson 6 – son
Robertstone 7 – tone
Robinson 3 – inson

the relational model says that order does not matter
from a practical point of view (e.g. finding information fast), order is very important
indexes are to do with “ordering” data

book index is an ordered list of headings and associated pointers to pages where useful material relating to that heading is mentioned
we do not have to read the whole book page by page in order to find specific information
instead, using an index we can jump straight to the relevant page

an index on an attribute (or a set of attributes) of a table is a data structure that makes it efficient to access values of the attribute

a widely used indexing structure
B-tree is a self-balancing tree that keeps data sorted and allows searches, sequential access, insertions and deletions in logarithmic time

B–tree of order m
all leaf nodes are at the same level
each internal node has at least m/2 and at most m children
each leaf node has at least [m/2] and at most (m – 1) keys
each node with j children has got (j – 1) keys
the root is either a leaf or has at least 2 children

Example: B–tree of order 5
all leaf nodes are at the same level
each internal node has at least 3 and at most 5 children
each leaf node has at least 2 and at most 4 keys
each node with j children has got (j – 1) keys
the root is either a leaf or has at least 2 children

B–tree example

https://www.youtube.com/watch?v=YIROy2XFjQk

Search in a B–tree
the search for a record with key x as a recursive function search(node, x)
if node is null, return null // not found!
otherwise, iterate through records (k1, r1) …, (km, rm)
of the node
if x = ki, then return ri // found it!
if x < ki , then search(childi, x) // search left sub-tree if ki < x, then search(childi+1 , x) // search right sub-tree Example: find 40 Inserting into a B–tree search the B–tree to find the node where the item is to be inserted if the node is not full, then insert the item into the node and maintain order if the node is full, then it has to be split Inserting into a B–tree Example: insert 12 Example: insert 12 Example: insert 53 node full! Splitting a node find middle value of old keys in the node and the new key, e.g. 52, 53, 55, 67, 89 keep records with keys smaller than middle (e.g. 52, 53) in the old node put records with keys greater than middle (e.g. 67, 89) in the new node push middle record (e.g. 55) up into parent node Splitting a node move to new node Inserting into a B–tree if pushing the middle record up into parent node makes it full (overflow), then split it the parent node and push its middle record upwards continue doing this until either: some space is found in an ancestor node, or a new root node is created Deleting from a B–tree Deleting from a B–tree search the B–tree to find the node where the item is to be deleted delete the key and replace it by its in–order successor, which will be found in a leaf node remove successor from its leaf node this may cause underflow (fewer than m/2 records in the node) depending on how many records the sibling has, this is fixed either by fusion or by transfer Example: delete 37 Example: delete 37 – replace by 39 Example: delete 37 – fix underflow (fusion) Example: delete 37 – fix underflow (fusion) Example: delete 37 – fix underflow (fusion) Example: delete 16 Example: delete 16 – fix underflow (transfer) Example: delete 16 – fix underflow (transfer) Example: delete 16 – fix underflow (transfer) Example: delete 16 – fix underflow (transfer) Efficiency Efficiency of B–trees space complexity: O(n) time complexity Operation Average Worst search O(log(n)) O(log(n)) insert O(log(n)) O(log(n)) delete O(log(n)) O(log(n)) Is a B–tree index useful? appropriate use of index can help data retrieval: SELECT Student.S#, Module.title FROM Student, Module WHERE Student.S# = Module.S# AND Student.name = 'John'; assumptions: table Student has n tuples table module has m tuples tuples are unordered Q: How many comparisons do we have to perform in order to answer the query? (a) with or (b) without a B–tree index on S# Conclusion an index file is much smaller than a record file, thus more efficient to handle if small enough, could be kept in the main memory multiple indexes may be created for a single table an index file may be ordered, but there is no need for a record file to be ordered if a table is updated frequently, index may not be good if a column contains few distinct values, then index like B–trees may not be useful /docProps/thumbnail.jpeg 程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com