INFS5710 IT Infra. for BA Normalisation
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
Normalisation
Copyright By PowCoder代写 加微信 powcoder
Functional Dependencies Normal Forms
BCNF 4NF
De-normalisation
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
This week, we will be looking at more on Normalisation of Database Tables.
Normalisation and Normalisation are both fine – one is English and the other is American English – it is ‘s’ or ‘z’
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
This week, we will be looking at more on Normalisation of Database Tables.
Normalisation and Normalisation are both fine – one is English and the other is American English – it is ‘s’ or ‘z’
INFS5710 Week 1
Database Systems Infrastructure
Entity Relationship Model (ERM)
Normalisation
Prepared by , Feb. 2021
External Data (e.g. Excel)
ETL (Data Cleansing)
Star Schema (De-Normalised)
Data Warehouse
External Data (e.g. Excel)
Data (DW to BD, or vice versa or both)
Unstructured Data (Social Media)
Data Streaming
Structured Data (Internet of Things (IOT))
Data Streaming
Flat Files
Machine Learning
Relational Database
(“Not Normalised”)
Spark and NoSQL (and other tools)
Hadoop Distributed File System (HDFS) and MapReduce
Reporting (Business Intelligence and Visualisation) and Business Analysis (End Users)
Note: In-Memory Database (e.g. SAP Hana) is an alternative data model not shown here
©2017 Cengage Learning®. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part, except for use as permitted in a license distributed with a certain product or service or otherwise on a password-protected website or school-approved learning management system for classroom use.
We are looking at Normalisation. Normalisation and ERD go together when designing a database. We have look at conceptual data modelling, i.e. ERD.
Remember, Entity-Relationship (ER) modelling is Top-down approach. Begins by looking for the data groups in the system.
On the other hand, Normalisation is a Bottom-up approach. Begins by looking at the smallest individual items of data recorded by the system.
When I was in my doing my undergraduate, I asked myself who will use normalisation. Three weeks into my first job, my boss asked me to evaluate all the databases the company had at that point in time to see if we need to do anything about normalisation.
So normalisation is an important part of database design.
The Needs and Outcomes of Normalisation
You need the process of normalisation is when your design a new database structure
Analyzes the relationship among the attributes within each entity
Determines if the structure can be improved
Improves the existing data structure and creates an
appropriate database design
The outcome of normalisation will result in a well-
structured relation. A well-structured relation is:
a relation that contains minimal data redundancy
allows users to insert, delete, and update rows without causing data inconsistencies and anomalies, i.e. reduce data anomalies.
The key point of normalisation is to help building a database structure. So you will go from one normal form to the next normal form based on the rules.
The need for normalisation include identifying business rules, identifying and defining business and data constraints, defining functional dependencies, identifying entities and relationships and eliminating multivalued attributes
You need to Improve the existing data structure and create an appropriate database design.
The outcome is what you want from normalisation as stated on slide. You can insert, delete and update rows without any issue.
We will now cover in more depth.
Normalisation (1)
Normalisation is a process for evaluating and correcting table structures to minimize data redundancies, thereby reducing the likelihood of data anomalies.
• Normalisation is …
• a process for converting a relation to a standard (normal) form.
• a process that is accomplished in stages.
• a technique that is used to define “goodness” (or “badness”) of a relation.
• to minimise or eliminate redundancy (duplication of data).
• to prevent data inconsistencies from update, deletion, and insertion
anomalies.
• to decompose a relation/table into smaller components.
• to recapture the precise content of the original relation/table.
• to build data structures that have some desirable (“good”) properties.
• Based on paper: Codd (1971).
The key points of normalisation are shown.
The keys point are to reduce or minimised data redundancies, and prevent data inconsistencies. We examine the table structures to see if we could minimise or further minimise data redundancies.
We have to ensure the data redundancy is eliminated or minimised (we will talk why there might be data redundancy on the next slide).
Normalisation (2)
• Redundancy
• Redundancy occurs when data about a one entity is recorded more than
once in a database.
• Database designers aim to reduce redundancy (i.e., database should not
store same data several times) to save space and prevent problems.
• Evaluating and correcting table structures to minimise data redundancies
• Anomalies
• Insertion Anomaly – adding new rows forces user to create duplicate data
• Deletion Anomaly – deleting rows may cause a loss of data that would be needed for other future rows
• Modification (Update) Anomaly – changing data in a row forces changes to other rows because of duplication
Data redundancy can be generated across different tables if they are not controlled, i.e. normalised.
If data redundancy exists, then there will be anomalies.
For example, if tables are not normalised, then data redundancies will happen, and you will
have issues with anonmalies.
Let’s say, your have a private email attribute/column in your student table. You also have your private email address in course_enrol table for all the courses you have enrolled. Thus, you have data redundancy because you have entered private email address column in two tables, and you have to enter the private email address every time you enrol a new course.
Thus, an anomaly refers to the situation when there is a change of an employee information, there are many columns/rows about this employee that must be manually changed/updated.
There are three types of anomalies as stated on the slide.
If you change your private email address, then you have to change the student table and all the course records in the course_enrol table. This is an example of modification anomaly.
Normalisation (3)
A Normal Form…
• …is a certain state of a relation.
• …can be determined by applying rules regarding dependencies. • …uses a concept called functional dependency…
• Normal forms
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)
• Boyce-Code normal form (BCNF) • [Fourth normal form (4NF)]
normalisation
De-normalisation
Normalisation is a structural point of view of normal forms.
A Normal Form…
a) …is a certain state of a relation.
b) …can be determined by applying rules regarding dependencies. c) …uses a concept called functional dependency.
The last two points, (b) and (c), are the foundation of normalisation, which will be covered later.
A normal form is up to about sixth normal form and there are other normal forms such as BCNF. In this course, we only interested in the first 3 normal forms and Boyce-Codd normal form, and we also touch on fourth normal form. Higher normal forms are better than lower normal forms.
Data and business rules determine the normal form we will adopt.
To normalise the data, we go from 1NF to 3NF etc., that is to reduce data redundancies. Most of the normalised tables are in 3NF and BCNF.
Denormalisation, on the hand, we go the other way round from bottom to top. The data becomes more redundant.
Why? You might ask. The reason is when the data is denormalized, you can access the data much faster. Therefore, The first half of this course, we teach you how to reduce data redundancies, when it comes to data warehouse and big data, we will then teach about demormalisation to become greater data redundancy. There are good reasons why we want to take this action! Denormalisation helps to improve greater performance with greater data redundancy!
Sometimes, we might just do that for a few tables, purely to improve run-time performance.
Normal Forms
Table 6.2: Normal Forms
Normal Form
Characteristic
First normal form (1NF)
Table format, no repeating groups, and PK identified
Second normal form (2NF)
1NF and no partial dependencies
Third normal form (3NF)
2NF and no transitive dependencies
Boyce-Codd normal form (BCNF)
Every determinant is a candidate key (special case of 3NF)
Fourth normal form (4NF)
3NF and no independent multivalued dependencies
These are the rules of for Normal forms:
We will go through some of these terms such as partial dependencies and transitive dependencies on the next slide
1. In the First normal form (1NF), you will try to ensure the data is cleaned and table is formatted, no repeating groups, and PK identified
2. In the Second normal form (2NF), it has to be in 1NF and no partial dependencies
3. In the Third normal form (3NF), it has to be in 2NF and no transitive dependencies
4. In Boyce-Codd normal form (BCNF), every determinant is a candidate key, which is a special case of 3NF
5. In Fourth normal form (4NF), it has to be in 3NF and no independent multivalued dependencies
Functional Dependency
Functional Dependency (FD)
Functional Dependencies …
…are relationships between attributes in a relation. …are the semantics of the attributes in a relation.
…can be inferred in a systematic way by applying a set of inference rules (next slides).
As stated on slide …
We will now go through FD…
Table 6.3: Functional Dependence Concepts
Functional dependence
Definition
The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B.
Example: PROJ_NUM S PROJ_NAME (read as PROJ_NUM functionally determines PROJ_NAME)
In this case, the attribute PROJ_NUM is known as the determinant attribute, and the attribute PROJ_NAME is known as the dependent attribute.
Functional dependence (generalized definition)
Attribute A determines attribute B (that is, B is functionally dependent on A) if all (generalized definition) of the rows in the table that agree in value for attribute A also agree in value for attribute B.
Fully functional dependence (composite key)
If attribute B is functionally dependent on a composite key A but not on any subset of that composite key, the attribute B is fully functionally dependent on A.
functionally determines AB
determinant
Now, we look at the term functional dependence (FD) As stated in the slide…
You can say,
A functionally determines B
B is functionally dependent on A
If AB, i.e. A determines B:
• if A then B; if the premise A holds, then the conclusion B holds;
• B can be inferred from A; A implies B.
In a relation R, an attribute A (or set of attributes) determines an attribute B (or set of attributes) if the values of A uniquely identify the values of B in all cases. In other words, B is “functionally dependent” on A (or A functionally determines B).
For example,
(1) Postcode State
i.e. “2052”“NSW”, but not “2052””VIC”
(1) PROJ_NUM -> PROJ_NAME
The attribute PROJ_NUM (e.g. 123) is known as the determinant attribute, and the attribute PROJ_NAME (“Project Terminator”) is known as the dependent attribute.
Functional Dependency & Normalisation
• Two types of functional dependencies:
• A partial dependency exists when there is a functional dependence in which
the determinant is only part of the primary key.
• For example, if (A, B) -> (C, D), B -> C, and (A, B) is the primary key, then the functional dependence B -> C is a partial dependency because only part of the primary key (B) is needed to determine the value of C. Partial dependencies tend to be straight-forward and easy to identify.
• A transitive dependency exists when there are functional dependencies such that X -> Y, Y -> Z, and X is the primary key. In that case, the dependency X -> Z is a transitive dependency because X determines the value of Z via Y.
• Unlike partial dependencies, transitive dependencies are more difficult to identify among a set of data.
• Fortunately, there is an effective way to identify transitive dependencies: they occur only when a functional dependence exists among nonprime attributes.
Partial dependency is a functional dependence in which the determinant is only part of the primary key.
For example, (A, B C, D) is Functional dependence whereas B C is Partial dependency because only part of the primary key (B) is needed to determine the value of C.
Transitive dependency is when an attribute, e.g. Z, functionally depends on another non- key attribute, e.g. Y., i.e. nonkey to nonkey)
For instance, X Y, Y Z, then X Z is transitive via a nonprime Y, so Y Z (i.e. nonkey to nonkey) is a transitive dependency (to be removed to get to 3NF)
Transitivity andTransitive Dependency
if X Y and Y Z then X Z
Example: if SuburbPostcode and PostcodeState then SuburbState
2052 Kensington and Kensington NSW, then 2052NSW
2052 Kensington and Kensington NSW, then 2052NSW
Normalisation
Normalisation Process
Objective is to ensure that each table conforms to the concept of well-formed relations
Eachtablerepresentsasinglesubject
No data item will be unnecessarily stored in more than one table
All nonprime attributes in a table are dependent on the primary key Eachtableisvoidofinsertion,update,anddeletionanomalies
Ensures that all tables are in at least 3NF (rule of thumb)
Works one relation at a time
Starts by:
Identifyingthedependenciesofarelation(table)
Progressivelybreakingtherelationintonewsetofrelations/tables
As stated in the slide…
In relational databases, tables are usually in 3NF or Boyce- F, and some are in 2NF (for performance).
In banks, it could be 4NF or 5NF, because of time of access to the database, such as from ATM. You might have a few person working on it to tune it up to a higher NF.
Lossless Decomposition and Normal Forms
• Our aim is to decompose relations/tables so to reduce size/redundancy.
• We use inferences rules for this decomposition process.
• We need to be sure that the decomposed components (tables/relations) have the lossless join property (i.e., decomposed components could be joined back together to the original table/relation).
As stated in the slide…
Construction Company Example
Scenario: database for reports for a construction company.
Building project has: Project number, Name, Employees assigned to the project. Employee has: Employee number, Name, Job classification.
The company charges its clients by billing the hours spent on each project.
The hourly billing rate is dependent on the employee’s position.
The following slide shows a table with contents correspond to the reporting requirements but is not “normalised.”
As stated in the slide…
Conversion to First Normal Form (1NF)
• Aim: creating a valid relation. • A relation / table is in 1NF if:
• The key attributes are defined, i.e. PK attributes are defined and not NULL (i.e., a valid PK).
• All attributes are dependent on the primary key
• There are no repeating groups in the table
• All attributes contain only atomic values (i.e., no multivalued attributes).
• Action to create/check 1NF:
• Step 1: Cleaning & dealing with Repeating Groups and Multi-valued Attributes • Step 2: Identify the Primary Key
• Step 3: Identify All Dependencies
All relational tables satisfy 1NF requirements as stated on the slide
It is OK if some tables contain partial dependencies in 1NF – subject to data redundancies and various anomalies
The actions to create/check 1NF are shown and discussed next.
Steps to Follow for 1NF
• Draw Dependency Diagram
• Step 1: Cleaning & dealing with Repeating Groups and Multi- valued Attributes
• Split multivalued attributes and split repeating groups of data (i.e., transform multivalued attributes in additional columns, or, better, additional rows).
• Add the appropriate entry in at least for the primary keys column(s).
• Step 2: Identify the Primary Key
• All attributes are dependent on PROJ_NUM + EMP_NUM
• Step 3: Identify All Dependencies
• Draw Dependency Diagram
• Partial dependency: attributes are dependent on only a part of a composite PK
• Transitive dependency: non-key (nonprime) attributes are dependent on another non-key attribute
Tips: Before going through steps of Normalisation, you want to remove all the derived attributes, i.e. calculated values.
Remove derived attributes
So, which are the PK and dependencies?
So what are the PKs and their dependencies?
The dependencies are built on data shown above. You have examine the data carefully and see if there are any association between them.
If you are good at spot the difference” between two pictures or numbers or Sudoku, then you should have no problem of picking up differences and similarities. Otherwise, not to worry, you still can do it but probably it will take you a bit longer.
Have a look at project number and project name – they are unique!
Have another look at:
(a) employee number and employee name;
(b) employee_number, job_class and charge/hour. (c) etc.
Examine the Similarities and Differences of the Data
ALL_IN_ONE (PROJ_NUM, PROJ_NAME, EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
If the data come in a spreadsheet, then you can examine the similarities and differences of the Data to build up your PK and dependencies using the functions and features under the Data tab in Excel.
First Normal Form (1NF) Dependency Diagram
(PROJ_NUM, EMP_NUM PROJ_NAME, EMP_NAME, JOB_CLASS, CHG_HOUR, HOURS)
Primary Key
All attributes depend on the primary key
Please note the notation
So, you draw like this, INF Dependency Diagram!
The primary composite key (PK) is Project Number (proj_num) and Employee Number
As stated in the slide…
Remember, Partial dependency is a functional dependence in which the determinant is only part of the primary key, in this case, Project Number is only part of PK Project Number and Employee Number
Same as Employee Number (emp_num), emp_name, job_class, and chg_hour
Remember Transitive dependency is an attribute, e.g. z, functionally depends on another non-key attribute, e.g. Y., i.e. nonkey to nonkey, so Job Class and Change Hour!
Explanation: First Normal Form (1NF) Dependency Diagram
Draw Dependency Diagram
Identify the Primary Key
• All attributes are dependent on PROJ_NUM + EM
程序代写 CS代考 加微信: powcoder QQ: 1823890830 Email: powcoder@163.com