CS计算机代考程序代写 Functional Dependencies database flex case study 2021/8/8 DBS - 211

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 1/12

DBS211

Week 9 – Normalization 1 (UNF and 1NF)

Table of Contents

Welcome

Database Design Fundamentals

DBDL

Introduction to Normalization

Un-Normalized Form

First Normal Form

Case Study

Summary

Reading Materials

Textbook, Chapter 12

Wikipedia – Database Normalization

Wikipedia – UNF

Wikipedia – 1NF

Welcome to Week 9 – Normalization

This week we dive into the process of designing a database from scratch when you have very little

to work from. You might only have a few paper forms given to you to use as a base and you are

expected to design a complete relational database. While utilizing the DBDL format and

understanding field dependencies with the primary keys, you will be able to come up with a very

satisfactory design for the database schema.

After this week, the student will understand what normalization is, apply DBDL to determine the

un-normalized and first normal forms of the database schema.

http://dbs211.ca/index.html

Chapter 12 Normalization

https://en.wikipedia.org/wiki/Database_normalization
https://en.wikipedia.org/wiki/Unnormalized_form
https://en.wikipedia.org/wiki/First_normal_form

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 2/12

Database Design Fundamentals

Throughout the course to this point, we have focused on the physical properties of the database

schema. We looked at several types of Keys, constraints, and the various forms that tables can

take and we learned about calculated or derived fields and how fields relate to one another. Now it

is time to take this a little further and learn about several different kinds of dependencies between

each field and the key field for the table in which it resides. This understanding will help you

ensure that fields are located in the right tables and that the tables required are all created.

Where to start…..

We will start by going back and reviewing several topics that we covered before and make sure

you grasp those concepts. All the below listed concepts are required for the database

normalization process. So go back and quickly review the following chapters:

Week 1 – Data Repetition and Redundancy

Week 1 – Data Anomalies

Week 2 – Various Types of Keys

Week 2 – Various Types of Tables

Now that we have a refresher on those topics, we can move forward….

DBDL

DBDL or DataBase Definition Language is a standardized way of describing entities in a relational

database in a written format. It is very similar to the entity format from ERDs, except written in line

as a paragraph of text would be written.

ENTITY_NAME [ KEY FIELD, field2, FK field3, (field4, field5, field6), … fieldn ]

The above is the basic syntax for a basic DBDL entity. A few points to note:

The entity name is written in ALL-CAPS outside the square brackets []

The Primary, or better referred to as Candidate, Key is underlined. If underlining is not

possible, we write either PK or CK in front of it indicating Primary or Candidate Key.

Fields that are related to fields in other entities are indicated using FK, for Foreign Key.

The Round Brackets are used to distinguish groups of related data that may result in

repeating groups (described in more detail below)

Once you start to obtain more entities, it may look something like this:

http://dbs211.ca/Weeks/Week01/index.html#File_Systems
http://dbs211.ca/Weeks/Week01/index.html#Data_Anomalies
http://dbs211.ca/Weeks/Week02/index.html#Keys
http://dbs211.ca/Weeks/Week02/index.html#Tables

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 3/12

PLAYERS [playerID, fname, lname, dob, preferred_position]
PLAYER_TEAM [FK playerID, FK teamID, shirt_number, primary_position]
TEAMS [teamID, team_name, home_location, manager_name]

Note in the above example, the PLAYER_TEAM entity has a Composite Candidate Primary Key and

both fields of that key are also Foreign Keys (child records) to the other tables respectively.

PLAYER_TEAM is an example of a junction or bridge entity.

Introduction to Normalization

Now we are ready to work our conceptual design into a format that can be translated into tables

for actual database implementation. Historically, this once had been a process of intuitive logic.

For many students this intuitive approach is often used. Without a LOT of experience the intuitive

process often leads to failure. As in the past the intuitive method often led to data redundancy,

and designs where relationships could not exist.

Normalization entails organizing the columns (attributes) and tables (entities) of a database to

ensure that their dependencies are properly enforced by database integrity constraints through

relationships. It is accomplished by applying some formal rules either by a process of synthesis

(creating a new database design) or decomposition (improving an existing database design).

There are 10 different forms of normalization today. However, most databases are considered well

designed if they comply with the first 4 forms. The additional forms are typically only needed in

advanced statistical analysis, machine learning and large data analysis processes. This course will

only concentrate on the first four forms.

UNF – Unnormalized Form – is the basic form simple utilized for obtaining and grouping

attributes that are required. Repeating groups of data are typically indicated using

parenthesis().

1NF – First Normal Form – contains all the features of UNF plus the elimination of multi-value

attributes or non-atomic fields.

2NF – Second Normal Form – builds on 1NF by eliminating partial functional dependencies

3NF – Third Normal Form – builds on 2NF by eliminating remaining transitive function

dependencies

The following table shows the first 4 normal forms and displays the basic rules for each. Each of

these are described further below and in next weeks content.

Property UNF 1NF 2NF 3NF

Each Entity has a Candidate Primary Key ✓ ✓ ✓ ✓

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 4/12

Property UNF 1NF 2NF 3NF

Repeating Groups Indicated by Parenthesis ✓ ✓ ✓ ✓

Repeating Groups Eliminated ✓ ✓ ✓

Atomic Fields Only ✓ ✓ ✓

No Partial Dependencies ✓ ✓

No Transitive Dependencies ✓

Let us now look at each normalized form in turn.

Un-Normalized Form (UNF)

The un-normalized form is often derived from having a single form or a source of requirements in

order to determine what attributes are included. But typically UNF will consist of a a simple list of

attributes and then indicate which groups of attributes may have repeating data. Each UNF

solution will clearly indicate the primary Entity of the subject matter and for each entity, indicate a

candidate key that could be used to determine uniqueness. This is probably best learned through

example.

Example:

The shown receipt is a simple typical receipt you would obtain from a small retail store. Let us use

this receipt to learn about UNF at this point. By looking at this receipt in detail we can determine a

significant amount of the attributes required and can use our knowledge to add additional ones

needed.

The first step is to determine the Primary Entity used. There is no one particular entity that is the

correct one, but some will make the process easier than others. Let us review potential entities

from this receipt and choose a single one to start with.

Bake Shop – since this appears to be an independent shop and not a franchise, there likely

will only be one location

Server – although the server’s name is Ken, he is not typically the focus of a receipt

Receipt – the receipt itself is an entity and maybe a good candidate to start with

Products – this person purchased some donuts and a Box O Joe (whatever that is??). The

products could also be a starting place, although might make it difficult to understand

starting with this one.

Customer – sometimes the focus can be placed on the customer, in membership or

commercial circumstances, but in this case, customers are likely anonymous a significant

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 5/12

amount of the time.

Let us choose the Receipt itself to be the main item here.

The choosing of the main entity can be confusing, but

realistically, it simply does not matter because as you go

through the process of normalization, the other entities will

appear anyways. This will become clear through example.

So since we chose the receipt for the main entity we can

write the starting DBDL:

RECEIPT [receiptID, company_name, purchase_date]

Let’s look at some other fields we can add from the

information on the receipt.

it is hard to tell, but the 10156 is the receipt ID, this is a

good candidate primary key.

RECEIPT [receiptID, company_name, purchase_date]

Ken was the server, so assumed there are more than

one server and therefore there will be a list of servers at

some point. There is also likely a PK for servers as we know

name is not a good candidate key, let us use employee_ID.

It looks like the 18 on the receipt may be Ken’s employee

number.

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name]

the receipt shows the products being sold, the quantity and the unit price.

the Amount, or sub-total, the taxes, a tip if provided and the total charged. It must be made

very clear that calculated fields are NEVER included in teh DBDL as they will never be in the final

database design. They are derived fields which are not included in a relational database design.

However, we need one piece of information in order to calculate all of these, and that is the rate of

tax that is charged. We will discuss a little later, but the tax rate is time sensitive and therefore we

need to record the tax rate on the date of the receipt, as it may change in the future.

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, product_na

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 6/12

The payment method (Visa), card #, name on card and the method used to enter card (S-

swipe, T-Tap, I-Insert) along with the approval code returned from the payment processing

company.

and the receipt shows a space for a signature if required.

Repeating Groups

Repeating groups are a group of related fields will have values repeated many times within the

rows of the data. This is best described by viewing a sample table using the schema we have so

far.

in the above table it is seen that several rows of data repeat themselves as we allow for multiple

products to be included on each receipt. Storing repeated data goes against the fundamental

principles of database design. Therefore we must eliminate these repeated groups of data.

In UNF we must indicate the presence of these repeated groups of data by using parenthesis in

one of 2 ways. Use parenthesis around the groups of repeating data, or around the data that does

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, product_na
unit_price, taxRate, payment_method, cCardNum, cCardName, ApprovalCode]

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, product_na
unit_price, taxRate, payment_method, cCardNum, cCardName, ApprovalCode, i

ReceiptID company_name purchase_date employee_id server_name product_n

11056
Ma & Pa’s Bake

Shop
10/9/2018 18 Ken Dozen Do

11056
Ma & Pa’s Bake

Shop
10/9/2018 18 Ken Box O Joe

11057
Ma & Pa’s Bake

Shop
10/9/2018 18 Ken Choc. Cak

11058
Ma & Pa’s Bake

Shop
10/9/2018 14 Beth Dozen Do

11058
Ma & Pa’s Bake

Shop
10/9/2018 14 Beth 6 Bagels

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 7/12

not repeat. Either method will work, but it is easier to chose the groups that are smaller to start.

Completed UNF Solution

The DBDL shown, uses parenthesis to indicate the fields that do not repeat between rows. All

other fields repeat for each Candidate Primary Key: The ReceiptID. There is no need to separate

these fields at this point of UNF, but simply indicate them at this point.

First Normal Form (1NF)

The First Normal Form, or 1NF, is defined as being a UNF solution in addition to the elimination of

Non-Atomic Fields. An atomic field is a field that contains only one value per row. To demonstrate

this, let us consider the above table, but follow the principle that the candidate primary key must

be unique. Therefore, we will rewrite the table as:

Seeing the table rewritten like this, it is clear that the product name, quantity and unit prices have

more than one value per primary key. Therefore, these fields must be separated into new entities.

This new entity should have a name that combines the source and the new concept.

RECEIPT_PRODUCT would be an appropriate name.

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, ( taxRate,
payment_method, cCardNum, cCardName, ApprovalCode, isSigned]

ReceiptID company_name purchase_date employee_id server_name product_n

11056
Ma & Pa’s Bake

Shop
10/9/2018 18 Ken

Dozen Do

Box O Joe

11057
Ma & Pa’s Bake

Shop
10/9/2018 18 Ken Choc. Cak

11058
Ma & Pa’s Bake

Shop
10/9/2018 14 Beth

Dozen Do

6 Bagels

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, ta
cCardNum, cCardName, ApprovalCode, isSigned]
RECEIPT_PRODUCT [product_name, quantity, unit_price]

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 8/12

However, with two separate entities now, it is obvious that they came from the same UNF solution

and therefore must be related. This relationship MUST be maintained. Therefore we need to

include a field in the new entity that links this back to the original entity. The appropriate field

would be the primary key.

Lastly at this point, we need to ensure that the new solution is both UNF and 1NF. The UNF

property of each entity having an appropriate candidate primary key is required. In our last

iteration, the product_name is not a great candidate key. We therefore should add a new key that

we know will be needed in the future. To indicate unique products, we will need to add a

productID field. Therefore, our final 1NF solution will become:

Completed 1NF Solution

Time Sensitive Attributes

We will often run into some attributes that can change over time and can affect calculations if

they change. The list_price of products is one of these type of attributes, as is tax rate and other

surcharge amounts.

We can all image a situation where we have the above receipts and we would need to also have a

PRODUCTS entity to store the list of products available and this entity would include the current

list price for each product.

PRODUCTS [productSKU, product_name, list_price, on_hand]

In this case it is important to note that the list_price would be the current price if someone bought

that product today. But what if someone bought the product 7 days ago, and the product is now

on sale for less money.The list_price in the products table would change. But if this list_price was

used to calculate the resiept totals for the purchase 7 days ago, the totals would be different than

they were 7 days ago, because the producers list price changed. Therefore, it is critical that an

additional field be created to store the price it was on the date of the receipt, and this value is

used for calculations.

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, ta
cCardNum, cCardName, ApprovalCode, isSigned]
RECEIPT_PRODUCT [product_name, FK receiptID, quantity, unit_price]

RECEIPT [receiptID, company_name, purchase_date, employee_id, server_name, ta
cCardNum, cCardName, ApprovalCode, isSigned]
RECEIPT_PRODUCT [productID, FK receiptID, product_name, quantity, unit_price]

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 9/12

in this example, the product price was split into 2 fields. The unit_price in the

RECEIPT_PRODUCTS entity is the price the product was on the date of the sale and the list_price

is the price of the product if it was purchased right now.

Time sensitive attributes often need to be split into 2 different fields to maintain historic

values.

Case Study

Let us do one more example that can demonstrate the process:

Soccer League

A local recreational soccer league is starting up and needs a database to maintain the league.

This database will include player data, team data in addition the league schedule, statistics and

standings.

The following form will be used to determine the database design through normalization.

UNF

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 10/12

Like before we will start the UNF process by determining the main entity of the form. In this case

we will use the team itself.

TEAMS []

Next we determine an appropriate candidate key for the entity of choice. The team # would be

unique and a good candidate key.

TEAMS [Team#, teamName]

Now we go through the process of adding all the fields we can obtain from the form, ensuring not

to include calculated or derived fields.

Coach and Captain Info:

The team specific information:

The Player Information:

The last step is to indicate the repeated groups of data by either choosing the items that are

repeating, or the items that differ and surrounding them with parenthesis. In this case, all the

fields are repeating data for different players, therefore let us surround the player information in

brackets.

Final UNF Solution

1NF

TEAMS [Team#, teamName, CoachName, CoachPhone, CoachEmail, CaptainName, CaptainPhone

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachPhone, Coa
CaptainName, CaptainPhone, CaptainEmail]

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachPhone, Coa
CaptainName, CaptainPhone, CaptainEmail, pNum, PfName, PlName, pDOB, pPhone,

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachPhone, Coa
CaptainName, CaptainPhone, CaptainEmail, ( pNum, PfName, PlName, pDOB, pPhone, pP

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 11/12

The first step in the 1NF is to eliminate the repeating groups. This is done by creating a new entity,

moving the fields there and creating a connecting Foreign Key.

First let’s move the fields in parenthesis into a new entity called TEAM_PLAYERS.

In order to link the entities together, we need to bring a FK forward. Team# will be the field of

choice.

Next we need to determine a candidate primary key. This will differ based on a business rule. Can

a player play on more than one team. If yes, then players can be repeated as can team numbers. If

players can not play on more than one team, then the team id can be stored in the player table as

a foreign key. In this case scenario, we do not know the answer to this question, so we will choose

the more flexible answer to remove restrictions in the future. This is to assume they can play on

more than one team.

Therefore, the candidate primary key will be a composite key made up of two fields in which the

combination of values can not be repeated. This will be the combination of team and player.

We need to ensure there are no repeating groups of data remaining in either entity. By careful

observation we see that phone number creates a non-atomic field that can be considered a

repeating group, when viewing it from a different perspective. We will choose to look at it from a

non-atomic field perspective at this point.

Next we need to ensure that all fields are atomic: i.e. no one field has multiple values per primary

key. Looking at the sample, it is possible for players to have more than one phone number.

Therefore, the player phone number field must be separated.

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachP
CaptainName, CaptainPhone, CaptainEmail]
TEAM_PLAYERS [pNum, PfName, PlName, pDOB, pPhone, pPhoneType, pEmail, pPosition, p

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachP
CaptainName, CaptainPhone, CaptainEmail]
TEAM_PLAYERS [FK team#, pNum, PfName, PlName, pDOB, pPhone, pPhoneType, pEmail, pP

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachP
CaptainName, CaptainPhone, CaptainEmail]
TEAM_PLAYERS [FK team#, pNum, PfName, PlName, pDOB, pEmail, pPosition, pNotes]

2021/8/8 DBS – 211

dbs211.ca/Weeks/Week10/index.html 12/12

We seem to have an issue with pNum as the relationship between TEAM_PLAYERS and

PLAYER_PHONES is through the pNum fields. But it is not an independent PK in either case,

therefore, we need to create an entity that has that as a single field PK.

The tricky part of the above, is knowing which fields go in which entity. For now we will take our

best guess. I know that sounds vague, but generally move fields that relate directly to the entity

(i.e. player name relates to the PLAYER, not the TEAM_PLAYERS and the player position depends

on both the player and the team. Ultimately, if you don’t get it quite right, that is okay at this point.

When solving 2NF and 3NF, these mistakes here will work themselves out anyways. The key thing

to ensure correct at this point is:

no repeating groups

all entities have an indicated PK or CK

all fields are atomic

that all tables are related through PK-FK relationships

Final 1NF Solution

PLAYER_PHONES [pNum, pPhone, pPhoneType]

TEAMS [Team#, teamName, TeamColor, HomeLocation, TeamURL, CoachName, CoachP
CaptainName, CaptainPhone, CaptainEmail]
TEAM_PLAYERS [FK team#, pNum, pPosition, pNotes]
PLAYERS [pNum, PfName, PlName, pDOB, pEmail]
PLAYER_PHONES [FK pNum, pPhone, pPhoneType]

Related Posts