2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 1/8
DBS211
Week 10 – Normalization 2 (2NF and 3NF)
Table of Contents
Welcome
2NF – Partial Dependencies
3NF – Transitive Dependencies
Exceptions To Normalization
Reading Materials
Textbook, Chapter 12
Wikipedia – Database Normalization
Wikipedia – 2NF
Wikipedia – 3NF
Welcome to Week 10
Continuing where we left off last week, we continue to path through normalization and look in
much more depth, the concept of attribute dependencies. Understanding the dependencies
between attributes and the primary key is essential to great database design. This week, we will
explore the area of both partial and transitive dependencies, how to find them and how to
eliminate them resulting in a final 3NF database design.
After this week the student will be able to:
identify the 2 variants of partial dependencies
eliminate partial dependencies and normalize a database design to 2NF
identify transitive dependencies and how to resolve them
eliminate transitive dependencies and normalize a database design to 3NF
2NF – Partial Dependencies
http://dbs211.ca/index.html
https://en.wikipedia.org/wiki/Database_normalization
https://en.wikipedia.org/wiki/Second_normal_form
https://en.wikipedia.org/wiki/Third_normal_form
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 2/8
Second normal form always starts with a database already in 1NF and then identifies partial
dependencies. Partial dependencies are relationships where the attribute in focus is not
dependent on the entire primary key or is dependent on more than the primary key. To satisfy
2NF, each attribute in an entity must be wholly dependent on the entire primary key. For this
particular form, we need to review every single non-pk attribute for partial dependencies.
The 2 Variations of Partial Dependencies
An attribute partially dependent on a composite primary key
Let’s review our soccer league again to display an example of this. We will use the business rule
where teams have more than one player and each player can play on more than one team.
IN the 1NF solution above we will investigate all non-PK attributes to detect partial dependencies
✓teamName – is team specific and does not depend on any field other than teamID
✓ShirtColour – is team specific and depends only on teamID. All players on one team wear
the same colour shirt.
PlayerName – is player specific and depends only on playerID. However the TEAM_PLAYERS
entity has a composite key with teamID. PlayerName has no relation with teamID and
therefore has a partial dependency with the entity’s PK and must be moved.
PlayerDOB – same situation as PlayerName
✓ShirtNumber – The number on the back of the players shirt will depend on the player, as no
two players on the same team wear the same number, however, the player may wear a
different number when playing on a different team. Therefore, the ShirtNumber depends on
both the teamID and the PlayerID. This matches the composite primary key and is therefore in
the right place.
from above, we can see that PlayerName and PlayerDOB both have partial dependencies with the
PK of the TEAM_PLAYERS entity and therefore can not be in that entity. Since they depend only
on the playerID, we need a new entity that has a single field PK of playerID. Let us therefore create
a PLAYER entity and place those fields in it.
UNF
TEAMS [teamID, teamName, ShirtColour, (PlayerID, PlayerName, PlayerDOB, ShirtNu
1NF
TEAMS [teamID, teamName, ShirtColour]
TEAM_PLAYERS[FK teamID, PlayerID, PlayerName, PlayerDOB, ShirtNumber]
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 3/8
TEAMS [teamID, teamName, ShirtColour]
TEAM_PLAYERS[FK teamID, FK PlayerID, ShirtNumber]
PLAYERS [PlayerID, PlayerName, PlayerDOB]
In this case, the playerID is also left behind in the TEAM_PLAYERS entity in order to maintain the
relationship, and will be a FK to the new entity, which now possess playerID as a PK.
Lastly, in order to satisfy 2NF, the solution must also be UNF and 1NF. Checking each attribute in
all entities, we can see these are satisfied, and therefore this is a good final 2NF solution.
An attribute dependent on more than one field, but only a single field PK exists
In this example we will again review the soccer league database to demonstrate this.
TEAMS [teamID, teamName, ShirtColour]
TEAM_PLAYERS[FK teamID, FK PlayerID]
PLAYERS [PlayerID, PlayerName, PlayerDOB, ShirtNumber]
in the UNF solution above, we included shirtNumber in the repeating group parenthesis as it
absolutely would go with the player. So it might seem natural to continue to keep the shirtNumber
with player.
Now we can revisit the business rule associated with players and teams. If the player can only play
on one team, then the shirt number would belong to the player and the team would not
necessarily determine the number directly. But, in the case where a player may play on more than
one team, then the player number can be different for each team and therefore the shirt number
depends on both the player id and the team id. Because it is in the Players entity, which only
contains the playerID, it has a partial dependency with the PK. Therefore it must move to an entity
where both the teamID and the playerID are together a composite primary key (the
TEAM_PLAYERS entity).
TEAMS [teamID, teamName, ShirtColour]
TEAM_PLAYERS[FK teamID, FK PlayerID, ShirtNumber]
PLAYERS [PlayerID, PlayerName, PlayerDOB]
The Use of Surrogate Keys and Normalization
Now is probably a good time to introduce the reason why we would not introduce surrogate keys,
such as auto generated ID type fields, to replace composite keys yet. Some of you ave heard that
composite keys are not usually left in database designs and they are often replaced with an
autonumber ID field. Although this is very true and a good practice, it should not be done just yet.
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 4/8
When trying to determine things like partial and transitive dependencies, it must be made very
clear that these dependencies must be determined with respect to the intended, oro original key
fields. The surrogate keys are a replacement key for what was the original fields and therefore,
those original fields must be used for the dependency analysis. If a surrogate key is introduced
too early, then determining partial and transitive dependencies is greatly complicated.
For Example: if we were to replace the composite key in TEAM_PLAYERS with an autonumber field
it might look like this.
TEAMS [teamID, teamName, ShirtColour]
TEAM_PLAYERS[rosterID, FK teamID, FK PlayerID, ShirtNumber]
PLAYERS [PlayerID, PlayerName, PlayerDOB]
Now looking at the ShirtNumber attribute, it is not nearly as clear that it is not partially dependent
on the primary key. You have to remember that the surrogate key, rosterID, is a replacement for
the original composite key between playerID and teamID and those are the fields in which the
dependencies must be determined.
Don’t introduce the surrogate keys to replace composite keys until after normalization has
been completed.
3NF – Transitive Dependencies
Transitive dependencies are dependencies where an attribute depends on another non-pk
attribute in the same entity. This one is a little tricky to see, but absolutely needs to be taken care
of.
Let us look at the following example to visualize transitive dependencies.
Transitive dependencies can often be discovered by asking the question, if I change one non-key
field, does it force me to change another non-key field. If the answer is yes, you have a transitive
dependency.
Looking at each field in the above CUSTOMERS entity, and asking the question above, it becomes
obvious very quickly that by changing the SalesRepID, the SalesRepName would also have to
change. Neither of these attributes are PKs or CKs, therefore SalesRepName has a transitive
relationship with CustomerID, the PK for the entity. SalesRepName depends on SaleRepID, which
in turn depends on CustomerID. This is Transitive.
CUSTOMERS [CustomerID, CustFName, CustLName, CustPhone, SalesRepID, SaleRepName]
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 5/8
To solve this, We move SalesRep details to another entity and only leave behind the ID as a FK to
the newly created PK of the new entity.
CUSTOMERS [CustomerID, CustFName, CustLName, CustPhone, FK SalesRepID]
SALESREPS [SalesRepID, SalesRepName]
Another More Complex Transitive Dependency Example
A good example of transitive dependencies are often in the automotive industry. A vehicle
typically is referred to by its’ make, model and version (example: Honda Civic LX) and each vehicle
is uniquely identified by its’ Vehicle Identification Number (VIN). The DBDL for the VEHICLE entity
might look something like this.
and a data table may look something like:
The above example is full of transitive dependencies that can be tricky to navigate, to let’s look at
one at a time working right to left.
Transmission – Any car can have a different transmission as this is an option when you buy a
car. You can buy a Honda Civic LX with either an automatic or a manual transmission and
therefore transmission is really independent of model of make and depends solely on the
individual car (i.e. the VIN). Therefore, this is not a transitive dependency.
✓ NumDoors – This is a strange one as it could go either way. But I would definitely say that
changing the version from LX to LX Hatchback would absolutely change the number of doors.
Therefore, I would say the number of doors has a transitive dependency, through version to
VIN, the PK.
VEHICLE [VIN, make, model, version, colour, NumDoors, Transmission, …… etc
VIN Make Model Version Colour NumDoors
FDSAGFSD8G68DG76SD Honda Civic EX Black 4
789FDSA6796G7D8AS68 Honda Civic
LX
Hatchback
Forest
Green
2
YUIS79S8DFS789789GSS Ford Mustang GT
Bumblebee
Yellow
2
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 6/8
Colour – Any car can be painted any colour and therefore it is vehicle dependent, the VIN.
Therefore, not transitive.
✓ Version – The version of the car is another tricky one, as manufacturers often reuse version
codes in different models. For instance there are both Honda Civic EX as well as Honda
Accord EX, but there is no Honda Civic Gt. GT is a version of Ford products. Therefore,
changing the Make or Model will absolutely force the Version to change. Therefore, Version
has a transitive dependency with VIN.
✓ Model – If I change the Make of the car, the model will absolutely have to change as well.
For example, if in the first row above, I changed the Make from Honda to Toyota, the model
and version would also have to change, meaning model depends on Make first and the VIN
and therefore has a transitive dependency with VIN, the PK.
✓ Make – the make also has a transitive dependency with VIN, although not as strongly. If you
change Model, it is likely that the Make also has to change, but not every time. For instance, if
Civic is change to Accord, the Make would remain as Honda, but if Civic was changed to
Mustang, then the Make would have to change from Honda to Ford. This is still a transitive
dependency as it is not entirely directly related to the PK field (VIN).
Wow, so how do we fix this?
so first thing we do is determine one of the fields and move them. Let us start with NumDoors.
NumDoors depends on the Version, which depends on Model, which depends on Make. All
somehow indirectly relate to VIN, but through transitive relationships. If we are to work this puzzle
out, we first look at the field most directly related to VIN which can be unique enough. In this case
that is Version, let us start there and move all related fields with it.
Now we still have transitive dependencies with make and model. The Number of Doors depends
directly on the version of the car, so that is no longer transitive with its’ PK.
For Make and Model we can follow the dependency line. The version depends on the Model and
then the Make, so let’s separate both those through the model.
VEHICLE [VIN, FK versionID, colour, Transmission, …… etc.]
VERSIONS [versionID, versionName, FK Model, NumDoors]
MODEL [Model, Make]
if we introduce ID fields for make and model here, we would see this.
VEHICLE [VIN, FK versionID, colour, Transmission, …… etc.]
VERSIONS [versionID, versionName, Model, Make, NumDoors]
We introduced a versionID field here is Version by itself is not a good candidate key
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 7/8
VEHICLE [VIN, FK versionID, colour, Transmission, …… etc.]
VERSIONS [versionID, versionName, FK ModelID, NumDoors]
MODEL [ModelID, ModelName, MakeID, MakeName]
Now MakeName is dependent on MakeID and not ModelID the PK in that entity, so it must move.
VEHICLE [VIN, FK versionID, colour, Transmission, …… etc.]
VERSIONS [versionID, versionName, FK ModelID, NumDoors]
MODEL [ModelID, ModelName, FK MakeID]
MAKE [MakeID, MakeName]
Now we have eliminated all transitive dependencies, verified there are no partial dependencies, all
fields are atomic, there are no repeating groups, and all entities have a good candidate primary
key. Therefore we have the design in 3NF.
Exceptions to Normalization
There are a few areas where normalization goes to far and will cause issues, such as data storage
growth or a lack of available information. One of the key examples of this is to do with physical
mailing addresses.
In this case it is not so obvious, but there are several transitive dependencies here.
Postal Codes are specific to countries and geographical areas
province is dependent on country
city is dependent on province
street is dependent on city
street house number is dependent on street
This gets extremely complicated and has some very specific requirements. First of all let us look
at the normalized version of an address.
CUSTOMERS [cID, name, phone, FK postal_code, house_number]
POSTAL_CODE [postal_code, FK streetID]
STREET [streetID, streetName, FK City]
CITY [city, FK provCode]
PROVINCE [provCode, provName, FK CountryCode]
COUNTRY [CountryCode, CountryName]
CUSTOMERS [cID, name, phone, address1, street, city, province, country, postal_code
2021/8/8 DBS – 211
dbs211.ca/Weeks/Week11/index.html 8/8
First off this seems absolutely ridiculous and it is. But in true 3NF form, this is what an address
would look like. But let us look at more consequences using an example of an online shopping site
that sells t-shirts:
remember that normalization results in FK-PK relationships which require referential integrity,
meaning data can not be added as a child record unless the parent record already exists.
example: We could not add a province unless the country already existed, we could not add a
street, unless the city already existed.
Either the store would have to restrict where their customers are located or they would end up
with a very large database before they even sold a single t-shirt. Even if the store was restricted
to Canada only, the database would need to contain every province, city, street, and postal code
in the entire country to enable customers to enter their information. Additionally, new construction
all the time means this database is growing and would need to be maintained with great effort.
This level of data storage and maintenance may be alright for very large companies, such as
FedEx, but not for most small and medium businesses in the country. Therefore, we often by-pass
the normalization process in order to save having to have to information pre-entered, the data
storage requirements. Note that this will absolutely lead to errors in the data as humans enter the
information, but the cost is worth it. Maybe the software can implement some checks using
external APIs to verify addresses before saving them in the database.
The point here is that, sometimes completing normalization to the extreme is sometimes more
costly than it is worth. So in some cases, we bypass normalization for simplicity.
If you choose to bypass normalization as part of an assignment, project, lab or test, it would
be best to confirm with your instructor before submitting your work.