CS3402
1
CS3402 – Chapter 3 Integrity Constraints
Database Modelling and Implementation
Ideas/requirements E/R design
Relational schema
Relational database
CS3402
2
Summary of ER-Diagram Notation
Symbol
Meaning
ENTITY TYPE
WEAK ENTITY TYPE
RELATIONSHIP TYPE
IDENTIFYING RELATIONSHIP TYPE ATTRIBUTE
(Primary) KEY ATTRIBUTE MULTIVALUED ATTRIBUTE
COMPOSITE ATTRIBUTE
DERIVED ATTRIBUTE
TOTAL PARTICIPATION OF E2 IN R CARDINALITY RATIO 1:N FOR E1:E2 IN R
STRUCTURAL CONSTRAINT (min, max) ON PARTICIPATION OF E IN R 3
E1
R R
R
N (min,max)
E2
E1
E2
E
CS3402
Database Modelling and Implementation
Ideas/requirements E/R design
Relational schema
Relational database
CS3402
4
Definition Summary
Informal Terms
Formal Terms
Table
Relation
Column Header
Attribute
All possible Column Values
Domain
Row
Tuple
Table Definition
Schema of a Relation
Populated Table
State of the Relation
CS3402 5
Integrity Constraints
A relational database schema is a set of relation scheme S = {R1, R2, …, Rn} and a set of integrity constraints IC
Integrity Constraints determine which values are permissible and which are not in the database (table)
Constraints are conditions that must hold on all valid relation states
Valid state Vs. invalid state
Invalid state: A database state that does not obey all the
integrity constraints
Valid state: a state that satisfies all the constraints in the defined set of integrity constraints
CS3402
7
Relational Integrity Constraints
They are of three main types of constraints:
Inherent or Implicit Constraints: These are based on the data model itself. (E.g., relational model does not allow multiple values for any attribute)
Schema-based or Explicit Constraints: They are expressed in the schema by using the facilities provided by the model. (E.g., max. cardinality ratio constraint in the ER model)
Application-based or Semantic constraints: These are beyond the expressive power of the model and must be specified and enforced by the application programs. (e.g. the salary of an employee should not exceed the salary of the employee’s
CS3402 supervisor) 8
Relational Integrity Constraints
There are four main types of schema-based constraints that can be expressed in the relational model:
Domain constraints
Key constraints
Entity integrity constraints
Referential integrity constraints
CS3402 9
Domain Constraints
Domain constraint: Every value in a tuple must be an atomic value from the domain of its attribute (or it could be null, if NULL is allowed for that attribute)
E.g.,
CS3402
10
C-Name: string of char (30) Balance: Number (6,2) …
Key Constraints
Superkey of R:
A set of attributes that contains a key is called a superkey
It is a set of attributes SK, e.g., {A1, A2} of R with the following condition (Key constraint):
No two tuples in any valid relation state r(R) will have the same value for SK
For any distinct tuples t1 and t2 in r(R), t1[SK] t2[SK]
(Candidate) Key of R:
A “minimal” superkey
A key is a superkey K such that removal of any attribute from K results in a set of attributes that is not a superkey (does not possess the superkey uniqueness property)
CS3402
11
Key Constraints
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year) CAR has two keys:
Key1 = {State, Reg#}
Key2 = {SerialNo}
Both are also superkeys of CAR
{SerialNo, Make} is a superkey but not a key
In general:
Any key is a superkey (but not vice versa)
Any set of attributes that includes a key is a superkey A minimal superkey is a key
CS3402 12
Key Constraints
If a relation has several candidate keys, one is chosen arbitrarily to be the primary key
The primary key attributes are underlined
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year) We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation
General rule: Choose as primary key the smallest of the candidate keys (in terms of size)
CS3402 13
Entity Integrity Constraints
Entity Integrity:
The primary key attributes PK of each relation schema R cannot
have null values in any tuple of R
Primary key values are used to identify the individual tuples
t[PK] null for any tuple t in R
If PK has several attributes, null is not allowed in any of these attributes
Note: Other attributes of R may be constrained to disallow null values, even though they are not members of the primary key.
CS3402
15
Referential Integrity
Key, domain, and entity integrity constraints are specified on individual relations.
Referential integrity is a constraint involving two relations
To specify a relationship among tuples in two relations
The referencing relation and the referenced relation (R1 -> R2)
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that reference the primary key attributes PK of the referenced relation R2 if it satisfies:
The attributes in FK have the same domain(s) as the primary key attributes PK of R2
A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value of PK for some tuple t2 in the current state r2(R2) or is
CS3402 NULL. 16
Referential Integrity
For example, we designate Dno to be a foreign key of EMPLOYEE referencing the DEPARTMENT’s Dnumber
A value of Dno in any tuple t1 of the EMPLOYEE relation must match a value of the primary key Dnumber, in the tuple t2 of the DEPARTMENT relation, or the value of Dno can be NULL if the employee does not belong to a department or will be assigned to a department later.
Referential integrity constraints typically arise from the relationships among the entities represented by the relation
CS3402 17
Displaying a relational database
schema and its constraints
Each relation schema can be displayed as a row of attribute names
The name of the relation is written above the attribute names
The primary key attribute (or attributes) will be underlined
A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the foreign key attributes to the primary key of the referenced relation.
Next slide shows the COMPANY relational schema diagram with referential integrity constraints
CS3402 19
Database State for COMPANY
All examples discussed below refer to the COMPANY database shown here
CS3402 Slide 8- 20 20
Populated database state for COMPANY
CS3402 21
Update Operations on Relations
INSERT a tuple
DELETE a tuple
MODIFY a tuple
Integrity constraints should not be violated by the update operations
Several update operations may have to be grouped together
Updates may propagate to cause other updates automatically. This may be necessary to maintain integrity constraints
CS3402 22
Possible violations for each operation
INSERT may violate any of the constraints:
Domain constraint: if one of the attribute values provided for the new tuple is not of the specified attribute domain
Key constraint: if the value of a key attribute in the new tuple already exists in another tuple in the relation
Entity integrity: if the primary key value is null in the new tuple
Referential integrity: if a foreign key value in the new tuple references a primary key value that does not exist in the referenced relation
CS3402
23
Possible violations for each
operation
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, NULL, ‘1960-04-05’, ‘6357 Windy Lane, Katy,TX’, F, 28000, NULL, 4> into EMPLOYEE.
–violates the entity integrity constraint (NULL for the primary key Ssn)
Insert <‘Alicia’, ‘J’, ‘Zelaya’, ‘999887777’, ‘1960-04-05’, ‘6357 Windy
Lane, Katy, TX’, F, 28000, ‘987654321’, 4> into EMPLOYEE.
–violates the key constraint because another tuple with the same Ssn value already exists in the EMPLOYEE relation
Insert <‘Cecilia’, ‘F’, ‘Kolonsky’, ‘677678989’, ‘1960-04-05’, ‘6357 Windswept, Katy, TX’, F, 28000, ‘987654321’, 7> into EMPLOYEE.
–violates the referential integrity constraint specified on Dno in EMPLOYEE because no corresponding referenced tuple exists in
DEPARTMENT with Dnumber = 7
CS3402 24
Possible violations for each operation
MODIFY(UPDATE) may violate any of the constraints:
Key constraint: if the value of a key attribute in the modified tuple already exists in another tuple in the relation
Domain constraint: if one of the attribute values provided for the modified tuple is not of the specified attribute domain
Entity integrity: if the primary key value is null in the modified tuple
Referential integrity: if a foreign key value in the modified tuple references a primary key value that does not exist in the referenced relation.
CS3402
25
Possible violations for each operation
Update the Sex of the EMPLOYEE tuple with Ssn = ‘999887777’ to 55. –violates domain constraint, because domain of sex is character
Update the Dno of the EMPLOYEE tuple with Ssn = ‘999887777’ to 7.
–violates referential integrity constraint specified on Dno in EMPLOYEE because no corresponding referenced tuple exists in
DEPARTMENT with Dnumber = 7
Update the Ssn of the EMPLOYEE tuple with Ssn = ‘987654321’ to ‘999887777’.
— violates primary key constraint by repeating a value that already exists as a primary key in another tuple; violates referential integrity constraints because there are other relations that refer to the existing value of Ssn.
CS3402 26
Possible violations for each operation
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is
referenced from other tuples in the database
Delete the EMPLOYEE tuple with Ssn = ‘999887777’.
–violates the referential integrity constraints, because there are tuples in WORKS_ON that refer to this tuple, if the tuple in EMPLOYEE is deleted, referential integrity violations will result.
CS3402 27
Integrity Constraints
In case of integrity violation, several actions can be taken:
cancel the operation that causes the violation perform the operation but inform the user of the
violation (e.g. ask the user to provide a valid value )
trigger additional updates so the violation is corrected (e.g. cascade the deletion by deleting tuples that reference the tuple being deleted)
execute a user-specified error-correction routine
CS3402 28
Functional Dependency
Functionaldependencyisaconstraintbetweentwosetsof attributes from the database
E.g., In DEPARTMENT, Dnumber and Dname
If you know the department number, you know the department
name, so we have Dnumber Dname
AfunctionaldependencydenotesbyXY,betweentwosetsof attributes X and Y that are subsets of a relation schema R specifies a constraint on the possible tuples that can form a relation state r of R
Theconstraintisthat,foranytwotuplest1andt2inrthathave t1[X] = t2[X], they must also have t1[Y] = t2[Y]
ThevaluesoftheYcomponentofatupleinrdependon,orare determined by the values of the X component
CS3402 32
Functional Dependency
Formal definition:
Let R be a relation scheme, and R, R (ie, and
are sets of R’s attribuets). We say:
if in any legal relation instance r(R), for all pairs of tuples t1 and t2 in r, we have:
(t1[] = t2[]) (t1[] = t2[])
CS3402
33
Functional Dependency
Some usages of FDs:
(1) to set constraints on legal relations (e.g. key constraints)
(2) to test relations to see if they are “legal” under a given set of FDs.
(3) to test the goodness of a database schema design (normalization).
CS3402
34
Functional Dependency: keys
A set of one or more attributes {A1, A2, …, An} is a (candidate) key for a relation if:
The attributes functionally determine all other attributes of the relation (superkey definition)
No proper subset of {A1, A2, …, An} functionally determines all other attributes of R, i.e., a key must be minimal
CS3402 35
Functional Dependency: properties
IfX→YinR,thisdoesnotsaywhetherornotY→XinR Ssn → Fname, can it be Fname → Ssn? No
IfX→Y, thenXZ→Y
Ssn→ Birthdate, can it be {Ssn, Fname} → Birthdate? Yes
SomeFDsare“trivial”,sincetheyarealwayssatisfiedbyall relations:
E.g., A A, AB A, (the right-hand side is a subset of the left-hand side)
E.g., {Fname, Sex} Fname
CS3402 37
Inference Rules for FDs
GivenasetofFDsF,wecaninferadditionalFDsthathold whenever the FDs in F hold
Armstrong’sinferencerules:
IR1. (Reflexive) If Y is a subset of X, then X Y IR2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X Y and Y Z, then X Z
IR1,IR2,IR3formasoundandcompletesetofinferencerules
Sound: These rules are true
Complete: All the other rules that are true can be deduced from these rules
CS3402
38
Inference Rules for FDs
Some additional inference rules that are useful: Decomposition: If X YZ, then X Y and X Z Union: If X Y and X Z, then X YZ
Pseudotransitivity: If X Y and WY Z, then WX Z
CS3402
39
Inference Rules for FDs
IR1 (reflective rule)
If X is a subset of Y, then X → Y
IR2 (augmentation rule)
If X Y, then XZ YZ
IR3 (transitive rule)
If X Y and Y Z, then X Z
IR4 (decomposition rule)
If X YZ, then X Y and X Z
IR5 (union rule)
If X Y and X Z, then X YZ
IR6 (pseudotransitive rule)
If X Y and WY Z, then WX Z
CS3402 40
Example
SupposewearegivenaschemaRwithattributesA,B,C,D,E,F and the FDs are:
A → BC
B→E
CD → EF
Show that the FD AD → F holds
1. A → BC (given)
2. A → C (1, decomposition)
3. AD → CD (2, augmentation)
4. CD → EF (given)
5. AD → EF (3 and 4, transitivity)
6. AD → F (5, decomposition)
CS3402 41
Inference Rules for FDs
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
E.g.,supposewespecifythefollowingsetFofobviousfunctional dependencies
F = {Ssn →{Ename, Bdate, Address,Dnumber}, Dnumber→{Dname, Dmgr_ssn}}
Then,
Ssn → {Dname, Dmgr_ssn}
Ssn → Ssn
Dnumber → Dname
….. ….
F+ Including all FDs which can be inferred from F
CS3402
42
Equivalence of Sets of FDs
AsetoffunctionaldependenciesFissaidtocoveranothersetof functional dependency G if every FD in G is also in F+ (E is a subset
of F+)
TwosetsofFDsFandGareequivalentif:
Every FD in F can be inferred from G, and Every FD in G can be inferred from F
Hence, F and G are equivalent if F+ =G+
Example:
F: ABC;
G: AB, AC F+ = G+
CS3402 43
Inference Rules for FDs
Closure of a set of attributes X with respect to F is the set X+ of all attributes that are functionally determined by X
Note both X and X+ are a set of attributes
If X+ consists of all attributes of R, X is a superkey for R
From the value of X, we can determine the values the whole tuple
X+ can be calculated by repeatedly applying IR1, IR2, IR3 using the FDs in F
FromXtofindoutX+
CS3402 44
Example
SupposewearegivenaschemaRwithattributesA,B,C,D,E,F, and FDs
A→BC
E→CF
B→E
CD→EF
{A}+ =>
CS3402
45
{A}+ = {A,B,C,E,F} Not a superkey or a key
Example
SupposewearegivenaschemaRwithattributesSsn,Ename, Pname, Plocation, Pnumber and Hours, and a FD set F.
F={Ssn→ Ename
Pnumber → {Pname, Plocation} {Snn, Pnumber} → Hours}
ThefollowingclosuresetswithrespecttoF
{Ssn}+ = {Ssn, Ename}
{Pnumber}+ = {Pnumber, Pname, Plocation}
{Ssn, Pnumber}+ = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}
{Ssn, Pnumber} is a (candidate) key. CS3402
46
Summary
ClosureofasetFofFDs
The set F+ of all FDs that can be inferred from F
ClosureofasetofattributesXwithrespecttoF
The set X+ of all attributes that are functionally determined by X
AsetoffunctionaldependenciesFissaidtocoveranothersetof functional dependency G
If every FD in G is also in F+
TwosetsofFDsFandGareequivalent
If F and G are equivalent if and only if F+ =G+
CS3402 47
References
6e
Ch. 3, p. 63 – 70
CS3402 48