NOTE: The commands for creating and dropping indexes are not part of standard SQL.
virtual or derived tables because they present the user with what appear to be tables;
however, the information in those tables is derived from previously defined tables.
Section 4 introduced the SQL ALTER TABLE statement, which is used for modifying
the database tables and constraints.
Table 2 summarizes the syntax (or structure) of various SQL statements. This sum-
mary is not meant to be comprehensive or to describe every possible SQL construct;
rather, it is meant to serve as a quick reference to the major types of constructs avail-
able in SQL. We use BNF notation, where nonterminal symbols are shown in angled
brackets <...>, optional parts are shown in square brackets […], repetitions are shown
in braces {…}, and alternatives are shown in parentheses (… | … | …).7
7The full syntax of SQL is described in many voluminous documents of hundreds of pages.
type of constructs can be specified in each of the six clauses. Which of the six
clauses are required and which are optional?
2. Describe conceptually how an SQL retrieval query will be executed by speci-
fying the conceptual order of executing each of the six clauses.
3. Discuss how NULLs are treated in comparison operators in SQL. How are
NULLs treated when aggregate functions are applied in an SQL query? How
are NULLs treated if they exist in grouping attributes?
4. Discuss how each of the following constructs is used in SQL, and discuss the
various options for each construct. Specify what each construct is useful for.
a. Nested queries.
b. Joined tables and outer joins.
c. Aggregate functions and grouping.
d. Triggers.
e. Assertions and how they differ from triggers.
f. Views and their updatability.
g. Schema change commands.
query results if each query is applied to the database in Figure A.2.
a. For each department whose average employee salary is more than
$30,000, retrieve the department name and the number of employees
working for that department.
b. Suppose that we want the number of male employees in each department
making more than $30,000, rather than all employees (as in Exercise 4a).
Can we specify this query in SQL? Why or why not?
6. Specify the following queries in SQL on the database schema in Figure A.5.
a. Retrieve the names and major departments of all straight-A students
(students who have a grade of A in all their courses).
b. Retrieve the names and major departments of all students who do not
have a grade of A in any of their courses.
7. In SQL, specify the following queries on the database in Figure A.1 using the
concept of nested queries and concepts described in this chapter.
a. Retrieve the names of all employees who work in the department that has
the employee with the highest salary among all employees.
b. Retrieve the names of all employees whose supervisor’s supervisor has
‘888665555’ for Ssn.
c. Retrieve the names of employees who make at least $10,000 more than
the employee who is paid the least in the company.
8. Specify the following views in SQL on the COMPANY database schema
shown in Figure A.1.
a. A view that has the department name, manager name, and manager
salary for every department.
b. A view that has the employee name, supervisor name, and employee
salary for each employee who works in the ‘Research’ department.
c. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project.
d. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project with more than one employee working on it.
9. Consider the following view, DEPT_SUMMARY, defined on the COMPANY
database in Figure A.2:
a. SELECT *
FROM DEPT_SUMMARY;
b. SELECT D, C
FROM DEPT_SUMMARY
WHERE TOTAL_S > 100000;
c. SELECT D, AVERAGE_S
FROM DEPT_SUMMARY
WHERE C > ( SELECT C FROM DEPT_SUMMARY WHERE D=4);
d. UPDATE DEPT_SUMMARY
SET D=3
WHERE D=4;
e. DELETE FROM DEPT_SUMMARY
WHERE C > 4;
There are many books that describe various aspects of SQL. For example, two refer-
ences that describe SQL-99 are Melton and Simon (2002) and Melton (2003).
Further SQL standards—SQL 2006 and SQL 2008—are described in a variety of
technical reports; but no standard references exist.
( . . . ,
Mgr_ssn CHAR(9) NOT NULL DEFAULT ‘888665555’,
. . . ,
( . . . ,
PRIMARY KEY (Dnumber, Dlocation),
FOREIGN KEY (Dnumber) REFERENCES DEPARTMENT(Dnumber)
In this chapter we discuss the two formal languages forthe relational model: the relational algebra and the
relational calculus. In contrast, there is the practical language for the relational
model, namely the SQL standard. Historically, the relational algebra and calculus
were developed before the SQL language. In fact, in some ways, SQL is based on
concepts from both the algebra and the calculus, as we shall see. Because most rela-
tional DBMSs use SQL as their language, we presented the SQL language first.
Recall that a data model must include a set of operations to manipulate the data-
base, in addition to the data model’s concepts for defining the database’s structure
and constraints. The basic set of operations for the relational model is the relational
algebra. These operations enable a user to specify basic retrieval requests as
relational algebra expressions. The result of a retrieval is a new relation, which may
have been formed from one or more relations. The algebra operations thus produce
new relations, which can be further manipulated using operations of the same alge-
bra. A sequence of relational algebra operations forms a relational algebra expres-
sion, whose result will also be a relation that represents the result of a database
query (or retrieval request).
The relational algebra is very important for several reasons. First, it provides a for-
mal foundation for relational model operations. Second, and perhaps more impor-
tant, it is used as a basis for implementing and optimizing queries in the query
processing and optimization modules that are integral parts of relational database
management systems (RDBMSs). Third, some of its concepts are incorporated into
the SQL standard query language for RDBMSs. Although most commercial
From Chapter 6 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
RDBMSs in use today do not provide user interfaces for relational algebra queries,
the core operations and functions in the internal modules of most relational sys-
tems are based on relational algebra operations. We will define these operations in
detail in Sections 1 through 4 of this chapter.
The relational algebra is often considered to be an integral part of the relational data
model. Its operations can be divided into two groups. One group includes set oper-
ations from mathematical set theory; these are applicable because each relation is
defined to be a set of tuples in the formal relational model. Set operations include
UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN PRODUCT (also
known as CROSS PRODUCT). The other group consists of operations developed
specifically for relational databases—these include SELECT, PROJECT, and JOIN,
among others. First, we describe the SELECT and PROJECT operations in Section 1
because they are unary operations that operate on single relations. Then we discuss
set operations in Section 2. In Section 3, we discuss JOIN and other complex binary
operations, which operate on two tables by combining related tuples (records)
based on join conditions. The COMPANY relational database shown in Figure A.1 in
Appendix: Figures at the end of this chapter is used for our examples.
In Sections 6 and 7 we describe the other main formal language for relational data-
bases, the relational calculus. There are two variations of relational calculus. The
tuple relational calculus is described in Section 6 and the domain relational calculus
is described in Section 7. Some of the SQL constructs are based on the tuple rela-
tional calculus. The relational calculus is a formal language, based on the branch of
1SQL is based on tuple relational calculus, but also incorporates some of the operations from the rela-
tional algebra and its extensions.
mathematical logic called predicate calculus.2 In tuple relational calculus, variables
range over tuples, whereas in domain relational calculus, variables range over the
domains (values) of attributes. Section 8 summarizes the chapter.
For the reader who is interested in a less detailed introduction to formal relational
languages, Sections 4, 6, and 7 may be skipped.
2In this chapter no familiarity with first-order predicate calculus—which deals with quantified variables
and values—is assumed.
3The SELECT operation is different from the SELECT clause of SQL. The SELECT operation chooses
tuples from a table, and is sometimes called a RESTRICT or FILTER operation.
The result is shown in Figure 1(a).
Notice that all the comparison operators in the set {=, <, ≤, >, ≥, ≠} can apply to
attributes whose domains are ordered values, such as numeric or date domains.
Domains of strings of characters are also considered to be ordered based on the col-
lating sequence of the characters. If the domain of an attribute is a set of unordered
values, then only the comparison operators in the set {=, ≠} can be used. An exam-
ple of an unordered domain is the domain Color = { ‘red’, ‘blue’, ‘green’, ‘white’, ‘yel-
low’, …}, where no order is specified among the various colors. Some domains allow
additional types of comparison operators; for example, a domain of character
strings may allow the comparison operator SUBSTRING_OF.
In general, the result of a SELECT operation can be determined as follows. The
is applied independently to each individual tuple t in R. This
is done by substituting each occurrence of an attribute Ai in the selection condition
with its value in the tuple t[Ai]. If the condition evaluates to TRUE, then tuple t is
selected. All the selected tuples appear in the result of the SELECT operation. The
Boolean conditions AND, OR, and NOT have their normal interpretation, as follows:
■ (NOT cond) is TRUE if cond is FALSE; otherwise, it is FALSE.
The SELECT operator is unary; that is, it is applied to a single relation. Moreover,
the selection operation is applied to each tuple individually; hence, selection condi-
tions cannot involve more than one tuple. The degree of the relation resulting from
a SELECT operation—its number of attributes—is the same as the degree of R. The
number of tuples in the resulting relation is always less than or equal to the number
of tuples in R. That is, |σc (R)| ≤ |R| for any condition C. The fraction of tuples
selected by a selection condition is referred to as the selectivity of the condition.
Hence, a sequence of SELECTs can be applied in any order. In addition, we can
always combine a cascade (or sequence) of SELECT operations into a single
SELECT operation with a conjunctive (AND) condition; that is,
In SQL, the SELECT condition is typically specified in the WHERE clause of a query.
For example, the following operation:
The resulting relation is shown in Figure 1(b). The general form of the PROJECT
operation is
If the attribute list includes only nonkey attributes of R, duplicate tuples are likely to
occur. The PROJECT operation removes any duplicate tuples, so the result of the
PROJECT operation is a set of distinct tuples, and hence a valid relation. This is
known as duplicate elimination. For example, consider the following PROJECT
operation:
The result is shown in Figure 1(c). Notice that the tuple <‘F’, 25000> appears only
once in Figure 1(c), even though this combination of values appears twice in the
EMPLOYEE relation. Duplicate elimination involves sorting or some other tech-
nique to detect duplicates and thus adds more processing. If duplicates are not elim-
inated, the result would be a multiset or bag of tuples rather than a set. This was not
permitted in the formal relational model, but is allowed in SQL.
In SQL, the PROJECT attribute list is specified in the SELECT clause of a query. For
example, the following operation:
Figure 2(a) shows the result of this in-line relational algebra expression.
Alternatively, we can explicitly show the sequence of operations, giving a name to
each intermediate relation, as follows:
It is sometimes simpler to break down a complex sequence of operations by specify-
ing intermediate result relations than to write a single relational algebra expression.
We can also use this technique to rename the attributes in the intermediate and
result relations. This can be useful in connection with more complex operations
such as UNION and JOIN, as we shall see. To rename the attributes in a relation, we
simply list the new attribute names in parentheses, as in the following example:
These two operations are illustrated in Figure 2(b).
If no renaming is applied, the names of the attributes in the resulting relation of a
SELECT operation are the same as those in the original relation and in the same
order. For a PROJECT operation with no renaming, the resulting relation has the
same attribute names as those in the projection list and in the same order in which
they appear in the list.
where the symbol ρ (rho) is used to denote the RENAME operator, S is the new rela-
tion name, and B1, B2, …, Bn are the new attribute names. The first expression
renames both the relation and its attributes, the second renames the relation only,
and the third renames the attributes only. If the attributes of R are (A1, A2, …, An) in
that order, then each Ai is renamed as Bi.
In SQL, a single query typically represents a complex relational algebra expression.
Renaming in SQL is accomplished by aliasing using AS, as in the following example:
4As a single relational algebra expression, this becomes Result ← πSsn (σDno=5 (EMPLOYEE) ) ∪
πSuper_ssn (σDno=5 (EMPLOYEE))
Several set theoretic operations are used to merge the elements of two sets in vari-
ous ways, including UNION, INTERSECTION, and SET DIFFERENCE (also called
MINUS or EXCEPT). These are binary operations; that is, each is applied to two sets
(of tuples). When these operations are adapted to relational databases, the two rela-
tions on which any of these three operations are applied must have the same type of
tuples; this condition has been called union compatibility or type compatibility. Two
relations R(A1, A2, …, An) and S(B1, B2, …, Bn) are said to be union compatible (or
type compatible) if they have the same degree n and if dom(Ai) = dom(Bi) for 1 ≤ i
≤ n. This means that the two relations have the same number of attributes and each
corresponding pair of attributes has the same domain.
Figure 4 illustrates the three operations. The relations STUDENT and INSTRUCTOR
in Figure 4(a) are union compatible and their tuples represent the names of stu-
dents and the names of instructors, respectively. The result of the UNION operation
in Figure 4(b) shows the names of all students and instructors. Note that duplicate
tuples appear only once in the result. The result of the INTERSECTION operation
(Figure 4(c)) includes only those who are both students and instructors.
The resulting relations from this sequence of operations are shown in Figure 5. The
EMP_DEPENDENTS relation is the result of applying the CARTESIAN PRODUCT
operation to EMPNAMES from Figure 5 with DEPENDENT from Figure A.1. In
EMP_DEPENDENTS, every tuple from EMPNAMES is combined with every tuple
from DEPENDENT, giving a result that is not very meaningful (every dependent is
combined with every female employee). We want to combine a female employee
tuple only with her particular dependents—namely, the DEPENDENT tuples whose
Dependent_name Sex . . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
. . .
Lname Ssn Essn Dependent_name Sex . . .Bdate
Jennifer Wallace Abner987654321 987654321 M . . .1942-02-28
Essn value match the Ssn value of the EMPLOYEE tuple. The ACTUAL_DEPENDENTS
relation accomplishes this. The EMP_DEPENDENTS relation is a good example of
the case where relational algebra can be correctly applied to yield results that make
no sense at all. It is the responsibility of the user to make sure to apply only mean-
ingful operations to relations.
The first operation is illustrated in Figure 6. Note that Mgr_ssn is a foreign key of the
DEPARTMENT relation that references Ssn, the primary key of the EMPLOYEE rela-
tion. This referential integrity constraint plays a role in having matching tuples in
the referenced relation EMPLOYEE.
. . . . . .
. . .
. . .
. . .
. . .
. . .
. . .
The general form of a JOIN operation on two relations5 R(A1, A2, …, An) and S(B1,
B2, …, Bm) is
The result of the JOIN is a relation Q with n + m attributes Q(A1, A2, …, An, B1, B2,
… , Bm) in that order; Q has one tuple for each combination of tuples—one from R
and one from S—whenever the combination satisfies the join condition. This is the
main difference between CARTESIAN PRODUCT and JOIN. In JOIN, only combina-
tions of tuples satisfying the join condition appear in the result, whereas in the
CARTESIAN PRODUCT all combinations of tuples are included in the result. The
join condition is specified on attributes from the two relations R and S and is evalu-
ated for each combination of tuples. Each tuple combination for which the join
condition evaluates to TRUE is included in the resulting relation Q as a single com-
bined tuple.
5Again, notice that R and S can be any relations that result from general relational algebra expressions.
The same query can be done in two steps by creating an intermediate table DEPT as
follows:
If the attributes on which the natural join is specified already have the same names in
both relations, renaming is unnecessary. For example, to apply a natural join on the
Dnumber attributes of DEPARTMENT and DEPT_LOCATIONS, it is sufficient to write
6NATURAL JOIN is basically an EQUIJOIN followed by the removal of the superfluous attributes.
Notice that if no combination of tuples satisfies the join condition, the result of a
JOIN is an empty relation with zero tuples. In general, if R has nR tuples and S has nS
tuples, the result of a JOIN operation R S will have between zero and
nR * nS tuples. The expected size of the join result divided by the maximum size nR *
nS leads to a ratio called join selectivity, which is a property of each join condition.
If there is no join condition, all combinations of tuples qualify and the JOIN degen-
erates into a CARTESIAN PRODUCT, also called CROSS PRODUCT or CROSS JOIN.
As we can see, a single JOIN operation is used to combine data from two relations so
that related information can be presented in a single table. These operations are also
known as inner joins, to distinguish them from a different join variation called
outer joins (see Section 4.4). Informally, an inner join is a type of match and com-
bine operation defined formally as a combination of CARTESIAN PRODUCT and
SELECTION. Note that sometimes a join may be specified between a relation and
itself, as we will illustrate in Section 4.3. The NATURAL JOIN or EQUIJOIN operation
can also be specified among multiple tables, leading to an n-way join. For example,
consider the following three-way join:
This combines each project tuple with its controlling department tuple into a single
tuple, and then combines that tuple with an employee tuple that is the department
manager. The net result is a consolidated relation in which each tuple contains this
project-department-manager combined information.
In SQL, JOIN can be realized in several different ways. The first method is to specify
the in the WHERE clause, along with any other selection condi-
tions. This is very common. The second way is to use a nested relation. Another way
is to use the concept of joined tables. The construct of joined tables was added to
SQL2 to allow the user to specify explicitly all the various types of joins, because the
other methods were more limited. It also allows the user to clearly distinguish join
conditions from the selection conditions in the WHERE clause.
Finally, apply the DIVISION operation to the two relations, which gives the desired
employees’ Social Security numbers:
The preceding operations are shown in Figure 8(a).
In general, the DIVISION operation is applied to two relations R(Z) ÷ S(X), where
the attributes of R are a subset of the attributes of S; that is, X ⊆ Z. Let Y be the set
of attributes of R that are not attributes of S; that is, Y = Z – X (and hence Z = X ∪
Y ). The result of DIVISION is a relation T(Y) that includes a tuple t if tuples tR appear
in R with tR [Y] = t, and with tR [X] = tS for every tuple tS in S. This means that, for
a tuple t to appear in the result T of the DIVISION, the values in t must appear in R in
combination with every tuple in S. Note that in the formulation of the DIVISION
operation, the tuples in the denominator relation S restrict the numerator relation R
by selecting those tuples in the result that match all values present in the denomina-
tor. It is not necessary to know what those values are as they can be computed by
another operation, as illustrated in the SMITH_PNOS relation in the above example.
Figure 8(b) illustrates a DIVISION operation where X = {A}, Y = {B}, and Z = {A, B}.
Notice that the tuples (values) b1 and b4 appear in R in combination with all three
tuples in S; that is why they appear in the resulting relation T. All other values of B
in R do not appear with all the tuples in S and are not selected: b2 does not appear
with a2, and b3 does not appear with a1.
The DIVISION operation can be expressed as a sequence of π, ×, and – operations as
follows:
A query tree is a tree data structure that corresponds to a relational algebra expres-
sion. It represents the input relations of the query as leaf nodes of the tree, and rep-
resents the relational algebra operations as internal nodes. An execution of the
query tree consists of executing an internal node operation whenever its operands
(represented by its child nodes) are available, and then replacing that internal node
by the relation that results from executing the operation. The execution terminates
when the root node is executed and produces the result relation for the query.
Figure 9 shows a query tree for Query 2: For every project located in ‘Stafford’, list the
project number, the controlling department number, and the department manager’s
last name, address, and birth date. This query is specified on the relational schema of
Figure A.2 and corresponds to the following relational algebra expression:
In Figure 9, the three leaf nodes P, D, and E represent the three relations PROJECT,
DEPARTMENT, and EMPLOYEE. The relational algebra operations in the expression
are represented by internal tree nodes. The query tree signifies an explicit order of
execution in the following sense. In order to execute Q2, the node marked (1) in
Figure 9 must begin execution before node (2) because some resulting tuples of
operation (1) must be available before we can begin to execute operation (2).
Similarly, node (2) must begin to execute and produce results before node (3) can
start execution, and so on. In general, a query tree gives a good visual representation
and understanding of the query in terms of the relational operations it uses and is
recommended as an additional means for expressing queries in relational algebra.
where F1, F2, …, Fn are functions over the attributes in relation R and may involve
arithmetic operations and constant values. This operation is helpful when develop-
ing reports where computed values have to be produced in the columns of a query
result.
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(πSsn, Salary – Deduction, 2000 * Years_service,
0.25 * Salary
(EMPLOYEE)).
7There is no single agreed-upon notation for specifying aggregate functions. In some cases a “script A” is
used.
resulting relation has the grouping attributes plus one attribute for each element in
the function list. For example, to retrieve each department number, the number of
employees in the department, and their average salary, while renaming the resulting
attributes as indicated below, we write:
ρR(Dno, No_of_employees, Average_sal)(Dno ℑ COUNT Ssn, AVERAGE Salary (EMPLOYEE))
If no grouping attributes are specified, the functions are applied to all the tuples in
the relation, so the resulting relation has a single tuple only. For example, Figure
10(c) shows the result of the following operation:
It is important to note that, in general, duplicates are not eliminated when an aggre-
gate function is applied; this way, the normal interpretation of functions such as
8Note that this is an arbitrary notation we are suggesting. There is no standard notation.
It is relatively straightforward in the relational algebra to specify all employees
supervised by e at a specific level by joining the table with itself one or more times.
However, it is difficult to specify all supervisees at all levels. For example, to specify
the Ssns of all employees e� directly supervised—at level one—by the employee e
whose name is ‘James Borg’ (see Figure A.1), we can apply the following operation:
To retrieve all employees supervised by Borg at level 2—that is, all employees e��
supervised by some employee e� who is directly supervised by Borg—we can apply
another JOIN to the result of the first query, as follows:
To get both sets of employees supervised at levels 1 and 2 by ‘James Borg’, we can
apply the UNION operation to the two results, as follows:
The results of these queries are illustrated in Figure 11. Although it is possible to
retrieve employees at each level and then take their UNION, we cannot, in general,
specify a query such as “retrieve the supervisees of ‘James Borg’ at all levels” without
utilizing a looping mechanism unless we know the maximum number of levels.10
An operation called the transitive closure of relations has been proposed to compute
the recursive relationship as far as the recursion proceeds.
9In SQL, the option of eliminating duplicates before applying the aggregate function is available by
including the keyword DISTINCT.
10The SQL3 standard includes syntax for recursive closure.
A set of operations, called outer joins, were developed for the case where the user
wants to keep all the tuples in R, or all those in S, or all those in both relations in the
result of the JOIN, regardless of whether or not they have matching tuples in the
other relation. This satisfies the need of queries in which tuples from two tables are
The LEFT OUTER JOIN operation keeps every tuple in the first, or left, relation R in R
S; if no matching tuple is found in S, then the attributes of S in the join result are
filled or padded with NULL values. The result of these operations is shown in Figure
12.
relation are also kept in the result relation T(X, Y, Z). It is therefore the same as a
FULL OUTER JOIN on the common attributes.
Two tuples t1 in R and t2 in S are said to match if t1[X]=t2[X]. These will be com-
bined (unioned) into a single tuple in t. Tuples in either relation that have no
matching tuple in the other relation are padded with NULL values. For example, an
OUTER UNION can be applied to two relations whose schemas are STUDENT(Name,
Ssn, Department, Advisor) and INSTRUCTOR(Name, Ssn, Department, Rank). Tuples
from the two relations are matched based on having the same combination of values
of the shared attributes—Name, Ssn, Department. The resulting relation,
STUDENT_OR_INSTRUCTOR, will have the following attributes:
All the tuples from both relations are included in the result, but tuples with the same
(Name, Ssn, Department) combination will appear only once in the result. Tuples
appearing only in STUDENT will have a NULL for the Rank attribute, whereas tuples
appearing only in INSTRUCTOR will have a NULL for the Advisor attribute. A tuple
that exists in both relations, which represent a student who is also an instructor, will
have values for all its attributes.11
Notice that the same person may still appear twice in the result. For example, we
could have a graduate student in the Mathematics department who is an instructor
in the Computer Science department. Although the two tuples representing that
person in STUDENT and INSTRUCTOR will have the same (Name, Ssn) values, they
will not agree on the Department value, and so will not be matched. This is because
Department has two different meanings in STUDENT (the department where the per-
son studies) and INSTRUCTOR (the department where the person is employed as an
instructor). If we wanted to apply the OUTER UNION based on the same (Name, Ssn)
combination only, we should rename the Department attribute in each table to reflect
that they have different meanings and designate them as not being part of the
union-compatible attributes. For example, we could rename the attributes as
MajorDept in STUDENT and WorkDept in INSTRUCTOR.
The following are additional examples to illustrate the use of the relational algebra
operations. All examples refer to the database in Figure A.1. In general, the same
query can be stated in numerous ways using the various operations. We will state
each query in one way and leave it to the reader to come up with equivalent formu-
lations.
Query 1. Retrieve the name and address of all employees who work for the
‘Research’ department.
11Note that OUTER UNION is equivalent to a FULL OUTER JOIN if the join attributes are all the com-
mon attributes of the two relations.
This query could be specified in other ways; for example, the order of the JOIN and
SELECT operations could be reversed, or the JOIN could be replaced by a NATURAL
JOIN after renaming one of the join attributes to match the other join attribute
name.
Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name,
address, and birth date.
In this example, we first select the projects located in Stafford, then join them with
their controlling departments, and then join the result with the department man-
agers. Finally, we apply a project operation on the desired attributes.
Query 3. Find the names of employees who work on all the projects controlled
by department number 5.
In this query, we first create a table DEPT5_PROJS that contains the project numbers
of all projects controlled by department 5. Then we create a table EMP_PROJ that
holds (Ssn, Pno) tuples, and apply the division operation. Notice that we renamed
the attributes so that they will be correctly used in the division operation. Finally, we
join the result of the division, which holds only Ssn values, with the EMPLOYEE
table to retrieve the desired attributes from EMPLOYEE.
Query 4. Make a list of project numbers for projects that involve an employee
whose last name is ‘Smith’, either as a worker or as a manager of the department
that controls the project.
Query 5. List the names of all employees with two or more dependents.
Query 6. Retrieve the names of employees who have no dependents.
We first retrieve a relation with all employee Ssns in ALL_EMPS. Then we create a
table with the Ssns of employees who have at least one dependent in
EMPS_WITH_DEPS. Then we apply the SET DIFFERENCE operation to retrieve
employees Ssns with no dependents in EMPS_WITHOUT_DEPS, and finally join this
with EMPLOYEE to retrieve the desired attributes. As a single in-line expression, this
query becomes:
Query 7. List the names of managers who have at least one dependent.
In this query, we retrieve the Ssns of managers in MGRS, and the Ssns of employees
with at least one dependent in EMPS_WITH_DEPS, then we apply the SET
INTERSECTION operation to get the Ssns of managers who have at least one
dependent.
As we mentioned earlier, the same query can be specified in many different ways in
relational algebra. In particular, the operations can often be applied in various orders.
In addition, some operations can be used to replace others; for example, the
INTERSECTION operation in Q7 can be replaced by a NATURAL JOIN. As an exercise, try
to do each of these sample queries using different operations.12 We showed how to
write queries as single relational algebra expressions for queries Q1, Q4, and Q6. Try to
write the remaining queries as single expressions. In Sections 6 and 7, we show how
these queries are written in other relational languages.
It has been shown that any retrieval that can be specified in the basic relational alge-
bra can also be specified in relational calculus, and vice versa; in other words, the
expressive power of the languages is identical. This led to the definition of the con-
cept of a relationally complete language. A relational query language L is considered
relationally complete if we can express in L any query that can be expressed in rela-
tional calculus. Relational completeness has become an important basis for compar-
ing the expressive power of high-level query languages. However, as we saw in
Section 4, certain frequently required queries in database applications cannot be
expressed in basic relational algebra or calculus. Most relational query languages are
relationally complete but have more expressive power than relational algebra or rela-
tional calculus because of additional operations such as aggregate functions, group-
ing, and ordering. As we mentioned in the introduction to this chapter, the
relational calculus is important for two reasons. First, it has a firm basis in mathe-
matical logic. Second, the standard query language (SQL) for RDBMSs has some of
its foundations in the tuple relational calculus.
12When queries are optimized, the system will choose a particular sequence of operations that corre-
sponds to an execution strategy that can be executed efficiently.
Our examples refer to the database shown in Figures A.1 and A.3. We will use the
same queries that were used in Section 5. Sections 6.6, 6.7, and 6.8 discuss dealing
with universal quantifiers and safety of expression issues. (Students interested in a
basic introduction to tuple relational calculus may skip these sections.)
The condition EMPLOYEE(t) specifies that the range relation of tuple variable t is
EMPLOYEE. Each EMPLOYEE tuple t that satisfies the condition t.Salary>50000 will
be retrieved. Notice that t.Salary references attribute Salary of tuple variable t; this
notation resembles how attribute names are qualified with relation names or aliases
in SQL; t.Salary is the same as writing t[Salary].
The above query retrieves all attribute values for each selected EMPLOYEE tuple t. To
retrieve only some of the attributes—say, the first and last names—we write
Informally, we need to specify the following information in a tuple relational calcu-
lus expression:
■ For each tuple variable t, the range relation R of t. This value is specified by
a condition of the form R(t). If we do not specify a range relation, then the
variable t will range over all possible tuples “in the universe” as it is not
restricted to any one relation.
■ A condition to select particular combinations of tuples. As tuple variables
range over their respective range relations, the condition is evaluated for
every possible combination of tuples to identify the selected combinations
for which the condition evaluates to TRUE.
■ A set of attributes to be retrieved, the requested attributes. The values of
these attributes are retrieved for each selected combination of tuples.
Before we discuss the formal syntax of tuple relational calculus, consider another
query.
Query 0. Retrieve the birth date and address of the employee (or employees)
whose name is John B. Smith.
In tuple relational calculus, we first specify the requested attributes t.Bdate and
t.Address for each selected tuple t. Then we specify the condition for selecting a
tuple following the bar (|)—namely, that t be a tuple of the EMPLOYEE relation
whose Fname, Minit, and Lname attribute values are ‘John’, ‘B’, and ‘Smith’, respectively.
where t1, t2, …, tn, tn+1, …, tn+m are tuple variables, each Ai is an attribute of the rela-
tion on which ti ranges, and COND is a condition or formula.
1. An atom of the form R(ti), where R is a relation name and ti is a tuple vari-
able. This atom identifies the range of the tuple variable ti as the relation
whose name is R. It evaluates to TRUE if ti is a tuple in the relation R, and
evaluates to FALSE otherwise.
2. An atom of the form ti.A op tj.B, where op is one of the comparison opera-
tors in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an attribute of
the relation on which ti ranges, and B is an attribute of the relation on which
tj ranges.
3. An atom of the form ti.A op c or c op tj.B, where op is one of the compari-
son operators in the set {=, <, ≤, >, ≥, ≠}, ti and tj are tuple variables, A is an
attribute of the relation on which ti ranges, B is an attribute of the relation
on which tj ranges, and c is a constant value.
Each of the preceding atoms evaluates to either TRUE or FALSE for a specific combi-
nation of tuples; this is called the truth value of an atom. In general, a tuple variable
t ranges over all possible tuples in the universe. For atoms of the form R(t), if t is
assigned to a tuple that is a member of the specified relation R, the atom is TRUE; oth-
erwise, it is FALSE. In atoms of types 2 and 3, if the tuple variables are assigned to
tuples such that the values of the specified attributes of the tuples satisfy the condi-
tion, then the atom is TRUE.
■ Rule 1: Every atom is a formula.
13Also called a well-formed formula, or WFF, in mathematical logic.
a. (F1 AND F2) is TRUE if both F1 and F2 are TRUE; otherwise, it is FALSE.
b. (F1 OR F2) is FALSE if both F1 and F2 are FALSE; otherwise, it is TRUE.
c. NOT (F1) is TRUE if F1 is FALSE; it is FALSE if F1 is TRUE.
d. NOT (F2) is TRUE if F2 is FALSE; it is FALSE if F2 is TRUE.
■ An occurrence of a tuple variable in a formula F that is an atom is free in F.
■ All free occurrences of a tuple variable t in F are bound in a formula F� of the
form F�= (∃∃ t)(F) or F� = (∀∀t)(F). The tuple variable is bound to the quanti-
fier specified in F�. For example, consider the following formulas:
The tuple variable d is free in both F1 and F2, whereas it is bound to the (∀∀) quan-
tifier in F3. Variable t is bound to the (∃∃) quantifier in F2.
We can now give Rules 3 and 4 for the definition of a formula we started earlier:
■ Rule 3: If F is a formula, then so is (∃∃t)(F), where t is a tuple variable. The
formula (∃∃t)(F) is TRUE if the formula F evaluates to TRUE for some (at least
one) tuple assigned to free occurrences of t in F; otherwise, (∃∃t)(F) is FALSE.
■ Rule 4: If F is a formula, then so is (∀∀t)(F), where t is a tuple variable. The
formula (∀∀t)(F) is TRUE if the formula F evaluates to TRUE for every tuple
(in the universe) assigned to free occurrences of t in F; otherwise, (∀∀t)(F) is
FALSE.
The (∃∃) quantifier is called an existential quantifier because a formula (∃∃t)(F) is
TRUE if there exists some tuple that makes F TRUE. For the universal quantifier,
(∀∀t)(F) is TRUE if every possible tuple that can be assigned to free occurrences of t
in F is substituted for t, and F is TRUE for every such substitution. It is called the uni-
versal or for all quantifier because every tuple in the universe of tuples must make F
TRUE to make the quantified formula TRUE.
Query 1. List the name and address of all employees who work for the
‘Research’ department.
The only free tuple variables in a tuple relational calculus expression should be those
that appear to the left of the bar (|). In Q1, t is the only free variable; it is then bound
successively to each tuple. If a tuple satisfies the conditions specified after the bar in
Q1, the attributes Fname, Lname, and Address are retrieved for each such tuple. The
conditions EMPLOYEE(t) and DEPARTMENT(d) specify the range relations for t and
d. The condition d.Dname = ‘Research’ is a selection condition and corresponds to a
SELECT operation in the relational algebra, whereas the condition d.Dnumber =
t.Dno is a join condition and is similar in purpose to the (INNER) JOIN operation
(see Section 3).
Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name,
birth date, and address.
In Q2 there are two free tuple variables, p and m. Tuple variable d is bound to the
existential quantifier. The query condition is evaluated for every combination of
tuples assigned to p and m, and out of all possible combinations of tuples to which
p and m are bound, only the combinations that satisfy the condition are selected.
Several tuple variables in a query can range over the same relation. For example, to
specify Q8—for each employee, retrieve the employee’s first and last name and the
first and last name of his or her immediate supervisor—we specify two tuple vari-
ables e and s that both range over the EMPLOYEE relation:
Query 3�. List the name of each employee who works on some project con-
trolled by department number 5. This is a variation of Q3 in which all is
changed to some. In this case we need two join conditions and two existential
quantifiers.
Query 4. Make a list of project numbers for projects that involve an employee
whose last name is ‘Smith’, either as a worker or as manager of the controlling
department for the project.
Compare this with the relational algebra version of this query in Section 5. The
UNION operation in relational algebra can usually be substituted with an OR con-
nective in relational calculus.
In the next section we discuss the relationship between the universal and existential
quantifiers and show how one can be transformed into the other.
Query 3. List the names of employees who work on all the projects controlled
by department number 5. One way to specify this query is to use the universal
quantifier as shown:
We want to make sure that a selected employee e works on all the projects controlled
by department 5, but the definition of universal quantifier says that to make the
quantified formula TRUE, the inner formula must be TRUE for all tuples in the uni-
verse. The trick is to exclude from the universal quantification all tuples that we are
not interested in by making the condition TRUE for all such tuples. This is necessary
because a universally quantified tuple variable, such as x in Q3, must evaluate to
TRUE for every possible tuple assigned to it to make the quantified formula TRUE.
1. For the formula F� = (∀∀x)(F) to be TRUE, we must have the formula F be
TRUE for all tuples in the universe that can be assigned to x. However, in Q3 we
are only interested in F being TRUE for all tuples of the PROJECT relation
that are controlled by department 5. Hence, the formula F is of the form
(NOT(PROJECT(x)) OR F1). The ‘NOT (PROJECT(x)) OR …’ condition is
TRUE for all tuples not in the PROJECT relation and has the effect of elimi-
nating these tuples from consideration in the truth value of F1. For every
tuple in the PROJECT relation, F1 must be TRUE if F� is to be TRUE.
2. Using the same line of reasoning, we do not want to consider tuples in the
PROJECT relation that are not controlled by department number 5, since we
are only interested in PROJECT tuples whose Dnum=5. Therefore, we can
write:
3. Formula F1, hence, is of the form NOT(x.Dnum=5) OR F2. In the context of
Q3, this means that, for a tuple x in the PROJECT relation, either its Dnum≠5
or it must satisfy F2.
4. Finally, F2 gives the condition that we want to hold for a selected EMPLOYEE
tuple: that the employee works on every PROJECT tuple that has not been
excluded yet. Such employee tuples are selected by the query.
In English, Q3 gives the following condition for selecting an EMPLOYEE tuple e: For
every tuple x in the PROJECT relation with x.Dnum=5, there must exist a tuple w in
WORKS_ON such that w.Essn=e.Ssn and w.Pno=x.Pnumber. This is equivalent to
saying that EMPLOYEE e works on every PROJECT x in DEPARTMENT number 5.
(Whew!)
Using the general transformation from universal to existential quantifiers given in
Section 6.6, we can rephrase the query in Q3 as shown in Q3A, which uses a negated
existential quantifier instead of the universal quantifier:
We now give some additional examples of queries that use quantifiers.
Query 6. List the names of employees who have no dependents.
Query 7. List the names of managers who have at least one dependent.
This query is handled by interpreting managers who have at least one dependent as
managers for whom there exists some dependent.
is unsafe because it yields all tuples in the universe that are not EMPLOYEE tuples,
which are infinitely numerous. If we follow the rules for Q3 discussed earlier, we will
get a safe expression when using universal quantifiers. We can define safe expres-
sions more precisely by introducing the concept of the domain of a tuple relational
calculus expression: This is the set of all values that either appear as constant values
in the expression or exist in any tuple in the relations referenced in the expression.
For example, the domain of {t | NOT(EMPLOYEE(t))} is the set of all attribute values
appearing in some tuple of the EMPLOYEE relation (for any attribute). The domain
An expression is said to be safe if all values in its result are from the domain of the
expression. Notice that the result of {t | NOT(EMPLOYEE(t))} is unsafe, since it will,
in general, include tuples (and hence values) from outside the EMPLOYEE relation;
such values are not in the domain of the expression. All of our other examples are
safe expressions.
Domain calculus differs from tuple calculus in the type of variables used in formu-
las: Rather than having variables range over tuples, the variables range over single
values from domains of attributes. To form a relation of degree n for a query result,
we must have n of these domain variables—one for each attribute. An expression of
the domain calculus is of the form
A formula is made up of atoms. The atoms of a formula are slightly different from
those for the tuple calculus and can be one of the following:
1. An atom of the form R(x1, x2, …, xj), where R is the name of a relation of
degree j and each xi, 1 ≤ i ≤ j, is a domain variable. This atom states that a list
of values of must be a tuple in the relation whose name is R,
where xi is the value of the ith attribute value of the tuple. To make a domain
calculus expression more concise, we can drop the commas in a list of vari-
ables; thus, we can write:
2. An atom of the form xi op xj, where op is one of the comparison operators in
the set {=, <, ≤, >, ≥, ≠}, and xi and xj are domain variables.
3. An atom of the form xi op c or c op xj, where op is one of the comparison
operators in the set {=, <, ≤, >, ≥, ≠}, xi and xj are domain variables, and c is a
constant value.
As in tuple calculus, atoms evaluate to either TRUE or FALSE for a specific set of val-
ues, called the truth values of the atoms. In case 1, if the domain variables are
assigned values corresponding to a tuple of the specified relation R, then the atom is
TRUE. In cases 2 and 3, if the domain variables are assigned values that satisfy the
condition, then the atom is TRUE.
In a similar way to the tuple relational calculus, formulas are made up of atoms,
variables, and quantifiers, so we will not repeat the specifications for formulas here.
Some examples of queries specified in the domain calculus follow. We will use low-
ercase letters l, m, n, …, x, y, z for domain variables.
Query 0. List the birth date and address of the employee whose name is ‘John
B. Smith’.
An alternative shorthand notation, used in QBE, for writing this query is to assign
the constants ‘John’, ‘B’, and ‘Smith’ directly as shown in Q0A. Here, all variables not
appearing to the left of the bar are implicitly existentially quantified:15
Query 1. Retrieve the name and address of all employees who work for the
‘Research’ department.
A condition relating two domain variables that range over attributes from two rela-
tions, such as m = z in Q1, is a join condition, whereas a condition that relates a
domain variable to a constant, such as l = ‘Research’, is a selection condition.
14Note that the notation of quantifying only the domain variables actually used in conditions and of
showing a predicate such as EMPLOYEE(qrstuvwxyz) without separating domain variables with commas
is an abbreviated notation used for convenience; it is not the correct formal notation.
15Again, this is not a formally accurate notation.
Query 2. For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last name,
birth date, and address.
Query 6. List the names of employees who have no dependents.
Q6 can be restated using universal quantifiers instead of the existential quantifiers,
as shown in Q6A:
Query 7. List the names of managers who have at least one dependent.
As we mentioned earlier, it can be shown that any query that can be expressed in the
basic relational algebra can also be expressed in the domain or tuple relational cal-
culus. Also, any safe expression in the domain or tuple relational calculus can be
expressed in the basic relational algebra.
In Sections 1 through 3, we introduced the basic relational algebra operations and
illustrated the types of queries for which each is used. First, we discussed the unary
relational operators SELECT and PROJECT, as well as the RENAME operation. Then,
we discussed binary set theoretic operations requiring that relations on which they
are applied be union (or type) compatible; these include UNION, INTERSECTION, and
SET DIFFERENCE. The CARTESIAN PRODUCT operation is a set operation that can
be used to combine tuples from two relations, producing all possible combinations. It
is rarely used in practice; however, we showed how CARTESIAN PRODUCT followed
by SELECT can be used to define matching tuples from two relations and leads to the
JOIN operation. Different JOIN operations called THETA JOIN, EQUIJOIN, and
NATURAL JOIN were introduced. Query trees were introduced as a graphical represen-
tation of relational algebra queries, which can also be used as the basis for internal
data structures that the DBMS can use to represent a query.
We discussed some important types of queries that cannot be stated with the basic
relational algebra operations but are important for practical situations. We intro-
duced GENERALIZED PROJECTION to use functions of attributes in the projection
list and the AGGREGATE FUNCTION operation to deal with aggregate types of sta-
tistical requests that summarize the information in the tables. We discussed recur-
sive queries, for which there is no direct support in the algebra but which can be
handled in a step-by-step approach, as we demonstrated. Then we presented the
OUTER JOIN and OUTER UNION operations, which extend JOIN and UNION and
allow all information in source relations to be preserved in the result.
The last two sections described the basic concepts behind relational calculus, which
is based on the branch of mathematical logic called predicate calculus. There are
two types of relational calculi: (1) the tuple relational calculus, which uses tuple
variables that range over tuples (rows) of relations, and (2) the domain relational
calculus, which uses domain variables that range over domains (columns of rela-
tions). In relational calculus, a query is specified in a single declarative statement,
without specifying any order or method for retrieving the query result. Hence, rela-
tional calculus is often considered to be a higher-level declarative language than the
relational algebra, because a relational calculus expression states what we want to
retrieve regardless of how the query may be executed.
We discussed the syntax of relational calculus queries using both tuple and domain
variables. We introduced query graphs as an internal representation for queries in
relational calculus. We also discussed the existential quantifier (∃∃) and the universal
quantifier (∀∀). We saw that relational calculus variables are bound by these quanti-
fiers. We described in detail how queries with universal quantification are written,
and we discussed the problem of specifying safe queries whose results are finite. We
also discussed rules for transforming universal into existential quantifiers, and vice
versa. It is the quantifiers that give expressive power to the relational calculus, mak-
ing it equivalent to the basic relational algebra. There is no analog to grouping and
aggregation functions in basic relational calculus, although some extensions have
been suggested.
2. What is union compatibility? Why do the UNION, INTERSECTION, and
DIFFERENCE operations require that the relations on which they are applied
be union compatible?
3. Discuss some types of queries for which renaming of attributes is necessary
in order to specify the query unambiguously.
4. Discuss the various types of inner join operations. Why is theta join
required?
5. What role does the concept of foreign key play when specifying the most
common types of meaningful join operations?
6. What is the FUNCTION operation? What is it used for?
7. How are the OUTER JOIN operations different from the INNER JOIN opera-
tions? How is the OUTER UNION operation different from UNION?
8. In what sense does relational calculus differ from relational algebra, and in
what sense are they similar?
9. How does tuple relational calculus differ from domain relational calculus?
10. Discuss the meanings of the existential quantifier (∃∃) and the universal
quantifier (∀∀).
11. Define the following terms with respect to the tuple calculus: tuple variable,
range relation, atom, formula, and expression.
12. Define the following terms with respect to the domain calculus: domain vari-
able, range relation, atom, formula, and expression.
13. What is meant by a safe expression in relational calculus?
14. When is a query language called relationally complete?
to the database state in Figure A.1.
16. Specify the following queries on the COMPANYrelational database schema
shown in Figure A.2, using the relational operators discussed in this chapter.
Also show the result of each query as it would apply to the database state in
Figure A.1.
a. Retrieve the names of all employees in department 5 who work more than
10 hours per week on the ProductX project.
b. List the names of all employees who have a dependent with the same first
name as themselves.
c. Find the names of all employees who are directly supervised by ‘Franklin
Wong’.
d. For each project, list the project name and the total hours per week (by all
employees) spent on that project.
e. Retrieve the names of all employees who work on every project.
f. Retrieve the names of all employees who do not work on any project.
g. For each department, retrieve the department name and the average
salary of all employees working in that department.
h. Retrieve the average salary of all female employees.
i. Find the names and addresses of all employees who work on at least one
project located in Houston but whose department has no location in
Houston.
j. List the last names of all department managers who have no dependents.
17. From the chapter “The Relational Data Model and Relational Database
Constraints,” consider the AIRLINE relational database schema shown in its
Figure 8, which was described in its Exercise 12. Specify the following queries
in relational algebra:
a. For each flight, list the flight number, the departure airport for the first leg
of the flight, and the arrival airport for the last leg of the flight.
b. List the flight numbers and weekdays of all flights or flight legs that
depart from Houston Intercontinental Airport (airport code ‘IAH’) and
arrive in Los Angeles International Airport (airport code ‘LAX’).
c. List the flight number, departure airport code, scheduled departure time,
arrival airport code, scheduled arrival time, and weekdays of all flights or
flight legs that depart from some airport in the city of Houston and arrive
at some airport in the city of Los Angeles.
d. List all fare information for flight number ‘CO197’.
e. Retrieve the number of available seats for flight number ‘CO197’ on
‘2009-10-09’.
18. Consider the LIBRARY relational database schema shown in Figure 14, which
is used to keep track of books, borrowers, and book loans. Referential
integrity constraints are shown as directed arcs in Figure 14. Write down
relational expressions for the following queries:
a. How many copies of the book titled The Lost Tribe are owned by the
library branch whose name is ‘Sharpstown’?
b. How many copies of the book titled The Lost Tribe are owned by each
library branch?
c. Retrieve the names of all borrowers who do not have any books checked
out.
d. For each book that is loaned out from the Sharpstown branch and whose
Due_date is today, retrieve the book title, the borrower’s name, and the
borrower’s address.
e. For each library branch, retrieve the branch name and the total number
of books loaned out from that branch.
f. Retrieve the names, addresses, and number of books checked out for all
borrowers who have more than five books checked out.
g. For each book authored (or coauthored) by Stephen King, retrieve the
title and the number of copies owned by the library branch whose name
is Central.
19. Specify the following queries in relational algebra on the database schema
given in Exercise 14 of the chapter “The Relational Data Model and
Relational Database Constraints”:
a. List the Order# and Ship_date for all orders shipped from Warehouse# W2.
b. List the WAREHOUSE information from which the CUSTOMER named
Jose Lopez was supplied his orders. Produce a listing: Order#, Warehouse#.
c. Produce a listing Cname, No_of_orders, Avg_order_amt, where the middle
column is the total number of orders by the customer and the last column
is the average order amount for that customer.
d. List the orders that were not shipped within 30 days of ordering.
e. List the Order# for orders that were shipped from all warehouses that the
company has in New York.
20. Specify the following queries in relational algebra on the database schema
given in Exercise 15 of the chapter “The Relational Data Model and
Relational Database Constraints”:
a. Give the details (all attributes of trip relation) for trips that exceeded
$2,000 in expenses.
b. Print the Ssns of salespeople who took trips to Honolulu.
c. Print the total trip expenses incurred by the salesperson with SSN = ‘234-
56-7890’.
21. Specify the following queries in relational algebra on the database schema
given in Exercise 16 of the chapter “The Relational Data Model and
Relational Database Constraints”:
a. List the number of courses taken by all students named John Smith in
Winter 2009 (i.e., Quarter=W09).
b. Produce a list of textbooks (include Course#, Book_isbn, Book_title) for
courses offered by the ‘CS’ department that have used more than two
books.
c. List any department that has all its adopted books published by ‘Pearson
Publishing’.
22. Consider the two tables T1 and T2 shown in Figure 15. Show the results of
the following operations:
a. T1 T1.P = T2.A T2
b. T1 T1.Q = T2.B T2
c. T1 T1.P = T2.A T2
d. T1 T1.Q = T2.B T2
e. T1 ∪ T2
f. T1 (T1.P = T2.A AND T1.R = T2.C) T2
23. Specify the following queries in relational algebra on the database schema in
Exercise 17 of the chapter “The Relational Data Model and Relational
Database Constraints”:
a. For the salesperson named ‘Jane Doe’, list the following information for
all the cars she sold: Serial#, Manufacturer, Sale_price.
b. List the Serial# and Model of cars that have no options.
c. Consider the NATURAL JOIN operation between SALESPERSON and
SALE. What is the meaning of a left outer join for these tables (do not
change the order of relations)? Explain with an example.
d. Write a query in relational algebra involving selection and one set opera-
tion and say in words what the query does.
24. Specify queries a, b, c, e, f, i, and j of Exercise 16 in both tuple and domain
relational calculus.
25. Specify queries a, b, c, and d of Exercise 17 in both tuple and domain rela-
tional calculus.
26. Specify queries c, d, and f of Exercise 18 in both tuple and domain relational
calculus.
27. In a tuple relational calculus query with n tuple variables, what would be the
typical minimum number of join conditions? Why? What is the effect of
having a smaller number of join conditions?
28. Rewrite the domain relational calculus queries that followed Q0 in Section 7
in the style of the abbreviated notation of Q0A, where the objective is to min-
imize the number of domain variables by writing constants in place of vari-
ables wherever possible.
29. Consider this query: Retrieve the Ssns of employees who work on at least
those projects on which the employee with Ssn=123456789 works. This may
be stated as (FORALL x) (IF P THEN Q), where
■ x is a tuple variable that ranges over the PROJECT relation.
■ P ≡ EMPLOYEE with Ssn=123456789 works on PROJECT x.
■ Q ≡ EMPLOYEE e works on PROJECT x.
■ (∀∀ x)(P(x)) ≡ NOT(∃∃x)(NOT(P(x))).
■ (IF P THEN Q) ≡ (NOT(P) OR Q).
30. Show how you can specify the following relational algebra operations in
both tuple and domain relational calculus.
d. R(A, B, C) ∪ S(A, B, C)
e. R(A, B, C) ∩ S(A, B, C)
f. R(A, B, C) = S(A, B, C)
g. R(A, B, C) × S(D, E, F)
h. R(A, B) ÷ S(A)
31. Suggest extensions to the relational calculus so that it may express the fol-
lowing types of operations that were discussed in Section 4: (a) aggregate
functions and grouping; (b) OUTER JOIN operations; (c) recursive closure
queries.
32. A nested query is a query within a query. More specifically, a nested query is
a parenthesized query whose result can be used as a value in a number of
places, such as instead of a relation. Specify the following queries on the
database specified in Figure A.2 using the concept of nested queries and the
relational operators discussed in this chapter. Also show the result of each
query as it would apply to the database state in Figure A.1.
a. List the names of all employees who work in the department that has the
employee with the highest salary among all employees.
b. List the names of all employees whose supervisor’s supervisor has
‘888665555’ for Ssn.
c. List the names of employees who make at least $10,000 more than the
employee who is paid the least in the company.
33. State whether the following conclusions are true or false:
a. NOT (P(x) OR Q(x)) → (NOT (P(x)) AND (NOT (Q(x)))
b. NOT (∃∃x) (P(x)) → ∀∀ x (NOT (P(x))
c. (∃∃x) (P(x)) → ∀∀ x ((P(x))
the RA interpreter on the COMPANY database schema in Figure A.2.
a. List the names of all employees in department 5 who work more than 10
hours per week on the ProductX project.
b. List the names of all employees who have a dependent with the same first
name as themselves.
c. List the names of employees who are directly supervised by Franklin
Wong.
d. List the names of employees who work on every project.
e. List the names of employees who do not work on any project.
f. List the names and addresses of employees who work on at least one proj-
ect located in Houston but whose department has no location in
Houston.
g. List the names of department managers who have no dependents.
35. Consider the following MAILORDER relational schema describing the data
for a mail order company.
a. Retrieve the names of parts that cost less than $20.00.
b. Retrieve the names and cities of employees who have taken orders for
parts costing more than $50.00.
c. Retrieve the pairs of customer number values of customers who live in
the same ZIP Code.
d. Retrieve the names of customers who have ordered parts from employees
living in Wichita.
e. Retrieve the names of customers who have ordered parts costing less than
$20.00.
f. Retrieve the names of customers who have not placed an order.
g. Retrieve the names of customers who have placed exactly two orders.
36. Consider the following GRADEBOOK relational schema describing the data
for a grade book of a particular instructor. (Note: The attributes A, B, C, and
D of COURSES store grade cutoffs.)
a. Retrieve the names of students enrolled in the Automata class during the
fall 2009 term.
b. Retrieve the Sid values of students who have enrolled in CSc226 and
CSc227.
c. Retrieve the Sid values of students who have enrolled in CSc226 or
CSc227.
d. Retrieve the names of students who have not enrolled in any class.
e. Retrieve the names of students who have enrolled in all courses in the
CATALOG table.
37. Consider a database that consists of the following relations.
a. Retrieve the part numbers that are supplied to exactly two projects.
b. Retrieve the names of suppliers who supply more than two parts to proj-
ect ‘J1’.
c. Retrieve the part numbers that are supplied by every supplier.
d. Retrieve the project names that are supplied by supplier ‘S1’ only.
e. Retrieve the names of suppliers who supply at least two different parts
each to at least two different projects.
38. Specify and execute the following queries for the database in Exercise 16 of
the chapter “The Relational Data Model and Relational Database
Constraints” using the RA interpreter.
a. Retrieve the names of students who have enrolled in a course that uses a
textbook published by Addison-Wesley.
b. Retrieve the names of courses in which the textbook has been changed at
least once.
c. Retrieve the names of departments that adopt textbooks published by
Addison-Wesley only.
d. Retrieve the names of departments that adopt textbooks written by
Navathe and published by Addison-Wesley.
e. Retrieve the names of students who have never used a book (in a course)
written by Navathe and published by Addison-Wesley.
39. Repeat Laboratory Exercises 34 through 38 in domain relational calculus
(DRC) by using the DRC interpreter.
Conceptual modeling is a very important phase indesigning a successful database application.
Generally, the term database application refers to a particular database and the
associated programs that implement the database queries and updates. For exam-
ple, a BANK database application that keeps track of customer accounts would
include programs that implement database updates corresponding to customer
deposits and withdrawals. These programs provide user-friendly graphical user
interfaces (GUIs) utilizing forms and menus for the end users of the application—
the bank tellers, in this example. Hence, a major part of the database application will
require the design, implementation, and testing of these application programs.
Traditionally, the design and testing of application programs has been considered
to be part of software engineering rather than database design. In many software
design tools, the database design methodologies and software engineering method-
ologies are intertwined since these activities are strongly related.
In this chapter, we follow the traditional approach of concentrating on the database
structures and constraints during conceptual database design. The design of appli-
cation programs is typically covered in software engineering courses. We present the
modeling concepts of the Entity-Relationship (ER) model, which is a popular
high-level conceptual data model. This model and its variations are frequently used
for the conceptual design of database applications, and many database design tools
employ its concepts. We describe the basic data-structuring concepts and con-
straints of the ER model and discuss their use in the design of conceptual schemas
for database applications. We also present the diagrammatic notation associated
with the ER model, known as ER diagrams.
From Chapter 7 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
This chapter is organized as follows: Section 1 discusses the role of high-level con-
ceptual data models in database design. We introduce the requirements for a sample
database application in Section 2 to illustrate the use of concepts from the ER
model. This sample database is also used throughout the book. In Section 3 we pres-
ent the concepts of entities and attributes, and we gradually introduce the diagram-
matic technique for displaying an ER schema. In Section 4 we introduce the
concepts of binary relationships and their roles and structural constraints. Section 5
introduces weak entity types. Section 6 shows how a schema design is refined to
include relationships. Section 7 reviews the notation for ER diagrams, summarizes
the issues and common pitfalls that occur in schema design, and discusses how to
choose the names for database schema constructs. Section 8 introduces some UML
class diagram concepts, compares them to ER model concepts, and applies them to
the same database example. Section 9 discusses more complex types of relation-
ships. Section 10 summarizes the chapter.
The material in Sections 8 and 9 may be excluded from an introductory course.
Figure 1 shows a simplified overview of the database design process. The first step
shown is requirements collection and analysis. During this step, the database
designers interview prospective database users to understand and document their
data requirements. The result of this step is a concisely written set of users’ require-
ments. These requirements should be specified in as detailed and complete a form
as possible. In parallel with specifying the data requirements, it is useful to specify
1A class is similar to an entity type in many ways.
the known functional requirements of the application. These consist of the user-
defined operations (or transactions) that will be applied to the database, including
both retrievals and updates. In software design, it is common to use data flow dia-
grams, sequence diagrams, scenarios, and other techniques to specify functional
requirements. We will not discuss any of these techniques here; they are usually
described in detail in software engineering texts.
Once the requirements have been collected and analyzed, the next step is to create a
conceptual schema for the database, using a high-level conceptual data model. This
step is called conceptual design. The conceptual schema is a concise description of
the data requirements of the users and includes detailed descriptions of the entity
types, relationships, and constraints; these are expressed using the concepts pro-
vided by the high-level data model. Because these concepts do not include imple-
mentation details, they are usually easier to understand and can be used to
communicate with nontechnical users. The high-level conceptual schema can also
be used as a reference to ensure that all users’ data requirements are met and that the
requirements do not conflict. This approach enables database designers to concen-
trate on specifying the properties of the data, without being concerned with storage
and implementation details. This makes it is easier to create a good conceptual data-
base design.
During or after the conceptual schema design, the basic data model operations can
be used to specify the high-level user queries and operations identified during func-
tional analysis. This also serves to confirm that the conceptual schema meets all the
identified functional requirements. Modifications to the conceptual schema can be
introduced if some functional requirements cannot be specified using the initial
schema.
The next step in database design is the actual implementation of the database, using
a commercial DBMS. Most current commercial DBMSs use an implementation
data model—such as the relational or the object-relational database model—so the
conceptual schema is transformed from the high-level data model into the imple-
mentation data model. This step is called logical design or data model mapping; its
result is a database schema in the implementation data model of the DBMS. Data
model mapping is often automated or semiautomated within the database design
tools.
The last step is the physical design phase, during which the internal storage struc-
tures, file organizations, indexes, access paths, and physical design parameters for
the database files are specified. In parallel with these activities, application programs
are designed and implemented as database transactions corresponding to the high-
level transaction specifications.
■ The company is organized into departments. Each department has a unique
name, a unique number, and a particular employee who manages the
department. We keep track of the start date when that employee began man-
aging the department. A department may have several locations.
The ER model describes data as entities, relationships, and attributes. In Section 3.1
we introduce the concepts of entities and their attributes. We discuss entity types
and key attributes in Section 3.2. Then, in Section 3.3, we specify the initial concep-
tual design of the entity types for the COMPANY database. Relationships are
described in Section 4.
2The Social Security number, or SSN, is a unique nine-digit identifier assigned to each individual in the
United States to keep track of his or her employment, benefits, and taxes. Other countries may have simi-
lar identification schemes, such as personal identification card numbers.
value for each of its attributes. The attribute values that describe each entity become
a major part of the data stored in the database.
Figure 3 shows two entities and the values of their attributes. The EMPLOYEE entity
e1 has four attributes: Name, Address, Age, and Home_phone; their values are ‘John
Smith,’ ‘2311 Kirby, Houston, Texas 77001’, ‘55’, and ‘713-749-2630’, respectively. The
COMPANY entity c1 has three attributes: Name, Headquarters, and President; their val-
ues are ‘Sunco Oil’, ‘Houston’, and ‘John Smith’, respectively.
Several types of attributes occur in the ER model: simple versus composite, single-
valued versus multivalued, and stored versus derived. First we define these attribute
types and illustrate their use via examples. Then we discuss the concept of a NULL
value for an attribute.
Composite versus Simple (Atomic) Attributes. Composite attributes can be
divided into smaller subparts, which represent more basic attributes with indepen-
dent meanings. For example, the Address attribute of the EMPLOYEE entity shown
in Figure 3 can be subdivided into Street_address, City, State, and Zip,3 with the val-
ues ‘2311 Kirby’, ‘Houston’, ‘Texas’, and ‘77001.’ Attributes that are not divisible are
called simple or atomic attributes. Composite attributes can form a hierarchy; for
example, Street_address can be further subdivided into three simple component
attributes: Number, Street, and Apartment_number, as shown in Figure 4. The value of
a composite attribute is the concatenation of the values of its component simple
attributes.
3Zip Code is the name used in the United States for a five-digit postal code, such as 76019, which can
be extended to nine digits, such as 76019-0015. We use the five-digit Zip in our examples.
need to subdivide it into component attributes. For example, if there is no need to
refer to the individual components of an address (Zip Code, street, and so on), then
the whole address can be designated as a simple attribute.
Single-Valued versus Multivalued Attributes. Most attributes have a single
value for a particular entity; such attributes are called single-valued. For example,
Age is a single-valued attribute of a person. In some cases an attribute can have a set
of values for the same entity—for instance, a Colors attribute for a car, or a
College_degrees attribute for a person. Cars with one color have a single value,
whereas two-tone cars have two color values. Similarly, one person may not have a
college degree, another person may have one, and a third person may have two or
more degrees; therefore, different people can have different numbers of values for
the College_degrees attribute. Such attributes are called multivalued. A multivalued
attribute may have lower and upper bounds to constrain the number of values
allowed for each individual entity. For example, the Colors attribute of a car may be
restricted to have between one and three values, if we assume that a car can have
three colors at most.
Stored versus Derived Attributes. In some cases, two (or more) attribute val-
ues are related—for example, the Age and Birth_date attributes of a person. For a
particular person entity, the value of Age can be determined from the current
(today’s) date and the value of that person’s Birth_date. The Age attribute is hence
called a derived attribute and is said to be derivable from the Birth_date attribute,
which is called a stored attribute. Some attribute values can be derived from
related entities; for example, an attribute Number_of_employees of a DEPARTMENT
entity can be derived by counting the number of employees related to (working
for) that department.
NULL Values. In some cases, a particular entity may not have an applicable value
for an attribute. For example, the Apartment_number attribute of an address applies
only to addresses that are in apartment buildings and not to other types of resi-
dences, such as single-family homes. Similarly, a College_degrees attribute applies
only to people with college degrees. For such situations, a special value called NULL
is created. An address of a single-family home would have NULL for its
Apartment_number attribute, and a person with no college degree would have NULL
for College_degrees. NULL can also be used if we do not know the value of an attrib-
ute for a particular entity—for example, if we do not know the home phone num-
ber of ‘John Smith’ in Figure 3. The meaning of the former type of NULL is not
applicable, whereas the meaning of the latter is unknown. The unknown category of
NULL can be further classified into two cases. The first case arises when it is known
that the attribute value exists but is missing—for instance, if the Height attribute of a
person is listed as NULL. The second case arises when it is not known whether the
attribute value exists—for example, if the Home_phone attribute of a person is NULL.
Complex Attributes. Notice that, in general, composite and multivalued attrib-
utes can be nested arbitrarily. We can represent arbitrary nesting by grouping com-
ponents of a composite attribute between parentheses () and separating the compo-
nents with commas, and by displaying multivalued attributes between braces { }.
Such attributes are called complex attributes. For example, if a person can have
more than one residence and each residence can have a single address and multiple
phones, an attribute Address_phone for a person can be specified as shown in Figure
5.4 Both Phone and Address are themselves composite attributes.
Entity Types and Entity Sets. A database usually contains groups of entities that
are similar. For example, a company employing hundreds of employees may want to
store similar information concerning each of the employees. These employee entities
share the same attributes, but each entity has its own value(s) for each attribute. An
entity type defines a collection (or set) of entities that have the same attributes. Each
entity type in the database is described by its name and attributes. Figure 6 shows
two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for
4For those familiar with XML, we should note that complex attributes are similar to complex elements in
XML.
each. A few individual entities of each type are also illustrated, along with the values
of their attributes. The collection of all entities of a particular entity type in the data-
base at any point in time is called an entity set; the entity set is usually referred to
using the same name as the entity type. For example, EMPLOYEE refers to both a type
of entity as well as the current set of all employee entities in the database.
An entity type is represented in ER diagrams5 (see Figure 2) as a rectangular box
enclosing the entity type name. Attribute names are enclosed in ovals and are
attached to their entity type by straight lines. Composite attributes are attached to
their component attributes by straight lines. Multivalued attributes are displayed in
double ovals. Figure 7(a) shows a CAR entity type in this notation.
An entity type describes the schema or intension for a set of entities that share the
same structure. The collection of entities of a particular entity type is grouped into
an entity set, which is also called the extension of the entity type.
Key Attributes of an Entity Type. An important constraint on the entities of an
entity type is the key or uniqueness constraint on attributes. An entity type usually
5We use a notation for ER diagrams that is close to the original proposed notation (Chen 1976). Many
other notations are in use; we illustrate some of them later in this chapter when we present UML class
diagrams.
has one or more attributes whose values are distinct for each individual entity in the
entity set. Such an attribute is called a key attribute, and its values can be used to
identify each entity uniquely. For example, the Name attribute is a key of the
COMPANY entity type in Figure 6 because no two companies are allowed to have the
same name. For the PERSON entity type, a typical key attribute is Ssn (Social
Security number). Sometimes several attributes together form a key, meaning that
the combination of the attribute values must be distinct for each entity. If a set of
attributes possesses this property, the proper way to represent this in the ER model
that we describe here is to define a composite attribute and designate it as a key
attribute of the entity type. Notice that such a composite key must be minimal; that
is, all component attributes must be included in the composite attribute to have the
uniqueness property. Superfluous attributes must not be included in a key. In ER
diagrammatic notation, each key attribute has its name underlined inside the oval,
as illustrated in Figure 7(a).
Some entity types have more than one key attribute. For example, each of the
Vehicle_id and Registration attributes of the entity type CAR (Figure 7) is a key in its
own right. The Registration attribute is an example of a composite key formed from
two simple component attributes, State and Number, neither of which is a key on its
own. An entity type may also have no key, in which case it is called a weak entity type
(see Section 5).
In our diagrammatic notation, if two attributes are underlined separately, then each
is a key on its own. Unlike the relational model, there is no concept of primary key in
the ER model that we present here; the primary key will be chosen during mapping
to a relational schema.
Value Sets (Domains) of Attributes. Each simple attribute of an entity type is
associated with a value set (or domain of values), which specifies the set of values
that may be assigned to that attribute for each individual entity. In Figure 6, if the
range of ages allowed for employees is between 16 and 70, we can specify the value
set of the Age attribute of EMPLOYEE to be the set of integer numbers between 16
and 70. Similarly, we can specify the value set for the Name attribute to be the set of
strings of alphabetic characters separated by blank characters, and so on. Value sets
are not displayed in ER diagrams, and are typically specified using the basic data
types available in most programming languages, such as integer, string, Boolean,
float, enumerated type, subrange, and so on. Additional data types to represent
common database types such as date, time, and other concepts are also employed.
Mathematically, an attribute A of entity set E whose value set is V can be defined as
a function from E to the power set6 P(V ) of V:
We refer to the value of attribute A for entity e as A(e). The previous definition cov-
ers both single-valued and multivalued attributes, as well as NULLs. A NULL value is
represented by the empty set. For single-valued attributes, A(e) is restricted to being
a singleton set for each entity e in E, whereas there is no restriction on multivalued
attributes.7 For a composite attribute A, the value set V is the power set of the
Cartesian product of P(V1), P(V2), …, P(Vn), where V1, V2, …, Vn are the value sets
of the simple component attributes that form A:
The value set provides all possible values. Usually only a small number of these val-
ues exist in the database at a particular time. Those values represent the data from
the current state of the miniworld. They correspond to the data as it actually exists
in the miniworld.
1. An entity type DEPARTMENT with attributes Name, Number, Locations,
Manager, and Manager_start_date. Locations is the only multivalued attribute.
We can specify that both Name and Number are (separate) key attributes
because each was specified to be unique.
2. An entity type PROJECT with attributes Name, Number, Location, and
Controlling_department. Both Name and Number are (separate) key attributes.
3. An entity type EMPLOYEE with attributes Name, Ssn, Sex, Address, Salary,
Birth_date, Department, and Supervisor. Both Name and Address may be com-
posite attributes; however, this was not specified in the requirements. We
must go back to the users to see if any of them will refer to the individual
components of Name—First_name, Middle_initial, Last_name—or of Address.
4. An entity type DEPENDENT with attributes Employee, Dependent_name, Sex,
Birth_date, and Relationship (to the employee).
6The power set P (V ) of a set V is the set of all subsets of V.
7A singleton set is a set with only one element (value).
So far, we have not represented the fact that an employee can work on several proj-
ects, nor have we represented the number of hours per week an employee works on
each project. This characteristic is listed as part of the third requirement in Section
2, and it can be represented by a multivalued composite attribute of EMPLOYEE
called Works_on with the simple components (Project, Hours). Alternatively, it can be
represented as a multivalued composite attribute of PROJECT called Workers with
the simple components (Employee, Hours). We choose the first alternative in Figure
8, which shows each of the entity types just described. The Name attribute of
EMPLOYEE is shown as a composite attribute, presumably after consultation with
the users.
In Figure 8 there are several implicit relationships among the various entity types. In
fact, whenever an attribute of one entity type refers to another entity type, some
relationship exists. For example, the attribute Manager of DEPARTMENT refers to an
employee who manages the department; the attribute Controlling_department of
PROJECT refers to the department that controls the project; the attribute Supervisor
of EMPLOYEE refers to another employee (the one who supervises this employee);
the attribute Department of EMPLOYEE refers to the department for which the
employee works; and so on. In the ER model, these references should not be repre-
sented as attributes but as relationships, which are discussed in this section. The
COMPANY database schema will be refined in Section 6 to represent relationships
explicitly. In the initial design of entity types, relationships are typically captured in
the form of attributes. As the design is refined, these attributes get converted into
relationships between entity types.
This section is organized as follows: Section 4.1 introduces the concepts of relation-
ship types, relationship sets, and relationship instances. We define the concepts of
relationship degree, role names, and recursive relationships in Section 4.2, and then
we discuss structural constraints on relationships—such as cardinality ratios and
existence dependencies—in Section 4.3. Section 4.4 shows how relationship types
can also have attributes.
Informally, each relationship instance ri in R is an association of entities, where the
association includes exactly one entity from each participating entity type. Each
such relationship instance ri represents the fact that the entities participating in ri
are related in some way in the corresponding miniworld situation. For example,
consider a relationship type WORKS_FOR between the two entity types EMPLOYEE
and DEPARTMENT, which associates each employee with the department for which
the employee works in the corresponding entity set. Each relationship instance in
the relationship set WORKS_FOR associates one EMPLOYEE entity and one
DEPARTMENT entity. Figure 9 illustrates this example, where each relationship
Degree of a Relationship Type. The degree of a relationship type is the number
of participating entity types. Hence, the WORKS_FOR relationship is of degree two.
A relationship type of degree two is called binary, and one of degree three is called
ternary. An example of a ternary relationship is SUPPLY, shown in Figure 10, where
each relationship instance ri associates three entities—a supplier s, a part p, and a
project j—whenever s supplies part p to project j. Relationships can generally be of
any degree, but the ones most common are binary relationships. Higher-degree
relationships are generally more complex than binary relationships; we characterize
them further in Section 9.
Relationships as Attributes. It is sometimes convenient to think of a binary
relationship type in terms of attributes, as we discussed in Section 3.3. Consider the
WORKS_FOR relationship type in Figure 9. One can think of an attribute called
Department of the EMPLOYEE entity type, where the value of Department for each
EMPLOYEE entity is (a reference to) the DEPARTMENT entity for which that
employee works. Hence, the value set for this Department attribute is the set of all
DEPARTMENT entities, which is the DEPARTMENT entity set. This is what we did in
Figure 8 when we specified the initial design of the entity type EMPLOYEE for the
COMPANY database. However, when we think of a binary relationship as an attrib-
ute, we always have two options. In this example, the alternative is to think of a mul-
tivalued attribute Employee of the entity type DEPARTMENT whose values for each
DEPARTMENT entity is the set of EMPLOYEE entities who work for that department.
The value set of this Employee attribute is the power set of the EMPLOYEE entity set.
Either of these two attributes—Department of EMPLOYEE or Employee of
DEPARTMENT—can represent the WORKS_FOR relationship type. If both are repre-
sented, they are constrained to be inverses of each other.8
8This concept of representing relationship types as attributes is used in a class of data models called
functional data models. In object databases, relationships can be represented by reference attributes,
either in one direction or in both directions as inverses. In relational databases, foreign keys are a type of
reference attribute used to represent relationships.
Role Names and Recursive Relationships. Each entity type that participates
in a relationship type plays a particular role in the relationship. The role name sig-
nifies the role that a participating entity from the entity type plays in each relation-
ship instance, and helps to explain what the relationship means. For example, in the
WORKS_FOR relationship type, EMPLOYEE plays the role of employee or worker and
DEPARTMENT plays the role of department or employer.
Role names are not technically necessary in relationship types where all the partici-
pating entity types are distinct, since each participating entity type name can be
used as the role name. However, in some cases the same entity type participates
more than once in a relationship type in different roles. In such cases the role name
becomes essential for distinguishing the meaning of the role that each participating
entity plays. Such relationship types are called recursive relationships. Figure 11
shows an example. The SUPERVISION relationship type relates an employee to a
supervisor, where both employee and supervisor entities are members of the same
EMPLOYEE entity set. Hence, the EMPLOYEE entity type participates twice in
SUPERVISION: once in the role of supervisor (or boss), and once in the role of
supervisee (or subordinate). Each relationship instance ri in SUPERVISION associates
two employee entities ej and ek, one of which plays the role of supervisor and the
other the role of supervisee. In Figure 11, the lines marked ‘1’ represent the supervi-
sor role, and those marked ‘2’ represent the supervisee role; hence, e1 supervises e2
and e3, e4 supervises e6 and e7, and e5 supervises e1 and e4. In this example, each rela-
tionship instance must be connected with two lines, one marked with ‘1’ (supervi-
sor) and the other with ‘2’ (supervisee).
Cardinality Ratios for Binary Relationships. The cardinality ratio for a binary
relationship specifies the maximum number of relationship instances that an entity
can participate in. For example, in the WORKS_FOR binary relationship type,
DEPARTMENT:EMPLOYEE is of cardinality ratio 1:N, meaning that each department
can be related to (that is, employs) any number of employees,9 but an employee can
be related to (work for) only one department. This means that for this particular
relationship WORKS_FOR, a particular department entity can be related to any
number of employees (N indicates there is no maximum number). On the other
hand, an employee can be related to a maximum of one department. The possible
cardinality ratios for binary relationship types are 1:1, 1:N, N:1, and M:N.
9N stands for any number of related entities (zero or more).
world rule is that an employee can work on several projects and a project can have
several employees.
Cardinality ratios for binary relationships are represented on ER diagrams by dis-
playing 1, M, and N on the diamonds as shown in Figure 2. Notice that in this nota-
tion, we can either specify no maximum (N) or a maximum of one (1) on
participation. An alternative notation (see Section 7.4) allows the designer to spec-
ify a specific maximum number on participation, such as 4 or 5.
Participation Constraints and Existence Dependencies. The participation
constraint specifies whether the existence of an entity depends on its being related
to another entity via the relationship type. This constraint specifies the minimum
number of relationship instances that each entity can participate in, and is some-
times called the minimum cardinality constraint. There are two types of participa-
tion constraints—total and partial—that we illustrate by example. If a company
policy states that every employee must work for a department, then an employee
entity can exist only if it participates in at least one WORKS_FOR relationship
instance (Figure 9). Thus, the participation of EMPLOYEE in WORKS_FOR is called
total participation, meaning that every entity in the total set of employee entities
must be related to a department entity via WORKS_FOR. Total participation is also
called existence dependency. In Figure 12 we do not expect every employee to man-
age a department, so the participation of EMPLOYEE in the MANAGES relationship
type is partial, meaning that some or part of the set of employee entities are related to
some department entity via MANAGES, but not necessarily all. We will refer to the
cardinality ratio and participation constraints, taken together, as the structural
constraints of a relationship type.
We will discuss constraints on higher-degree relationships in Section 9.
Notice that attributes of 1:1 or 1:N relationship types can be migrated to one of the
participating entity types. For example, the Start_date attribute for the MANAGES
relationship can be an attribute of either EMPLOYEE or DEPARTMENT, although
conceptually it belongs to MANAGES. This is because MANAGES is a 1:1 relation-
ship, so every department or employee entity participates in at most one relationship
instance. Hence, the value of the Start_date attribute can be determined separately,
either by the participating department entity or by the participating employee
(manager) entity.
For a 1:N relationship type, a relationship attribute can be migrated only to the
entity type on the N-side of the relationship. For example, in Figure 9, if the
WORKS_FOR relationship also has an attribute Start_date that indicates when an
employee started working for a department, this attribute can be included as an
attribute of EMPLOYEE. This is because each employee works for only one depart-
ment, and hence participates in at most one relationship instance in WORKS_FOR.
In both 1:1 and 1:N relationship types, the decision where to place a relationship
attribute—as a relationship type attribute or as an attribute of a participating entity
type—is determined subjectively by the schema designer.
In ER diagrams, both a weak entity type and its identifying relationship are distin-
guished by surrounding their boxes and diamonds with double lines (see Figure 2).
The partial key attribute is underlined with a dashed or dotted line.
10The identifying entity type is also sometimes called the parent entity type or the dominant entity
type.
11The weak entity type is also sometimes called the child entity type or the subordinate entity type.
12The partial key is sometimes called the discriminator.
weak entity type representation if there are many attributes. If the weak entity par-
ticipates independently in relationship types other than its identifying relationship
type, then it should not be modeled as a complex attribute.
We can now refine the database design in Figure 8 by changing the attributes that
represent relationships into relationship types. The cardinality ratio and participa-
tion constraint of each relationship type are determined from the requirements
listed in Section 2. If some cardinality ratio or dependency cannot be determined
from the requirements, the users must be questioned further to determine these
structural constraints.
■ MANAGES, a 1:1 relationship type between EMPLOYEE and DEPARTMENT.
EMPLOYEE participation is partial. DEPARTMENT participation is not clear
from the requirements. We question the users, who say that a department
must have a manager at all times, which implies total participation.13 The
attribute Start_date is assigned to this relationship type.
■ CONTROLS, a 1:N relationship type between DEPARTMENT and PROJECT.
The participation of PROJECT is total, whereas that of DEPARTMENT is
determined to be partial, after consultation with the users indicates that
some departments may control no projects.
13The rules in the miniworld that determine the constraints are sometimes called the business rules,
since they are determined by the business or organization that will utilize the database.
type DEPENDENT. The participation of EMPLOYEE is partial, whereas that of
DEPENDENT is total.
After specifying the above six relationship types, we remove from the entity types in
Figure 8 all attributes that have been refined into relationships. These include
Manager and Manager_start_date from DEPARTMENT; Controlling_department from
PROJECT; Department, Supervisor, and Works_on from EMPLOYEE; and Employee
from DEPENDENT. It is important to have the least possible redundancy when we
design the conceptual schema of a database. If some redundancy is desired at the
storage level or at the user view level, it can be introduced later.
Figure 2 displays the COMPANY ER database schema as an ER diagram. We now
review the full ER diagram notation. Entity types such as EMPLOYEE,
DEPARTMENT, and PROJECT are shown in rectangular boxes. Relationship types
such as WORKS_FOR, MANAGES, CONTROLS, and WORKS_ON are shown in
diamond-shaped boxes attached to the participating entity types with straight lines.
Attributes are shown in ovals, and each attribute is attached by a straight line to its
entity type or relationship type. Component attributes of a composite attribute are
attached to the oval representing the composite attribute, as illustrated by the Name
attribute of EMPLOYEE. Multivalued attributes are shown in double ovals, as illus-
trated by the Locations attribute of DEPARTMENT. Key attributes have their names
underlined. Derived attributes are shown in dotted ovals, as illustrated by the
Number_of_employees attribute of DEPARTMENT.
In Figure 2 the cardinality ratio of each binary relationship type is specified by
attaching a 1, M, or N on each participating edge. The cardinality ratio of
DEPARTMENT:EMPLOYEE in MANAGES is 1:1, whereas it is 1:N for DEPARTMENT:
EMPLOYEE in WORKS_FOR, and M:N for WORKS_ON. The participation
constraint is specified by a single line for partial participation and by double lines
for total participation (existence dependency).
Figure 14 summarizes the conventions for ER diagrams. It is important to note that
there are many other alternative diagrammatic notations (see Section 7.4).
As a general practice, given a narrative description of the database requirements, the
nouns appearing in the narrative tend to give rise to entity type names, and the verbs
tend to indicate names of relationship types. Attribute names generally arise from
additional nouns that describe the nouns corresponding to entity types.
. . .
In general, the schema design process should be considered an iterative refinement
process, where an initial design is created and then iteratively refined until the most
suitable design is reached. Some of the refinements that are often used include the
following:
■ Section 9 discusses choices concerning the degree of a relationship.
In this section, we describe one alternative ER notation for specifying structural
constraints on relationships, which replaces the cardinality ratio (1:1, 1:N, M:N)
and single/double line notation for participation constraints. This notation involves
associating a pair of integer numbers (min, max) with each participation of an
entity type E in a relationship type R, where 0 ≤ min ≤ max and max ≥ 1. The num-
bers mean that for each entity e in E, e must participate in at least min and at most
14In some notations, particularly those used in object modeling methodologies such as UML, the (min,
max) is placed on the opposite sides to the ones we have shown. For example, for the WORKS_FOR
relationship in Figure 15, the (1,1) would be on the DEPARTMENT side, and the (4,N) would be on the
EMPLOYEE side. Here we used the original notation from Abrial (1974).
max relationship instances in R at any point in time. In this method, min = 0 implies partial participation,
whereas min > 0 implies total participation.
Figure 15 displays the COMPANY database schema using the (min, max) notation.14 Usually, one uses
either the cardinality ratio/single-line/double-line notation or the (min, max) notation. The (min, max)
. . .
notation is more precise, and we can use it to specify some structural constraints for
relationship types of higher degree. However, it is not sufficient for specifying some
key constraints on higher-degree relationships, as discussed in Section 9.
Figure 15 also displays all the role names for the COMPANY database schema.
In UML class diagrams, a class (similar to an entity type in ER) is displayed as a box
(see Figure 16) that includes three sections: The top section gives the class name
(similar to entity type name); the middle section includes the attributes; and the
last section includes operations that can be applied to individual objects (similar to
individual entities in an entity set) of the class. Operations are not specified in ER
diagrams. Consider the EMPLOYEE class in Figure 16. Its attributes are Name, Ssn,
Bdate, Sex, Address, and Salary. The designer can optionally specify the domain of
an attribute if desired, by placing a colon (:) followed by the domain name or
description, as illustrated by the Name, Sex, and Bdate attributes of EMPLOYEE in
Figure 16. A composite attribute is modeled as a structured domain, as illustrated
by the Name attribute of EMPLOYEE. A multivalued attribute will generally be mod-
eled as a separate class, as illustrated by the LOCATION class in Figure 16.
In UML, there are two types of relationships: association and aggregation.
Aggregation is meant to represent a relationship between a whole object and its
component parts, and it has a distinct diagrammatic notation. In Figure 16, we
modeled the locations of a department and the single location of a project as aggre-
gations. However, aggregation and association do not have different structural
properties, and the choice as to which type of relationship to use is somewhat sub-
jective. In the ER model, both are represented as relationships.
value. Association (relationship) names are optional in UML, and relationship
attributes are displayed in a box attached with a dashed line to the line representing
the association/aggregation (see Start_date and Hours in Figure 16).
The operations given in each class are derived from the functional requirements of
the application, as we discussed in Section 1. It is generally sufficient to specify the
operation names initially for the logical operations that are expected to be applied
to individual objects of a class, as shown in Figure 16. As the design is refined, more
details are added, such as the exact argument types (parameters) for each operation,
plus a functional description of each operation. UML has function descriptions and
sequence diagrams to specify some of the operation details, but these are beyond the
scope of our discussion.
Weak entities can be modeled using the construct called qualified association (or
qualified aggregation) in UML; this can represent both the identifying relationship
and the partial key, which is placed in a box attached to the owner class. This is illus-
trated by the DEPENDENT class and its qualified aggregation to EMPLOYEE in
Figure 16. The partial key Dependent_name is called the discriminator in UML ter-
minology, since its value distinguishes the objects associated with (related to) the
same EMPLOYEE. Qualified associations are not restricted to modeling weak enti-
ties, and they can be used to model other situations in UML.
This section is not meant to be a complete description of UML class diagrams, but
rather to illustrate one popular type of alternative diagrammatic notation that can
be used for representing ER modeling concepts.
In Section 4.2 we defined the degree of a relationship type as the number of partic-
ipating entity types and called a relationship type of degree two binary and a rela-
tionship type of degree three ternary. In this section, we elaborate on the differences
between binary and higher-degree relationships, when to choose higher-degree ver-
sus binary relationships, and how to specify constraints on higher-degree relation-
ships.
The ER diagram notation for a ternary relationship type is shown in Figure 17(a),
which displays the schema for the SUPPLY relationship type that was displayed at
the entity set/relationship set or instance level in Figure 10. Recall that the relation-
ship set of SUPPLY is a set of relationship instances (s, j, p), where s is a SUPPLIER
who is currently supplying a PART p to a PROJECT j. In general, a relationship type
R of degree n will have n edges in an ER diagram, one connecting R to each partici-
pating entity type.
some part to project j. The existence of three relationship instances (s, p), ( j, p), and
(s, j) in CAN_SUPPLY, USES, and SUPPLIES, respectively, does not necessarily imply
that an instance (s, j, p) exists in the ternary relationship SUPPLY, because the
meaning is different. It is often tricky to decide whether a particular relationship
should be represented as a relationship type of degree n or should be broken down
into several relationship types of smaller degrees. The designer must base this
decision on the semantics or meaning of the particular situation being represented.
The typical solution is to include the ternary relationship plus one or more of the
binary relationships, if they represent different meanings and if all are needed by the
application.
It is also possible to represent the ternary relationship as a regular entity type by
introducing an artificial or surrogate key. In this example, a key attribute Supply_id
could be used for the supply entity type, converting it into a regular entity type.
Three binary N:1 relationships relate SUPPLY to the three participating entity types.
Another example is shown in Figure 18. The ternary relationship type OFFERS rep-
resents information on instructors offering courses during particular semesters;
hence it includes a relationship instance (i, s, c) whenever INSTRUCTOR i offers
COURSE c during SEMESTER s. The three binary relationship types shown in
Figure 18 have the following meanings: CAN_TEACH relates a course to the instruc-
tors who can teach that course, TAUGHT_DURING relates a semester to the instruc-
tors who taught some course during that semester, and OFFERED_DURING relates a
semester to the courses offered during that semester by any instructor. These ternary
and binary relationships represent different information, but certain constraints
should hold among the relationships. For example, a relationship instance (i, s, c)
should not exist in OFFERS unless an instance (i, s) exists in TAUGHT_DURING, an
instance (s, c) exists in OFFERED_DURING, and an instance (i, c) exists in
CAN_TEACH. However, the reverse is not always true; we may have instances (i, s), (s,
c), and (i, c) in the three binary relationship types with no corresponding instance
(i, s, c) in OFFERS. Note that in this example, based on the meanings of the relation-
ships, we can infer the instances of TAUGHT_DURING and OFFERED_DURING from
the instances in OFFERS, but we cannot infer the instances of CAN_TEACH; there-
fore, TAUGHT_DURING and OFFERED_DURING are redundant and can be left out.
Although in general three binary relationships cannot replace a ternary relationship,
they may do so under certain additional constraints. In our example, if the
CAN_TEACH relationship is 1:1 (an instructor can teach one course, and a course
can be taught by only one instructor), then the ternary relationship OFFERS can be
left out because it can be inferred from the three binary relationships CAN_TEACH,
TAUGHT_DURING, and OFFERED_DURING. The schema designer must analyze the
meaning of each specific situation to decide which of the binary and ternary rela-
tionship types are needed.
Notice that it is possible to have a weak entity type with a ternary (or n-ary) identi-
fying relationship type. In this case, the weak entity type can have several owner
entity types. An example is shown in Figure 19. This example shows part of a data-
base that keeps track of candidates interviewing for jobs at various companies, and
may be part of an employment agency database, for example. In the requirements, a
candidate can have multiple interviews with the same company (for example, with
different company departments or on separate dates), but a job offer is made based
on one of the interviews. Here, INTERVIEW is represented as a weak entity with two
owners CANDIDATE and COMPANY, and with the partial key Dept_date. An
INTERVIEW entity is uniquely identified by a candidate, a company, and the combi-
nation of the date and department of the interview.
There are two notations for specifying structural constraints on n-ary relationships,
and they specify different constraints. They should thus both be used if it is impor-
tant to fully specify the structural constraints on a ternary or higher-degree rela-
tionship. The first notation is based on the cardinality ratio notation of binary
relationships displayed in Figure 2. Here, a 1, M, or N is specified on each participa-
tion arc (both M and N symbols stand for many or any number).15 Let us illustrate
this constraint using the SUPPLY relationship in Figure 17.
Recall that the relationship set of SUPPLY is a set of relationship instances (s, j, p),
where s is a SUPPLIER, j is a PROJECT, and p is a PART. Suppose that the constraint
exists that for a particular project-part combination, only one supplier will be used
(only one supplier supplies a particular part to a particular project). In this case, we
place 1 on the SUPPLIER participation, and M, N on the PROJECT, PART participa-
tions in Figure 17. This specifies the constraint that a particular ( j, p) combination
can appear at most once in the relationship set because each such (PROJECT, PART)
combination uniquely determines a single supplier. Hence, any relationship
instance (s, j, p) is uniquely identified in the relationship set by its ( j, p) combina-
tion, which makes ( j, p) a key for the relationship set. In this notation, the participa-
tions that have a 1 specified on them are not required to be part of the identifying
key for the relationship set.16 If all three cardinalities are M or N, then the key will
be the combination of all three participants.
The second notation is based on the (min, max) notation displayed in Figure 15 for
binary relationships. A (min, max) on a participation here specifies that each entity
is related to at least min and at most max relationship instances in the relationship
set. These constraints have no bearing on determining the key of an n-ary relation-
ship, where n > 2,17 but specify a different type of constraint that places restrictions
on how many relationship instances each entity can participate in.
15This notation allows us to determine the key of the relationship relation.
16This is also true for cardinality ratios of binary relationships.
17The (min, max) constraints can determine the keys for binary relationships, though.
the various types of attributes, which can be nested arbitrarily to produce complex
attributes:
We also briefly discussed stored versus derived attributes. Then we discussed the ER
model concepts at the schema or “intension” level:
We presented two methods for specifying the structural constraints on relationship
types. The first method distinguished two types of structural constraints:
We noted that, alternatively, another method of specifying structural constraints is
to specify minimum and maximum numbers (min, max) on the participation of
each entity type in a relationship type. We discussed weak entity types and the
related concepts of owner entity types, identifying relationship types, and partial
key attributes.
Entity-Relationship schemas can be represented diagrammatically as ER diagrams.
We showed how to design an ER schema for the COMPANY database by first defin-
ing the entity types and their attributes and then refining the design to include rela-
tionship types. We displayed the ER diagram for the COMPANY database schema.
We discussed some of the basic concepts of UML class diagrams and how they relate
to ER modeling concepts. We also described ternary and higher-degree relationship
types in more detail, and discussed the circumstances under which they are distin-
guished from binary relationships.
2. List the various cases where use of a NULL value would be appropriate.
3. Define the following terms: entity, attribute, attribute value, relationship
instance, composite attribute, multivalued attribute, derived attribute, complex
attribute, key attribute, and value set (domain).
4. What is an entity type? What is an entity set? Explain the differences among
an entity, an entity type, and an entity set.
5. Explain the difference between an attribute and a value set.
6. What is a relationship type? Explain the differences among a relationship
instance, a relationship type, and a relationship set.
7. What is a participation role? When is it necessary to use role names in the
description of relationship types?
8. Describe the two alternatives for specifying structural constraints on rela-
tionship types. What are the advantages and disadvantages of each?
9. Under what conditions can an attribute of a binary relationship type be
migrated to become an attribute of one of the participating entity types?
10. When we think of relationships as attributes, what are the value sets of these
attributes? What class of data models is based on this concept?
11. What is meant by a recursive relationship type? Give some examples of
recursive relationship types.
12. When is the concept of a weak entity used in data modeling? Define the
terms owner entity type, weak entity type, identifying relationship type, and
partial key.
13. Can an identifying relationship of a weak entity type be of a degree greater
than two? Give examples to illustrate your answer.
14. Discuss the conventions for displaying an ER schema as an ER diagram.
15. Discuss the naming conventions used for ER schema diagrams.
a. The university keeps track of each student’s name, student number, Social
Security number, current address and phone number, permanent address
and phone number, birth date, sex, class (freshman, sophomore, …, grad-
uate), major department, minor department (if any), and degree program
(B.A., B.S., …, Ph.D.). Some user applications need to refer to the city,
state, and ZIP Code of the student’s permanent address and to the stu-
dent’s last name. Both Social Security number and student number have
unique values for each student.
b. Each department is described by a name, department code, office num-
ber, office phone number, and college. Both name and code have unique
values for each department.
c. Each course has a course name, description, course number, number of
semester hours, level, and offering department. The value of the course
number is unique for each course.
d. Each section has an instructor, semester, year, course, and section num-
ber. The section number distinguishes sections of the same course that are
taught during the same semester/year; its values are 1, 2, 3, …, up to the
number of sections taught during each semester.
e. A grade report has a student, section, letter grade, and numeric grade (0,
1, 2, 3, or 4).
17. Composite and multivalued attributes can be nested to any number of levels.
Suppose we want to design an attribute for a STUDENT entity type to keep
track of previous college education. Such an attribute will have one entry for
each college previously attended, and each such entry will be composed of
college name, start and end dates, degree entries (degrees awarded at that
college, if any), and transcript entries (courses completed at that college, if
any). Each degree entry contains the degree name and the month and year
the degree was awarded, and each transcript entry contains a course name,
semester, year, and grade. Design an attribute to hold this information. Use
the conventions in Figure 5.
18. Show an alternative design for the attribute described in Exercise 17 that uses
only entity types (including weak entity types, if needed) and relationship
types.
19. Consider the ER diagram in Figure 20, which shows a simplified schema for
an airline reservations system. Extract from the ER diagram the require-
ments and constraints that produced this schema. Try to be as precise as pos-
sible in your requirements and constraints specification.
20. We can consider many entity types to describe database environment and
database users, such as DBMS, stored database, DBA, and catalog/data dic-
tionary. Try to specify all the entity types that can fully describe a database
system and its environment; then specify the relationship types among them,
and draw an ER diagram to describe such a general database environment.
21. Design an ER schema for keeping track of information about votes taken in
the U.S. House of Representatives during the current two-year congressional
session. The database needs to keep track of each U.S. STATE’s Name (e.g.,
‘Texas’, ‘New York’, ‘California’) and include the Region of the state (whose
domain is {‘Northeast’, ‘Midwest’, ‘Southeast’, ‘Southwest’, ‘West’}). Each
22. A database is being constructed to keep track of the teams and games of a
sports league. A team has a number of players, not all of whom participate in
each game. It is desired to keep track of the players participating in each
game for each team, the positions they played in that game, and the result of
the game. Design an ER schema diagram for this application, stating any
assumptions you make. Choose your favorite sport (e.g., soccer, baseball,
football).
23. Consider the ER diagram shown in Figure 21 for part of a BANK database.
Each bank can have multiple branches, and each branch can have multiple
accounts and loans.
a. List the strong (nonweak) entity types in the ER diagram.
b. Is there a weak entity type? If so, give its name, partial key, and identifying
relationship.
c. What constraints do the partial key and the identifying relationship of the
weak entity type specify in this diagram?
d. List the names of all relationship types, and specify the (min, max) con-
straint on each participation of an entity type in a relationship type.
Justify your choices.
e. List concisely the user requirements that led to this ER schema design.
f. Suppose that every customer must have at least one account but is
restricted to at most two loans at a time, and that a bank branch cannot
have more than 1,000 loans. How does this show up on the (min, max)
constraints?
24. Consider the ER diagram in Figure 22. Assume that an employee may work
in up to two departments or may not be assigned to any department. Assume
that each department must have one and may have up to three phone num-
bers. Supply (min, max) constraints on this diagram. State clearly any addi-
tional assumptions you make. Under what conditions would the relationship
HAS_PHONE be redundant in this example?
25. Consider the ER diagram in Figure 23. Assume that a course may or may not
use a textbook, but that a text by definition is a book that is used in some
course. A course may not use more than five books. Instructors teach from
two to four courses. Supply (min, max) constraints on this diagram. State
clearly any additional assumptions you make. If we add the relationship
ADOPTS, to indicate the textbook(s) that an instructor uses for a course,
should it be a binary relationship between INSTRUCTOR and TEXT, or a ter-
nary relationship between all three entity types? What (min, max) con-
straints would you put on it? Why?
26. Consider an entity type SECTION in a UNIVERSITY database, which
describes the section offerings of courses. The attributes of SECTION are
Section_number, Semester, Year, Course_number, Instructor, Room_no (where
section is taught), Building (where section is taught), Weekdays (domain is
the possible combinations of weekdays in which a section can be offered
{‘MWF’, ‘MW’, ‘TT’, and so on}), and Hours (domain is all possible time peri-
ods during which sections are offered {‘9–9:50 A.M.’, ‘10–10:50 A.M.’, …,
‘3:30–4:50 P.M.’, ‘5:30–6:20 P.M.’, and so on}). Assume that Section_number is
unique for each course within a particular semester/year combination (that
is, if a course is offered multiple times during a particular semester, its sec-
tion offerings are numbered 1, 2, 3, and so on). There are several composite
keys for section, and some attributes are components of more than one key.
Identify three composite keys, and show how they can be represented in an
ER schema diagram.
27. Cardinality ratios often dictate the detailed design of a database. The cardi-
nality ratio depends on the real-world meaning of the entity types involved
and is defined by the specific application. For the following binary relation-
ships, suggest cardinality ratios based on the common-sense meaning of the
entity types. Clearly state any assumptions you make.
1. STUDENT ______________ SOCIAL_SECURITY_CARD
2. STUDENT ______________ TEACHER
3. CLASSROOM ______________ WALL
4. COUNTRY ______________ CURRENT_PRESIDENT
5. COURSE ______________ TEXTBOOK
6. ITEM (that can
be found in an
order) ______________ ORDER
7. STUDENT ______________ CLASS
8. CLASS ______________ INSTRUCTOR
9. INSTRUCTOR ______________ OFFICE
10. EBAY_AUCTION
_ITEM ______________ EBAY_BID
28. Consider the ER schema for the MOVIES database in Figure 24.
Assume that MOVIES is a populated database. ACTOR is used as a generic
term and includes actresses. Given the constraints shown in the ER schema,
respond to the following statements with True, False, or Maybe. Assign a
response of Maybe to statements that, while not explicitly shown to be True,
cannot be proven False based on the schema as shown. Justify each answer.
a. There are no actors in this database that have been in no movies.
b. There are some actors who have acted in more than ten movies.
c. Some actors have done a lead role in multiple movies.
d. A movie can have only a maximum of two lead actors.
e. Every director has been an actor in some movie.
f. No producer has ever been an actor.
g. A producer cannot be an actor in some other movie.
h. There are movies with more than a dozen actors.
i. Some producers have been a director as well.
j. Most movies have one director and one producer.
k. Some movies have one director but several producers.
l. There are some actors who have done a lead role, directed a movie, and
produced some movie.
m. No movie has a director who also acted in that movie.
29. Given the ER schema for the MOVIES database in Figure 24, draw an instance
diagram using three movies that have been released recently. Draw instances
of each entity type: MOVIES, ACTORS, PRODUCERS, DIRECTORS involved;
make up instances of the relationships as they exist in reality for those
movies.
30. Illustrate the UML Diagram for Exercise 16. Your UML design should
observe the following requirements:
a. A student should have the ability to compute his/her GPA and add or
drop majors and minors.
b. Each department should be to able add or delete courses and hire or ter-
minate faculty.
c. Each instructor should be able to assign or change a student’s grade for a
course.
Note: Some of these functions may be spread over multiple classes.
32. Consider a MAIL_ORDER database in which employees take orders for parts
from customers. The data requirements are summarized as follows:
33. Consider a MOVIE database in which data is recorded about the movie indus-
try. The data requirements are summarized as follows:
■ Each movie is identified by title and year of release. Each movie has a
length in minutes. Each has a production company, and each is classified
under one or more genres (such as horror, action, drama, and so forth).
Each movie has one or more directors and one or more actors appear in
it. Each movie also has a plot outline. Finally, each movie has zero or more
quotable quotes, each of which is spoken by a particular actor appearing
in the movie.
■ Production companies are identified by name and each has an address. A
production company produces one or more movies.
34. Consider a CONFERENCE_REVIEW database in which researchers submit
their research papers for consideration. Reviews by reviewers are recorded
for use in the paper selection process. The database system caters primarily
to reviewers who record answers to evaluation questions for each paper they
review and make recommendations regarding whether to accept or reject the
paper. The data requirements are summarized as follows:
■ Authors of papers are uniquely identified by e-mail id. First and last
names are also recorded.
■ Reviewers of papers are uniquely identified by e-mail address. Each
reviewer’s first name, last name, phone number, affiliation, and topics of
interest are also recorded.
■ Each paper is assigned between two and four reviewers. A reviewer rates
each paper assigned to him or her on a scale of 1 to 10 in four categories:
technical merit, readability, originality, and relevance to the conference.
Finally, each reviewer provides an overall recommendation regarding
each paper.
35. Consider the ER diagram for the AIRLINE database shown in Figure 20. Build
this design using a data modeling tool such as ERwin or Rational Rose.
Entity – Relationship (ER) modeling concepts aresufficient for representing many database schemas
for traditional database applications, which include many data-processing applica-
tions in business and industry. Since the late 1970s, however, designers of database
applications have tried to design more accurate database schemas that reflect the
data properties and constraints more precisely. This was particularly important for
newer applications of database technology, such as databases for engineering design
and manufacturing (CAD/CAM),1 telecommunications, complex software systems,
and Geographic Information Systems (GIS), among many other applications. These
types of databases have more complex requirements than do the more traditional
applications. This led to the development of additional semantic data modeling con-
cepts that were incorporated into conceptual data models such as the ER model.
Various semantic data models have been proposed in the literature. Many of these
concepts were also developed independently in related areas of computer science,
such as the knowledge representation area of artificial intelligence and the object
modeling area in software engineering.
In this chapter, we describe features that have been proposed for semantic data mod-
els, and show how the ER model can be enhanced to include these concepts, leading
to the Enhanced ER (EER) model.2 We start in Section 1 by incorporating the con-
cepts of class/subclass relationships and type inheritance into the ER model. Then, in
Section 2, we add the concepts of specialization and generalization. Section 3
2EER has also been used to stand for Extended ER model.
1CAD/CAM stands for computer-aided design/computer-aided manufacturing.
From Chapter 8 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
We present the UML class diagram notation for representing specialization and gen-
eralization in Section 6, and briefly compare these with EER notation and concepts.
This serves as an example of alternative notation, and is a continuation of basic UML
class diagram notation that corresponds to the basic ER model. In Section 7, we dis-
cuss the fundamental abstractions that are used as the basis of many semantic data
models. Section 8 summarizes the chapter.
an employee. We call each of these subgroupings a subclass or subtype of the
EMPLOYEE entity type, and the EMPLOYEE entity type is called the superclass or
supertype for each of these subclasses. Figure 1 shows how to represent these con-
cepts diagramatically in EER diagrams. (The circle notation in Figure 1 will be
explained in Section 2.)
3A class/subclass relationship is often called an IS-A (or IS-AN) relationship because of the way we
refer to the concept. We say a SECRETARY is an EMPLOYEE, a TECHNICIAN is an EMPLOYEE, and
so on.
An entity cannot exist in the database merely by being a member of a subclass; it
must also be a member of the superclass. Such an entity can be included optionally
as a member of any number of subclasses. For example, a salaried employee who is
also an engineer belongs to the two subclasses ENGINEER and
SALARIED_EMPLOYEE of the EMPLOYEE entity type. However, it is not necessary
that every entity in a superclass is a member of some subclass.
An important concept associated with subclasses (subtypes) is that of type inheri-
tance. Recall that the type of an entity is defined by the attributes it possesses and
the relationship types in which it participates. Because an entity in the subclass rep-
resents the same real-world entity from the superclass, it should possess values for
its specific attributes as well as values of its attributes as a member of the superclass.
We say that an entity that is a member of a subclass inherits all the attributes of the
entity as a member of the superclass. The entity also inherits all the relationships in
which the superclass participates. Notice that a subclass, with its own specific (or
local) attributes and relationships together with all the attributes and relationships
it inherits from the superclass, can be considered an entity type in its own right.4
4In some object-oriented programming languages, a common restriction is that an entity (or object) has
only one type. This is generally too restrictive for conceptual database modeling.
5There are many alternative notations for specialization.
attributes) of the subclass. Similarly, a subclass can participate in specific relation-
ship types, such as the HOURLY_EMPLOYEE subclass participating in the
BELONGS_TO relationship in Figure 1. We will explain the d symbol in the circles in
Figure 1 and additional EER diagram notation shortly.
Figure 2 shows a few entity instances that belong to subclasses of the {SECRETARY,
ENGINEER, TECHNICIAN} specialization. Again, notice that an entity that belongs
to a subclass represents the same real-world entity as the entity connected to it in the
EMPLOYEE superclass, even though the same entity is shown twice; for example, e1
is shown in both EMPLOYEE and SECRETARY in Figure 2. As the figure suggests, a
superclass/subclass relationship such as EMPLOYEE/ SECRETARY somewhat resem-
bles a 1:1 relationship at the instance level. The main difference is that in a 1:1 rela-
tionship two distinct entities are related, whereas in a superclass/subclass
relationship the entity in the subclass is the same real-world entity as the entity in
the superclass but is playing a specialized role—for example, an EMPLOYEE special-
ized in the role of SECRETARY, or an EMPLOYEE specialized in the role of
TECHNICIAN.
There are two main reasons for including class/subclass relationships and specializa-
tions in a data model. The first is that certain attributes may apply to some but not all
entities of the superclass. A subclass is defined in order to group the entities to which
these attributes apply. The members of the subclass may still share the majority of
their attributes with the other members of the superclass. For example, in Figure 1
the SECRETARY subclass has the specific attribute Typing_speed, whereas the
ENGINEER subclass has the specific attribute Eng_type, but SECRETARY and
ENGINEER share their other inherited attributes from the EMPLOYEE entity type.
The second reason for using subclasses is that some relationship types may be par-
ticipated in only by entities that are members of the subclass. For example, if only
HOURLY_EMPLOYEES can belong to a trade union, we can represent that fact by
creating the subclass HOURLY_EMPLOYEE of EMPLOYEE and relating the subclass
to an entity type TRADE_UNION via the BELONGS_TO relationship type, as illus-
trated in Figure 1.
Notice that the generalization process can be viewed as being functionally the inverse
of the specialization process. Hence, in Figure 3 we can view {CAR, TRUCK} as a spe-
cialization of VEHICLE, rather than viewing VEHICLE as a generalization of CAR and
TRUCK. Similarly, in Figure 1 we can view EMPLOYEE as a generalization of
SECRETARY, TECHNICIAN, and ENGINEER. A diagrammatic notation to distinguish
between generalization and specialization is used in some design methodologies. An
arrow pointing to the generalized superclass represents a generalization, whereas
arrows pointing to the specialized subclasses represent a specialization. We will not
use this notation because the decision as to which process is followed in a particular
situation is often subjective.
So far we have introduced the concepts of subclasses and superclass/subclass rela-
tionships, as well as the specialization and generalization processes. In general, a
superclass or subclass represents a collection of entities of the same type and hence
also describes an entity type; that is why superclasses and subclasses are all shown in
rectangles in EER diagrams, like entity types. Next, we discuss the properties of spe-
cializations and generalizations in more detail.
First, we discuss constraints that apply to a single specialization or a single general-
ization. For brevity, our discussion refers only to specialization even though it
applies to both specialization and generalization. Then, we discuss differences
between specialization/generalization lattices (multiple inheritance) and hierarchies
(single inheritance), and elaborate on the differences between the specialization and
generalization processes during conceptual database schema design.
each of the specializations. However, a specialization may also consist of a single
subclass only, such as the {MANAGER} specialization in Figure 1; in such a case, we
do not use the circle notation.
If all subclasses in a specialization have their membership condition on the same
attribute of the superclass, the specialization itself is called an attribute-defined spe-
cialization, and the attribute is called the defining attribute of the specialization.6 In
this case, all the entities with the same value for the attribute belong to the same sub-
class. We display an attribute-defined specialization by placing the defining attribute
name next to the arc from the circle to the superclass, as shown in Figure 4.
6Such an attribute is called a discriminator in UML terminology.
Two other constraints may apply to a specialization. The first is the disjointness (or
disjointedness) constraint, which specifies that the subclasses of the specialization
must be disjoint. This means that an entity can be a member of at most one of the
subclasses of the specialization. A specialization that is attribute-defined implies the
disjointness constraint (if the attribute used to define the membership predicate is
single-valued). Figure 4 illustrates this case, where the d in the circle stands for
disjoint. The d notation also applies to user-defined subclasses of a specialization
that must be disjoint, as illustrated by the specialization {HOURLY_EMPLOYEE,
SALARIED_EMPLOYEE} in Figure 1. If the subclasses are not constrained to be dis-
joint, their sets of entities may be overlapping; that is, the same (real-world) entity
may be a member of more than one subclass of the specialization. This case, which
is the default, is displayed by placing an o in the circle, as shown in Figure 5.
The second constraint on specialization is called the completeness (or totalness)
constraint, which may be total or partial. A total specialization constraint specifies
that every entity in the superclass must be a member of at least one subclass in the
specialization. For example, if every EMPLOYEE must be either an
HOURLY_EMPLOYEE or a SALARIED_EMPLOYEE, then the specialization
{HOURLY_EMPLOYEE, SALARIED_EMPLOYEE} in Figure 1 is a total specialization of
EMPLOYEE. This is shown in EER diagrams by using a double line to connect the
superclass to the circle. A single line is used to display a partial specialization,
which allows an entity not to belong to any of the subclasses. For example, if some
EMPLOYEE entities do not belong to any of the subclasses {SECRETARY,
ENGINEER, TECHNICIAN} in Figures 1 and 4, then that specialization is partial.7
Notice that the disjointness and completeness constraints are independent. Hence,
we have the following four possible constraints on specialization:
7The notation of using single or double lines is similar to that for partial or total participation of an entity
type in a relationship type.
Certain insertion and deletion rules apply to specialization (and generalization) as a
consequence of the constraints specified earlier. Some of these rules are as follows:
The reader is encouraged to make a complete list of rules for insertions and dele-
tions for the various types of specializations.
A subclass itself may have further subclasses specified on it, forming a hierarchy or a
lattice of specializations. For example, in Figure 6 ENGINEER is a subclass of
EMPLOYEE and is also a superclass of ENGINEERING_MANAGER; this represents
the real-world constraint that every engineering manager is required to be an engi-
neer. A specialization hierarchy has the constraint that every subclass participates
as a subclass in only one class/subclass relationship; that is, each subclass has only
one parent, which results in a tree structure or strict hierarchy. In contrast, for a
specialization lattice, a subclass can be a subclass in more than one class/subclass
relationship. Hence, Figure 6 is a lattice.
Figure 7 shows another specialization lattice of more than one level. This may be
part of a conceptual schema for a UNIVERSITY database. Notice that this arrange-
ment would have been a hierarchy except for the STUDENT_ASSISTANT subclass,
which is a subclass in two distinct class/subclass relationships.
The requirements for the part of the UNIVERSITY database shown in Figure 7 are the
following:
1. The database keeps track of three types of persons: employees, alumni, and
students. A person can belong to one, two, or all three of these types. Each
person has a name, SSN, sex, address, and birth date.
2. Every employee has a salary, and there are three types of employees: faculty,
staff, and student assistants. Each employee belongs to exactly one of these
types. For each alumnus, a record of the degree or degrees that he or she
3. Each faculty has a rank, whereas each staff member has a staff position.
Student assistants are classified further as either research assistants or teach-
ing assistants, and the percent of time that they work is recorded in the data-
base. Research assistants have their research project stored, whereas teaching
assistants have the current course they work on.
4. Students are further classified as either graduate or undergraduate, with the
specific attributes degree program (M.S., Ph.D., M.B.A., and so on) for
graduate students and class (freshman, sophomore, and so on) for under-
graduates.
In such a specialization lattice or hierarchy, a subclass inherits the attributes not
only of its direct superclass, but also of all its predecessor superclasses all the way to
the root of the hierarchy or lattice if necessary. For example, an entity in
GRADUATE_STUDENT inherits all the attributes of that entity as a STUDENT and as
a PERSON. Notice that an entity may exist in several leaf nodes of the hierarchy,
where a leaf node is a class that has no subclasses of its own. For example, a member
of GRADUATE_STUDENT may also be a member of RESEARCH_ASSISTANT.
8In some models, the class is further restricted to be a leaf node in the hierarchy or lattice.
It is important to note here that some models and languages are limited to single
inheritance and do not allow multiple inheritance (shared subclasses). It is also
important to note that some models do not allow an entity to have multiple types,
and hence an entity can be a member of only one leaf class.8 In such a model, it is
necessary to create additional subclasses as leaf nodes to cover all possible combina-
tions of classes that may have some entity that belongs to all these classes simultane-
ously. For example, in the overlapping specialization of PERSON into {EMPLOYEE,
ALUMNUS, STUDENT} (or {E, A, S} for short), it would be necessary to create seven
subclasses of PERSON in order to cover all possible types of entities: E, A, S, E_A,
E_S, A_S, and E_A_S. Obviously, this can lead to extra complexity.
Although we have used specialization to illustrate our discussion, similar concepts
apply equally to generalization, as we mentioned at the beginning of this section.
Hence, we can also speak of generalization hierarchies and generalization lattices.
Now we elaborate on the differences between the specialization and generalization
processes, and how they are used to refine conceptual schemas during conceptual
database design. In the specialization process, we typically start with an entity type
and then define subclasses of the entity type by successive specialization; that is, we
repeatedly define more specific groupings of the entity type. For example, when
designing the specialization lattice in Figure 7, we may first specify an entity type
PERSON for a university database. Then we discover that three types of persons will
be represented in the database: university employees, alumni, and students. We cre-
ate the specialization {EMPLOYEE, ALUMNUS, STUDENT} for this purpose and
choose the overlapping constraint, because a person may belong to more than one
of the subclasses. We specialize EMPLOYEE further into {STAFF, FACULTY,
STUDENT_ASSISTANT}, and specialize STUDENT into {GRADUATE_STUDENT,
UNDERGRADUATE_STUDENT}. Finally, we specialize STUDENT_ASSISTANT into
{RESEARCH_ASSISTANT, TEACHING_ASSISTANT}. This successive specialization
corresponds to a top-down conceptual refinement process during conceptual
schema design. So far, we have a hierarchy; then we realize that
STUDENT_ASSISTANT is a shared subclass, since it is also a subclass of STUDENT,
leading to the lattice.
It is possible to arrive at the same hierarchy or lattice from the other direction. In
such a case, the process involves generalization rather than specialization and corre-
sponds to a bottom-up conceptual synthesis. For example, the database designers
may first discover entity types such as STAFF, FACULTY, ALUMNUS,
GRADUATE_STUDENT, UNDERGRADUATE_STUDENT, RESEARCH_ASSISTANT,
TEACHING_ASSISTANT, and so on; then they generalize {GRADUATE_STUDENT,
In structural terms, hierarchies or lattices resulting from either process may be iden-
tical; the only difference relates to the manner or order in which the schema super-
classes and subclasses were created during the design process. In practice, it is likely
that neither the generalization process nor the specialization process is followed
strictly, but that a combination of the two processes is employed. New classes are
continually incorporated into a hierarchy or lattice as they become apparent to users
and designers. Notice that the notion of representing data and knowledge by using
superclass/subclass hierarchies and lattices is quite common in knowledge-based sys-
tems and expert systems, which combine database technology with artificial intelli-
gence techniques. For example, frame-based knowledge representation schemes
closely resemble class hierarchies. Specialization is also common in software engi-
neering design methodologies that are based on the object-oriented paradigm.
9Our use of the term category is based on the ECR (Entity-Category-Relationship) model (Elmasri et al.
1985).
A category has two or more superclasses that may represent distinct entity types,
whereas other superclass/subclass relationships always have a single superclass. To
better understand the difference, we can compare a category, such as OWNER in
Figure 8, with the ENGINEERING_MANAGER shared subclass in Figure 6. The latter
is a subclass of each of the three superclasses ENGINEER, MANAGER, and
SALARIED_EMPLOYEE, so an entity that is a member of ENGINEERING_MANAGER
must exist in all three. This represents the constraint that an engineering manager
must be an ENGINEER, a MANAGER, and a SALARIED_EMPLOYEE; that is,
ENGINEERING_MANAGER is a subset of the intersection of the three classes (sets of
entities). On the other hand, a category is a subset of the union of its superclasses.
Hence, an entity that is a member of OWNER must exist in only one of the super-
classes. This represents the constraint that an OWNER may be a COMPANY, a BANK,
or a PERSON in Figure 8.
Attribute inheritance works more selectively in the case of categories. For example,
in Figure 8 each OWNER entity inherits the attributes of a COMPANY, a PERSON,
or a BANK, depending on the superclass to which the entity belongs. On the other
hand, a shared subclass such as ENGINEERING_MANAGER (Figure 6) inherits all
the attributes of its superclasses SALARIED_EMPLOYEE, ENGINEER, and
MANAGER.
A category can be total or partial. A total category holds the union of all entities in
its superclasses, whereas a partial category can hold a subset of the union. A total cat-
egory is represented diagrammatically by a double line connecting the category and
the circle, whereas a partial category is indicated by a single line.
The superclasses of a category may have different key attributes, as demonstrated by
the OWNER category in Figure 8, or they may have the same key attribute, as
demonstrated by the REGISTERED_VEHICLE category. Notice that if a category is
total (not partial), it may be represented alternatively as a total specialization (or a
total generalization). In this case, the choice of which representation to use is sub-
jective. If the two classes represent the same type of entities and share numerous
attributes, including the same key attributes, specialization/generalization is pre-
ferred; otherwise, categorization (union type) is more appropriate.
It is important to note that some modeling methodologies do not have union types.
In these models, a union type must be represented in a roundabout way.
In this section, we first give an example of a database schema in the EER model to
illustrate the use of the various concepts discussed here. Then, we discuss design
choices for conceptual schemas, and finally we summarize the EER model concepts
and define them formally.
GRAD_STUDENT is a subclass of STUDENT, with the defining predicate Class = 5.
For each graduate student, we keep a list of previous degrees in a composite, multi-
valued attribute [Degrees]. We also relate the graduate student to a faculty advisor
[ADVISOR] and to a thesis committee [COMMITTEE], if one exists.
10We assume that the quarter system rather than the semester system is used in this university.
date [St_date]. A grant is related to one principal investigator [PI] and to all
researchers it supports [SUPPORT]. Each instance of support has as attributes the
starting date of support [Start], the ending date of the support (if known) [End],
and the percentage of time being spent on the project [Time] by the researcher being
supported.
Conceptual database design should be considered as an iterative refinement process
until the most suitable design is reached. The following guidelines can help to guide
the design process for EER concepts:
■ If a subclass has few specific (local) attributes and no specific relationships, it
can be merged into the superclass. The specific attributes would hold NULL
values for entities that are not members of the subclass. A type attribute
could specify whether an entity is a member of the subclass.
■ Similarly, if all the subclasses of a specialization/generalization have few spe-
cific attributes and no specific relationships, they can be merged into the
superclass and replaced with one or more type attributes that specify the sub-
class or subclasses that each entity belongs to.
is a set or collection of entities; this includes any of the EER schema constructs of
group entities, such as entity types, subclasses, superclasses, and categories. A
subclass S is a class whose entities must always be a subset of the entities in another
class, called the superclass C of the superclass/subclass (or IS-A) relationship. We
denote such a relationship by C/S. For such a superclass/subclass relationship, we
must always have
A specialization Z = {S1, S2, …, Sn} is a set of subclasses that have the same super-
class G; that is, G/Si is a superclass/subclass relationship for i = 1, 2, …, n. G is called
a generalized entity type (or the superclass of the specialization, or a
generalization of the subclasses {S1, S2, …, Sn} ). Z is said to be total if we always (at
any point in time) have
Otherwise, Z is said to be partial. Z is said to be disjoint if we always have
Otherwise, Z is said to be overlapping.
A subclass S of C is said to be predicate-defined if a predicate p on the attributes of
C is used to specify which entities in C are members of S; that is, S = C[p], where
C[p] is the set of entities in C that satisfy p. A subclass that is not defined by a pred-
icate is called user-defined.
A specialization Z (or generalization G) is said to be attribute-defined if a predicate
(A = ci), where A is an attribute of G and ci is a constant value from the domain of A,
11The use of the word class here differs from its more common use in object-oriented programming lan-
guages such as C++. In C++, a class is a structured type definition along with its applicable functions
(operations).
is used to specify membership in each subclass Si in Z. Notice that if ci ≠ cj for i ≠ j,
and A is a single-valued attribute, then the specialization will be disjoint.
A category T is a class that is a subset of the union of n defining superclasses D1, D2,
…, Dn, n > 1, and is formally specified as follows:
In database design, we are mainly concerned with specifying concrete classes whose
collections of objects are permanently (or persistently) stored in the database. The
bibliographic notes at the end of this chapter give some references to books that
describe complete details of UML.
. . .
. . .
In this section we discuss in general terms some of the modeling concepts of the ER
and EER models. This terminology is not only used in conceptual data modeling
but also in artificial intelligence literature when discussing knowledge representa-
tion (KR). This section discusses the similarities and differences between concep-
tual modeling and knowledge representation, and introduces some of the
alternative terminology and a few additional concepts.
■ KR is generally broader in scope than semantic data models. Different forms
of knowledge, such as rules (used in inference, deduction, and search),
incomplete and default knowledge, and temporal and spatial knowledge, are
represented in KR schemes. Database models are being expanded to include
some of these concepts.
12An ontology is somewhat similar to a conceptual schema, but with more knowledge, rules, and excep-
tions.
In general, the objects of a class should have a similar type structure. However, some
objects may display properties that differ in some respects from the other objects of
the class; these exception objects also need to be modeled, and KR schemes allow
more varied exceptions than do database models. In addition, certain properties
apply to the class as a whole and not to the individual objects; KR schemes allow
such class properties. UML diagrams also allow specification of class properties.
In the EER model, entities are classified into entity types according to their basic
attributes and relationships. Entities are further classified into subclasses and cate-
gories based on additional similarities and differences (exceptions) among them.
Relationship instances are classified into relationship types. Hence, entity types,
subclasses, categories, and relationship types are the different concepts that are used
for classification in the EER model. The EER model does not provide explicitly for
class properties, but it may be extended to do so. In UML, objects are classified into
classes, and it is possible to display both class properties and individual objects.
At the object level, the values of key attributes are used to distinguish among entities
of a particular entity type. For weak entity types, entities are identified by a combi-
nation of their own partial key values and the entities they are related to in the
owner entity type(s). Relationship instances are identified by some combination of
the entities that they relate to, depending on the cardinality ratio specified.
an object to form the whole object. The second case is when we represent an aggre-
gation relationship as an ordinary relationship. The third case, which the EER
model does not provide for explicitly, involves the possibility of combining objects
that are related by a particular relationship instance into a higher-level aggregate
object. This is sometimes useful when the higher-level aggregate object is itself to be
related to another object. We call the relationship between the primitive objects and
their aggregate object IS-A-PART-OF; the inverse is called IS-A-COMPONENT-
OF. UML provides for all three types of aggregation.
The abstraction of association is used to associate objects from several independent
classes. Hence, it is somewhat similar to the second use of aggregation. It is repre-
sented in the EER model by relationship types, and in UML by associations. This
abstract relationship is called IS-ASSOCIATED-WITH.
One way to represent this situation is to create a higher-level aggregate class com-
posed of COMPANY, JOB_APPLICANT, and INTERVIEW and to relate this class to
JOB_OFFER, as shown in Figure 11(d). Although the EER model as described in this
book does not have this facility, some semantic data models do allow it and call the
resulting object a composite or molecular object. Other models treat entity types
and relationship types uniformly and hence permit relationships among relation-
ships, as illustrated in Figure 11(c).
To represent this situation correctly in the ER model as described here, we need to
create a new weak entity type INTERVIEW, as shown in Figure 11(e), and relate it to
JOB_OFFER. Hence, we can always represent these situations correctly in the ER
model by creating additional entity types, although it may be conceptually more
desirable to allow direct representation of aggregation, as in Figure 11(d), or to
allow relationships among relationships, as in Figure 11(c).
The study of ontologies attempts to describe the structures and relationships that
are possible in reality through some common vocabulary; therefore, it can be con-
sidered as a way to describe the knowledge of a certain community about reality.
Ontology originated in the fields of philosophy and metaphysics. One commonly
used definition of ontology is a specification of a conceptualization.13
In this definition, a conceptualization is the set of concepts that are used to repre-
sent the part of reality or knowledge that is of interest to a community of users.
Specification refers to the language and vocabulary terms that are used to specify
the conceptualization. The ontology includes both specification and
conceptualization. For example, the same conceptualization may be specified in two
different languages, giving two separate ontologies. Based on this quite general def-
inition, there is no consensus on what an ontology is exactly. Some possible ways to
describe ontologies are as follows:
Usually the concepts used to describe ontologies are quite similar to the concepts we
discussed in conceptual modeling, such as entities, attributes, relationships, special-
izations, and so on. The main difference between an ontology and, say, a database
schema, is that the schema is usually limited to describing a small subset of a mini-
13This definition is given in Gruber (1995).
world from reality in order to store and manage data. An ontology is usually consid-
ered to be more general in that it attempts to describe a part of reality or a domain
of interest (for example, medical terms, electronic-commerce applications, sports,
and so on) as completely as possible.
Next, we showed how to display these new constructs in an EER diagram. We also
discussed the various types of constraints that may apply to specialization or gener-
alization. The two main constraints are total/partial and disjoint/overlapping. In
addition, a defining predicate for a subclass or a defining attribute for a specializa-
tion may be specified. We discussed the differences between user-defined and
predicate-defined subclasses and between user-defined and attribute-defined spe-
cializations. Finally, we discussed the concept of a category or union type, which is a
subset of the union of two or more classes, and we gave formal definitions of all the
concepts presented.
2. Define the following terms: superclass of a subclass, superclass/subclass rela-
tionship, IS-A relationship, specialization, generalization, category, specific
(local) attributes, and specific relationships.
3. Discuss the mechanism of attribute/relationship inheritance. Why is it use-
ful?
4. Discuss user-defined and predicate-defined subclasses, and identify the dif-
ferences between the two.
5. Discuss user-defined and attribute-defined specializations, and identify the
differences between the two.
6. Discuss the two main types of constraints on specializations and generaliza-
tions.
7. What is the difference between a specialization hierarchy and a specialization
lattice?
8. What is the difference between specialization and generalization? Why do we
not display this difference in schema diagrams?
9. How does a category differ from a regular shared subclass? What is a cate-
gory used for? Illustrate your answer with examples.
10. For each of the following UML terms (see Section 6), discuss the correspon-
ding term in the EER model, if any: object, class, association, aggregation,
generalization, multiplicity, attributes, discriminator, link, link attribute,
reflexive association, and qualified association.
11. Discuss the main differences between the notation for EER schema diagrams
and UML class diagrams by comparing how common concepts are repre-
sented in each.
12. List the various data abstraction concepts and the corresponding modeling
concepts in the EER model.
13. What aggregation feature is missing from the EER model? How can the EER
model be further enhanced to support it?
14. What are the main similarities and differences between conceptual database
modeling techniques and knowledge representation techniques?
15. Discuss the similarities and differences between an ontology and a database
schema.
Specify all constraints that should hold on the database. Make sure that the
schema has at least five entity types, four relationship types, a weak entity
type, a superclass/subclass relationship, a category, and an n-ary (n > 2) rela-
tionship type.
17. Consider the BANK ER schema in Figure A.1 (at end of the chapter), and
suppose that it is necessary to keep track of different types of ACCOUNTS
(SAVINGS_ACCTS, CHECKING_ACCTS, …) and LOANS (CAR_LOANS,
HOME_LOANS, …). Suppose that it is also desirable to keep track of each
ACCOUNT’s TRANSACTIONS (deposits, withdrawals, checks, …) and each
LOAN’s PAYMENTS; both of these include the amount, date, and time. Modify
the BANK schema, using ER and EER concepts of specialization and general-
ization. State any assumptions you make about the additional requirements.
18. The following narrative describes a simplified version of the organization of
Olympic facilities planned for the summer Olympics. Draw an EER diagram
that shows the entity types, attributes, relationships, and specializations for
this application. State any assumptions you make. The Olympic facilities are
divided into sports complexes. Sports complexes are divided into one-sport
and multisport types. Multisport complexes have areas of the complex desig-
nated for each sport with a location indicator (e.g., center, NE corner, and so
on). A complex has a location, chief organizing individual, total occupied
area, and so on. Each complex holds a series of events (e.g., the track stadium
may hold many different races). For each event there is a planned date, dura-
tion, number of participants, number of officials, and so on. A roster of all
officials will be maintained together with the list of events each official will
be involved in. Different equipment is needed for the events (e.g., goal posts,
poles, parallel bars) as well as for maintenance. The two types of facilities
(one-sport and multisport) will have different types of information. For
each type, the number of facilities needed is kept, together with an approxi-
mate budget.
19. Identify all the important concepts represented in the library database case
study described below. In particular, identify the abstractions of classifica-
tion (entity types and relationship types), aggregation, identification, and
specialization/generalization. Specify (min, max) cardinality constraints
whenever possible. List details that will affect the eventual design but that
have no bearing on the conceptual design. List the semantic constraints sep-
arately. Draw an EER diagram of the library database.
Books can be checked out for 21 days. Members are allowed to have only five
books out at a time. Members usually return books within three to four
weeks. Most members know that they have one week of grace before a notice
is sent to them, so they try to return books before the grace period ends.
About 5 percent of the members have to be sent reminders to return books.
Most overdue books are returned within a month of the due date.
Approximately 5 percent of the overdue books are either kept or never
returned. The most active members of the library are defined as those who
borrow books at least ten times during the year. The top 1 percent of mem-
bership does 15 percent of the borrowing, and the top 10 percent of the
membership does 40 percent of the borrowing. About 20 percent of the
members are totally inactive in that they are members who never borrow.
20. Design a database to keep track of information for an art museum. Assume
that the following requirements were collected:
■ The museum has a collection of ART_OBJECTS. Each ART_OBJECT has a
unique Id_no, an Artist (if known), a Year (when it was created, if known),
a Title, and a Description. The art objects are categorized in several ways, as
discussed below.
■ ART_OBJECTS are categorized based on their type. There are three main
types: PAINTING, SCULPTURE, and STATUE, plus another type called
OTHER to accommodate objects that do not fall into one of the three
main types.
Draw an EER schema diagram for this application. Discuss any assumptions
you make, and that justify your EER design choices.
21. Figure 12 shows an example of an EER diagram for a small private airport
database that is used to keep track of airplanes, their owners, airport
employees, and pilots. From the requirements for this database, the follow-
ing information was collected: Each AIRPLANE has a registration number
[Reg#], is of a particular plane type [OF_TYPE], and is stored in a particular
hangar [STORED_IN]. Each PLANE_TYPE has a model number [Model], a
capacity [Capacity], and a weight [Weight]. Each HANGAR has a number
[Number], a capacity [Capacity], and a location [Location]. The database also
keeps track of the OWNERs of each plane [OWNS] and the EMPLOYEEs who
have maintained the plane [MAINTAIN]. Each relationship instance in OWNS
relates an AIRPLANE to an OWNER and includes the purchase date [Pdate].
Each relationship instance in MAINTAIN relates an EMPLOYEE to a service
record [SERVICE]. Each plane undergoes service many times; hence, it is
related by [PLANE_SERVICE] to a number of SERVICE records. A SERVICE
record includes as attributes the date of maintenance [Date], the number of
hours spent on the work [Hours], and the type of work done [Work_code].
We use a weak entity type [SERVICE] to represent airplane service, because
the airplane registration number is used to identify a service record. An
OWNER is either a person or a corporation. Hence, we use a union type (cat-
egory) [OWNER] that is a subset of the union of corporation
[CORPORATION] and person [PERSON] entity types. Both pilots [PILOT]
and employees [EMPLOYEE] are subclasses of PERSON. Each PILOT has
10. EMPLOYEE SSN
11. FURNITURE CHAIR
12. CHAIR Weight
13. HUMAN WOMAN
14. SOLDIER PERSON
15. ENEMY_COMBATANT PERSON
22. Show how the UNIVERSITY EER schema in Figure 9 may be represented in
UML notation.
23. Consider the entity sets and attributes shown in the table below. Place a
checkmark in one column in each row to indicate the relationship between
the far left and right columns.
a. The left side has a relationship with the right side.
b. The right side is an attribute of the left side.
c. The left side is a specialization of the right side.
d. The left side is a generalization of the right side.
24. Draw a UML diagram for storing a played game of chess in a database. You
may look at http://www.chessgames.com for an application similar to what
you are designing. State clearly any assumptions you make in your UML dia-
gram. A sample of assumptions you can make about the scope is as follows:
1. The game of chess is played between two players.
2. The game is played on an 8 × 8 board like the one shown below:
3. The players are assigned a color of black or white at the start of the game.
4. Each player starts with the following pieces (traditionally called chess-
men):
a. king
b. queen
c. 2 rooks
d. 2 bishops
e. 2 knights
f. 8 pawns
5. Every piece has its own initial position.
6. Every piece has its own set of legal moves based on the state of the game.
You do not need to worry about which moves are or are not legal except
for the following issues:
a. A piece may move to an empty square or capture an opposing piece.
b. If a piece is captured, it is removed from the board.
c. If a pawn moves to the last row, it is “promoted” by converting it to
another piece (queen, rook, bishop, or knight).
Note: Some of these functions may be spread over multiple classes.
25. Draw an EER diagram for a game of chess as described in Exercise 24. Focus
on persistent storage aspects of the system. For example, the system would
need to retrieve all the moves of every game played in sequential order.
26. Which of the following EER diagrams is/are incorrect and why? State clearly
any assumptions you make.
a.
b.
c.
27. Consider the following EER diagram that describes the computer systems at
a company. Provide your own attributes and key for each entity type. Supply
max cardinality constraints justifying your choice. Write a complete narra-
tive description of what this EER diagram represents.
points earned by individual students in their classes. The data requirements are summarized as
follows:
■ Each student is identified by a unique identifier, first and last name, and an e-mail address.
■ Each instructor teaches certain courses each term. Each course is identified by a course num-
ber, a section number, and the term in which it is taught. For each course he or she teaches, the
■ Students are enrolled in each course taught by the instructor.
29. Consider an ONLINE_AUCTION database system in which members (buyers
and sellers) participate in the sale of items. The data requirements for this
system are summarized as follows:
■ A member may be a buyer or a seller. A buyer has a shipping address
recorded in the database. A seller has a bank account number and routing
number recorded in the database.
■ Buyers make bids for items they are interested in. Bid price and time of
bid is recorded. The bidder at the end of the auction with the highest bid
price is declared the winner and a transaction between buyer and seller
may then proceed.
30. Consider a database system for a baseball organization such as the major
leagues. The data requirements are summarized as follows:
■ Within the players group is a subset of players called pitchers. Pitchers
have a lifetime ERA (earned run average) associated with them.
■ Teams are uniquely identified by their names. Teams are also described by
the city in which they are located and the division and league in which
they play (such as Central division of the American League).
■ Teams have one manager, a number of coaches, and a number of players.
31. Consider the EER diagram for the UNIVERSITY database shown in Figure 9.
Enter this design using a data modeling tool such as ERwin or Rational Rose.
Make a list of the differences in notation between the diagram in the text and
the corresponding equivalent diagrammatic notation you end up using with
the tool.
32. Consider the EER diagram for the small AIRPORT database shown in Figure
12. Build this design using a data modeling tool such as ERwin or Rational
Rose. Be careful as to how you model the category OWNER in this diagram.
(Hint: Consider using CORPORATION_IS_OWNER and PERSON_IS_
OWNER as two distinct relationship types.)
A survey of semantic data modeling appears in Hull and King (1987). Eick (1991)
discusses design and transformations of conceptual schemas. Analysis of con-
straints for n-ary relationships is given in Soutou (1998). UML is described in detail
in Booch, Rumbaugh, and Jacobson (1999). Fowler and Scott (2000) and Stevens
and Pooley (2000) give concise introductions to UML concepts.
Fensel (2000, 2003) discuss the Semantic Web and application of ontologies.
Uschold and Gruninger (1996) and Gruber (1995) discuss ontologies. The June
2002 issue of Communications of the ACM is devoted to ontology concepts and
applications. Fensel (2003) is a book that discusses ontologies and e-commerce.
This chapter discusses how to design a relationaldatabase schema based on a conceptual schema
design. In this chapter we focus on the logical database design or data model map-
ping step of database design. We present the procedures to create a relational
schema from an Entity-Relationship (ER) or an Enhanced ER (EER) schema. Our
discussion relates the constructs of the ER and EER models to the constructs of the
relational model. Many computer-aided software engineering (CASE) tools are
based on the ER or EER models, or other similar models. Many tools use ER or EER
diagrams or variations to develop the schema graphically, and then convert it auto-
matically into a relational database schema in the DDL of a specific relational
DBMS by employing algorithms similar to the ones presented in this chapter.
We outline a seven-step algorithm in Section 1 to convert the basic ER model con-
structs—entity types (strong and weak), binary relationships (with various struc-
tural constraints), n-ary relationships, and attributes (simple, composite, and
multivalued)—into relations. Then, in Section 2, we continue the mapping algo-
rithm by describing how to map EER model constructs—specialization/generaliza-
tion and union types (categories)—into relations. Section 3 summarizes the
chapter.
From Chapter 9 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
assume that the mapping will create tables with simple single-valued attributes.
Relational model constraints, which include primary keys, unique keys (if any), and
referential integrity constraints on the relations, will also be specified in the map-
ping results.
Step 1: Mapping of Regular Entity Types. For each regular (strong) entity type
E in the ER schema, create a relation R that includes all the simple attributes of E.
Include only the simple component attributes of a composite attribute. Choose one
of the key attributes of E as the primary key for R. If the chosen key of E is a com-
posite, then the set of simple attributes that form it will together form the primary
key of R.
If multiple keys were identified for E during the conceptual design, the information
describing the attributes that form each additional key is kept in order to specify
secondary (unique) keys of relation R. Knowledge about keys is also kept for index-
ing purposes and other types of analyses.
The relations that are created from the mapping of entity types are sometimes called
entity relations because each tuple represents an entity instance. The result after
this mapping step is shown in Figure 3(a).
Step 2: Mapping of Weak Entity Types. For each weak entity type W in the ER
schema with owner entity type E, create a relation R and include all simple attrib-
utes (or simple components of composite attributes) of W as attributes of R. In
addition, include as foreign key attributes of R, the primary key attribute(s) of the
relation(s) that correspond to the owner entity type(s); this takes care of mapping
the identifying relationship type of W. The primary key of R is the combination of
the primary key(s) of the owner(s) and the partial key of the weak entity type W, if
any.
If there is a weak entity type E2 whose owner is also a weak entity type E1, then E1
should be mapped before E2 to determine its primary key first.
necessary. The primary key of the DEPENDENT relation is the combination {Essn,
Dependent_name}, because Dependent_name (also renamed from Name in Figure 1)
is the partial key of DEPENDENT.
Step 3: Mapping of Binary 1:1 Relationship Types. For each binary 1:1 rela-
tionship type R in the ER schema, identify the relations S and T that correspond to
the entity types participating in R. There are three possible approaches: (1) the for-
eign key approach, (2) the merged relationship approach, and (3) the cross-
reference or relationship relation approach. The first approach is the most useful
and should be followed unless special conditions exist, as we discuss below.
1. Foreign key approach: Choose one of the relations—S, say—and include as
a foreign key in S the primary key of T. It is better to choose an entity type
with total participation in R in the role of S. Include all the simple attributes
(or simple components of composite attributes) of the 1:1 relationship type
R as attributes of S.
2. Merged relation approach: An alternative mapping of a 1:1 relationship
type is to merge the two entity types and the relationship into a single rela-
tion. This is possible when both participations are total, as this would indicate
that the two tables will have the exact same number of tuples at all times.
3. Cross-reference or relationship relation approach: The third option is to
set up a third relation R for the purpose of cross-referencing the primary
keys of the two relations S and T representing the entity types. As we will see,
this approach is required for binary M:N relationships. The relation R is
called a relationship relation (or sometimes a lookup table), because each
Step 4: Mapping of Binary 1:N Relationship Types. For each regular binary
1:N relationship type R, identify the relation S that represents the participating entity
type at the N-side of the relationship type. Include as foreign key in S the primary key
of the relation T that represents the other entity type participating in R; we do this
because each entity instance on the N-side is related to at most one entity instance on
the 1-side of the relationship type. Include any simple attributes (or simple compo-
nents of composite attributes) of the 1:N relationship type as attributes of S.
An alternative approach is to use the relationship relation (cross-reference) option
as in the third option for binary 1:1 relationships. We create a separate relation R
whose attributes are the primary keys of S and T, which will also be foreign keys to
S and T. The primary key of R is the same as the primary key of S. This option can
be used if few tuples in S participate in the relationship to avoid excessive NULL val-
ues in the foreign key.
Step 5: Mapping of Binary M:N Relationship Types. For each binary M:N
relationship type R, create a new relation S to represent R. Include as foreign key
attributes in S the primary keys of the relations that represent the participating
entity types; their combination will form the primary key of S. Also include any sim-
ple attributes of the M:N relationship type (or simple components of composite
attributes) as attributes of S. Notice that we cannot represent an M:N relationship
type by a single foreign key attribute in one of the participating relations (as we did
for 1:1 or 1:N relationship types) because of the M:N cardinality ratio; we must cre-
ate a separate relationship relation S.
Notice that we can always map 1:1 or 1:N relationships in a manner similar to M:N
relationships by using the cross-reference (relationship relation) approach, as we
discussed earlier. This alternative is particularly useful when few relationship
instances exist, in order to avoid NULL values in foreign keys. In this case, the pri-
mary key of the relationship relation will be only one of the foreign keys that refer-
ence the participating entity relations. For a 1:N relationship, the primary key of the
relationship relation will be the foreign key that references the entity relation on the
N-side. For a 1:1 relationship, either foreign key can be used as the primary key of
the relationship relation.
Step 6: Mapping of Multivalued Attributes. For each multivalued attribute A,
create a new relation R. This relation R will include an attribute corresponding to A,
plus the primary key attribute K—as a foreign key in R—of the relation that repre-
sents the entity type or relationship type that has A as a multivalued attribute. The
primary key of R is the combination of A and K. If the multivalued attribute is com-
posite, we include its simple components.
In our example, we create a relation DEPT_LOCATIONS (see Figure 3(d)). The
attribute Dlocation represents the multivalued attribute LOCATIONS of
DEPARTMENT, while Dnumber—as foreign key—represents the primary key of the
DEPARTMENT relation. The primary key of DEPT_LOCATIONS is the combination
of {Dnumber, Dlocation}. A separate tuple will exist in DEPT_LOCATIONS for each
location that a department has.
Step 7: Mapping of N-ary Relationship Types. For each n-ary relationship
type R, where n > 2, create a new relation S to represent R. Include as foreign key
attributes in S the primary keys of the relations that represent the participating
entity types. Also include any simple attributes of the n-ary relationship type (or
. . .
. . .
. . .
simple components of composite attributes) as attributes of S. The primary key of S
is usually a combination of all the foreign keys that reference the relations repre-
senting the participating entity types. However, if the cardinality constraints on any
of the entity types E participating in R is 1, then the primary key of S should not
include the foreign key attribute that references the relation E� corresponding to E.
One of the main points to note in a relational schema, in contrast to an ER schema,
is that relationship types are not represented explicitly; instead, they are represented
by having two attributes A and B, one a primary key and the other a foreign key
(over the same domain) included in two relations S and T. Two tuples in S and T are
related when they have the same value for A and B. By using the EQUIJOIN opera-
tion (or NATURAL JOIN if the two join attributes have the same name) over S.A and
T.B, we can combine all pairs of related tuples from S and T and materialize the
relationship. When a binary 1:1 or 1:N relationship type is involved, a single join
operation is usually needed. For a binary M:N relationship type, two join operations
are needed, whereas for n-ary relationship types, n joins are needed to fully materi-
alize the relationship instances.
For example, to form a relation that includes the employee name, project name, and
hours that the employee works on each project, we need to connect each EMPLOYEE
tuple to the related PROJECT tuples via the WORKS_ON relation in Figure 2. Hence,
we must apply the EQUIJOIN operation to the EMPLOYEE and WORKS_ON relations
with the join condition Ssn = Essn, and then apply another EQUIJOIN operation to
the resulting relation and the PROJECT relation with join condition Pno = Pnumber.
In general, when multiple relationships need to be traversed, numerous join opera-
tions must be specified. A relational database user must always be aware of the for-
eign key attributes in order to use them correctly in combining related tuples from
two or more relations. This is sometimes considered to be a drawback of the rela-
tional data model, because the foreign key/primary key correspondences are not
always obvious upon inspection of relational schemas. If an EQUIJOIN is performed
among attributes of two relations that do not represent a foreign key/primary key
relationship, the result can often be meaningless and may lead to spurious data. For
example, the reader can try joining the PROJECT and DEPT_LOCATIONS relations
on the condition Dlocation = Plocation and examine the result.
In the relational schema we create a separate relation for each multivalued attribute.
For a particular entity with a set of values for the multivalued attribute, the key
attribute value of the entity is repeated once for each value of the multivalued
attribute in a separate tuple because the basic relational model does not allow mul-
tiple values (a list, or a set of values) for an attribute in a single tuple. For example,
because department 5 has three locations, three tuples exist in the
DEPT_LOCATIONS relation in Figure A.2; each tuple specifies one of the locations.
In our example, we apply EQUIJOIN to DEPT_LOCATIONS and DEPARTMENT on the
Dnumber attribute to get the values of all locations along with other DEPARTMENT
attributes. In the resulting relation, the values of the other DEPARTMENT attributes
are repeated in separate tuples for every location that a department has.
Next, we discuss the mapping of EER model constructs to relations by extending the
ER-to-relational mapping algorithm that was presented in Section 1.1.
Step 8: Options for Mapping Specialization or Generalization. Convert
each specialization with m subclasses {S1, S2, …, Sm} and (generalized) superclass C,
where the attributes of C are {k, a1, …an} and k is the (primary) key, into relation
schemas using one of the following options:
■ Option 8A: Multiple relations—superclass and subclasses. Create a rela-
tion L for C with attributes Attrs(L) = {k, a1, …, an} and PK(L) = k. Create a
relation Li for each subclass Si, 1 ≤ i ≤ m, with the attributes Attrs(Li) = {k} ∪
{attributes of Si} and PK(Li) = k. This option works for any specialization
(total or partial, disjoint or overlapping).
■ Option 8B: Multiple relations—subclass relations only. Create a relation
Li for each subclass Si, 1 ≤ i ≤ m, with the attributes Attrs(Li) = {attributes of
Si} ∪ {k, a1, …, an} and PK(Li) = k. This option only works for a specialization
whose subclasses are total (every entity in the superclass must belong to (at
least) one of the subclasses). Additionally, it is only recommended if the spe-
cialization has the disjointedness constraint. If the specialization is overlapping,
the same entity may be duplicated in several relations.
■ Option 8C: Single relation with one type attribute. Create a single relation
L with attributes Attrs(L) = {k, a1, …, an} ∪ {attributes of S1} ∪ … ∪ {attrib-
utes of Sm} ∪ {t} and PK(L) = k. The attribute t is called a type (or
Description Mflag Drawing_no Batch_no Pflag List_priceSupplier_nameManufacture_date
■ Option 8D: Single relation with multiple type attributes. Create a single
relation schema L with attributes Attrs(L) = {k, a1, …, an} ∪ {attributes of S1}
∪ … ∪ {attributes of Sm} ∪ {t1, t2, …, tm} and PK(L) = k. Each ti, 1 ≤ i ≤ m, is
a Boolean type attribute indicating whether a tuple belongs to subclass Si.
This option is used for a specialization whose subclasses are overlapping (but
will also work for a disjoint specialization).
must hold for each Li. This specifies a foreign key from each Li to L, as well as an
inclusion dependency Li.k < L.k.
Figure 5
Options for mapping specialization
or generalization. (a) Mapping the
EER schema in Figure A.3 using
option 8A. (b) Mapping the EER
schema in Figure A.4(b) using
option 8B. (c) Mapping the EER
schema in Figure A.3 using option
8C. (d) Mapping Figure A.5 using
option 8D with Boolean type fields
Mflag and Pflag.
297
Relational Database Design by ER- and EER-to-Relational Mapping
In option 8B, the EQUIJOIN operation between each subclass and the superclass is
built into the schema and the relation L is done away with, as illustrated in Figure
5(b) for the EER specialization in Figure A.4(b). This option works well only when
both the disjoint and total constraints hold. If the specialization is not total, an
entity that does not belong to any of the subclasses Si is lost. If the specialization is
not disjoint, an entity belonging to more than one subclass will have its inherited
attributes from the superclass C stored redundantly in more than one Li. With
option 8B, no relation holds all the entities in the superclass C; consequently, we
must apply an OUTER UNION (or FULL OUTER JOIN) operation to the Li relations to
retrieve all the entities in C. The result of the outer union will be similar to the rela-
tions under options 8C and 8D except that the type fields will be missing. Whenever
we search for an arbitrary entity in C, we must search all the m relations Li.
Options 8C and 8D create a single relation to represent the superclass C and all its
subclasses. An entity that does not belong to some of the subclasses will have NULL
values for the specific attributes of these subclasses. These options are not recom-
mended if many specific attributes are defined for the subclasses. If few specific sub-
class attributes exist, however, these mappings are preferable to options 8A and 8B
because they do away with the need to specify EQUIJOIN and OUTER UNION opera-
tions; therefore, they can yield a more efficient implementation.
Option 8C is used to handle disjoint subclasses by including a single type (or image
or discriminating) attribute t to indicate to which of the m subclasses each tuple
belongs; hence, the domain of t could be {1, 2, ..., m}. If the specialization is partial,
t can have NULL values in tuples that do not belong to any subclass. If the specializa-
tion is attribute-defined, that attribute serves the purpose of t and t is not needed;
this option is illustrated in Figure 5(c) for the EER specialization in Figure A.3.
Option 8D is designed to handle overlapping subclasses by including m Boolean
type (or flag) fields, one for each subclass. It can also be used for disjoint subclasses.
Each type field ti can have a domain {yes, no}, where a value of yes indicates that the
tuple is a member of subclass Si. If we use this option for the EER specialization in
Figure A.3, we would include three types attributes—Is_a_secretary, Is_a_engineer,
and Is_a_technician—instead of the Job_type attribute in Figure 5(c). Notice that it is
also possible to create a single type attribute of m bits instead of the m type fields.
Figure 5(d) shows the mapping of the specialization from Figure A.5 using option
8D.
When we have a multilevel specialization (or generalization) hierarchy or lattice, we
do not have to follow the same mapping option for all the specializations. Instead,
we can use one mapping option for part of the hierarchy or lattice and other options
for other parts. Figure 6 shows one possible mapping into relations for the EER
lattice in Figure A.6. Here we used option 8A for PERSON/{EMPLOYEE, ALUMNUS,
STUDENT}, option 8C for EMPLOYEE/{STAFF, FACULTY, STUDENT_ASSISTANT} by
including the type attribute Employee_type, and option 8D for
STUDENT_ASSISTANT/{RESEARCH_ASSISTANT, TEACHING_ ASSISTANT} by
including the type attributes Ta_flag and Ra_flag in EMPLOYEE, STUDENT/
298
Relational Database Design by ER- and EER-to-Relational Mapping
EMPLOYEE
Salary Employee_type Position Rank Percent_time Ra_flag Ta_flag Project Course
STUDENT
Major_dept Grad_flag Undergrad_flag Degree_program Class Student_assist_flag
Name Birth_date Sex Address
PERSON
Ssn
ALUMNUS ALUMNUS_DEGREES
Year MajorSsn
Ssn
Ssn
Ssn Degree
Figure 6
Mapping the EER specialization lattice in Figure A.7 using multiple options.
STUDENT_ASSISTANT by including the type attributes Student_assist_flag in
STUDENT, and STUDENT/{GRADUATE_STUDENT, UNDERGRADUATE_STUDENT}
by including the type attributes Grad_flag and Undergrad_flag in STUDENT. In Figure
6, all attributes whose names end with type or flag are type fields.
2.2 Mapping of Shared Subclasses (Multiple Inheritance)
A shared subclass, such as ENGINEERING_MANAGER in Figure A.6, is a subclass of
several superclasses, indicating multiple inheritance. These classes must all have the
same key attribute; otherwise, the shared subclass would be modeled as a category
(union type). We can apply any of the options discussed in step 8 to a shared sub-
class, subject to the restrictions discussed in step 8 of the mapping algorithm. In
Figure 6, options 8C and 8D are used for the shared subclass STUDENT_ASSISTANT.
Option 8C is used in the EMPLOYEE relation (Employee_type attribute) and option
8D is used in the STUDENT relation (Student_assist_flag attribute).
2.3 Mapping of Categories (Union Types)
We add another step to the mapping procedure—step 9—to handle categories. A
category (or union type) is a subclass of the union of two or more superclasses that
can have different keys because they can be of different entity types. An example is
the OWNER category shown in Figure A.7, which is a subset of the union of three
entity types PERSON, BANK, and COMPANY. The other category in that figure,
REGISTERED_VEHICLE, has two superclasses that have the same key attribute.
299
Relational Database Design by ER- and EER-to-Relational Mapping
Driver_license_no Name Address Owner_id
PERSON
Ssn
BANK
Baddress Owner_idBname
COMPANY
Caddress Owner_idCname
OWNER
Owner_id
REGISTERED_VEHICLE
License_plate_number Vehicle_id
CAR
Cstyle Cmake Cmodel CyearVehicle_id
TRUCK
Tmake Tmodel Tonnage TyearVehicle_id
OWNS
Purchase_date Lien_or_regularOwner_id Vehicle_id
Figure 7
Mapping the EER categories
(union types) in Figure A.7 to
relations.
Step 9: Mapping of Union Types (Categories). For mapping a category whose
defining superclasses have different keys, it is customary to specify a new key attrib-
ute, called a surrogate key, when creating a relation to correspond to the category.
The keys of the defining classes are different, so we cannot use any one of them
exclusively to identify all entities in the category. In our example in Figure A.7, we
create a relation OWNER to correspond to the OWNER category, as illustrated in
Figure 7, and include any attributes of the category in this relation. The primary key
of the OWNER relation is the surrogate key, which we called Owner_id. We also
include the surrogate key attribute Owner_id as foreign key in each relation corre-
sponding to a superclass of the category, to specify the correspondence in values
between the surrogate key and the key of each superclass. Notice that if a particular
PERSON (or BANK or COMPANY) entity is not a member of OWNER, it would have
a NULL value for its Owner_id attribute in its corresponding tuple in the PERSON (or
BANK or COMPANY) relation, and it would not have a tuple in the OWNER relation.
It is also recommended to add a type attribute (not shown in Figure 7) to the
OWNER relation to indicate the particular entity type to which each tuple belongs
(PERSON or BANK or COMPANY).
300
Relational Database Design by ER- and EER-to-Relational Mapping
For a category whose superclasses have the same key, such as VEHICLE in Figure A.7,
there is no need for a surrogate key. The mapping of the REGISTERED_VEHICLE
category, which illustrates this case, is also shown in Figure 7.
3 Summary
In Section 1, we showed how a conceptual schema design in the ER model can be
mapped to a relational database schema. An algorithm for ER-to-relational map-
ping was given and illustrated by examples from the COMPANY database. Table 1
summarized the correspondences between the ER and relational model constructs
and constraints. Next, we added additional steps to the algorithm in Section 2 for
mapping the constructs from the EER model into the relational model. Similar
algorithms are incorporated into graphical database design tools to create a rela-
tional schema from a conceptual schema design automatically.
Review Questions
1. Discuss the correspondences between the ER model constructs and the rela-
tional model constructs. Show how each ER model construct can be mapped
to the relational model and discuss any alternative mappings.
2. Discuss the options for mapping EER model constructs to relations.
Exercises
3. Try to map the relational schema in Figure 14 of the chapter “The Relational
Algebra and Relational Calculus” into an ER schema. This is part of a process
known as reverse engineering, where a conceptual schema is created for an
existing implemented database. State any assumptions you make.
4. Figure 8 shows an ER schema for a database that can be used to keep track of
transport ships and their locations for maritime authorities. Map this
schema into a relational schema and specify all primary keys and foreign
keys.
5. Map the BANK ER schema of Exercise 23 from the chapter “Data Modeling
Using the Entity-Relationship (ER) Model” (shown in Figure 21 in the same
chapter) into a relational schema. Specify all primary keys and foreign keys.
Also, from the same chapter, repeat for the AIRLINE schema (Figure 20) of
Exercise 19 and for the other schemas for Exercises 16 through 24.
6. Map the EER diagrams in Figures 9 and 12 from the chapter “The Enhanced
Entity-Relationship (EER) Model” into relational schemas. Justify your
choice of mapping options.
7. Is it possible to successfully map a binary M:N relationship type without
requiring a new relation? Why or why not?
301
Relational Database Design by ER- and EER-to-Relational Mapping
8. Consider the EER diagram in Figure 9 for a car dealer.
Map the EER schema into a set of relations. For the VEHICLE to
CAR/TRUCK/SUV generalization, consider the four options presented in
Section 2.1 and show the relational schema design under each of those
options.
9. Using the attributes you provided for the EER diagram in Exercise 27 from
the chapter “The Enhanced Entity-Relationship (EER) Model,” map the
complete schema into a set of relations. Choose an appropriate option out of
8A thru 8D from Section 2.1 in doing the mapping of generalizations and
defend your choice.
Time_stamp
Longitude
Latitude
Time
Sname
Owner
Date
Tonnage
Name
Name
Start_date End_date
HullType1
N
1
N
N 1
N 1
(0,*)
(0,*)
1
(1,1)
N
SHIP_MOVEMENT
HISTORY
SHIP TYPE SHIP_TYPE
HOME_PORT
PORT
PORT_VISIT
STATE/COUNTRY
SEA/OCEAN/LAKE
SHIP_AT
_PORT
Pname
Continent
IN
ON
Figure 8
An ER schema for a SHIP_TRACKING database.
302
Laboratory Exercises
10. Consider the ER design for the UNIVERSITY database that was modeled
using a tool like ERwin or Rational Rose in Laboratory Exercise 31 From the
chapter “Data Modeling Using the Entity-Relationship (EER) Model.” Using
the SQL schema generation feature of the modeling tool, generate the SQL
schema for an Oracle database.
11. Consider the ER design for the MAIL_ORDER database that was modeled
using a tool like ERwin or Rational Rose in Laboratory Exercise 32 From the
chapter “Data Modeling Using the Entity-Relationship (EER) Model.” Using
the SQL schema generation feature of the modeling tool, generate the SQL
schema for an Oracle database.
12. Consider the ER design for the CONFERENCE_REVIEW database that was
modeled using a tool like ERwin or Rational Rose in Laboratory Exercise 34
From the chapter “Data Modeling Using the Entity-Relationship (EER)
Model.” Using the SQL schema generation feature of the modeling tool, gen-
erate the SQL schema for an Oracle database.
13. Consider the EER design for the GRADE_BOOK database that was modeled
using a tool like ERwin or Rational Rose in Laboratory Exercise 28 From the
chapter “The Enhanced Entity-Relationship (EER) Model.” Using the SQL
schema generation feature of the modeling tool, generate the SQL schema
for an Oracle database.
14. Consider the EER design for the ONLINE_AUCTION database that was mod-
eled using a tool like ERwin or Rational Rose in Laboratory Exercise 29 From
the chapter “The Enhanced Entity-Relationship (EER) Model.” Using the
SQL schema generation feature of the modeling tool, generate the SQL
schema for an Oracle database.
Relational Database Design by ER- and EER-to-Relational Mapping
Name Name
Model
VEHICLE
Price
Date
Engine_size
Tonnage
No_seats
CAR
TRUCK
SUV
d
SALESPERSON CUSTOMER
Vin
Sid Ssn State
Address City
Street
SALE
1 1
N
Figure 9
EER diagram for
a car dealer
303
Selected Bibliography
The original ER-to-relational mapping algorithm was described in Chen’s classic
paper (Chen 1976) that presented the original ER model. Batini et al. (1992) discuss
a variety of mapping algorithms from ER and EER models to legacy models and
vice versa.
Relational Database Design by ER- and EER-to-Relational Mapping
(a) SUPPLY
Sname
Part_no
SUPPLIER
Quantity
PROJECT
PART
Proj_name
(b)
(c)
Part_no
PART
N
Sname
SUPPLIER
Proj_name
PROJECT
N
Quantity
SUPPLY
N1
Part_no
M N
CAN_SUPPLY
N
M
Sname
SUPPLIER
Proj_name
PROJECT
USES
PART
M
N
SUPPLIES
SP
SPJSS
1
1
Figure A.1
Ternary relationship types. (a) The SUPPLY
relationship. (b) Three binary relationships
not equivalent to SUPPLY. (c) SUPPLY rep-
resented as a weak entity type.
304
Relational Database Design by ER- and EER-to-Relational Mapping
DEPT_LOCATIONS
Dnumber
Houston
Stafford
Bellaire
Sugarland
Dlocation
DEPARTMENT
Dname
Research
Administration
Headquarters 1
5
4
888665555
333445555
987654321
1981-06-19
1988-05-22
1995-01-01
Dnumber Mgr_ssn Mgr_start_date
WORKS_ON
Essn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
Pno Hours
PROJECT
Pname
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
3
1
2
30
10
20
5
5
5
4
4
1
Houston
Bellaire
Sugarland
Stafford
Stafford
Houston
Pnumber Plocation Dnum
DEPENDENT
333445555
333445555
333445555
987654321
123456789
123456789
123456789
Joy
Alice F
M
F
M
M
F
F
1986-04-05
1983-10-25
1958-05-03
1942-02-28
1988-01-04
1988-12-30
1967-05-05
Theodore
Alice
Elizabeth
Abner
Michael
Spouse
Daughter
Son
Daughter
Spouse
Spouse
Son
Dependent_name Sex Bdate Relationship
EMPLOYEE
Fname
John
Franklin
Jennifer
Alicia
Ramesh
Joyce
James
Ahmad
Narayan
English
Borg
Jabbar
666884444
453453453
888665555
987987987
F
F
M
M
M
M
M
F
4
4
5
5
4
1
5
5
25000
43000
30000
40000
25000
55000
38000
25000
987654321
888665555
333445555
888665555
987654321
NULL
333445555
333445555
Zelaya
Wallace
Smith
Wong
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
731 Fondren, Houston, TX
638 Voss, Houston, TX
1968-01-19
1941-06-20
1965-01-09
1955-12-08
1969-03-29
1937-11-10
1962-09-15
1972-07-31
980 Dallas, Houston, TX
450 Stone, Houston, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
999887777
987654321
123456789
333445555
Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn
B
T
J
S
K
A
V
E
Houston
1
4
5
5
Essn
5
Figure A.2
One possible database state for the COMPANY relational database schema.
305
Relational Database Design by ER- and EER-to-Relational Mapping
d
Minit Lname
Name Birth_date Address Job_typeSsn
Fname
Eng_typeTgrade
‘Technician’
Job_type
‘Secretary’ ‘Engineer’
Typing_speed
SECRETARY TECHNICIAN ENGINEER
EMPLOYEE
Figure A.3
EER diagram notation
for an attribute-defined
specialization on
Job_type.
(a)
(b)
Max_speed
Vehicle_id
No_of_passengers
License_plate_no
CAR Price Price
License_plate_no
No_of_axles
Vehicle_id
Tonnage
TRUCK
Vehicle_id Price License_plate_no
VEHICLE
No_of_passengers
Max_speed
CAR TRUCK
No_of_axles
Tonnage
d
Figure A.4
Generalization. (a) Two entity types, CAR and TRUCK. (b)
Generalizing CAR and TRUCK into the superclass VEHICLE.
306
Relational Database Design by ER- and EER-to-Relational Mapping
Part_no Description
PARTManufacture_date
Drawing_no
PURCHASED_PART
Supplier_name
Batch_no
List_price
o
MANUFACTURED_PART
Figure A.5
EER diagram notation
for an overlapping
(nondisjoint)
specialization.
d
HOURLY_EMPLOYEE
SALARIED_EMPLOYEE
ENGINEERING_MANAGER
SECRETARY TECHNICIAN ENGINEER MANAGER
EMPLOYEE
d
Figure A.6
A specialization lattice with shared subclass
ENGINEERING_MANAGER.
307
Relational Database Design by ER- and EER-to-Relational Mapping
Name Address
Driver_license_no
Ssn
License_plate_no
Lien_or_regular
Purchase_date
Bname Baddress
Cname Caddress
BANK
PERSON
OWNER
OWNS
M
N
U
REGISTERED_VEHICLE
COMPANY
U
Cstyle
Cyear
Vehicle_id
Cmake
Cmodel
CAR
Tonnage
Tyear
Vehicle_id
Tmake
Tmodel
TRUCK
Figure A.7
Two categories (union
types): OWNER and
REGISTERED_VEHICLE.
308
Practical Database Design
Methodology and Use
of UML Diagrams
In this chapter we examine some of the practicalaspects of database design.
The overall database design activity has to undergo a systematic process called the
design methodology, whether the target database is managed by a relational data-
base management system (RDBMS), an object database management system
(ODBMS), an object-relational database management system (ORDBMS), or some
other type of database management system. Various design methodologies are pro-
vided in the database design tools currently supplied by vendors. Popular tools
include Oracle Designer and related products in Oracle Developer Suite by Oracle,
ERwin and related products by CA, PowerBuilder and PowerDesigner by Sybase,
and ER/Studio and related products by Embarcadero Technologies, among many
others. Our goal in this chapter is to discuss not one specific methodology but
rather database design in a broader context, as it is undertaken in large organiza-
tions for the design and implementation of applications catering to hundreds or
thousands of users.
From Chapter 10 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
309
Practical Database Design Methodology and Use of UML Diagrams
Generally, the design of small databases with perhaps up to 20 users need not be
very complicated. But for medium-sized or large databases that serve several diverse
application groups, each with dozens or hundreds of users, a systematic approach to
the overall database design activity becomes necessary. The sheer size of a populated
database does not reflect the complexity of the design; it is the database schema that
is the more important focus of database design. Any database with a schema that
includes more than 20 entity types and a similar number of relationship types
requires a careful design methodology.
Using the term large database for databases with several dozen gigabytes of data
and a schema with more than 30 or 40 distinct entity types, we can cover a wide
array of databases used in government, industry, and financial and commercial
institutions. Service sector industries, including banking, hotels, airlines, insurance,
utilities, and communications, use databases for their day-to-day operations 24
hours a day, 7 days a week—known in the industry as 24 by 7 operations.
Application systems for these databases are called transaction processing systems due
to the large transaction volumes and rates that are required. In this chapter we will
concentrate on the database design for such medium- and large-scale databases
where transaction processing dominates.
This chapter has a variety of objectives. Section 1 discusses the information system
life cycle within organizations with a particular emphasis on the database system.
Section 2 highlights the phases of a database design methodology within the organi-
zational context. Section 3 introduces some types of UML diagrams and gives
details on the notations that are particularly helpful in collecting requirements and
performing conceptual and logical design of databases. An illustrative partial exam-
ple of designing a university database is presented. Section 4 introduces the popular
software development tool called Rational Rose, which uses UML diagrams as its
main specification technique. Features of Rational Rose specific to database require-
ments modeling and schema design are highlighted. Section 5 briefly discusses
automated database design tools. Section 6 summarizes the chapter.
1 The Role of Information Systems
in Organizations
1.1 The Organizational Context
for Using Database Systems
Database systems have become a part of the information systems of many organiza-
tions. Historically, information systems were dominated by file systems in the 1960s,
but since the early 1970s organizations have gradually moved to database manage-
ment systems (DBMSs). To accommodate DBMSs, many organizations have created
the position of database administrator (DBA) and database administration depart-
ments to oversee and control database life-cycle activities. Similarly, information
technology (IT) and information resource management (IRM) departments have
310
Practical Database Design Methodology and Use of UML Diagrams
been recognized by large organizations as being key to successful business manage-
ment for the following reasons:
■ Data is regarded as a corporate resource, and its management and control is
considered central to the effective working of the organization.
■ More functions in organizations are computerized, increasing the need to
keep large volumes of data available in an up-to-the-minute current state.
■ As the complexity of the data and applications grows, complex relationships
among the data need to be modeled and maintained.
■ There is a tendency toward consolidation of information resources in many
organizations.
■ Many organizations are reducing their personnel costs by letting end users
perform business transactions. This is evident with travel services, financial
services, higher education, government, and many other types of services.
This trend was realized early on by online retail goods outlets and customer-
to-business electronic commerce, such as Amazon.com and eBay. In these
organizations, a publicly accessible and updatable operational database must
be designed and made available for the customer transactions.
Many capabilities provided by database systems have made them integral compo-
nents in computer-based information systems. The following are some of the key
features that they offer:
■ Integrating data across multiple applications into a single database.
■ Support for developing new applications in a short time by using high-level
languages like SQL.
■ Providing support for casual access for browsing and querying by managers
while supporting major production-level transaction processing for cus-
tomers.
From the early 1970s through the mid-1980s, the move was toward creating large
centralized repositories of data managed by a single centralized DBMS. Since then,
the trend has been toward utilizing distributed systems because of the following
developments:
1. Personal computers and database system-like software products such as
Excel, Visual FoxPro, Access (Microsoft), and SQL Anywhere (Sybase), and
public domain products such as MySQL and PostgreSQL, are being heavily
utilized by users who previously belonged to the category of casual and occa-
sional database users. Many administrators, secretaries, engineers, scientists,
architects, and students belong to this category. As a result, the practice of
creating personal databases is gaining popularity. It is sometimes possible to
check out a copy of part of a large database from a mainframe computer or a
database server, work on it from a personal workstation, and then restore it
on the mainframe. Similarly, users can design and create their own databases
and then merge them into a larger one.
311
Practical Database Design Methodology and Use of UML Diagrams
2. The advent of distributed and client-server DBMSs is opening up the option
of distributing the database over multiple computer systems for better local
control and faster local processing. At the same time, local users can access
remote data using the facilities provided by the DBMS as a client, or through
the Web. Application development tools such as PowerBuilder and
PowerDesigner (Sybase) and OracleDesigner and Oracle Developer Suite
(Oracle) are being used with built-in facilities to link applications to multi-
ple back-end database servers.
3. Many organizations now use data dictionary systems or information
repositories, which are mini DBMSs that manage meta-data—that is, data
that describes the database structure, constraints, applications, authoriza-
tions, users, and so on. These are often used as an integral tool for informa-
tion resource management. A useful data dictionary system should store and
manage the following types of information:
a. Descriptions of the schemas of the database system.
b. Detailed information on physical database design, such as storage struc-
tures, access paths, and file and record sizes.
c. Descriptions of the types of database users, their responsibilities, and
their access rights.
d. High-level descriptions of the database transactions and applications and
of the relationships of users to transactions.
e. The relationship between database transactions and the data items refer-
enced by them. This is useful in determining which transactions are
affected when certain data definitions are changed.
f. Usage statistics such as frequencies of queries and transactions and access
counts to different portions of the database.
g. The history of any changes made to the database and applications, and
documentation that describes the reasons for these changes. This is some-
times referred to as data provenance.
This meta-data is available to DBAs, designers, and authorized users as online sys-
tem documentation. This improves the control of DBAs over the information sys-
tem as well as the users’ understanding and use of the system. The advent of data
warehousing technology has highlighted the importance of meta-data.
When designing high-performance transaction processing systems, which require
around-the-clock nonstop operation, performance becomes critical. These data-
bases are often accessed by hundreds, or thousands, of transactions per minute from
remote computers and local terminals. Transaction performance, in terms of the
average number of transactions per minute and the average and maximum transac-
tion response time, is critical. A careful physical database design that meets the
organization’s transaction processing needs is a must in such systems.
Some organizations have committed their information resource management to
certain DBMS and data dictionary products. Their investment in the design and
312
Practical Database Design Methodology and Use of UML Diagrams
implementation of large and complex systems makes it difficult for them to change
to newer DBMS products, which means that the organizations become locked in to
their current DBMS system. With regard to such large and complex databases, we
cannot overemphasize the importance of a careful design that takes into account the
need for possible system modifications—called tuning—to respond to changing
requirements. The cost can be very high if a large and complex system cannot
evolve, and it becomes necessary to migrate to other DBMS products and redesign
the whole system.
1.2 The Information System Life Cycle
In a large organization, the database system is typically part of an information sys-
tem (IS), which includes all resources that are involved in the collection, manage-
ment, use, and dissemination of the information resources of the organization. In a
computerized environment, these resources include the data itself, the DBMS soft-
ware, the computer system hardware and storage media, the personnel who use and
manage the data (DBA, end users, and so on), the application programs (software)
that accesses and updates the data, and the application programmers who develop
these applications. Thus the database system is part of a much larger organizational
information system.
In this section we examine the typical life cycle of an information system and how
the database system fits into this life cycle. The information system life cycle has
been called the macro life cycle, whereas the database system life cycle has been
referred to as the micro life cycle. The distinction between them is becoming less
pronounced for information systems where databases are a major integral compo-
nent. The macro life cycle typically includes the following phases:
1. Feasibility analysis. This phase is concerned with analyzing potential appli-
cation areas, identifying the economics of information gathering and dis-
semination, performing preliminary cost-benefit studies, determining the
complexity of data and processes, and setting up priorities among applica-
tions.
2. Requirements collection and analysis. Detailed requirements are collected
by interacting with potential users and user groups to identify their particu-
lar problems and needs. Interapplication dependencies, communication,
and reporting procedures are identified.
3. Design. This phase has two aspects: the design of the database system and
the design of the application systems (programs) that use and process the
database through retrievals and updates.
4. Implementation. The information system is implemented, the database is
loaded, and the database transactions are implemented and tested.
5. Validation and acceptance testing. The acceptability of the system in meet-
ing users’ requirements and performance criteria is validated. The system is
tested against performance criteria and behavior specifications.
313
Practical Database Design Methodology and Use of UML Diagrams
6. Deployment, operation, and maintenance. This may be preceded by con-
version of users from an older system as well as by user training. The opera-
tional phase starts when all system functions are operational and have been
validated. As new requirements or applications crop up, they pass through
the previous phases until they are validated and incorporated into the sys-
tem. Monitoring of system performance and system maintenance are impor-
tant activities during the operational phase.
1.3 The Database Application System Life Cycle
Activities related to the micro life cycle, which focuses on the database application
system, include the following:
1. System definition. The scope of the database system, its users, and its
applications are defined. The interfaces for various categories of users, the
response time constraints, and storage and processing needs are identified.
2. Database design. A complete logical and physical design of the database
system on the chosen DBMS is prepared.
3. Database implementation. This comprises the process of specifying the
conceptual, external, and internal database definitions, creating the (empty)
database files, and implementing the software applications.
4. Loading or data conversion. The database is populated either by loading
the data directly or by converting existing files into the database system for-
mat.
5. Application conversion. Any software applications from a previous system
are converted to the new system.
6. Testing and validation. The new system is tested and validated. Testing and
validation of application programs can be a very involved process, and the
techniques that are employed are usually covered in software engineering
courses. There are automated tools that assist in this process, but a discus-
sion is outside the scope of this text.
7. Operation. The database system and its applications are put into opera-
tion. Usually, the old and the new systems are operated in parallel for a
period of time.
8. Monitoring and maintenance. During the operational phase, the system is
constantly monitored and maintained. Growth and expansion can occur in
both data content and software applications. Major modifications and reor-
ganizations may be needed from time to time.
Activities 2, 3, and 4 are part of the design and implementation phases of the larger
information system macro life cycle. Our emphasis in Section 2 is on activities 2 and
3, which cover the database design and implementation phases. Most databases
in organizations undergo all of the preceding life cycle activities. The conversion
activities (4 and 5) are not applicable when both the database and the applications
are new. When an organization moves from an established system to a new one,
314
Practical Database Design Methodology and Use of UML Diagrams
activities 4 and 5 tend to be very time-consuming and the effort to accomplish them
is often underestimated. In general, there is often feedback among the various steps
because new requirements frequently arise at every stage. Figure 1 shows the feed-
back loop affecting the conceptual and logical design phases as a result of system
implementation and tuning.
2 The Database Design
and Implementation Process
Now, we focus on activities 2 and 3 of the database application system life cycle,
which are database design and implementation. The problem of database design
can be stated as follows:
Design the logical and physical structure of one or more databases to accommodate the
information needs of the users in an organization for a defined set of applications.
Phase 1: Requirements
collection
and analysis
Phase 2: Conceptual
database
design
Phase 3: Choice
of DBMS
Phase 4: Data model
mapping
(logical design)
Phase 5: Physical
design
Phase 6: System
implementation
and tuning
Data content, structure,
and constraints
Data
requirements
Conceptual
Schema design
(DBMS-independent)
Logical Schema
and view design
(DBMS-dependent)
Internal
Schema design
(DBMS-dependent)
DDL statements
SDL statements
Database
applications
Processing
requirements
Transaction and
application design
(DBMS-independent)
Transaction
and application
implementation
Frequencies,
performance
constraints
Figure 1
Phases of database design and
implementation for large databases.
315
Practical Database Design Methodology and Use of UML Diagrams
The goals of database design are multiple:
■ Satisfy the information content requirements of the specified users and
applications.
■ Provide a natural and easy-to-understand structuring of the information.
■ Support processing requirements and any performance objectives, such as
response time, processing time, and storage space.
These goals are very hard to accomplish and measure and they involve an inherent
tradeoff: if one attempts to achieve more naturalness and understandability of the
model, it may be at the cost of performance. The problem is aggravated because the
database design process often begins with informal and incomplete requirements.
In contrast, the result of the design activity is a rigidly defined database schema that
cannot easily be modified once the database is implemented. We can identify six
main phases of the overall database design and implementation process:
1. Requirements collection and analysis
2. Conceptual database design
3. Choice of a DBMS
4. Data model mapping (also called logical database design)
5. Physical database design
6. Database system implementation and tuning
The design process consists of two parallel activities, as illustrated in Figure 1. The
first activity involves the design of the data content, structure, and constraints of
the database; the second relates to the design of database applications. To keep the
figure simple, we have avoided showing most of the interactions between these
sides, but the two activities are closely intertwined. For example, by analyzing data-
base applications, we can identify data items that will be stored in the database. In
addition, the physical database design phase, during which we choose the storage
structures and access paths of database files, depends on the applications that will
use these files for querying and updating. On the other hand, we usually specify the
design of database applications by referring to the database schema constructs,
which are specified during the first activity. Clearly, these two activities strongly
influence one another. Traditionally, database design methodologies have primarily
focused on the first of these activities whereas software design has focused on the
second; this may be called data-driven versus process-driven design. It now is rec-
ognized by database designers and software engineers that the two activities should
proceed hand-in-hand, and design tools are increasingly combining them.
The six phases mentioned previously do not typically progress strictly in sequence.
In many cases we may have to modify the design from an earlier phase during a later
phase. These feedback loops among phases—and also within phases—are com-
mon. We show only a couple of feedback loops in Figure 1, but many more exist
between various phases. We have also shown some interaction between the data and
the process sides of the figure; many more interactions exist in reality. Phase 1 in
Figure 1 involves collecting information about the intended use of the database, and
316
Practical Database Design Methodology and Use of UML Diagrams
Phase 6 concerns database implementation and redesign. The heart of the database
design process comprises Phases 2, 4, and 5; we briefly summarize these phases:
■ Conceptual database design (Phase 2). The goal of this phase is to produce
a conceptual schema for the database that is independent of a specific
DBMS. We often use a high-level data model such as the ER or EER model
(Entity-Relationship or Enhanced Entity-Relationship) during this phase.
Additionally, we specify as many of the known database applications or
transactions as possible, using a notation that is independent of any specific
DBMS. Often, the DBMS choice is already made for the organization; the
intent of conceptual design is still to keep it as free as possible from imple-
mentation considerations.
■ Data model mapping (Phase 4). During this phase, which is also called
logical database design, we map (or transform) the conceptual schema
from the high-level data model used in Phase 2 into the data model of the
chosen DBMS. We can start this phase after choosing a specific type of
DBMS—for example, if we decide to use some relational DBMS but have not
yet decided on which particular one. We call the latter system-independent
(but data model-dependent) logical design. In terms of three-level DBMS
architecture, the result of this phase is a conceptual schema in the chosen data
model. In addition, the design of external schemas (views) for specific appli-
cations is often done during this phase.
■ Physical database design (Phase 5). During this phase, we design the spec-
ifications for the stored database in terms of physical file storage structures,
record placement, and indexes. This corresponds to designing the internal
schema in the terminology of the three-level DBMS architecture.
■ Database system implementation and tuning (Phase 6). During this
phase, the database and application programs are implemented, tested, and
eventually deployed for service. Various transactions and applications are
tested individually and then in conjunction with each other. This typically
reveals opportunities for physical design changes, data indexing, reorganiza-
tion, and different placement of data—an activity referred to as database
tuning. Tuning is an ongoing activity—a part of system maintenance that
continues for the life cycle of a database as long as the database and applica-
tions keep evolving and performance problems are detected.
We discuss each of the six phases of database design in more detail in the following
subsections.
2.1 Phase 1: Requirements Collection and Analysis1
Before we can effectively design a database, we must know and analyze the expecta-
tions of the users and the intended uses of the database in as much detail as possi-
ble. This process is called requirements collection and analysis. To specify the
requirements, we first identify the other parts of the information system that will
1A part of this section has been contributed by Colin Potts.
317
Practical Database Design Methodology and Use of UML Diagrams
interact with the database system. These include new and existing users and applica-
tions, whose requirements are then collected and analyzed. Typically, the following
activities are part of this phase:
1. The major application areas and user groups that will use the database or
whose work will be affected by it are identified. Key individuals and commit-
tees within each group are chosen to carry out subsequent steps of require-
ments collection and specification.
2. Existing documentation concerning the applications is studied and ana-
lyzed. Other documentation—policy manuals, forms, reports, and organiza-
tion charts—is reviewed to determine whether it has any influence on the
requirements collection and specification process.
3. The current operating environment and planned use of the information is
studied. This includes analysis of the types of transactions and their frequen-
cies as well as of the flow of information within the system. Geographic
characteristics regarding users, origin of transactions, destination of reports,
and so on are studied. The input and output data for the transactions are
specified.
4. Written responses to sets of questions are sometimes collected from the
potential database users or user groups. These questions involve the users’
priorities and the importance they place on various applications. Key indi-
viduals may be interviewed to help in assessing the worth of information
and in setting up priorities.
Requirement analysis is carried out for the final users, or customers, of the database
system by a team of system analysts or requirement experts. The initial require-
ments are likely to be informal, incomplete, inconsistent, and partially incorrect.
Therefore, much work needs to be done to transform these early requirements into
a specification of the application that can be used by developers and testers as the
starting point for writing the implementation and test cases. Because the require-
ments reflect the initial understanding of a system that does not yet exist, they will
inevitably change. Therefore, it is important to use techniques that help customers
converge quickly on the implementation requirements.
There is evidence that customer participation in the development process increases
customer satisfaction with the delivered system. For this reason, many practitioners
use meetings and workshops involving all stakeholders. One such methodology of
refining initial system requirements is called Joint Application Design (JAD). More
recently, techniques have been developed, such as Contextual Design, which involve
the designers becoming immersed in the workplace in which the application is to be
used. To help customer representatives better understand the proposed system, it is
common to walk through workflow or transaction scenarios or to create a mock-up
rapid prototype of the application.
The preceding modes help structure and refine requirements but leave them still in
an informal state. To transform requirements into a better-structured representa-
tion, requirements specification techniques are used. These include object-
318
Practical Database Design Methodology and Use of UML Diagrams
oriented analysis (OOA), data flow diagrams (DFDs), and the refinement of appli-
cation goals. These methods use diagramming techniques for organizing and pre-
senting information-processing requirements. Additional documentation in the
form of text, tables, charts, and decision requirements usually accompanies the dia-
grams. There are techniques that produce a formal specification that can be checked
mathematically for consistency and what-if symbolic analyses. These methods may
become standard in the future for those parts of information systems that serve
mission-critical functions and which therefore must work as planned. The model-
based formal specification methods, of which the Z-notation and methodology is a
prominent example, can be thought of as extensions of the ER model and are there-
fore the most applicable to information system design.
Some computer-aided techniques—called Upper CASE tools—have been proposed
to help check the consistency and completeness of specifications, which are usually
stored in a single repository and can be displayed and updated as the design pro-
gresses. Other tools are used to trace the links between requirements and other
design entities, such as code modules and test cases. Such traceability databases are
especially important in conjunction with enforced change-management procedures
for systems where the requirements change frequently. They are also used in con-
tractual projects where the development organization must provide documentary
evidence to the customer that all the requirements have been implemented.
The requirements collection and analysis phase can be quite time-consuming, but it
is crucial to the success of the information system. Correcting a requirements error
is more expensive than correcting an error made during implementation because
the effects of a requirements error are usually pervasive, and much more down-
stream work has to be reimplemented as a result. Not correcting a significant error
means that the system will not satisfy the customer and may not even be used at all.
Requirements gathering and analysis is the subject of entire books.
2.2 Phase 2: Conceptual Database Design
The second phase of database design involves two parallel activities.2 The first activ-
ity, conceptual schema design, examines the data requirements resulting from
Phase 1 and produces a conceptual database schema. The second activity,
transaction and application design, examines the database applications analyzed
in Phase 1 and produces high-level specifications for these applications.
Phase 2a: Conceptual Schema Design. The conceptual schema produced by
this phase is usually contained in a DBMS-independent high-level data model for
the following reasons:
1. The goal of conceptual schema design is a complete understanding of the
database structure, meaning (semantics), interrelationships, and constraints.
2This phase of design is discussed in great detail in the first seven chapters of Batini et al. (1992); we
summarize that discussion here.
319
Practical Database Design Methodology and Use of UML Diagrams
This is best achieved independently of a specific DBMS because each DBMS
typically has idiosyncrasies and restrictions that should not be allowed to
influence the conceptual schema design.
2. The conceptual schema is invaluable as a stable description of the database
contents. The choice of DBMS and later design decisions may change with-
out changing the DBMS-independent conceptual schema.
3. A good understanding of the conceptual schema is crucial for database users
and application designers. Use of a high-level data model that is more
expressive and general than the data models of individual DBMSs is there-
fore quite important.
4. The diagrammatic description of the conceptual schema can serve as a vehi-
cle of communication among database users, designers, and analysts.
Because high-level data models usually rely on concepts that are easier to
understand than lower-level DBMS-specific data models, or syntactic defini-
tions of data, any communication concerning the schema design becomes
more exact and more straightforward.
In this phase of database design, it is important to use a conceptual high-level data
model with the following characteristics:
1. Expressiveness. The data model should be expressive enough to distin-
guish different types of data, relationships, and constraints.
2. Simplicity and understandability. The model should be simple enough for
typical nonspecialist users to understand and use its concepts.
3. Minimality. The model should have a small number of basic concepts that
are distinct and nonoverlapping in meaning.
4. Diagrammatic representation. The model should have a diagrammatic
notation for displaying a conceptual schema that is easy to interpret.
5. Formality. A conceptual schema expressed in the data model must repre-
sent a formal unambiguous specification of the data. Hence, the model con-
cepts must be defined accurately and unambiguously.
Some of these requirements—the first one in particular—sometimes conflict with
the other requirements. Many high-level conceptual models have been proposed for
database design. In the following discussion, we will use the terminology of the
Enhanced Entity-Relationship (EER) model and we will assume that it is being used
in this phase. Conceptual schema design, including data modeling, is becoming an
integral part of object-oriented analysis and design methodologies. The UML has
class diagrams that are largely based on extensions of the EER model.
Approaches to Conceptual Schema Design. For conceptual schema design, we must
identify the basic components (or constructs) of the schema: the entity types, rela-
tionship types, and attributes. We should also specify key attributes, cardinality and
participation constraints on relationships, weak entity types, and specialization/ gen-
eralization hierarchies/lattices. There are two approaches to designing the
conceptual schema, which is derived from the requirements collected during Phase 1.
320
Practical Database Design Methodology and Use of UML Diagrams
The first approach is the centralized (or one shot) schema design approach, in
which the requirements of the different applications and user groups from Phase 1
are merged into a single set of requirements before schema design begins. A single
schema corresponding to the merged set of requirements is then designed. When
many users and applications exist, merging all the requirements can be an arduous
and time-consuming task. The assumption is that a centralized authority, the DBA,
is responsible for deciding how to merge the requirements and for designing the
conceptual schema for the whole database. Once the conceptual schema is designed
and finalized, external schemas for the various user groups and applications can be
specified by the DBA.
The second approach is the view integration approach, in which the requirements
are not merged. Rather a schema (or view) is designed for each user group or appli-
cation based only on its own requirements. Thus we develop one high-level schema
(view) for each such user group or application. During a subsequent view integra-
tion phase, these schemas are merged or integrated into a global conceptual
schema for the entire database. The individual views can be reconstructed as exter-
nal schemas after view integration.
The main difference between the two approaches lies in the manner and stage in
which multiple views or requirements of the many users and applications are recon-
ciled and merged. In the centralized approach, the reconciliation is done manually by
the DBA staff prior to designing any schemas and is applied directly to the require-
ments collected in Phase 1. This places the burden to reconcile the differences and
conflicts among user groups on the DBA staff. The problem has been typically dealt
with by using external consultants/design experts, who apply their specific methods
for resolving these conflicts. Because of the difficulties of managing this task, the
view integration approach has been proposed as an alternative technique.
In the view integration approach, each user group or application actually designs its
own conceptual (EER) schema from its requirements, with assistance from the DBA
staff. Then an integration process is applied to these schemas (views) by the DBA to
form the global integrated schema. Although view integration can be done manu-
ally, its application to a large database involving dozens of user groups requires a
methodology and the use of automated tools. The correspondences among the
attributes, entity types, and relationship types in various views must be specified
before the integration can be applied. Additionally, problems such as integrating
conflicting views and verifying the consistency of the specified interschema corre-
spondences must be dealt with.
Strategies for Schema Design. Given a set of requirements, whether for a single user
or for a large user community, we must create a conceptual schema that satisfies
these requirements. There are various strategies for designing such a schema. Most
strategies follow an incremental approach—that is, they start with some important
schema constructs derived from the requirements and then they incrementally mod-
ify, refine, and build on them. We now discuss some of these strategies:
1. Top-down strategy. We start with a schema containing high-level abstrac-
tions and then apply successive top-down refinements. For example, we may
321
Practical Database Design Methodology and Use of UML Diagrams
specify only a few high-level entity types and then, as we specify their attrib-
utes, split them into lower-level entity types and specify the relationships.
The process of specialization to refine an entity type into subclasses is
another activity during a top-down design strategy.
2. Bottom-up strategy. Start with a schema containing basic abstractions and
then combine or add to these abstractions. For example, we may start with
the database attributes and group these into entity types and relationships.
We may add new relationships among entity types as the design progresses.
The process of generalizing entity types into higher-level generalized super-
classes is another activity during a bottom-up design strategy.
3. Inside-out strategy. This is a special case of a top-down strategy, where
attention is focused on a central set of concepts that are most evident.
Modeling then spreads outward by considering new concepts in the vicinity
of existing ones. We could specify a few clearly evident entity types in the
schema and continue by adding other entity types and relationships that are
related to each.
4. Mixed strategy. Instead of following any particular strategy throughout the
design, the requirements are partitioned according to a top-down strategy,
and part of the schema is designed for each partition according to a bottom-
up strategy. The various schema parts are then combined.
Figures 2 and 3 illustrate some simple examples of top-down and bottom-up refine-
ment, respectively. An example of a top-down refinement primitive is decomposi-
tion of an entity type into several entity types. Figure 2(a) shows a COURSE being
refined into COURSE and SEMINAR, and the TEACHES relationship is correspond-
ingly split into TEACHES and OFFERS. Figure 2(b) shows a COURSE_OFFERING
entity type being refined into two entity types (COURSE and INSTRUCTOR) and a
relationship between them. Refinement typically forces a designer to ask more ques-
tions and extract more constraints and details: for example, the (min, max) cardi-
nality ratios between COURSE and INSTRUCTOR are obtained during refinement.
Figure 3(a) shows the bottom-up refinement primitive of generating relationships
among the entity types FACULTY and STUDENT. Two relationships are identified:
ADVISES and COMMITTEE_CHAIR_OF. The bottom-up refinement using catego-
rization (union type) is illustrated in Figure 3(b), where the new concept of
VEHICLE_OWNER is discovered from the existing entity types FACULTY, STAFF, and
STUDENT.
Schema (View) Integration. For large databases with many expected users and appli-
cations, the view integration approach of designing individual schemas and then
merging them can be used. Because the individual views can be kept relatively small,
design of the schemas is simplified. However, a methodology for integrating the
views into a global database schema is needed. Schema integration can be divided
into the following subtasks:
322
Practical Database Design Methodology and Use of UML Diagrams
FACULTY
(a)
(b)
COURSETEACHES
(1,N)
(1,N)
(1,5)
(1,1)
(1,1)
(1,3)
(1,N)
(1,3)
COURSE
FACULTY
TEACHES
OFFERS
OFFERED_BY
SEMINAR
Name
INSTRUCTOR
InstructorSemesterSec#Course#
COURSE_OFFERING
SemesterSec#Course#
COURSE
Figure 2
Examples of top-
down refinement. (a)
Generating a new
entity type. (b)
Decomposing an
entity type into two
entity types and a
relationship type.
1. Identifying correspondences and conflicts among the schemas. Because
the schemas are designed individually, it is necessary to specify constructs in
the schemas that represent the same real-world concept. These correspon-
dences must be identified before integration can proceed. During this
process, several types of conflicts among the schemas may be discovered:
a. Naming conflicts. These are of two types: synonyms and homonyms. A
synonym occurs when two schemas use different names to describe the
same concept; for example, an entity type CUSTOMER in one schema may
describe the same concept as an entity type CLIENT in another schema. A
homonym occurs when two schemas use the same name to describe dif-
ferent concepts; for example, an entity type PART may represent computer
parts in one schema and furniture parts in another schema.
b. Type conflicts. The same concept may be represented in two schemas
by different modeling constructs. For example, the concept of a
323
Practical Database Design Methodology and Use of UML Diagrams
FACULTY(a)
(b)
FACULTY
STUDENTSTUDENT
FACULTY STAFF STUDENT
ADVISES
VEHICLE_OWNER
STAFF STUDENTFACULTY
IS_A_
FACULTY
IS_A_
STAFF
IS_A_
STUDENT
PARKING_DECAL
COMMITTEE_
CHAIR_OF
PARKING_DECAL
Figure 3
Examples of bottom-up refinement. (a) Discovering
and adding new relationships. (b) Discovering a
new category (union type) and relating it.
DEPARTMENT may be an entity type in one schema and an attribute in
another.
c. Domain (value set) conflicts. An attribute may have different domains
in two schemas. For example, Ssn may be declared as an integer in one
schema and as a character string in the other. A conflict of the unit of
measure could occur if one schema represented Weight in pounds and the
other used kilograms.
d. Conflicts among constraints. Two schemas may impose different con-
straints; for example, the key of an entity type may be different in each
schema. Another example involves different structural constraints on
a relationship such as TEACHES; one schema may represent it as 1:N (a
course has one instructor), while the other schema represents it as M:N (a
course may have more than one instructor).
324
Practical Database Design Methodology and Use of UML Diagrams
2. Modifying views to conform to one another. Some schemas are modified
so that they conform to other schemas more closely. Some of the conflicts
identified in the first subtask are resolved during this step.
3. Merging of views. The global schema is created by merging the individual
schemas. Corresponding concepts are represented only once in the global
schema, and mappings between the views and the global schema are speci-
fied. This is the most difficult step to achieve in real-life databases involving
dozens or hundreds of entities and relationships. It involves a considerable
amount of human intervention and negotiation to resolve conflicts and to
settle on the most reasonable and acceptable solutions for a global schema.
4. Restructuring. As a final optional step, the global schema may be analyzed
and restructured to remove any redundancies or unnecessary complexity.
Some of these ideas are illustrated by the rather simple example presented in Figures
4 and 5. In Figure 4, two views are merged to create a bibliographic database. During
identification of correspondences between the two views, we discover that
RESEARCHER and AUTHOR are synonyms (as far as this database is concerned), as
are CONTRIBUTED_BY and WRITTEN_BY. Further, we decide to modify VIEW 1 to
include a SUBJECT for ARTICLE, as shown in Figure 4, to conform to VIEW 2. Figure
5 shows the result of merging MODIFIED VIEW 1 with VIEW 2. We generalize the
entity types ARTICLE and BOOK into the entity type PUBLICATION, with their com-
mon attribute Title. The relationships CONTRIBUTED_BY and WRITTEN_BY are
merged, as are the entity types RESEARCHER and AUTHOR. The attribute Publisher
applies only to the entity type BOOK, whereas the attribute Size and the relationship
type PUBLISHED_IN apply only to ARTICLE.
This simple example illustrates the complexity of the merging process and how the
meaning of the various concepts must be accounted for in simplifying the resultant
schema design. For real-life designs, the process of schema integration requires a
more disciplined and systematic approach. Several strategies have been proposed
for the view integration process (see Figure 6):
1. Binary ladder integration. Two schemas that are quite similar are integrated
first. The resulting schema is then integrated with another schema, and the
process is repeated until all schemas are integrated. The ordering of schemas
for integration can be based on some measure of schema similarity. This strat-
egy is suitable for manual integration because of its step-by-step approach.
2. N-ary integration. All the views are integrated in one procedure after an
analysis and specification of their correspondences. This strategy requires
computerized tools for large design problems. Such tools have been built as
research prototypes but are not yet commercially available.
3. Binary balanced strategy. Pairs of schemas are integrated first, then the
resulting schemas are paired for further integration; this procedure is
repeated until a final global schema results.
4. Mixed strategy. Initially, the schemas are partitioned into groups based on
their similarity, and each group is integrated separately. The intermediate
schemas are grouped again and integrated, and so on.
325
Practical Database Design Methodology and Use of UML Diagrams
Classification_idName
NumberSizeTitle
Jid
NumberVolumeJname
SizeTitle
PublisherTitle
ARTICLE
CONTRIBUTED
_BY
BOOK
RESEARCHER
BELONGS_TO
WRITTEN_BY
AUTHOR
Jid
JOURNAL
JOURNAL
ARTICLE
PUBLISHED_IN
PUBLISHED_IN
View 1
View 2
Modified View 1
BELONGS_TO
WRITTEN_BY
AUTHOR
SUBJECT
SUBJECT
Classification_idName
Jname Volume
Figure 4
Modifying views to conform before integration.
326
Practical Database Design Methodology and Use of UML Diagrams
NumberJnameVolume
Jid
AUTHOR
PUBLICATION
WRITTEN_BY
BELONGS_TO
PUBLISHED_IN
SUBJECT
Name
Title
Size
d
Publisher
Classification_id
JOURNALBOOK ARTICLE
IS_A_BOOK IS_AN_ARTICLE
Figure 5
Integrated schema
after merging views
1 and 2.
Integrated schema Integrated schema
Integrated schema Integrated schema
Binary ladder integration N-ary integration
Binary balanced integration Mixed integration
V6
V5V4
V3V2V1
Intermediate
lntegrated
schemas
V5V4V3
V2V1
V5
V4
V3
V2V1
V5V4V3V2V1
Figure 6
Different strategies for
the view integration
process.
327
Practical Database Design Methodology and Use of UML Diagrams
Phase 2b: Transaction Design. The purpose of Phase 2b, which proceeds in
parallel with Phase 2a, is to design the characteristics of known database transac-
tions (applications) in a DBMS-independent way. When a database system is being
designed, the designers are aware of many known applications (or transactions)
that will run on the database once it is implemented. An important part of database
design is to specify the functional characteristics of these transactions early on in
the design process. This ensures that the database schema will include all the infor-
mation required by these transactions. In addition, knowing the relative importance
of the various transactions and the expected rates of their invocation plays a crucial
part during the physical database design (Phase 5). Usually, not all of the database
transactions are known at design time; after the database system is implemented,
new transactions are continuously identified and implemented. However, the most
important transactions are often known in advance of system implementation and
should be specified at an early stage. The informal 80–20 rule typically applies in this
context: 80 percent of the workload is represented by 20 percent of the most fre-
quently used transactions, which govern the physical database design. In applica-
tions that are of the ad hoc querying or batch processing variety, queries and
applications that process a substantial amount of data must be identified.
A common technique for specifying transactions at a conceptual level is to identify
their input/output and functional behavior. By specifying the input and output
parameters (arguments) and the internal functional flow of control, designers can
specify a transaction in a conceptual and system-independent way. Transactions
usually can be grouped into three categories: (1) retrieval transactions, which are
used to retrieve data for display on a screen or for printing of a report; (2) update
transactions, which are used to enter new data or to modify existing data in the
database; and (3) mixed transactions, which are used for more complex applica-
tions that do some retrieval and some update. For example, consider an airline
reservations database. A retrieval transaction could first list all morning flights on
a given date between two cities. An update transaction could be to book a seat on a
particular flight. A mixed transaction may first display some data, such as showing a
customer reservation on some flight, and then update the database, such as cancel-
ing the reservation by deleting it, or by adding a flight segment to an existing reser-
vation. Transactions (applications) may originate in a front-end tool such as
PowerBuilder (Sybase), which collect parameters online and then send a transaction
to the DBMS as a backend.3
Several techniques for requirements specification include notation for specifying
processes, which in this context are more complex operations that can consist of
several transactions. Process modeling tools like BPwin as well as workflow model-
ing tools are becoming popular to identify information flows in organizations. The
UML language, which provides for data modeling via class and object diagrams, has
a variety of process modeling diagrams including state transition diagrams, activity
diagrams, sequence diagrams, and collaboration diagrams. All of these refer to
3This philosophy has been followed for over 20 years in popular products like CICS, which serves as a
tool to generate transactions for legacy DBMSs like IMS.
328
Practical Database Design Methodology and Use of UML Diagrams
activities, events, and operations within the information system, the inputs and out-
puts of the processes, the sequencing or synchronization requirements, and other
conditions. It is possible to refine these specifications and extract individual trans-
actions from them. Other proposals for specifying transactions include TAXIS,
GALILEO, and GORDAS (see this chapter’s Selected Bibliography). Some of these
have been implemented into prototype systems and tools. Process modeling still
remains an active area of research.
Transaction design is just as important as schema design, but it is often considered
to be part of software engineering rather than database design. Many current design
methodologies emphasize one over the other. One should go through Phases 2a and
2b in parallel, using feedback loops for refinement, until a stable design of schema
and transactions is reached.4
2.3 Phase 3: Choice of a DBMS
The choice of a DBMS is governed by a number of factors—some technical, others
economic, and still others concerned with the politics of the organization. The tech-
nical factors focus on the suitability of the DBMS for the task at hand. Issues to con-
sider are the type of DBMS (relational, object-relational, object, other), the storage
structures and access paths that the DBMS supports, the user and programmer
interfaces available, the types of high-level query languages, the availability of devel-
opment tools, the ability to interface with other DBMSs via standard interfaces, the
architectural options related to client-server operation, and so on. Nontechnical
factors include the financial status and the support organization of the vendor. In
this section we concentrate on discussing the economic and organizational factors
that affect the choice of DBMS. The following costs must be considered:
1. Software acquisition cost. This is the up-front cost of buying the software,
including programming language options, different interface options
(forms, menu, and Web-based graphic user interface (GUI) tools), recov-
ery/backup options, special access methods, and documentation. The cor-
rect DBMS version for a specific operating system must be selected.
Typically, the development tools, design tools, and additional language sup-
port are not included in basic pricing.
2. Maintenance cost. This is the recurring cost of receiving standard mainte-
nance service from the vendor and for keeping the DBMS version up-to-
date.
3. Hardware acquisition cost. New hardware may be needed, such as addi-
tional memory, terminals, disk drives and controllers, or specialized DBMS
storage and archival storage.
4. Database creation and conversion cost. This is the cost of either creating
the database system from scratch or converting an existing system to the new
4High-level transaction modeling is covered in Batini et al. (1992, Chapters 8, 9, and 11). The joint func-
tional and data analysis philosophy is advocated throughout that book.
329
Practical Database Design Methodology and Use of UML Diagrams
DBMS software. In the latter case it is customary to operate the existing sys-
tem in parallel with the new system until all the new applications are fully
implemented and tested. This cost is hard to project and is often underesti-
mated.
5. Personnel cost. Acquisition of DBMS software for the first time by an
organization is often accompanied by a reorganization of the data processing
department. Positions of DBA and staff exist in most companies that have
adopted DBMSs.
6. Training cost. Because DBMSs are often complex systems, personnel must
often be trained to use and program the DBMS. Training is required at all
levels, including programming and application development, physical
design, and database administration.
7. Operating cost. The cost of continued operation of the database system is
typically not worked into an evaluation of alternatives because it is incurred
regardless of the DBMS selected.
The benefits of acquiring a DBMS are not so easy to measure and quantify. A DBMS
has several intangible advantages over traditional file systems, such as ease of use,
consolidation of company-wide information, wider availability of data, and faster
access to information. With Web-based access, certain parts of the data can be made
globally accessible to employees as well as external users. More tangible benefits
include reduced application development cost, reduced redundancy of data, and
better control and security. Although databases have been firmly entrenched in
most organizations, the decision of whether to move an application from a file-
based to a database-centered approach still comes up. This move is generally driven
by the following factors:
1. Data complexity. As data relationships become more complex, the need for
a DBMS is greater.
2. Sharing among applications. The need for a DBMS is greater when appli-
cations share common data stored redundantly in multiple files.
3. Dynamically evolving or growing data. If the data changes constantly, it is
easier to cope with these changes using a DBMS than using a file system.
4. Frequency of ad hoc requests for data. File systems are not at all suitable
for ad hoc retrieval of data.
5. Data volume and need for control. The sheer volume of data and the need
to control it sometimes demands a DBMS.
It is difficult to develop a generic set of guidelines for adopting a single approach to
data management within an organization—whether relational, object-oriented, or
object-relational. If the data to be stored in the database has a high level of complex-
ity and deals with multiple data types, the typical approach may be to consider an
object or object-relational DBMS. Also, the benefits of inheritance among classes
330
Practical Database Design Methodology and Use of UML Diagrams
and the corresponding advantage of reuse favor these approaches. Finally, several
economic and organizational factors affect the choice of one DBMS over another:
1. Organization-wide adoption of a certain philosophy. This is often a dom-
inant factor affecting the acceptability of a certain data model (for example,
relational versus object), a certain vendor, or a certain development method-
ology and tools (for example, use of an object-oriented analysis and design
tool and methodology may be required of all new applications).
2. Familiarity of personnel with the system. If the programming staff within
the organization is familiar with a particular DBMS, it may be favored to
reduce training cost and learning time.
3. Availability of vendor services. The availability of vendor assistance in
solving problems with the system is important, since moving from a non-
DBMS to a DBMS environment is generally a major undertaking and
requires much vendor assistance at the start.
Another factor to consider is the DBMS portability among different types of hard-
ware. Many commercial DBMSs now have versions that run on many
hardware/software configurations (or platforms). The need of applications for
backup, recovery, performance, integrity, and security must also be considered.
Many DBMSs are currently being designed as total solutions to the information-
processing and information resource management needs within organizations.
Most DBMS vendors are combining their products with the following options or
built-in features:
■ Text editors and browsers
■ Report generators and listing utilities
■ Communication software (often called teleprocessing monitors)
■ Data entry and display features such as forms, screens, and menus with auto-
matic editing features
■ Inquiry and access tools that can be used on the World Wide Web (Web-
enabling tools)
■ Graphical database design tools
A large amount of third-party software is available that provides added functional-
ity to a DBMS in each of the above areas. In rare cases it may be preferable to
develop in-house software rather than use a DBMS—for example, if the applica-
tions are very well defined and are all known beforehand. Under such circum-
stances, an in-house custom-designed system may be appropriate to implement the
known applications in the most efficient way. In most cases, however, new applica-
tions that were not foreseen at design time come up after system implementation.
This is precisely why DBMSs have become very popular: They facilitate the incorpo-
ration of new applications with only incremental modifications to the existing
design of a database. Such design evolution—or schema evolution—is a feature
present to various degrees in commercial DBMSs.
331
Practical Database Design Methodology and Use of UML Diagrams
2.4 Phase 4: Data Model Mapping
(Logical Database Design)
The next phase of database design is to create a conceptual schema and external
schemas in the data model of the selected DBMS by mapping those schemas pro-
duced in Phase 2a. The mapping can proceed in two stages:
1. System-independent mapping. In this stage, the mapping does not consider
any specific characteristics or special cases that apply to the particular DBMS
implementation of the data model.
2. Tailoring the schemas to a specific DBMS. Different DBMSs implement a
data model by using specific modeling features and constraints. We may
have to adjust the schemas obtained in step 1 to conform to the specific
implementation features of a data model as used in the selected DBMS.
The result of this phase should be DDL (data definition language) statements in the
language of the chosen DBMS that specify the conceptual and external level
schemas of the database system. But if the DDL statements include some physical
design parameters, a complete DDL specification must wait until after the physical
database design phase is completed. Many automated CASE (computer-aided soft-
ware engineering) design tools (see Section 5) can generate DDL for commercial
systems from a conceptual schema design.
2.5 Phase 5: Physical Database Design
Physical database design is the process of choosing specific file storage structures
and access paths for the database files to achieve good performance for the various
database applications. Each DBMS offers a variety of options for file organizations
and access paths. These usually include various types of indexing, clustering of
related records on disk blocks, linking related records via pointers, and various
types of hashing techniques. Once a specific DBMS is chosen, the physical database
design process is restricted to choosing the most appropriate structures for the data-
base files from among the options offered by that DBMS. In this section we give
generic guidelines for physical design decisions; they hold for any type of DBMS.
The following criteria are often used to guide the choice of physical database design
options:
1. Response time. This is the elapsed time between submitting a database
transaction for execution and receiving a response. A major influence on
response time that is under the control of the DBMS is the database access
time for data items referenced by the transaction. Response time is also
influenced by factors not under DBMS control, such as system load, operat-
ing system scheduling, or communication delays.
2. Space utilization. This is the amount of storage space used by the database
files and their access path structures on disk, including indexes and other
access paths.
332
Practical Database Design Methodology and Use of UML Diagrams
3. Transaction throughput. This is the average number of transactions that
can be processed per minute; it is a critical parameter of transaction systems
such as those used for airline reservations or banking. Transaction through-
put must be measured under peak conditions on the system.
Typically, average and worst-case limits on the preceding parameters are specified as
part of the system performance requirements. Analytical or experimental tech-
niques, which can include prototyping and simulation, are used to estimate the
average and worst-case values under different physical design decisions to deter-
mine whether they meet the specified performance requirements.
Performance depends on record size and number of records in the file. Hence, we
must estimate these parameters for each file. Additionally, we should estimate the
update and retrieval patterns for the file cumulatively from all the transactions.
Attributes used for searching for specific records should have primary access paths
and secondary indexes constructed for them. Estimates of file growth, either in the
record size because of new attributes or in the number of records, should also be
taken into account during physical database design.
The result of the physical database design phase is an initial determination of stor-
age structures and access paths for the database files. It is almost always necessary to
modify the design on the basis of its observed performance after the database sys-
tem is implemented. We include this activity of database tuning in the next phase.
2.6 Phase 6: Database System Implementation
and Tuning
After the logical and physical designs are completed, we can implement the database
system. This is typically the responsibility of the DBA and is carried out in conjunc-
tion with the database designers. Language statements in the DDL, including the
SDL (storage definition language) of the selected DBMS, are compiled and used to
create the database schemas and (empty) database files. The database can then be
loaded (populated) with the data. If data is to be converted from an earlier comput-
erized system, conversion routines may be needed to reformat the data for loading
into the new database.
Database programs are implemented by the application programmers, by referring
to the conceptual specifications of transactions, and then writing and testing pro-
gram code with embedded DML (data manipulation language) commands. Once
the transactions are ready and the data is loaded into the database, the design and
implementation phase is over and the operational phase of the database system
begins.
Most systems include a monitoring utility to collect performance statistics, which
are kept in the system catalog or data dictionary for later analysis. These include sta-
tistics on the number of invocations of predefined transactions or queries,
input/output activity against files, counts of file disk pages or index records, and
333
Practical Database Design Methodology and Use of UML Diagrams
frequency of index usage. As the database system requirements change, it often
becomes necessary to add or remove existing tables and to reorganize some files by
changing primary access methods or by dropping old indexes and constructing new
ones. Some queries or transactions may be rewritten for better performance.
Database tuning continues as long as the database is in existence, as long as per-
formance problems are discovered, and while the requirements keep changing.
3 Use of UML Diagrams as an Aid
to Database Design Specification5
3.1 UML as a Design Specification Standard
There is a need of some standard approach to cover the entire spectrum of require-
ments analysis, modeling, design, implementation, and deployment of databases
and their applications. One approach that is receiving wide attention and that is also
proposed as a standard by the Object Management Group (OMG) is the Unified
Modeling Language (UML) approach. It provides a mechanism in the form of dia-
grammatic notation and associated language syntax to cover the entire life cycle.
Presently, UML can be used by software developers, data modelers, database design-
ers, and so on to define the detailed specification of an application. They also use it
to specify the environment consisting of users, software, communications, and
hardware to implement and deploy the application.
UML combines commonly accepted concepts from many object-oriented (O-O)
methods and methodologies (see this chapter’s Selected Bibliography for the con-
tributing methodologies that led to UML). It is generic, and is language-independent
and platform-independent. Software architects can model any type of application,
running on any operating system, programming language, or network, in UML. That
has made the approach very widely applicable. Tools like Rational Rose are currently
popular for drawing UML diagrams—they enable software developers to develop
clear and easy-to-understand models for specifying, visualizing, constructing, and
documenting components of software systems. Since the scope of UML extends to
software and application development at large, we will not cover all aspects of UML
here. Our goal is to show some relevant UML notations that are commonly used in
the requirements collection and analysis phase of database design, as well as the con-
ceptual design phase (see Phases 1 and 2 in Figure 1). A detailed application develop-
ment methodology using UML is outside the scope of this book and may be found in
various textbooks devoted to object-oriented design, software engineering, and UML
(see the Selected Bibliography at the end of this chapter).
5The contribution of Abrar Ul-Haque to the UML and Rational Rose sections is much appreciated.
334
Practical Database Design Methodology and Use of UML Diagrams
UML has many types of diagrams. Class diagrams can represent the end result of
conceptual database design. To arrive at the class diagrams, the application require-
ments may be gathered and specified using use case diagrams, sequence diagrams,
and statechart diagrams. In the rest of this section we introduce the different types
of UML diagrams briefly to give the reader an idea of the scope of UML. Then we
describe a small sample application to illustrate the use of some of these diagrams
and show how they lead to the eventual class diagram as the final conceptual data-
base design. The diagrams presented in this section pertain to the standard UML
notation and have been drawn using Rational Rose. Section 4 is devoted to a general
discussion of the use of Rational Rose in database application design.
3.2 UML for Database Application Design
UML was developed as a software engineering methodology. Most software systems
have sizable database components. The database community has started embracing
UML, and now some database designers and developers are using UML for data
modeling as well as for subsequent phases of database design. The advantage of
UML is that even though its concepts are based on object-oriented techniques, the
resulting models of structure and behavior can be used to design relational, object-
oriented, or object-relational databases.
One of the major contributions of the UML approach has been to bring the tradi-
tional database modelers, analysts, and designers together with the software applica-
tion developers. In Figure 1 we showed the phases of database design and
implementation and how they apply to these two groups. UML also allows us to do
behavioral, functional, and dynamic modeling by introducing various types of dia-
grams. This results in a more complete specification/description of the overall data-
base application. In the following sections we summarize the different types of
UML diagrams and then give an example of the use case, sequence, and statechart
diagrams in a sample application.
3.3 Different Types of Diagrams in UML
UML defines nine types of diagrams divided into these two categories:
■ Structural Diagrams. These describe the structural or static relationships
among schema objects, data objects, and software components. They include
class diagrams, object diagrams, component diagrams, and deployment dia-
grams.
■ Behavioral Diagrams. Their purpose is to describe the behavioral or
dynamic relationships among components. They include use case diagrams,
sequence diagrams, collaboration diagrams, statechart diagrams, and activ-
ity diagrams.
335
Practical Database Design Methodology and Use of UML Diagrams
We introduce the nine types briefly below. The structural diagrams include:
A. Class Diagrams. Class diagrams capture the static structure of the system and
act as foundation for other models. They show classes, interfaces, collaborations,
dependencies, generalizations, associations, and other relationships. Class diagrams
are a very useful way to model the conceptual database schema.
Package Diagrams. Package diagrams are a subset of class diagrams. They organize
elements of the system into related groups called packages. A package may be a col-
lection of related classes and the relationships between them. Package diagrams help
minimize dependencies in a system.
B. Object Diagrams. Object diagrams show a set of individual objects and their
relationships, and are sometimes referred to as instance diagrams. They give a static
view of a system at a particular time and are normally used to test class diagrams for
accuracy.
C. Component Diagrams. Component diagrams illustrate the organizations and
dependencies among software components. A component diagram typically consists
of components, interfaces, and dependency relationships. A component may be a
source code component, a runtime component, or an executable component. It is a
physical building block in the system and is represented as a rectangle with two small
rectangles or tabs overlaid on its left side. An interface is a group of operations used
or created by a component and is usually represented by a small circle. Dependency
relationship is used to model the relationship between two components and is repre-
sented by a dotted arrow pointing from a component to the component it depends
on. For databases, component diagrams stand for stored data such as tablespaces or
partitions. Interfaces refer to applications that use the stored data.
D. Deployment Diagrams. Deployment diagrams represent the distribution of
components (executables, libraries, tables, files) across the hardware topology. They
depict the physical resources in a system, including nodes, components, and con-
nections, and are basically used to show the configuration of runtime processing
elements (the nodes) and the software processes that reside on them (the threads).
Next, we briefly describe the various types of behavioral diagrams and expand on
those that are of particular interest.
E. Use Case Diagrams. Use case diagrams are used to model the functional
interactions between users and the system. A scenario is a sequence of steps describ-
ing an interaction between a user and a system. A use case is a set of scenarios that
have a common goal. The use case diagram was introduced by Jacobson6 to visual-
ize use cases. A use case diagram shows actors interacting with use cases and can be
understood easily without the knowledge of any notation. An individual use case is
6See Jacobson et al. (1992).
336
Practical Database Design Methodology and Use of UML Diagrams
Base use case_1
Base use case_2
Use case
Base use case_3
Extended use case
Actor_1
Actor_2 Actor_4
Actor_3
<>
shown as an oval and stands for a specific task performed by the system. An actor,
shown with a stick person symbol, represents an external user, which may be a
human user, a representative group of users, a certain role of a person in the organ-
ization, or anything external to the system (see Figure 7). The use case diagram
shows possible interactions of the system (in our case, a database system) and
describes as use cases the specific tasks the system performs. Since they do not spec-
ify any implementation detail and are supposed to be easy to understand, they are
used as a vehicle for communicating between the end users and developers to help
in easier user validation at an early stage. Test plans can also be described using use
case diagrams. Figure 7 shows the use case diagram notation. The include relation-
ship is used to factor out some common behavior from two or more of the original
use cases—it is a form of reuse. For example, in a university environment shown in
Figure 8, the use cases Register for course and Enter grades in which the actors stu-
dent and professor are involved, include a common use case called Validate user. If a
use case incorporates two or more significantly different scenarios, based on cir-
cumstances or varying conditions, the extend relationship is used to show the sub-
cases attached to the base case.
Interaction Diagrams. The next two types of UML behavioral diagrams, interaction
diagrams, are used to model the dynamic aspects of a system. They consist of a set
of messages exchanged between a set of objects. There are two types of interaction
diagrams, sequence and collaboration.
F. Sequence Diagrams. Sequence diagrams describe the interactions between
various objects over time. They basically give a dynamic view of the system by
showing the flow of messages between objects. Within the sequence diagram, an
object or an actor is shown as a box at the top of a dashed vertical line, which is
called the object’s lifeline. For a database, this object is typically something physi-
cal—a book in a warehouse that would be represented in the database, an external
document or form such as an order form, or an external visual screen—that may be
part of a user interface. The lifeline represents the existence of an object over time.
Activation, which indicates when an object is performing an action, is represented
as a rectangular box on a lifeline. Each message is represented as an arrow between
the lifelines of two objects. A message bears a name and may have arguments and
control information to explain the nature of the interaction. The order of messages
is read from top to bottom. A sequence diagram also gives the option of self-call,
which is basically just a message from an object to itself. Condition and Iteration
markers can also be shown in sequence diagrams to specify when the message
should be sent and to specify the condition to send multiple markers. A return
dashed line shows a return from the message and is optional unless it carries a
special meaning. Object deletion is shown with a large X. Figure 9 explains some of
the notation used in sequence diagrams.
G. Collaboration Diagrams. Collaboration diagrams represent interactions
among objects as a series of sequenced messages. In collaboration diagrams the
emphasis is on the structural organization of the objects that send and receive mes-
sages, whereas in sequence diagrams the emphasis is on the time-ordering of the
messages. Collaboration diagrams show objects as icons and number the messages;
numbered messages represent an ordering. The spatial layout of collaboration dia-
grams allows linkages among objects that show their structural relationships. Use of
collaboration and sequence diagrams to represent interactions is a matter of choice
as they can be used for somewhat similar purposes; we will hereafter use only
sequence diagrams.
H. Statechart Diagrams. Statechart diagrams describe how an object’s state
changes in response to external events.
To describe the behavior of an object, it is common in most object-oriented tech-
niques to draw a statechart diagram to show all the possible states an object can get
into in its lifetime. The UML statecharts are based on David Harel’s7 statecharts.
They show a state machine consisting of states, transitions, events, and actions and
are very useful in the conceptual design of the application that works against a data-
base of stored objects.
The important elements of a statechart diagram shown in Figure 10 are as follows:
■ States. Shown as boxes with rounded corners, they represent situations in
the lifetime of an object.
■ Transitions. Shown as solid arrows between the states, they represent the
paths between different states of an object. They are labeled by the event-
name [guard] /action; the event triggers the transition and the action results
from it. The guard is an additional and optional condition that specifies a
condition under which the change of state may not occur.
■ Start/Initial State. Shown by a solid circle with an outgoing arrow to a state.
■ Stop/Final State. Shown as a double-lined filled circle with an arrow point-
ing into it from a state.
Statechart diagrams are useful in specifying how an object’s reaction to a message
depends on its state. An event is something done to an object such as receiving a
message; an action is something that an object does such as sending a message.
7See Harel (1987).
I. Activity Diagrams. Activity diagrams present a dynamic view of the system by
modeling the flow of control from activity to activity. They can be considered as
flowcharts with states. An activity is a state of doing something, which could be a
real-world process or an operation on some object or class in the database.
Typically, activity diagrams are used to model workflow and internal business oper-
ations for an application.
In this section we will briefly illustrate the use of some of the UML diagrams we
presented above to design a simple database in a university setting. A large number
of details are left out to conserve space; only a stepwise use of these diagrams that
leads toward a conceptual design and the design of program components is illus-
trated. As we indicated before, the eventual DBMS on which this database gets
implemented may be relational, object-oriented, or object-relational. That will not
change the stepwise analysis and modeling of the application using the UML
diagrams.
Imagine a scenario with students enrolling in courses that are offered by professors.
The registrar’s office is in charge of maintaining a schedule of courses in a course
catalog. They have the authority to add and delete courses and to do schedule
changes. They also set enrollment limits on courses. The financial aid office is in
charge of processing student aid applications for which the students have to apply.
Assume that we have to design a database that maintains the data about students,
professors, courses, financial aid, and so on. We also want to design some of the
applications that enable us to do course registration, financial aid application pro-
cessing, and maintaining of the university-wide course catalog by the registrar’s
office. The above requirements may be depicted by a series of UML diagrams.
As mentioned previously, one of the first steps involved in designing a database is to
gather customer requirements by using use case diagrams. Suppose one of the
requirements in the UNIVERSITY database is to allow the professors to enter grades
for the courses they are teaching and for the students to be able to register for
courses and apply for financial aid. The use case diagram corresponding to these use
cases can be drawn as shown in Figure 8.
Next, we can design a sequence diagram to visualize the execution of the use cases. For
the university database, the sequence diagram corresponds to the use case: student
requests to register and selects a particular course to register is shown in Figure 12.
Enroll student [count < 50]
Enroll student/set count = 0Course enrollment Enrolling
Entry/register student
Section closing
Canceled
Cancel
CancelCancel Count = 50
Exit/closesection
Do/enroll students
Figure 11
A sample statechart
diagram for the
UNIVERSITY
database.
341
Practical Database Design Methodology and Use of UML Diagrams
getSeatsLeft
getPreq = true &&
[getSeatsLeft =
True]/update
Schedule
:Registration
requestRegistration getCourseListing
selectCourse addCourse getPreReq
:Student
:Catalog :Course :Schedule
Figure 12
A sequence diagram for the UNIVERSITY database.
The catalog is first browsed to get course listings. Then, when the student selects a
course to register in, prerequisites and course capacity are checked, and the course is
then added to the student’s schedule if the prerequisites are met and there is space in
the course.
These UML diagrams are not the complete specification of the UNIVERSITY data-
base. There will be other use cases for the various applications of the actors, includ-
ing registrar, student, professor, and so on. A complete methodology for how to
arrive at the class diagrams from the various diagrams we illustrated in this section
is outside our scope here. Design methodologies remain a matter of judgment and
personal preferences. However, the designer should make sure that the class dia-
gram will account for all the specifications that have been given in the form of the
use cases, statechart, and sequence diagrams. The class diagram in Figure 13 shows
a possible class diagram for this application, with the structural relationships and
the operations within the classes. These classes will need to be implemented to
develop the UNIVERSITY database, and together with the operations they will
implement the complete class schedule/enrollment/aid application. Only some of
the attributes and methods (operations) are shown in Figure 13. It is likely that
these class diagrams will be modified as more details are specified and more func-
tions evolve in the UNIVERSITY application.
342
Practical Database Design Methodology and Use of UML Diagrams
REGISTRATION
. . .
findCourseAdd()
cancelCourse()
addCourse()
viewSchedule()
. . .()
CATALOG
. . .
getPreReq()
getSeatsLeft()
getCourseListing()
. . .()
FINANCIAL_AID
aidType
aidAmount
assignAid()
denyAid()
SCHEDULE
. . .
updateSchedule()
showSchedule()
. . .()
STUDENT
. . .
requestRegistration()
applyAid()
. . .()
PROFESSOR
. . .
enterGrades()
offerCourse()
. . .()
COURSE
time
classroom
seats
. . .
dropCourse()
addCourse()
. . .()
PERSON
Name
Ssn
. . .
viewSchedule()
. . .()
Figure 13
The design of the UNIVERSITY database as a class diagram.
4 Rational Rose: A UML-Based Design Tool
4.1 Rational Rose for Database Design
Rational Rose is one of the modeling tools used in the industry to develop informa-
tion systems. It was acquired by IBM in 2003. As we pointed out in the first two sec-
tions of this chapter, a database is a central component of most information
systems. Rational Rose provides the initial specification in UML that eventually
leads to the database development. Many extensions have been made in the latest
versions of Rose for data modeling, and now it provides support for conceptual,
logical, and physical database modeling and design.
343
Practical Database Design Methodology and Use of UML Diagrams
4.2 Rational Rose Data Modeler
Rational Rose Data Modeler is a visual modeling tool for designing databases.
Because it is UML-based, it provides a common tool and language to bridge the
communication gap between database designers and application developers. This
makes it possible for database designers, developers, and analysts to work together,
capture and share business requirements, and track them as they change through-
out the process. Also, by allowing the designers to model and design all specifica-
tions on the same platform using the same notation, it improves the design process
and reduces the risk of errors.
The process modeling capabilities in Rational Rose allow the modeling of the
behavior of database applications as we saw in the short example above, in the form
of use cases (Figure 8), sequence diagrams (Figure 12), and statechart diagrams
(Figure 11). There is the additional machinery of collaboration diagrams to show
interactions between objects and activity diagrams to model the flow of control,
which we did not show in our example. The eventual goal is to generate the database
specification and application code as much as possible. The Rose Data Modeler can
also capture triggers, stored procedures, and other modeling concepts explicitly in
the diagram rather than representing them with hidden tagged values behind the
scenes. The Rose Data Modeler also provides the capability to forward engineer a
database in terms of constantly changing requirements and reverse engineer an
existing implemented database into its conceptual design.
4.3 Data Modeling Using Rational Rose Data Modeler
There are many tools and options available in Rose Data Modeler for data modeling.
Reverse Engineering. Reverse engineering of a database allows the user to create
a conceptual data model based on an existing database schema specified in a DDL
file. We can use the reverse engineering wizard in Rational Rose Data Modeler for
this purpose. The reverse engineering wizard basically reads the schema in the data-
base or DDL file and recreates it as a data model. While doing so, it also includes the
names of all quoted identifier entities.
Forward Engineering and DDL Generation. We can also create a data model
directly from scratch in Rose. Having created the data model,8 we can also use it to
generate the DDL for a specific DBMS. There is a forward engineering wizard in the
Rose Data Modeler that reads the schema in the data model or reads both the
schema in the data model and the tablespaces in the data storage model and gener-
ates the appropriate DDL code in a DDL file. The wizard also provides the option of
generating a database by executing the generated DDL file.
8The term data model used by Rational Rose Data Modeler corresponds to our notion of an application
model or conceptual schema.
344
Practical Database Design Methodology and Use of UML Diagrams
Conceptual Design in UML Notation. Rational Rose allows modeling of data-
bases using UML notation. ER diagrams most often used in the conceptual design
of databases can be easily built using the UML notation as class diagrams in
Rational Rose. For example, the ER schema of our COMPANY database from
Figure A.1 at the end of this chapter in Appendix: Figure can be redrawn in Rose
using UML notation as shown in Figure 14. The textual specification in Figure 14
can be converted to the graphical representation shown in Figure 15 by using the
data model diagram option in Rose.
Figure 15 is similar to Figure A.1, except that it is using the notation provided by
Rational Rose. Hence, it can be considered as an ER diagram using UML notation,
with the inclusion of methods and other details. Identifying relationships specify
that an object in a child class (DEPENDENT in Figure 15) cannot exist without a
Figure 14
A logical data model diagram definition in Rational Rose.
345
Practical Database Design Methodology and Use of UML Diagrams
corresponding parent object in the parent class (EMPLOYEE in Figure 15), whereas
non-identifying relationships specify a regular association (relationship) between
two independent classes. It is possible to update the schemas directly in their text or
graphical form. For example, if the relationship between the EMPLOYEE and
PROJECT called WORKS_ON was deleted, Rose would automatically update or
delete all the foreign keys in the relevant tables.
EMPLOYEE
Fname: Char(15)
Minit: Char(1)
Lname: Char(15)
Sex: Char(1)
Salary: Integer
Address: Char(20)
Ssn: Integer
Bdate: Date
Number: Integer
Project_number: Integer
Name: Char(15)
Employee_ssn: Integer
<>PK_T_00()
An important difference in Figure 15 from ER notation is that foreign key attributes
actually appear in the class diagrams in Rational Rose. This is common in several dia-
grammatic notations to make the conceptual design closer to the way it is realized in
the relational model implementation.
Converting Logical Data Model to Object Model and Vice Versa. Rational
Rose Data Modeler also provides the option of converting a logical database design
(relational schema) to an object model design (object schema) and vice versa. For
example, the logical data model shown in Figure 14 can be converted to an object
model. This sort of mapping allows a deep understanding of the relationships
between the conceptual model and implementation model, and helps in keeping
them both up-to-date when changes are made to either model during the develop-
ment process. Figure 16 shows the Employee table after converting it to a class in an
object model. The various tabs in the window can then be used to enter/display dif-
ferent types of information. They include operations, attributes, and relationships
for that class.
Extensive Domain Support. The Rose Data Modeler allows database designers
to create a standard set of user-defined data types (these are similar to domains in
SQL) and assign them to any column in the data model. Properties of the domain
are then cascaded to assigned columns. These domains can then be maintained by a
standards group and deployed to all modelers when they begin creating new models
by using the Rational Rose framework.
Easy Communication among Design Teams. As mentioned earlier, using a
common tool allows easy communication between teams. In the Rose Data
Modeler, an application developer can access both the object and data models and
see how they are related, and thus make informed and better choices about how to
build data access methods. There is also the option of using Rational Rose Web
Publisher to allow the models and the meta-data beneath these models to be avail-
able to everyone on the team.
1. As an application involves more and more complexity of data in terms of
relationships and constraints, the number of options or different designs to
model the same information keeps increasing rapidly. It becomes difficult
to deal with this complexity and the corresponding design alternatives
manually.
2. The sheer size of some databases runs into hundreds of entity types and rela-
tionship types, making the task of manually managing these designs almost
impossible. The meta information related to the design process we described
in Section 2 yields another database that must be created, maintained, and
queried as a database in its own right.
The above factors have given rise to many tools that come under the general cate-
gory of CASE (computer-aided software engineering) tools for database design.
Rational Rose is a good example of a modern CASE tool. Typically these tools con-
sist of a combination of the following facilities:
1. Diagramming. This allows the designer to draw a conceptual schema dia-
gram in some tool-specific notation. Most notations include entity types
(classes), relationship types (associations) that are shown either as separate
boxes or simply as directed or undirected lines, cardinality constraints
2. Model mapping. The mapping is system-specific—most tools generate
schemas in SQL DDL for Oracle, DB2, Informix, Sybase, and other RDBMSs.
This part of the tool is most amenable to automation. The designer can fur-
ther edit the produced DDL files if needed.
3. Design normalization. This utilizes a set of functional dependencies that
are supplied at the conceptual design or after the relational schemas are pro-
duced during logical design. Then, design decomposition algorithms are
applied to decompose existing relations into higher normal-form relations.
Generally, many of these tools lack the approach of generating alternative
3NF or BCNF designs and allowing the designer to select among them based
on some criteria like the minimum number of relations or least amount of
storage.
We will not survey database design tools here, but only mention the following char-
acteristics that a good design tool should possess:
1. An easy-to-use interface. This is critical because it enables designers to
focus on the task at hand, not on understanding the tool. Graphical and
point-and-click interfaces are commonly used. A few tools like the SECSI
design tool use natural language input. Different interfaces may be tailored
to beginners or to expert designers.
2. Analytical components. Tools should provide analytical components for
tasks that are difficult to perform manually, such as evaluating physical
design alternatives or detecting conflicting constraints among views. This
area is weak in most current tools.
3. Heuristic components. Aspects of the design that cannot be precisely
quantified can be automated by entering heuristic rules in the design tool to
evaluate design alternatives.
4. Trade-off analysis. A tool should present the designer with adequate com-
parative analysis whenever it presents multiple alternatives to choose from.
Tools should ideally incorporate an analysis of a design change at the con-
ceptual design level down to physical design. Because of the many alterna-
tives possible for physical design in a given system, such tradeoff analysis is
difficult to carry out and most current tools avoid it.
5. Display of design results. Design results, such as schemas, are often dis-
played in diagrammatic form. Aesthetically pleasing and well laid out dia-
grams are not easy to generate automatically. Multipage design layouts that
are easy to read are another challenge. Other types of results of design may
be shown as tables, lists, or reports that should be easy to interpret.
6. Design verification. This is a highly desirable feature. Its purpose is to ver-
ify that the resulting design satisfies the initial requirements. Unless the
requirements are captured and internally represented in some analyzable
form, the verification cannot be attempted.
Currently there is increasing awareness of the value of design tools, and they are
becoming a must for dealing with large database design problems. There is also an
increasing awareness that schema design and application design should go hand in
hand, and the current trend among CASE tools is to address both areas. The popu-
larity of tools such as Rational Rose is due to the fact that it approaches the two
arms of the design process shown in Figure 1 concurrently, approaching database
design and application design as a unified activity. After the acquisition of Rational
by IBM in 2003, the Rational suite of tools have been enhanced as XDE (extended
development environment) tools. Some vendors like Platinum (CA) provide a tool
for data modeling and schema design (ERwin), and another for process modeling
and functional design (BPwin). Other tools (for example, SECSI) use expert system
technology to guide the design process by including design expertise in the form of
rules. Expert system technology is also useful in the requirements collection and
analysis phase, which is typically a laborious and frustrating process. The trend is to
use both meta-data repositories and design tools to achieve better designs for com-
plex databases. Without a claim of being exhaustive, Table 1 lists some popular data-
base design and application modeling tools. Companies in the table are listed
alphabetically.
package must be chosen. We discussed some of the organizational criteria that come
into play in selecting a DBMS. As performance problems are detected, and as new
applications are added, designs have to be modified. The importance of designing
both the schema and the applications (or transactions) was highlighted. We dis-
cussed different approaches to conceptual schema design and the difference
between centralized schema design and the view integration approach.
Persistence Inc. PowerTier Mapping from O-O to relational model
Popkin Software Telelogic System Architect Data modeling, object modeling, process
modeling, structured analysis/design
Resolution Ltd. XCase Conceptual modeling up to code maintenance
2. Which of the six phases are considered the main activities of the database
design process itself? Why?
3. Why is it important to design the schemas and applications in parallel?
4. Why is it important to use an implementation-independent data model dur-
ing conceptual schema design? What models are used in current design
tools? Why?
5. Discuss the importance of requirements collection and analysis.
6. Consider an actual application of a database system of interest. Define the
requirements of the different levels of users in terms of data needed, types of
queries, and transactions to be processed.
7. Discuss the characteristics that a data model for conceptual schema design
should possess.
8. Compare and contrast the two main approaches to conceptual schema
design.
9. Discuss the strategies for designing a single conceptual schema from its
requirements.
10. What are the steps of the view integration approach to conceptual schema
design? What are the difficulties during each step?
11. How would a view integration tool work? Design a sample modular architec-
ture for such a tool.
12. What are the different strategies for view integration?
13. Discuss the factors that influence the choice of a DBMS package for the
information system of an organization.
14. What is system-independent data model mapping? How is it different from
system-dependent data model mapping?
15. What are the important factors that influence physical database design?
16. Discuss the decisions made during physical database design.
17. Discuss the macro and micro life cycles of an information system.
18. Discuss the guidelines for physical database design in RDBMSs.
19. Discuss the types of modifications that may be applied to the logical data-
base design of a relational database.
20. What functions do the typical database design tools provide?
21. What type of functionality would be desirable in automated tools to support
optimal design of large databases?
22. What are the current relational DBMSs that dominate the market? Choose
one that you are familiar with and show how it measures up based on the cri-
teria laid out in Section 2.3?
23. A possible DDL corresponding to the figure above follows:
a. The choice of requiring Name to be NON NULL
b. Selection of Ssn as the PRIMARY KEY
c. Choice of field sizes and precision
d. Any modification of the fields defined in this database
e. Any constraints on individual fields
24. What naming conventions can you develop to help identify foreign keys
more efficiently?
25. What functions do the typical database design tools provide?
detailed discussion of physical design and transaction issues in reference to com-
mercial RDBMSs. A large body of work on conceptual modeling and design was
done in the 1980s. Brodie et al. (1984) gives a collection of chapters on conceptual
modeling, constraint specification and analysis, and transaction design. Yao (1985)
is a collection of works ranging from requirements specification techniques to
schema restructuring. Teorey (1998) emphasizes EER modeling and discusses vari-
ous aspects of conceptual and logical database design. Hoffer et al. (2009) is a good
introduction to the business applications issues of database management.
Navathe and Kerschberg (1986) discuss all phases of database design and point out
the role of data dictionaries. Goldfine and Konig (1988) and ANSI (1989) discuss
the role of data dictionaries in database design. Rozen and Shasha (1991) and Carlis
and March (1984) present different models for the problem of physical database
design. Object-oriented analysis and design is discussed in Schlaer and Mellor
(1988), Rumbaugh et al. (1991), Martin and Odell (1991), and Jacobson et al.
(1992). Recent books by Blaha and Rumbaugh (2005) and Martin and Odell (2008)
consolidate the existing techniques in object-oriented analysis and design using
UML. Fowler and Scott (2000) is a quick introduction to UML. For a comprehen-
sive treatment of UML and its use in the software development process, consult
Jacobson et al. (1999) and Rumbaugh et al. (1999).
Requirements collection and analysis is a heavily researched topic. Chatzoglu et al.
(1997) and Lubars et al. (1993) present surveys of current practices in requirements
capture, modeling, and analysis. Carroll (1995) provides a set of readings on the use
of scenarios for requirements gathering in early stages of system development.
Wood and Silver (1989) gives a good overview of the official Joint Application
Design (JAD) process. Potter et al. (1991) describes the Z-notation and methodol-
ogy for formal specification of software. Zave (1997) has classified the research
efforts in requirements engineering.
refinement of EER schemas for integration. Castano et al. (1998) present a compre-
hensive survey of conceptual schema analysis techniques.
Transaction design is a relatively less thoroughly researched topic. Mylopoulos et al.
(1980) proposed the TAXIS language, and Albano et al. (1985) developed the
GALILEO system, both of which are comprehensive systems for specifying transac-
tions. The GORDAS language for the ECR model (Elmasri et al. 1985) contains a
transaction specification capability. Navathe and Balaraman (1991) and Ngu (1989)
discuss transaction modeling in general for semantic data models. Elmagarmid
(1992) discusses transaction models for advanced applications. Batini et al. (1992,
Chapters 8, 9, and 11) discuss high-level transaction design and joint analysis of
data and functions. Shasha (1992) is an excellent source on database tuning.
. . .
In this chapter, we discuss the features of object-oriented data models and show how some of these
features have been incorporated in relational database systems. Object-oriented
databases are now referred to as object databases (ODB) (previously called
OODB), and the database systems are referred to as object data management sys-
tems (ODMS) (formerly referred to as ODBMS or OODBMS). Traditional data
models and systems, such as relational, network, and hierarchical, have been quite
successful in developing the database technologies required for many traditional
business database applications. However, they have certain shortcomings when
more complex database applications must be designed and implemented—for
example, databases for engineering design and manufacturing (CAD/CAM and
CIM1), scientific experiments, telecommunications, geographic information sys-
tems, and multimedia.2 These newer applications have requirements and character-
istics that differ from those of traditional business applications, such as more
complex structures for stored objects; the need for new data types for storing
images, videos, or large textual items; longer-duration transactions; and the need to
define nonstandard application-specific operations. Object databases were pro-
posed to meet some of the needs of these more complex applications. A key feature
of object databases is the power they give the designer to specify both the structure
of complex objects and the operations that can be applied to these objects.
2Multimedia databases must store various types of multimedia objects, such as video, audio, images,
graphics, and documents.
1Computer-aided design/computer-aided manufacturing and computer-integrated manufacturing.
From Chapter 11 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
Another reason for the creation of object-oriented databases is the vast increase in
the use of object-oriented programming languages for developing software applica-
tions. Databases are fundamental components in many software systems, and tradi-
tional databases are sometimes difficult to use with software applications that are
developed in an object-oriented programming language such as C++ or Java.
Object databases are designed so they can be directly—or seamlessly—integrated
with software that is developed using object-oriented programming languages.
and the ENCORE/ObServer project at Brown University. Commercially available
systems included GemStone Object Server of GemStone Systems, ONTOS DB of
Ontos, Objectivity/DB of Objectivity Inc., Versant Object Database and FastObjects
by Versant Corporation (and Poet), ObjectStore of Object Design, and Ardent
Database of Ardent.5 These represent only a partial list of the experimental proto-
types and commercial object-oriented database systems that were created.
3Microelectronics and Computer Technology Corporation, Austin, Texas.
4Now called Lucent Technologies.
5Formerly O2 of O2 Technology.
6Object Data Management Group.
7Similar concepts were also developed in the fields of semantic data modeling and knowledge represen-
tation.
a database by making them persistent, and type and class hierarchies and inheritance.
Then, in Section 2 we see how these concepts have been incorporated into the latest
SQL standards, leading to object-relational databases. Object features were origi-
nally introduced in SQL:1999, and then updated in the latest version (SQL:2008) of
the standard. In Section 3, we turn our attention to “pure” object database standards
by presenting features of the object database standard ODMG 3.0 and the object
definition language ODL. Section 4 presents an overview of the database design
process for object databases. Section 5 discusses the object query language (OQL),
which is part of the ODMG 3.0 standard. In Section 6, we discuss programming lan-
guage bindings, which specify how to extend object-oriented programming lan-
guages to include the features of the object database standard. Section 7 summarizes
the chapter. Sections 5 and 6 may be left out if a less thorough introduction to object
databases is desired.
An object typically has two components: state (value) and behavior (operations). It
can have a complex data structure as well as specific operations defined by the pro-
grammer.9 Objects in an OOPL exist only during program execution; therefore,
they are called transient objects. An OO database can extend the existence of objects
so that they are stored permanently in a database, and hence the objects become
persistent objects that exist beyond program termination and can be retrieved later
and shared by other programs. In other words, OO databases store persistent
objects permanently in secondary storage, and allow the sharing of these objects
among multiple programs and applications. This requires the incorporation of
other well-known features of database management systems, such as indexing
mechanisms to efficiently locate the objects, concurrency control to allow object
8Palo Alto Research Center, Palo Alto, California.
9Objects have many other characteristics, as we discuss in the rest of this chapter.
sharing among concurrent programs, and recovery from failures. An OO database
system will typically interface with one or more OO programming languages to
provide persistent and shared object capabilities.
The internal structure of an object in OOPLs includes the specification of instance
variables, which hold the values that define the internal state of the object. An
instance variable is similar to the concept of an attribute in the relational model,
except that instance variables may be encapsulated within the object and thus are
not necessarily visible to external users. Instance variables may also be of arbitrarily
complex data types. Object-oriented systems allow definition of the operations or
functions (behavior) that can be applied to objects of a particular type. In fact, some
OO models insist that all operations a user can apply to an object must be prede-
fined. This forces a complete encapsulation of objects. This rigid approach has been
relaxed in most OO data models for two reasons. First, database users often need to
know the attribute names so they can specify selection conditions on the attributes
to retrieve specific objects. Second, complete encapsulation implies that any simple
retrieval requires a predefined operation, thus making ad hoc queries difficult to
specify on the fly.
To encourage encapsulation, an operation is defined in two parts. The first part,
called the signature or interface of the operation, specifies the operation name and
arguments (or parameters). The second part, called the method or body, specifies the
implementation of the operation, usually written in some general-purpose pro-
gramming language. Operations can be invoked by passing a message to an object,
which includes the operation name and the parameters. The object then executes
the method for that operation. This encapsulation permits modification of the
internal structure of an object, as well as the implementation of its operations, with-
out the need to disturb the external programs that invoke these operations. Hence,
encapsulation provides a form of data and operation independence.
Another OO concept is operator overloading, which refers to an operation’s ability to
be applied to different types of objects; in such a situation, an operation name may
refer to several distinct implementations, depending on the type of object it is
applied to. This feature is also called operator polymorphism. For example, an opera-
tion to calculate the area of a geometric object may differ in its method (implemen-
tation), depending on whether the object is of type triangle, circle, or rectangle. This
may require the use of late binding of the operation name to the appropriate
method at runtime, when the type of object to which the operation is applied
becomes known.
In the next several sections, we discuss in some detail the main characteristics of
object databases. Section 1.2 discusses object identity; Section 1.3 shows how the
types for complex-structured objects are specified via type constructors; Section 1.4
discusses encapsulation and persistence; and Section 1.5 presents inheritance con-
cepts. Section 1.6 discusses some additional OO concepts, and Section 1.7 gives a
summary of all the OO concepts that we introduced. In Section 2, we show how
some of these concepts have been incorporated into the SQL:2008 standard for rela-
tional databases. Then in Section 3, we show how these concepts are realized in the
ODMG 3.0 object database standard.
It is inappropriate to base the OID on the physical address of the object in storage,
since the physical address can change after a physical reorganization of the database.
However, some early ODMSs have used the physical address as the OID to increase
the efficiency of object retrieval. If the physical address of the object changes, an
indirect pointer can be placed at the former address, which gives the new physical
location of the object. It is more common to use long integers as OIDs and then to
use some form of hash table to map the OID value to the current physical address of
the object in storage.
1. One type constructor has been called the atom constructor, although this
term is not used in the latest object standard. This includes the basic built-in
data types of the object model, which are similar to the basic types in many
programming languages: integers, strings, floating point numbers, enumer-
ated types, Booleans, and so on. They are called single-valued or atomic
types, since each value of the type is considered an atomic (indivisible) sin-
gle value.
2. A second type constructor is referred to as the struct (or tuple) constructor.
This can create standard structured types, such as the tuples (record types)
in the basic relational model. A structured type is made up of several compo-
nents, and is also sometimes referred to as a compound or composite type.
More accurately, the struct constructor is not considered to be a type, but
rather a type generator, because many different structured types can be cre-
ated. For example, two different structured types that can be created are:
struct Name, and
3. Collection (or multivalued) type constructors include the set(T), list(T),
bag(T), array(T), and dictionary(K,T) type constructors. These allow part
of an object or literal value to include a collection of other objects or values
when needed. These constructors are also considered to be type generators
because many different types can be created. For example, set(string),
set(integer), and set(Employee) are three different types that can be created
from the set type constructor. All the elements in a particular collection value
must be of the same type. For example, all values in a collection of type
set(string) must be string values.
The atom constructor is used to represent all basic atomic values, such as integers,
real numbers, character strings, Booleans, and any other basic data types that the
system supports directly. The tuple constructor can create structured values and
objects of the form , where each aj is an attribute name
The main characteristic of a collection type is that its objects or values will be a
collection of objects or values of the same type that may be unordered (such as a set or
a bag) or ordered (such as a list or an array). The tuple type constructor is often
called a structured type, since it corresponds to the struct construct in the C and
C++ programming languages.
10Also called an instance variable name in OO terminology.
11This corresponds to the DDL (data definition language) of the database system.
the concepts gradually in this section using a simpler notation. The type construc-
tors can be used to define the data structures for an OO database schema. Figure 1
shows how we may declare EMPLOYEE and DEPARTMENT types.
In Figure 1, the attributes that refer to other objects—such as Dept of EMPLOYEE or
Projects of DEPARTMENT—are basically OIDs that serve as references to other
objects to represent relationships among the objects. For example, the attribute Dept
of EMPLOYEE is of type DEPARTMENT, and hence is used to refer to a specific
DEPARTMENT object (the DEPARTMENT object where the employee works). The
value of such an attribute would be an OID for a specific DEPARTMENT object. A
binary relationship can be represented in one direction, or it can have an inverse ref-
erence. The latter representation makes it easy to traverse the relationship in both
directions. For example, in Figure 1 the attribute Employees of DEPARTMENT has as
its value a set of references (that is, a set of OIDs) to objects of type EMPLOYEE; these
are the employees who work for the DEPARTMENT. The inverse is the reference
attribute Dept of EMPLOYEE. We will see in Section 3 how the ODMG standard
allows inverses to be explicitly declared as relationship attributes to ensure that
inverse references are consistent.
Encapsulation of Operations. The concept of encapsulation is one of the main
characteristics of OO languages and systems. It is also related to the concepts of
abstract data types and information hiding in programming languages. In traditional
database models and systems this concept was not applied, since it is customary to
make the structure of database objects visible to users and external programs. In
these traditional models, a number of generic database operations are applicable to
objects of all types. For example, in the relational model, the operations for selecting,
inserting, deleting, and modifying tuples are generic and may be applied to any rela-
tion in the database. The relation and its attributes are visible to users and to exter-
nal programs that access the relation by using these operations. The concepts of
encapsulation is applied to database objects in ODBs by defining the behavior of a
type of object based on the operations that can be externally applied to objects of
that type. Some operations may be used to create (insert) or destroy (delete)
objects; other operations may update the object state; and others may be used to
retrieve parts of the object state or to apply some calculations. Still other operations
may perform a combination of retrieval, calculation, and update. In general, the
implementation of an operation can be specified in a general-purpose programming
language that provides flexibility and power in defining the operations.
The external users of the object are only made aware of the interface of the opera-
tions, which defines the name and arguments (parameters) of each operation. The
implementation is hidden from the external users; it includes the definition of any
hidden internal data structures of the object and the implementation of the opera-
tions that access these structures. The interface part of an operation is sometimes
called the signature, and the operation implementation is sometimes called the
method.
For database applications, the requirement that all objects be completely encapsu-
lated is too stringent. One way to relax this requirement is to divide the structure of
an object into visible and hidden attributes (instance variables). Visible attributes
can be seen by and are directly accessible to the database users and programmers via
the query language. The hidden attributes of an object are completely encapsulated
and can be accessed only through predefined operations. Most ODMSs employ
high-level query languages for accessing visible attributes. In Section 5 we will
describe the OQL query language that is proposed as a standard query language for
ODBs.
The term class is often used to refer to a type definition, along with the definitions of
the operations for that type.12 Figure 2 shows how the type definitions in Figure 1
can be extended with operations to define classes. A number of operations are
12This definition of class is similar to how it is used in the popular C++ programming language. The
ODMG standard uses the word interface in addition to class (see Section 3). In the EER model, the
term class is used to refer to an object type, along with the set of all objects of that type.
declared for each class, and the signature (interface) of each operation is included in
the class definition. A method (implementation) for each operation must be defined
elsewhere using a programming language. Typical operations include the object con-
structor operation (often called new), which is used to create a new object, and the
destructor operation, which is used to destroy (delete) an object. A number of object
modifier operations can also be declared to modify the states (values) of various
attributes of an object. Additional operations can retrieve information about the
object.
An operation is typically applied to an object by using the dot notation. For exam-
ple, if d is a reference to a DEPARTMENT object, we can invoke an operation such as
no_of_emps by writing d.no_of_emps. Similarly, by writing d.destroy_dept, the object
referenced by d is destroyed (deleted). The only exception is the constructor opera-
tion, which returns a reference to a new DEPARTMENT object. Hence, it is customary
in some OO models to have a default name for the constructor operation that is the
name of the class itself, although this was not used in Figure 2.13 The dot notation is
also used to refer to attributes of an object—for example, by writing d.Dnumber or
d.Mgr_Start_date.
Specifying Object Persistence via Naming and Reachability. An ODBS is
often closely coupled with an object-oriented programming language (OOPL). The
OOPL is used to specify the method (operation) implementations as well as other
application code. Not all objects are meant to be stored permanently in the data-
base. Transient objects exist in the executing program and disappear once the pro-
gram terminates. Persistent objects are stored in the database and persist after
program termination. The typical mechanisms for making an object persistent are
naming and reachability.
If we first create a named persistent object N, whose state is a set (or possibly a bag)
of objects of some class C, we can make objects of C persistent by adding them to the
set, thus making them reachable from N. Hence, N is a named object that defines a
persistent collection of objects of class C. In the object model standard, N is called
the extent of C (see Section 3).
Notice the difference between traditional database models and ODBs in this respect.
13Default names for the constructor and destructor operations exist in the C++ programming language.
For example, for class EMPLOYEE, the default constructor name is EMPLOYEE and the default destruc-
tor name is ~EMPLOYEE. It is also common to use the new operation to create new objects.
14As we will see in Section 3, the ODMG ODL syntax uses set instead of
set(DEPARTMENT).
In traditional database models, such as the relational model, all objects are assumed
to be persistent. Hence, when a table such as EMPLOYEE is created in a relational
database, it represents both the type declaration for EMPLOYEE and a persistent set of
all EMPLOYEE records (tuples). In the OO approach, a class declaration of
EMPLOYEE specifies only the type and operations for a class of objects. The user
must separately define a persistent object of type set(EMPLOYEE) or
bag(EMPLOYEE) whose value is the collection of references (OIDs) to all persistent
EMPLOYEE objects, if this is desired, as shown in Figure 3.15 This allows transient
and persistent objects to follow the same type and class declarations of the ODL and
the OOPL. In general, it is possible to define several persistent collections for the
same class definition, if desired.
Simplified Model for Inheritance. Another main characteristic of ODBs is that
they allow type hierarchies and inheritance. We use a simple OO model in this sec-
tion—a model in which attributes and operations are treated uniformly—since
both attributes and operations can be inherited. In Section 3, we will discuss the
inheritance model of the ODMG standard, which differs from the model discussed
here because it distinguishes between two types of inheritance. Inheritance allows the
definition of new types based on other predefined types, leading to a type (or class)
hierarchy.
15Some systems, such as POET, automatically create the extent for a class.
A type is defined by assigning it a type name, and then defining a number of attrib-
utes (instance variables) and operations (methods) for the type.16 In the simplified
model we use in this section, the attributes and operations are together called
functions, since attributes resemble functions with zero arguments. A function
name can be used to refer to the value of an attribute or to refer to the resulting
value of an operation (method). We use the term function to refer to both attrib-
utes and operations, since they are treated similarly in a basic introduction to inher-
itance.17
A type in its simplest form has a type name and a list of visible (public) functions.
When specifying a type in this section, we use the following format, which does not
specify arguments of functions, to simplify the discussion:
In the PERSON type, the Name, Address, Ssn, and Birth_date functions can be imple-
mented as stored attributes, whereas the Age function can be implemented as an
operation that calculates the Age from the value of the Birth_date attribute and the
current date.
The concept of subtype is useful when the designer or user must create a new type
that is similar but not identical to an already defined type. The subtype then inher-
its all the functions of the predefined type, which is referred to as the supertype. For
example, suppose that we want to define two new types EMPLOYEE and STUDENT
as follows:
16In this section we will use the terms type and class as meaning the same thing—namely, the attributes
and operations of some type of object.
17We will see in Section 3 that types with functions are similar to the concept of interfaces as used in
ODMG ODL.
In general, a subtype includes all of the functions that are defined for its supertype
plus some additional functions that are specific only to the subtype. Hence, it is pos-
sible to generate a type hierarchy to show the supertype/subtype relationships
among all the types declared in the system.
As another example, consider a type that describes objects in plane geometry, which
may be defined as follows:
Notice that type definitions describe objects but do not generate objects on their
own. When an object is created, typically it belongs to one or more of these types
that have been declared. For example, a circle object is of type CIRCLE and
GEOMETRY_OBJECT (by inheritance). Each object also becomes a member of one
or more persistent collections of objects (or extents), which are used to group
together collections of objects that are persistently stored in the database.
Constraints on Extents Corresponding to a Type Hierarchy. In most ODBs,
an extent is defined to store the collection of persistent objects for each type or sub-
type. In this case, the constraint is that every object in an extent that corresponds to
a subtype must also be a member of the extent that corresponds to its supertype.
Some OO database systems have a predefined system type (called the ROOT class or
the OBJECT class) whose extent contains all the objects in the system.18
Classification then proceeds by assigning objects into additional subtypes that are
meaningful to the application, creating a type hierarchy (or class hierarchy) for the
system. All extents for system- and user-defined classes are subsets of the extent cor-
18This is called OBJECT in the ODMG model (see Section 3).
responding to the class OBJECT, directly or indirectly. In the ODMG model (see
Section 3), the user may or may not specify an extent for each class (type), depend-
ing on the application.
An extent is a named persistent object whose value is a persistent collection that
holds a collection of objects of the same type that are stored permanently in the
database. The objects can be accessed and shared by multiple programs. It is also
possible to create a transient collection, which exists temporarily during the execu-
tion of a program but is not kept when the program terminates. For example, a
transient collection may be created in a program to hold the result of a query that
selects some objects from a persistent collection and copies those objects into the
transient collection. The program can then manipulate the objects in the transient
collection, and once the program terminates, the transient collection ceases to exist.
In general, numerous collections—transient or persistent—may contain objects of
the same type.
The inheritance model discussed in this section is very simple. As we will see in
Section 3, the ODMG model distinguishes between type inheritance—called
interface inheritance and denoted by a colon (:)—and the extent inheritance con-
straint—denoted by the keyword EXTEND.
Polymorphism of Operations (Operator Overloading). Another characteris-
tic of OO systems in general is that they provide for polymorphism of operations,
which is also known as operator overloading. This concept allows the same
operator name or symbol to be bound to two or more different implementations of
the operator, depending on the type of objects to which the operator is applied. A
simple example from programming languages can illustrate this concept. In some
languages, the operator symbol “+” can mean different things when applied to
operands (objects) of different types. If the operands of “+” are of type integer, the
operation invoked is integer addition. If the operands of “+” are of type floating
point, the operation invoked is floating point addition. If the operands of “+” are of
type set, the operation invoked is set union. The compiler can determine which
operation to execute based on the types of operands supplied.
In OO databases, a similar situation may occur. We can use the
GEOMETRY_OBJECT example presented in Section 1.5 to illustrate operation poly-
morphism19 in ODB.
19In programming languages, there are several kinds of polymorphism. The interested reader is referred
to the Selected Bibliography at the end of this chapter for works that include a more thorough discus-
sion.
example, by writing a general algorithm to calculate the area of a polygon) and then
to rewrite more efficient algorithms to calculate the areas of specific types of geo-
metric objects, such as a circle, a rectangle, a triangle, and so on. In this case, the Area
function is overloaded by different implementations.
Multiple Inheritance and Selective Inheritance. Multiple inheritance occurs
when a certain subtype T is a subtype of two (or more) types and hence inherits the
functions (attributes and methods) of both supertypes. For example, we may create a
subtype ENGINEERING_MANAGER that is a subtype of both MANAGER and
ENGINEER. This leads to the creation of a type lattice rather than a type hierarchy.
One problem that can occur with multiple inheritance is that the supertypes from
which the subtype inherits may have distinct functions of the same name, creating an
ambiguity. For example, both MANAGER and ENGINEER may have a function called
Salary. If the Salary function is implemented by different methods in the MANAGER
and ENGINEER supertypes, an ambiguity exists as to which of the two is inherited by
the subtype ENGINEERING_MANAGER. It is possible, however, that both ENGINEER
and MANAGER inherit Salary from the same supertype (such as EMPLOYEE) higher
up in the lattice. The general rule is that if a function is inherited from some common
supertype, then it is inherited only once. In such a case, there is no ambiguity; the
problem only arises if the functions are distinct in the two supertypes.
There are several techniques for dealing with ambiguity in multiple inheritance.
One solution is to have the system check for ambiguity when the subtype is created,
and to let the user explicitly choose which function is to be inherited at this time. A
second solution is to use some system default. A third solution is to disallow multi-
ple inheritance altogether if name ambiguity occurs, instead forcing the user to
change the name of one of the functions in one of the supertypes. Indeed, some OO
systems do not permit multiple inheritance at all. In the object database standard
(see Section 3), multiple inheritance is allowed for operation inheritance of inter-
faces, but is not allowed for EXTENDS inheritance of classes.
Selective inheritance occurs when a subtype inherits only some of the functions of
a supertype. Other functions are not inherited. In this case, an EXCEPT clause may
be used to list the functions in a supertype that are not to be inherited by the sub-
type. The mechanism of selective inheritance is not typically provided in ODBs, but
it is used more frequently in artificial intelligence applications.20
20In the ODMG model, type inheritance refers to inheritance of operations only, not attributes (see
Section 3).
■ Object identity. Objects have unique identities that are independent of their
attribute values and are generated by the ODMS.
■ Type constructors. Complex object structures can be constructed by apply-
ing in a nested manner a set of basic constructors, such as tuple, set, list,
array, and bag.
■ Encapsulation of operations. Both the object structure and the operations
that can be applied to individual objects are included in the type definitions.
■ Programming language compatibility. Both persistent and transient
objects are handled seamlessly. Objects are made persistent by being reach-
able from a persistent collection (extent) or by explicit naming.
■ Type hierarchies and inheritance. Object types can be specified by using a
type hierarchy, which allows the inheritance of both attributes and methods
(operations) of previously defined types. Multiple inheritance is allowed in
some models.
■ Extents. All persistent objects of a particular type can be stored in an extent.
Extents corresponding to a type hierarchy have set/subset constraints
enforced on their collections of persistent objects.
■ Polymorphism and operator overloading. Operations and method names
can be overloaded to apply to different object types with different imple-
mentations.
In the following sections we show how these concepts are realized in the SQL stan-
dard (Section 2) and the ODMG standard (Section 3).
SQL is the standard language for RDBMSs. SQL was first specified by Chamberlin
and Boyce (1974) and underwent enhancements and standardization in 1989 and
1992. The language continued its evolution with a new standard, initially called
SQL3 while being developed, and later known as SQL:99 for the parts of SQL3 that
were approved into the standard. Starting with the version of SQL known as SQL3,
features from object databases were incorporated into the SQL standard. At first,
these extensions were known as SQL/Object, but later they were incorporated in the
main part of SQL, known as SQL/Foundation. We will use that latest standard,
SQL:2008, in our presentation of the object features of SQL, even though this may
not yet have been realized in commercial DBMSs that follow SQL. We will also dis-
cuss how the object features of SQL evolved to their latest manifestation in
SQL:2008.
The following are some of the object database features that have been included in
SQL:
■ Some type constructors have been added to specify complex objects. These
include the row type, which corresponds to the tuple (or struct) constructor.
An array type for specifying collections is also provided. Other collection
type constructors, such as set, list, and bag constructors, were not part of the
original SQL/Object specifications but were later included in the standard.
■ Inheritance mechanisms are provided using the keyword UNDER.
We now discuss each of these concepts in more detail. In our discussion, we will
refer to the example in Figure 4.
Figure 4 illustrates some of the object concepts in SQL. We will explain the examples
in this figure gradually as we explain the concepts. First, a UDT can be used as either
the type for an attribute or as the type for a table. By using a UDT as the type for an
attribute within another UDT, a complex structure for objects (tuples) in a table can
be created, much like that achieved by nesting type constructors. This is similar to
using the struct type constructor of Section 1.3. For example, in Figure 4(a), the
UDT STREET_ADDR_TYPE is used as the type for the STREET_ADDR attribute in the
UDT USA_ADDR_TYPE. Similarly, the UDT USA_ADDR_TYPE is in turn used as the
type for the ADDR attribute in the UDT PERSON_TYPE in Figure 4(b). If a UDT
does not have any operations, as in the examples in Figure 4(a), it is possible to use
the concept of ROW TYPE to directly create a structured attribute by using the key-
word ROW. For example, we could use the following instead of declaring
STREET_ADDR_TYPE as a separate type as in Figure 4(a):
To allow for collection types in order to create complex-structured objects, four
constructors are now included in SQL: ARRAY, MULTISET, LIST, and SET. These are
similar to the type constructors discussed in Section 1.3. In the initial specification
of SQL/Object, only the ARRAY type was specified, since it can be used to simulate
the other types, but the three additional collection types were included in the latest
version of the SQL standard. In Figure 4(b), the PHONES attribute of
PERSON_TYPE has as its type an array whose elements are of the previously defined
UDT USA_PHONE_TYPE. This array has a maximum of four elements, meaning
that we can store up to four phone numbers per person. An array can also have no
maximum number of elements if desired.
An array type can have its elements referenced using the common notation of square
brackets. For example, PHONES[1] refers to the first location value in a PHONES
attribute (see Figure 4(b)). A built-in function CARDINALITY can return the current
number of elements in an array (or any other collection type). For example,
PHONES[CARDINALITY (PHONES)] refers to the last element in the array.
In general, the user can specify that system-generated object identifiers for the indi-
vidual rows in a table should be created. By using the syntax:
For example, in Figure 4(b), we declared a method Age() that calculates the age of
an individual object of type PERSON_TYPE.
The code for implementing the method still has to be written. We can refer to the
method implementation by specifying the file that contains the code for the
method, or we can write the actual code within the type declaration itself (see
Figure 4(b)).
SQL provides certain built-in functions for user-defined types. For a UDT called
TYPE_T, the constructor function TYPE_T( ) returns a new object of that type. In
the new UDT object, every attribute is initialized to its default value. An observer
function A is implicitly created for each attribute A to read its value. Hence, A(X) or
X.A returns the value of attribute A of TYPE_T if X is of type TYPE_T. A mutator
function for updating an attribute sets the value of the attribute to a new value. SQL
allows these functions to be blocked from public use; an EXECUTE privilege is
needed to have access to these functions.
In general, a UDT can have a number of user-defined functions associated with it.
The syntax is
Two types of functions can be defined: internal SQL and external. Internal functions
are written in the extended PSM language of SQL. External functions are written in
a host language, with only their signature (interface) appearing in the UDT defini-
tion. An external function definition can be declared as follows:
It is also possible to define virtual attributes as part of UDTs, which are computed
and updated using functions.
■ All attributes are inherited.
■ The order of supertypes in the UNDER clause determines the inheritance
hierarchy.
■ An instance of a subtype can be used in every context in which a supertype
instance is used.
■ A subtype can redefine any function that is defined in its supertype, with the
restriction that the signature be the same.
■ When a function is called, the best match is selected based on the types of all
arguments.
■ For dynamic linking, the runtime types of parameters is considered.
Consider the following examples to illustrate type inheritance, which are illustrated
in Figure 4(c). Suppose that we want to create two subtypes of PERSON_TYPE:
EMPLOYEE_TYPE and STUDENT_TYPE. In addition, we also create a subtype
MANAGER_TYPE that inherits all the attributes (and methods) of EMPLOYEE_TYPE
but has an additional attribute DEPT_MANAGED. These subtypes are shown in
Figure 4(c).
In general, we specify the local attributes and any additional specific methods for
the subtype, which inherits the attributes and operations of its supertype.
Another facility in SQL is table inheritance via the supertable/subtable facility. This
is also specified using the keyword UNDER (see Figure 4(d)). Here, a new record that
is inserted into a subtable, say the MANAGER table, is also inserted into its superta-
bles EMPLOYEE and PERSON. Notice that when a record is inserted in MANAGER,
we must provide values for all its inherited attributes. INSERT, DELETE, and UPDATE
operations are appropriately propagated.
In SQL, –> is used for dereferencing and has the same meaning assigned to it in the
C programming language. Thus, if r is a reference to a tuple and a is a component
attribute in that tuple, then r –> a is the value of attribute a in that tuple.
One of the reasons for the success of commercial relational DBMSs is the SQL stan-
dard. The lack of a standard for ODMSs for several years may have caused some
potential users to shy away from converting to this new technology. Subsequently, a
consortium of ODMS vendors and users, called ODMG (Object Data Management
Group), proposed a standard that is known as the ODMG-93 or ODMG 1.0 stan-
dard. This was revised into ODMG 2.0, and later to ODMG 3.0. The standard is
made up of several parts, including the object model, the object definition language
(ODL), the object query language (OQL), and the bindings to object-oriented pro-
gramming languages.
In this section, we describe the ODMG object model and the ODL. In Section 4, we
discuss how to design an ODB from an EER conceptual schema. We will give an
overview of OQL in Section 5, and the C++ language binding in Section 6.
Examples of how to use ODL, OQL, and the C++ language binding will use the
UNIVERSITY database example introduced in the chapter “The Enhanced Entity-
Relationship (EER) Model.” In our description, we will follow the ODMG 3.0 object
model as described in Cattell et al. (2000).21 It is important to note that many of the
ideas embodied in the ODMG object model are based on two decades of research
into conceptual modeling and object databases by many researchers.
Objects and Literals. Objects and literals are the basic building blocks of the
object model. The main difference between the two is that an object has both an
object identifier and a state (or current value), whereas a literal has a value (state)
but no object identifier.22 In either case, the value can have a complex structure. The
object state can change over time by modifying the object value. A literal is basically
a constant value, possibly having a complex structure, but it does not change.
An object has five aspects: identifier, name, lifetime, structure, and creation.
1. The object identifier is a unique system-wide identifier (or Object_id).23
Every object must have an object identifier.
2. Some objects may optionally be given a unique name within a particular
ODMS—this name can be used to locate the object, and the system should
return the object given that name.24 Obviously, not all individual objects
will have unique names. Typically, a few objects, mainly those that hold col-
lections of objects of a particular object type—such as extents—will have a
name. These names are used as entry points to the database; that is, by
locating these objects by their unique name, the user can then locate other
objects that are referenced from these objects. Other important objects in
21The earlier versions of the object model were published in 1993 and 1997.
22We will use the terms value and state interchangeably here.
23This corresponds to the OID of Section 1.2.
24This corresponds to the naming mechanism for persistence, described in Section 1.4.
3. The lifetime of an object specifies whether it is a persistent object (that is, a
database object) or transient object (that is, an object in an executing pro-
gram that disappears after the program terminates). Lifetimes are indepen-
dent of types—that is, some objects of a particular type may be transient
whereas others may be persistent.
4. The structure of an object specifies how the object is constructed by using
the type constructors. The structure specifies whether an object is atomic or
not. An atomic object refers to a single object that follows a user-defined
type, such as Employee or Department. If an object is not atomic, then it will be
composed of other objects. For example, a collection object is not an atomic
object, since its state will be a collection of other objects.25 The term atomic
object is different from how we defined the atom constructor in Section 1.3,
which referred to all values of built-in data types. In the ODMG model, an
atomic object is any individual user-defined object. All values of the basic
built-in data types are considered to be literals.
5. Object creation refers to the manner in which an object can be created. This
is typically accomplished via an operation new for a special Object_Factory
interface. We shall describe this in more detail later in this section.
In the object model, a literal is a value that does not have an object identifier.
However, the value may have a simple or complex structure. There are three types of
literals: atomic, structured, and collection.
1. Atomic literals26 correspond to the values of basic data types and are prede-
fined. The basic data types of the object model include long, short, and
unsigned integer numbers (these are specified by the keywords long, short,
unsigned long, and unsigned short in ODL), regular and double precision
floating point numbers (float, double), Boolean values (boolean), single
characters (char), character strings (string), and enumeration types (enum),
among others.
2. Structured literals correspond roughly to values that are constructed using
the tuple constructor described in Section 1.3. The built-in structured liter-
als include Date, Interval, Time, and Timestamp (see Figure 5(b)). Additional
user-defined structured literals can be defined as needed by each
application.27 User-defined structures are created using the STRUCT key-
word in ODL, as in the C and C++ programming languages.
25In the ODMG model, atomic objects do not correspond to objects whose values are basic data types.
All basic values (integers, reals, and so on) are considered literals.
26The use of the word atomic in atomic literal corresponds to the way we used atom constructor in
Section 1.3.
27The structures for Date, Interval, Time, and Timestamp can be used to create either literal values or
objects with identifiers.
3. Collection literals specify a literal value that is a collection of objects or val-
ues but the collection itself does not have an Object_id. The collections in the
object model can be defined by the type generators set, bag, list,
and array, where T is the type of objects or values in the collection.28
Figure 5 gives a simplified view of the basic types and type generators of the object
model. The notation of ODMG uses three concepts: interface, literal, and class.
Following the ODMG terminology, we use the word behavior to refer to operations
and state to refer to properties (attributes and relationships). An interface specifies
only behavior of an object type and is typically noninstantiable (that is, no objects
are created corresponding to an interface). Although an interface may have state
properties (attributes and relationships) as part of its specifications, these cannot be
inherited from the interface. Hence, an interface serves to define operations that can
be inherited by other interfaces, as well as by classes that define the user-defined
objects for a particular application. A class specifies both state (attributes) and
behavior (operations) of an object type, and is instantiable. Hence, database and
application objects are typically created based on the user-specified class declara-
tions that form a database schema. Finally, a literal declaration specifies state but no
behavior. Thus, a literal instance holds a simple or complex structured value but has
neither an object identifier nor encapsulated operations.
Figure 5 is a simplified version of the object model. For the full specifications, see
Cattell et al. (2000). We will describe some of the constructs shown in Figure 5 as we
describe the object model. In the object model, all objects inherit the basic interface
operations of Object, shown in Figure 5(a); these include operations such as copy
(creates a new copy of the object), delete (deletes the object), and same_as (com-
pares the object’s identity to another object).29 In general, operations are applied to
objects using the dot notation. For example, given an object O, to compare it with
another object P, we write
The result returned by this operation is Boolean and would be true if the identity of
P is the same as that of O, and false otherwise. Similarly, to create a copy P of object
O, we write
28These are similar to the corresponding type constructors described in Section 1.3.
29Additional operations are defined on objects for locking purposes, which are not shown in Figure 5.
The other inheritance relationship, called EXTENDS inheritance, is specified by the
keyword extends. It is used to inherit both state and behavior strictly among classes,
so both the supertype and the subtype must be classes. Multiple inheritance via
extends is not permitted. However, multiple inheritance is allowed for behavior
inheritance via the colon (:) notation. Hence, an interface may inherit behavior
from several other interfaces. A class may also inherit behavior from several inter-
faces via colon (:) notation, in addition to inheriting behavior and state from at most
one other class via extends. In Section 3.4 we will give examples of how these two
inheritance relationships—“:” and extends—may be used.
30The ODMG report also calls interface inheritance as type/subtype, is-a, and generalization/specializa-
tion relationships, although, in the literature these terms have been used to describe inheritance of both
state and operations (see Section 1).
be raised by the O.remove_element(E) operation if E is not an element in the collec-
tion O. The NoMoreElements exception in the iterator interface would be raised by
the I.next_position() operation if the iterator is currently positioned at the last ele-
ment in the collection, and hence no more elements exist for the iterator to point to.
Collection objects are further specialized into set, list, bag, array, and dictionary, which
inherit the operations of the Collection interface. A set type generator can be
used to create objects such that the value of object O is a set whose elements are of
type T. The Set interface includes the additional operation P = O.create_union(S) (see
Figure 5(c)), which returns a new object P of type set that is the union of the
two sets O and S. Other operations similar to create_union (not shown in Figure
5(c)) are create_intersection(S) and create_difference(S). Operations for set compari-
son include the O.is_subset_of(S) operation, which returns true if the set object O is
a subset of some other set object S, and returns false otherwise. Similar operations
(not shown in Figure 5(c)) are is_proper_subset_of(S), is_superset_of(S), and
is_proper_superset_of(S). The bag type generator allows duplicate elements in
the collection and also inherits the Collection interface. It has three operations—
create_union(b), create_intersection(b), and create_difference(b)—that all return a new
object of type bag.
E = O.retrieve_element_at(I), which simply retrieves the ith element of the array. Any
of these operations can raise the exception InvalidIndex if I is greater than the array’s
size. The operation O.resize(N) changes the number of array elements to N.
Figure 6 is a diagram that illustrates the inheritance hierarchy of the built-in con-
structs of the object model. Operations are inherited from the supertype to the sub-
type. The collection interfaces described above are not directly instantiable; that is,
one cannot directly create objects based on these interfaces. Rather, the interfaces
can be used to generate user-defined collection types—of type set, bag, list, array, or
dictionary—for a particular database application. If an attribute or class has a collec-
tion type, say a set, then it will inherit the operations of the set interface. For exam-
ple, in a UNIVERSITY database application, the user can specify a type for
set, whose state would be sets of STUDENT objects. The programmer
can then use the operations for set to manipulate an instance of type
set. Creating application classes is typically done by utilizing the object
definition language ODL (see Section 3.6).
It is important to note that all objects in a particular collection must be of the same
type. Hence, although the keyword any appears in the specifications of collection
interfaces in Figure 5(c), this does not mean that objects of any type can be inter-
mixed within the same collection. Rather, it means that any type can be used when
specifying the type of elements for a particular collection (including other collec-
tion types!).
For example, in a UNIVERSITY database application, the user can specify an object
type (class) for STUDENT objects. Most such objects will be structured objects; for
example, a STUDENT object will have a complex structure, with many attributes,
relationships, and operations, but it is still considered atomic because it is not a col-
lection. Such a user-defined atomic object type is defined as a class by specifying its
properties and operations. The properties define the state of the object and are fur-
ther distinguished into attributes and relationships. In this subsection, we elabo-
rate on the three types of components—attributes, relationships, and
operations—that a user-defined object type for atomic (structured) objects can
include. We illustrate our discussion with the two classes EMPLOYEE and
DEPARTMENT shown in Figure 7.
An attribute is a property that describes some aspect of an object. Attributes have
values (which are typically literals having a simple or complex structure) that are
stored within the object. However, attribute values can also be Object_ids of other
objects. Attribute values can even be specified via methods that are used to calculate
the attribute value. In Figure 732 the attributes for EMPLOYEE are Name, Ssn,
Birth_date, Sex, and Age, and those for DEPARTMENT are Dname, Dnumber, Mgr,
Locations, and Projs. The Mgr and Projs attributes of DEPARTMENT have complex
structure and are defined via struct, which corresponds to the tuple constructor of
Section 1.3. Hence, the value of Mgr in each DEPARTMENT object will have two com-
ponents: Manager, whose value is an Object_id that references the EMPLOYEE object
that manages the DEPARTMENT, and Start_date, whose value is a date. The locations
attribute of DEPARTMENT is defined via the set constructor, since each
DEPARTMENT object can have a set of locations.
A relationship is a property that specifies that two objects in the database are related.
In the object model of ODMG, only binary relationships are explicitly represented,
and each binary relationship is represented by a pair of inverse references specified
via the keyword relationship. In Figure 7, one relationship exists that relates each
EMPLOYEE to the DEPARTMENT in which he or she works—the Works_for relation-
ship of EMPLOYEE. In the inverse direction, each DEPARTMENT is related to the set
of EMPLOYEES that work in the DEPARTMENT—the Has_emps relationship of
DEPARTMENT. The keyword inverse specifies that these two properties define a sin-
gle conceptual relationship in inverse directions.33
31As mentioned earlier, this definition of atomic object in the ODMG object model is different from the
definition of atom constructor given in Section 1.3, which is the definition used in much of the object-
oriented database literature.
32We are using the Object Definition Language (ODL) notation in Figure 7, which will be discussed in
more detail in Section 3.6.
33We do not detail here how a relationship can be represented by two attributes in inverse directions.
By specifying inverses, the database system can maintain the referential integrity of
the relationship automatically. That is, if the value of Works_for for a particular
EMPLOYEE E refers to DEPARTMENT D, then the value of Has_emps for
DEPARTMENT D must include a reference to E in its set of EMPLOYEE references. If
the database designer desires to have a relationship to be represented in only one
direction, then it has to be modeled as an attribute (or operation). An example is the
Manager component of the Mgr attribute in DEPARTMENT.
In addition to attributes and relationships, the designer can include operations in
object type (class) specifications. Each object type can have a number of operation
signatures, which specify the operation name, its argument types, and its returned
value, if applicable. Operation names are unique within each object type, but they
can be overloaded by having the same operation name appear in distinct object
types. The operation signature can also specify the names of exceptions that can
occur during operation execution. The implementation of the operation will
include the code to raise these exceptions. In Figure 7 the EMPLOYEE class has one
operation: reassign_emp, and the DEPARTMENT class has two operations: add_emp
and change_manager.
A class with an extent can have one or more keys. A key consists of one or more
properties (attributes or relationships) whose values are constrained to be unique
for each object in the extent. For example, in Figure 7 the EMPLOYEE class has the
Ssn attribute as key (each EMPLOYEE object in the extent must have a unique Ssn
value), and the DEPARTMENT class has two distinct keys: Dname and Dnumber (each
DEPARTMENT must have a unique Dname and a unique Dnumber). For a composite
key34 that is made of several properties, the properties that form the key are con-
tained in parentheses. For example, if a class VEHICLE with an extent ALL_VEHICLES
has a key made up of a combination of two attributes State and License_number, they
would be placed in parentheses as (State, License_number) in the key declaration.
Next, we present the concept of factory object—an object that can be used to gen-
erate or create individual objects via its operations. Some of the interfaces of factory
objects that are part of the ODMG object model are shown in Figure 8. The inter-
face ObjectFactory has a single operation, new(), which returns a new object with an
Object_id. By inheriting this interface, users can create their own factory interfaces
for each user-defined (atomic) object type, and the programmer can implement the
operation new differently for each type of object. Figure 8 also shows a DateFactory
interface, which has additional operations for creating a new calendar_date, and for
creating an object whose value is the current_date, among other operations (not
shown in Figure 8). As we can see, a factory object basically provides the
constructor operations for new objects.
Finally, we discuss the concept of a database. Because an ODBMS can create many
34A composite key is called a compound key in the ODMG report.
Figure 10 shows one possible set of ODL class definitions for the UNIVERSITY data-
base. In general, there may be several possible mappings from an object schema dia-
gram (or EER schema diagram) into ODL classes. We will discuss these options
further in Section 4.
35The ODL syntax and data types are meant to be compatible with the Interface Definition language
(IDL) of CORBA (Common Object Request Broker Architecture), with extensions for relationships and
other database concepts.
of STUDENTs. At the same time, individual STUDENT and FACULTY objects will
inherit the properties (attributes and relationships) and operations of PERSON,
and individual GRAD_STUDENT objects will inherit those of STUDENT.
CIRCLERECTANGLE . . .
to the M:N relationship between STUDENT and SECTION in Figure 9(b). The reason
it was made into a separate class (rather than as a pair of inverse relationships) is
because it includes the relationship attribute Grade.36
Hence, the M:N relationship is mapped to the class GRADE, and a pair of 1:N rela-
tionships, one between STUDENT and GRADE and the other between SECTION
and GRADE.37 These relationships are represented by the following relationship
36We will discuss alternative mappings for attributes of relationships in Section 4.
Because the previous example does not include any interfaces, only classes, we now
utilize a different example to illustrate interfaces and interface (behavior) inheri-
tance. Figure 11(a) is part of a database schema for storing geometric objects. An
interface GeometryObject is specified, with operations to calculate the perimeter and
area of a geometric object, plus operations to translate (move) and rotate an object.
Several classes (RECTANGLE, TRIANGLE, CIRCLE, …) inherit the GeometryObject
interface. Since GeometryObject is an interface, it is noninstantiable—that is, no
objects can be created based on this interface directly. However, objects of type
RECTANGLE, TRIANGLE, CIRCLE, … can be created, and these objects inherit all the
operations of the GeometryObject interface. Note that with interface inheritance,
only operations are inherited, not properties (attributes, relationships). Hence, if a
property is needed in the inheriting class, it must be repeated in the class definition,
as with the Reference_point attribute in Figure 11(b). Notice that the inherited oper-
ations can have different implementations in each class. For example, the imple-
mentations of the area and perimeter operations may be different for RECTANGLE,
TRIANGLE, and CIRCLE.
Multiple inheritance of interfaces by a class is allowed, as is multiple inheritance of
interfaces by another interface. However, with the extends (class) inheritance, mul-
tiple inheritance is not permitted. Hence, a class can inherit via extends from at most
one class (in addition to inheriting from zero or more interfaces).
37This is similar to how an M:N relationship is mapped in the relational model and in the legacy network
model.
another, thus enforcing the ODB equivalent of the relational referential integrity
constraint.
The third major difference is that in ODB design, it is necessary to specify the oper-
ations early on in the design since they are part of the class specifications. Although
it is important to specify operations during the design phase for all types of data-
bases, it may be delayed in RDB design as it is not strictly required until the imple-
mentation phase.
Step 1. Create an ODL class for each EER entity type or subclass. The type of the
ODL class should include all the attributes of the EER class.38 Multivalued attributes
are typically declared by using the set, bag, or list constructors.39 If the values of the
multivalued attribute for an object should be ordered, the list constructor is chosen;
if duplicates are allowed, the bag constructor should be chosen; otherwise, the set
constructor is chosen. Composite attributes are mapped into a tuple constructor (by
using a struct declaration in ODL).
Declare an extent for each class, and specify any key attributes as keys of the extent.
(This is possible only if an extent facility and key constraint declarations are avail-
able in the ODBMS.)
Step 2. Add relationship properties or reference attributes for each binary relation-
ship into the ODL classes that participate in the relationship. These may be created
in one or both directions. If a binary relationship is represented by references in
both directions, declare the references to be relationship properties that are inverses
of one another, if such a facility exists.40 If a binary relationship is represented by a
reference in only one direction, declare the reference to be an attribute in the refer-
encing class whose type is the referenced class name.
Depending on the cardinality ratio of the binary relationship, the relationship prop-
erties or reference attributes may be single-valued or collection types. They will be
single-valued for binary relationships in the 1:1 or N:1 directions; they are collec-
tion types (set-valued or list-valued41) for relationships in the 1:N or M:N direc-
tion. An alternative way to map binary M:N relationships is discussed in step 7.
If relationship attributes exist, a tuple constructor (struct) can be used to create a
structure of the form , which may be included
instead of the reference attribute. However, this does not allow the use of the inverse
constraint. Additionally, if this choice is represented in both directions, the attribute
values will be represented twice, creating redundancy.
38This implicitly uses a tuple constructor at the top level of the type declaration, but in general, the tuple
constructor is not explicitly shown in the ODL class declarations.
39Further analysis of the application domain is needed to decide which constructor to use because this
information is not available from the EER schema.
40The ODL standard provides for the explicit definition of inverse relationships. Some ODBMS products
may not provide this support; in such cases, programmers must maintain every relationship explicitly by
coding the methods that update the objects appropriately.
41The decision whether to use set or list is not available from the EER schema and must be determined
from the requirements.
Step 3. Include appropriate operations for each class. These are not available from
the EER schema and must be added to the database design by referring to the origi-
nal requirements. A constructor method should include program code that checks
any constraints that must hold when a new object is created. A destructor method
should check any constraints that may be violated when an object is deleted. Other
methods should include any further constraint checks that are relevant.
Step 4. An ODL class that corresponds to a subclass in the EER schema inherits (via
extends) the type and methods of its superclass in the ODL schema. Its specific
(noninherited) attributes, relationship references, and operations are specified, as
discussed in steps 1, 2, and 3.
Step 5. Weak entity types can be mapped in the same way as regular entity types. An
alternative mapping is possible for weak entity types that do not participate in any
relationships except their identifying relationship; these can be mapped as though
they were composite multivalued attributes of the owner entity type, by using the
set> or list> constructors. The attributes of the weak entity
are included in the struct<... > construct, which corresponds to a tuple constructor.
Attributes are mapped as discussed in steps 1 and 2.
Step 6. Categories (union types) in an EER schema are difficult to map to ODL. It is
possible to create a mapping similar to the EER-to-relational mapping (see Section
9.2) by declaring a class to represent the category and defining 1:1 relationships
between the category and each of its superclasses. Another option is to use a union
type, if it is available.
Step 7. An n-ary relationship with degree n > 2 can be mapped into a separate class,
with appropriate references to each participating class. These references are based
on mapping a 1:N relationship from each class that represents a participating entity
type to the class that represents the n-ary relationship. An M:N binary relationship,
especially if it contains relationship attributes, may also use this mapping option, if
desired.
In Section 5.1 we will discuss the syntax of simple OQL queries and the concept of
using named objects or extents as database entry points. Then, in Section 5.2 we will
discuss the structure of query results and the use of path expressions to traverse
relationships among objects. Other OQL features for handling object identity,
inheritance, polymorphism, and other object-oriented concepts are discussed in
Section 5.3. The examples to illustrate OQL queries are based on the UNIVERSITY
database schema given in Figure 10.
The basic OQL syntax is a select … from … where … structure, as it is for SQL. For
example, the query to retrieve the names of all departments in the college of
‘Engineering’ can be written as follows:
In general, an entry point to the database is needed for each query, which can be any
named persistent object. For many queries, the entry point is the name of the extent
of a class. Recall that the extent name is considered to be the name of a persistent
object whose type is a collection (in most cases, a set) of objects from the class.
Looking at the extent names in Figure 10, the named object DEPARTMENTS is of
type set; PERSONS is of type set; FACULTY is of type
set; and so on.
Using the example in Q0, there are three syntactic options for specifying iterator
variables:
42This is similar to the tuple variables that range over tuples in SQL queries.
The named objects used as database entry points for OQL queries are not limited to
the names of extents. Any named persistent object, whether it refers to an atomic
(single) object or to a collection object, can be used as a database entry point.
returns a reference to the collection of all persistent DEPARTMENT objects, whose
type is set. Similarly, suppose we had given (via the database bind
operation, see Figure 8) a persistent name CS_DEPARTMENT to a single
DEPARTMENT object (the Computer Science department); then, the query
returns a reference to that individual object of type DEPARTMENT. Once an entry
point is specified, the concept of a path expression can be used to specify a path to
related attributes and objects. A path expression typically starts at a persistent object
name, or at the iterator variable that ranges over individual objects in a collection.
This name will be followed by zero or more relationship names or attribute names
connected using the dot notation. For example, referring to the UNIVERSITY data-
base in Figure 10, the following are examples of path expressions, which are also
valid queries in OQL:
The first expression Q2 returns an object of type FACULTY, because that is the type
of the attribute Chair of the DEPARTMENT class. This will be a reference to the
FACULTY object that is related to the DEPARTMENT object whose persistent name is
CS_DEPARTMENT via the attribute Chair; that is, a reference to the FACULTY object
who is chairperson of the Computer Science department. The second expression
Q2A is similar, except that it returns the Rank of this FACULTY object (the Computer
Science chair) rather than the object reference; hence, the type returned by Q2A is
string, which is the data type for the Rank attribute of the FACULTY class.
Path expressions Q2 and Q2A return single values, because the attributes Chair (of
DEPARTMENT) and Rank (of FACULTY) are both single-valued and they are applied
to a single object. The third expression, Q2B, is different; it returns an object of type
set even when applied to a single object, because that is the type of the
43Note that the latter two options are similar to the syntax for specifying tuple variables in SQL queries.
relationship Has_faculty of the DEPARTMENT class. The collection returned will
include references to all FACULTY objects that are related to the DEPARTMENT object
whose persistent name is CS_DEPARTMENT via the relationship Has_faculty; that is,
references to all FACULTY objects who are working in the Computer Science depart-
ment. Now, to return the ranks of Computer Science faculty, we cannot write
In general, an OQL query can return a result with a complex structure specified in
the query itself by utilizing the struct keyword. Consider the following examples:
44As mentioned earlier, struct corresponds to the tuple constructor discussed in Section 1.3.
Note that OQL is orthogonal with respect to specifying path expressions. That is,
attributes, relationships, and operation names (methods) can be used interchange-
ably within the path expressions, as long as the type system of OQL is not compro-
mised. For example, one can write the following queries to retrieve the grade point
average of all senior students majoring in Computer Science, with the result ordered
by GPA, and within that by last and first name:
The order by clause is similar to the corresponding SQL construct, and specifies in
which order the query result is to be displayed. Hence, the collection returned by a
query with an order by clause is of type list.
Specifying Views as Named Queries. The view mechanism in OQL uses the
concept of a named query. The define keyword is used to specify an identifier of the
named query, which must be a unique name among all named objects, class names,
method names, and function names in the schema. If the identifier has the same
name as an existing named query, then the new definition replaces the previous def-
inition. Once defined, a query definition is persistent until it is redefined or deleted.
A view can also have parameters (arguments) in its definition.
For example, the following view V1 defines a named query Has_minors to retrieve the
set of objects for students minoring in a given department:
Extracting Single Elements from Singleton Collections. An OQL query will,
in general, return a collection as its result, such as a bag, set (if distinct is specified), or
list (if the order by clause is used). If the user requires that a query only return a sin-
gle element, there is an element operator in OQL that is guaranteed to return a sin-
gle element E from a singleton collection C that contains only one element. If C
contains more than one element or if C is empty, then the element operator raises
an exception. For example, Q6 returns the single object reference to the Computer
Science department:
Since a department name is unique across all departments, the result should be one
department. The type of the result is D:DEPARTMENT.
Collection Operators (Aggregate Functions, Quantifiers). Because many
query expressions specify collections as their result, a number of operators have been
defined that are applied to such collections. These include aggregate operators as well
as membership and quantification (universal and existential) over a collection.
The aggregate operators (min, max, count, sum, avg) operate over a collection.45 The
operator count returns an integer type. The remaining aggregate operators (min, max,
sum, avg) return the same type as the type of the operand collection. Two examples
follow. The query Q7 returns the number of students minoring in Computer
Science and Q8 returns the average GPA of all seniors majoring in Computer
Science.
45These correspond to aggregate functions in SQL.
Notice that aggregate operations can be applied to any collection of the appropriate
type and can be used in any part of a query. For example, the query to retrieve all
department names that have more than 100 majors can be written as in Q9:
The membership and quantification expressions return a Boolean type—that is, true
or false. Let V be a variable, C a collection expression, B an expression of type
Boolean (that is, a Boolean condition), and E an element of the type of elements in
collection C. Then:
(E in C) returns true if element E is a member of collection C.
(for all V in C : B) returns true if all the elements of collection C satisfy B.
(exists V in C : B) returns true if there is at least one element in C satisfying B.
Q10 also illustrates a simpler way to specify the select clause of queries that return a
collection of structs; the type returned by Q10 is bag.
One can also write queries that return true/false results. As an example, let us
assume that there is a named object called JEREMY of type STUDENT. Then, query
Q11 answers the following question: Is Jeremy a Computer Science minor? Similarly,
Q12 answers the question Are all Computer Science graduate students advised by
Computer Science faculty? Both Q11 and Q12 return true or false, which are inter-
preted as yes or no answers to the above questions:
Ordered (Indexed) Collection Expressions. As we discussed in Section 3.3,
collections that are lists and arrays have additional operations, such as retrieving the
ith, first, and last elements. Additionally, operations exist for extracting a subcollec-
tion and concatenating two lists. Hence, query expressions that involve lists or
arrays can invoke these operations. We will illustrate a few of these operations using
sample queries. Q14 retrieves the last name of the faculty member who earns the
highest salary:
Q14 illustrates the use of the first operator on a list collection that contains the
salaries of faculty members sorted in descending order by salary. Thus, the first ele-
ment in this sorted list contains the faculty member with the highest salary. This
query assumes that only one faculty member earns the maximum salary. The next
query, Q15, retrieves the top three Computer Science majors based on GPA.
The select-from-order-by query returns a list of Computer Science students ordered
by GPA in descending order. The first element of an ordered collection has an index
position of 0, so the expression [0:2] returns a list containing the first, second, and
third elements of the select … from … order by … result.
The Grouping Operator. The group by clause in OQL, although similar to the
corresponding clause in SQL, provides explicit reference to the collection of objects
within each group or partition. First we give an example, and then we describe the
general form of these queries.
Q16 retrieves the number of majors in each department. In this query, the students
are grouped into the same partition (group) if they have the same major; that is, the
where F1: E1, F2: E2, …, Fk: Ek is a list of partitioning (grouping) attributes and each
partitioning attribute specification Fi: Ei defines an attribute (field) name Fi and an
expression Ei. The result of applying the grouping (specified in the group by clause)
is a set of structures:
where Ti is the type returned by the expression Ei, partition is a distinguished field
name (a keyword), and B is a structure whose fields are the iterator variables (S in
Q16) declared in the from clause having the appropriate type.
Just as in SQL, a having clause can be used to filter the partitioned sets (that is, select
only some of the groups based on group conditions). In Q17, the previous query is
modified to illustrate the having clause (and also shows the simplified syntax for the
select clause). Q17 retrieves for each department having more than 100 majors, the
average GPA of its majors. The having clause in Q17 selects only those partitions
(groups) that have more than 100 elements (that is, departments with more than
100 students).
Note that the select clause of Q17 returns the average GPA of the students in the
partition. The expression
returns a bag of student GPAs for that partition. The from clause declares an iterator
variable P over the partition collection, which is of type bag.
Then the path expression P.gpa is used to access the GPA of each student in the
partition.
The class library added to C++ for the ODMG standard uses the prefix d_ for class
declarations that deal with database concepts.46 The goal is that the programmer
should think that only one language is being used, not two separate languages. For
the programmer to refer to database objects in a program, a class D_Ref is
defined for each database class T in the schema. Hence, program variables of type
D_Ref can refer to both persistent and transient objects of class T.
In order to utilize the various built-in types in the ODMG object model such as col-
lection types, various template classes are specified in the library. For example, an
abstract class D_Object specifies the operations to be inherited by all objects.
Similarly, an abstract class D_Collection specifies the operations of collections.
These classes are not instantiable, but only specify the operations that can be inher-
ited by all objects and by collection objects, respectively. A template class is specified
for each type of collection; these include D_Set, D_List, D_Bag,
D_Varray, and D_Dictionary, and correspond to the collection types in the
object model (see Section 3.1). Hence, the programmer can create classes of types
such as D_Set> whose instances would be sets of references to
STUDENT objects, or D_Set whose instances would be sets of strings.
Additionally, a class d_Iterator corresponds to the Iterator class of the object model.
46Presumably, d_ stands for database classes.
47That is, member variables in object-oriented programming terminology.
To specify relationships, the keyword rel_ is used within the prefix of type names; for
example, by writing
in the DEPARTMENT class, we are declaring that Majors_in and Has_majors are rela-
tionship properties that are inverses of one another and hence represent a 1:N
binary relationship between DEPARTMENT and STUDENT.
For the OML, the binding overloads the operation new so that it can be used to cre-
ate either persistent or transient objects. To create persistent objects, one must pro-
vide the database name and the persistent name of the object. For example, by
writing
48We have only provided a brief overview of the C++ binding. For full details, see Cattell and Barry eds.
(2000), Ch. 5.
We then discussed the ODMG 3.0 standard for object databases. We started by
describing the various constructs of the object model in Sction 3. The various built-
in types, such as Object, Collection, Iterator, set, list, and so on were described by their
interfaces, which specify the built-in operations of each type. These built-in types
are the foundation upon which the object definition language (ODL) and object
query language (OQL) are based. We also described the difference between objects,
which have an ObjectId, and literals, which are values with no OID. Users can
declare classes for their application that inherit operations from the appropriate
built-in interfaces. Two types of properties can be specified in a user-defined class—
attributes and relationships—in addition to the operations that can be applied to
objects of the class. The ODL allows users to specify both interfaces and classes, and
permits two different types of inheritance—interface inheritance via “:” and class
inheritance via extends. A class can have an extent and keys. A description of ODL
followed, and an example database schema for the UNIVERSITY database was used
to illustrate the ODL constructs.
In Section 5, we presented an overview of the object query language (OQL). The
OQL follows the concept of orthogonality in constructing queries, meaning that an
operation can be applied to the result of another operation as long as the type of the
result is of the correct input type for the operation. The OQL syntax follows many
of the constructs of SQL but includes additional concepts such as path expressions,
inheritance, methods, relationships, and collections. Examples of how to use OQL
over the UNIVERSITY database were given.
In 1997 Sun endorsed the ODMG API (Application Program Interface). O2 tech-
nologies was the first corporation to deliver an ODMG-compliant DBMS. Many
ODBMS vendors, including Object Design (now eXcelon), Gemstone Systems,
POET Software, and Versant Object Technology, have endorsed the ODMG
standard.
2. What primary characteristics should an OID possess?
3. Discuss the various type constructors. How are they used to create complex
object structures?
4. Discuss the concept of encapsulation, and tell how it is used to create
abstract data types.
5. Explain what the following terms mean in object-oriented database termi-
nology: method, signature, message, collection, extent.
6. What is the relationship between a type and its subtype in a type hierarchy?
What is the constraint that is enforced on extents corresponding to types in
the type hierarchy?
7. What is the difference between persistent and transient objects? How is per-
sistence handled in typical OO database systems?
8. How do regular inheritance, multiple inheritance, and selective inheritance
differ?
9. Discuss the concept of polymorphism/operator overloading.
10. Discuss how each of the following features is realized in SQL 2008: object
identifier.; type inheritance, encapsulation of operations, and complex object
structures.
11. In the traditional relational model, creating a table defined both the table
type (schema or attributes) and the table itself (extension or set of current
tuples). How can these two concepts be separated in SQL 2008?
12. Describe the rules of inheritance in SQL 2008.
13. What are the differences and similarities between objects and literals in the
ODMG object model?
14. List the basic operations of the following built-in interfaces of the ODMG
object model: Object, Collection, Iterator, Set, List, Bag, Array, and Dictionary.
15. Describe the built-in structured literals of the ODMG object model and the
operations of each.
16. What are the differences and similarities of attribute and relationship prop-
erties of a user-defined (atomic) class?
17. What are the differences and similarities of class inhertance via extends and
interface inheritance via “:”in the ODMG object model?
18. Discuss how persistence is specified in the ODMG object model in the C++
binding.
19. Why are the concepts of extents and keys important in database applica-
tions?
20. Describe the following OQL concepts: database entry points, path expres-
sions, iterator variables, named queries (views), aggregate functions, grouping,
and quantifiers.
21. What is meant by the type orthogonality of OQL?
22. Discuss the general principles behind the C++ binding of the ODMG stan-
dard.
23. What are the main differences between designing a relational database and
an object database?
24. Describe the steps of the algorithm for object database design by EER-to-OO
mapping.
26. Compare inheritance in the EER model to inheritance in the OO model
described in Section 1.5.
27. Consider the UNIVERSITY EER schema in Figure A.1. Think of what opera-
tions are needed for the entity types/classes in the schema. Do not consider
constructor and destructor operations.
28. Consider the COMPANY ER schema in Figure A.2. Think of what operations
are needed for the entity types/classes in the schema. Do not consider con-
structor and destructor operations.
29. Design an OO schema for a database application that you are interested in.
Construct an EER schema for the application, and then create the corre-
sponding classes in ODL. Specify a number of methods for each class, and
then specify queries in OQL for your database application.
30. Consider the AIRPORT database described in Exercise 21 from the chapter
“The Enhanced Entity-Relationship (EER) Model.” Specify a number of
operations/methods that you think should be applicable to that application.
Specify the ODL classes and methods for the database.
31. Map the COMPANY ER schema in Figure A.1 into ODL classes. Include
appropriate methods for each class.
32. Using search engines and other sources, determine to what extent the vari-
ous commercial ODBMS products are compliant with the ODMG 3.0
standard.
There is a vast bibliography on OO databases, so we can only provide a representa-
tive sample here. The October 1991 issue of CACM and the December 1990 issue of
IEEE Computer describe OO database concepts and systems. Dittrich (1986) and
Zaniolo et al. (1986) survey the basic concepts of OO data models. An early paper
on OO database system implementation is Baroody and DeWitt (1981). Su et al.
(1988) presents an OO data model that was used in CAD/CAM applications. Gupta
and Horowitz (1992) discusses OO applications to CAD, Network Management,
and other areas. Mitschang (1989) extends the relational algebra to cover complex
objects. Query languages and graphical user interfaces for OO are described in
Gyssens et al. (1990), Kim (1989), Alashqur et al. (1989), Bertino et al. (1992),
Agrawal et al. (1990), and Cruz (1992).
The Object-Oriented Manifesto by Atkinson et al. (1990) is an interesting article
that reports on the position by a panel of experts regarding the mandatory and
optional features of OO database management. Polymorphism in databases and
OO programming languages is discussed in Osborn (1989), Atkinson and Buneman
(1987), and Danforth and Tomlinson (1988). Object identity is discussed in
Abiteboul and Kanellakis (1989). OO programming languages for databases are dis-
cussed in Kent (1991). Object constraints are discussed in Delcambre et al. (1991)
and Elmasri, James and Kouramajian (1993). Authorization and security in OO
databases are examined in Rabitti et al. (1991) and Bertino (1992).
The O2 system is described in Deux et al. (1991), and Bancilhon et al. (1992)
includes a list of references to other publications describing various aspects of O2.
The O2 model was formalized in Velez et al. (1989). The ObjectStore system is
described in Lamb et al. (1991). Fishman et al. (1987) and Wilkinson et al. (1990)
discuss IRIS, an object-oriented DBMS developed at Hewlett-Packard laboratories.
Maier et al. (1986) and Butterworth et al. (1991) describe the design of GEM-
STONE. The ODE system developed at AT&T Bell Labs is described in Agrawal and
Gehani (1989). The ORION system developed at MCC is described in Kim et al.
(1990). Morsi et al. (1992) describes an OO testbed.
Cattell (1991) surveys concepts from both relational and object databases and dis-
cusses several prototypes of object-based and extended relational database systems.
Alagic (1997) points out discrepancies between the ODMG data model and its lan-
guage bindings and proposes some solutions. Bertino and Guerrini (1998) propose
an extension of the ODMG model for supporting composite objects. Alagic (1999)
presents several data models belonging to the ODMG family.
. . .
. . .
Many electronic commerce (e-commerce) andother Internet applications provide Web inter-
faces to access information stored in one or more databases. These databases are
often referred to as data sources. It is common to use two-tier and three-tier
client/server architectures for Internet applications. In some cases, other variations
of the client/server model are used. E-commerce and other Internet database appli-
cations are designed to interact with the user through Web interfaces that display
Web pages. The common method of specifying the contents and formatting of Web
pages is through the use of hypertext documents. There are various languages for
writing these documents, the most common being HTML (HyperText Markup
Language). Although HTML is widely used for formatting and structuring Web
documents, it is not suitable for specifying structured data that is extracted from
databases. A new language—namely, XML (Extensible Markup Language)—has
emerged as the standard for structuring and exchanging data over the Web. XML
can be used to provide information about the structure and meaning of the data in
the Web pages rather than just specifying how the Web pages are formatted for dis-
play on the screen. The formatting aspects are specified separately—for example, by
using a formatting language such as XSL (Extensible Stylesheet Language) or a
transformation language such as XSLT (Extensible Stylesheet Language for
Transformations or simply XSL Transformations). Recently, XML has also been
proposed as a possible model for data storage and retrieval, although only a few
experimental database systems based on XML have been developed so far.
From Chapter 12 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
of the Web page. The Web program must first submit a query to the airline database
to retrieve this information, and then display it. Such Web pages, where part of the
information is extracted from databases or other data sources are called dynamic
Web pages, because the data extracted and displayed each time will be for different
flights and dates.
In this chapter, we will focus on describing the XML data model and its associated
languages, and how data extracted from relational databases can be formatted as
XML documents to be exchanged over the Web. Section 1 discusses the difference
between structured, semistructured, and unstructured data. Section 2 pre-sents the
XML data model, which is based on tree (hierarchical) structures as compared to
the flat relational data model structures. In Section 3, we focus on the structure of
XML documents, and the languages for specifying the structure of these documents
such as DTD (Document Type Definition) and XML Schema. Section 4 shows the
relationship between XML and relational databases. Section 5 describes some of the
languages associated with XML, such as XPath and XQuery. Section 6 discusses how
data extracted from relational databases can be formatted as XML documents.
Finally, Section 7 is the chapter summary.
The information stored in databases is known as structured data because it is repre-
sented in a strict format. For example, each record in a relational database table—such
as each of the tables in the COMPANY database in Figure A.1(in Appendix: Figures at
the end of this chapter)—follows the same format as the other records in that table.
For structured data, it is common to carefully design the database schema using tech-
niques based on Entity-Relationship (ER) and Ennanced Entity-Relationship (EER)
models in order to define the database structure. The DBMS then checks to ensure
that all data follows the structures and constraints specified in the schema.
However, not all data is collected and inserted into carefully designed structured
databases. In some applications, data is collected in an ad hoc manner before it is
known how it will be stored and managed. This data may have a certain structure,
but not all the information collected will have the identical structure. Some attrib-
utes may be shared among the various entities, but other attributes may exist only in
a few entities. Moreover, additional attributes can be introduced in some of the
newer data items at any time, and there is no predefined schema. This type of data is
known as semistructured data. A number of data models have been introduced for
representing semistructured data, often based on using tree or graph data structures
rather than the flat relational model structures.
Consider the following example. We want to collect a list of bibliographic references
related to a certain research project. Some of these may be books or technical reports,
others may be research articles in journals or conference proceedings, and still others
may refer to complete journal issues or conference proceedings. Clearly, each of these
may have different attributes and different types of information. Even for the same
type of reference—say, conference articles—we may have different information. For
example, one article citation may be quite complete, with full information about
author names, title, proceedings, page numbers, and so on, whereas another citation
may not have all the information available. New types of bibliographic sources may
appear in the future—for instance, references to Web pages or to conference tutori-
als—and these may have new attributes that describe them.
Semistructured data may be displayed as a directed graph, as shown in Figure 1. The
information shown in Figure 1 corresponds to some of the structured data shown in
Figure A.1. As we can see, this model somewhat resembles the object model in its
ability to represent complex objects and nested structures. In Figure 1, the labels or
tags on the directed edges represent the schema names: the names of attributes,
object types (or entity types or classes), and relationships. The internal nodes repre-
sent individual objects or composite attributes. The leaf nodes represent actual data
values of simple (atomic) attributes.
1. The schema information—names of attributes, relationships, and classes
(object types) in the semistructured model is intermixed with the objects
and their data values in the same data structure.
2. In the semistructured model, there is no requirement for a predefined
schema to which the data objects must conform, although it is possible to
define a schema if necessary.
In addition to structured and semistructured data, a third category exists, known as
unstructured data because there is very limited indication of the type of data. A
typical example is a text document that contains information embedded within it.
Web pages in HTML that contain some data are considered to be unstructured data.
Consider part of an HTML file, shown in Figure 2. Text that appears between angled
brackets, <...>, is an HTML tag. A tag with a slash, , indicates an end tag,
which represents the ending of the effect of a matching start tag. The tags mark up
HTML uses a large number of predefined tags, which are used to specify a variety of
commands for formatting Web documents for display. The start and end tags spec-
ify the range of text to be formatted by each command. A few examples of the tags
shown in Figure 2 follow:
■ The … tags specify the boundaries of the document.
■ The document header information—within the
…
start tag has four attributes describing various character-
istics of the table. The following and start tags have one and two
attributes, respectively.
HTML has a very large number of predefined tags, and whole books are devoted to
describing how to use these tags. If designed properly, HTML documents can be
1That is why it is known as HyperText Markup Language.
2
stands for table row and stands for table data.
3This is how the term attribute is used in document markup languages, which differs from how it is used
in database models.
424
XML: Extensible Markup Language
formatted so that humans are able to easily understand the document contents, and
are able to navigate through the resulting Web documents. However, the source
HTML text documents are very difficult to interpret automatically by computer pro-
grams because they do not include schema information about the type of data in the
documents. As e-commerce and other Internet applications become increasingly
automated, it is becoming crucial to be able to exchange Web documents among
various computer sites and to interpret their contents automatically. This need was
one of the reasons that led to the development of XML. In addition, an extendible
version of HTML called XHTML was developed that allows users to extend the tags
of HTML for different applications, and allows an XHTML file to be interpreted by
standard XML processing programs. Our discussion will focus on XML only.
The example in Figure 2 illustrates a static HTML page, since all the information to
be displayed is explicitly spelled out as fixed text in the HTML file. In many cases,
some of the information to be displayed may be extracted from a database. For
example, the project names and the employees working on each project may be
extracted from the database in Figure A.1 through the appropriate SQL query. We
may want to use the same HTML formatting tags for displaying each project and the
employees who work on it, but we may want to change the particular projects (and
employees) being displayed. For example, we may want to see a Web page displaying
the information for ProjectX, and then later a page displaying the information for
ProjectY. Although both pages are displayed using the same HTML formatting tags,
the actual data items displayed will be different. Such Web pages are called dynamic,
since the data parts of the page may be different each time it is displayed, even
though the display appearance is the same.
2 XML Hierarchical (Tree) Data Model
We now introduce the data model used in XML. The basic object in XML is the
XML document. Two main structuring concepts are used to construct an XML doc-
ument: elements and attributes. It is important to note that the term attribute in
XML is not used in the same manner as is customary in database terminology, but
rather as it is used in document description languages such as HTML and SGML.4
Attributes in XML provide additional information that describes elements, as we
will see. There are additional concepts in XML, such as entities, identifiers, and ref-
erences, but first we concentrate on describing elements and attributes to show the
essence of the XML model.
Figure 3 shows an example of an XML element called . As in HTML, ele-
ments are identified in a document by their start tag and end tag. The tag names are
enclosed between angled brackets < ... >, and end tags are further identified by a
slash, ... >.5
4SGML (Standard Generalized Markup Language) is a more general language for describing documents
and provides capabilities for specifying new tags. However, it is more complex than HTML and XML.
5The left and right angled bracket characters (< and >) are reserved characters, as are the ampersand
(&), apostrophe (’), and single quotation mark (‘). To include them within the text of a document, they must
be encoded with escapes as <, >, &, ', and ", respectively.
425
XML: Extensible Markup Language
Figure 3
A complex XML
element called
.
ProductX
1
Bellaire
5
123456789
Smith
32.5
453453453
Joyce
20.0
ProductY
2
Sugarland
5
123456789
7.5
453453453
20.0
333445555
10.0
…
Complex elements are constructed from other elements hierarchically, whereas
simple elements contain data values. A major difference between XML and HTML
is that XML tag names are defined to describe the meaning of the data elements in
the document, rather than to describe how the text is to be displayed. This makes it
possible to process the data elements in the XML document automatically by com-
puter programs. Also, the XML tag (element) names can be defined in another doc-
ument, known as the schema document, to give a semantic meaning to the tag names
426
XML: Extensible Markup Language
that can be exchanged among multiple users. In HTML, all tag names are prede-
fined and fixed; that is why they are not extendible.
It is straightforward to see the correspondence between the XML textual representa-
tion shown in Figure 3 and the tree structure shown in Figure 1. In the tree repre-
sentation, internal nodes represent complex elements, whereas leaf nodes represent
simple elements. That is why the XML model is called a tree model or a
hierarchical model. In Figure 3, the simple elements are the ones with the tag
names , , , , , ,
, and . The complex elements are the ones with the tag names
, , and . In general, there is no limit on the levels of
nesting of elements.
It is possible to characterize three main types of XML documents:
■ Data-centric XML documents. These documents have many small data
items that follow a specific structure and hence may be extracted from a
structured database. They are formatted as XML documents in order to
exchange them over or display them on the Web. These usually follow a
predefined schema that defines the tag names.
■ Document-centric XML documents. These are documents with large
amounts of text, such as news articles or books. There are few or no struc-
tured data elements in these documents.
■ Hybrid XML documents. These documents may have parts that contain
structured data and other parts that are predominantly textual or unstruc-
tured. They may or may not have a predefined schema.
XML documents that do not follow a predefined schema of element names and cor-
responding tree structure are known as schemaless XML documents. It is impor-
tant to note that data-centric XML documents can be considered either as
semistructured data or as structured data as defined in Section 1. If an XML docu-
ment conforms to a predefined XML schema or DTD (see Section 3), then the doc-
ument can be considered as structured data. On the other hand, XML allows
documents that do not conform to any schema; these would be considered as
semistructured data and are schemaless XML documents. When the value of the
standalone attribute in an XML document is yes, as in the first line in Figure 3, the
document is standalone and schemaless.
XML attributes are generally used in a manner similar to how they are used in
HTML (see Figure 2), namely, to describe properties and characteristics of the ele-
ments (tags) within which they appear. It is also possible to use XML attributes to
hold the values of simple data elements; however, this is generally not recom-
mended. An exception to this rule is in cases that need to reference another element
in another part of the XML document. To do this, it is common to use attribute val-
ues in one element as the references. This resembles the concept of foreign keys in
relational databases, and is a way to get around the strict hierarchical model that the
XML tree model implies. We discuss XML attributes further in Section 3 when we
discuss XML schema and DTD.
427
XML: Extensible Markup Language
3 XML Documents, DTD, and XML Schema
3.1 Well-Formed and Valid XML Documents
and XML DTD
In Figure 3, we saw what a simple XML document may look like. An XML docu-
ment is well formed if it follows a few conditions. In particular, it must start with an
XML declaration to indicate the version of XML being used as well as any other rel-
evant attributes, as shown in the first line in Figure 3. It must also follow the syntac-
tic guidelines of the tree data model. This means that there should be a single root
element, and every element must include a matching pair of start and end tags
within the start and end tags of the parent element. This ensures that the nested ele-
ments specify a well-formed tree structure.
A well-formed XML document is syntactically correct. This allows it to be processed
by generic processors that traverse the document and create an internal tree repre-
sentation. A standard model with an associated set of API (application program-
ming interface) functions called DOM (Document Object Model) allows programs
to manipulate the resulting tree representation corresponding to a well-formed
XML document. However, the whole document must be parsed beforehand when
using DOM in order to convert the document to that standard DOM internal data
structure representation. Another API called SAX (Simple API for XML) allows
processing of XML documents on the fly by notifying the processing program
through callbacks whenever a start or end tag is encountered. This makes it easier to
process large documents and allows for processing of so-called streaming XML
documents, where the processing program can process the tags as they are encoun-
tered. This is also known as event-based processing.
A well-formed XML document can be schemaless; that is, it can have any tag names
for the elements within the document. In this case, there is no predefined set of ele-
ments (tag names) that a program processing the document knows to expect. This
gives the document creator the freedom to specify new elements, but limits the pos-
sibilities for automatically interpreting the meaning or semantics of the elements
within the document.
A stronger criterion is for an XML document to be valid. In this case, the document
must be well formed, and it must follow a particular schema. That is, the element
names used in the start and end tag pairs must follow the structure specified in a
separate XML DTD (Document Type Definition) file or XML schema file. We first
discuss XML DTD here, and then we give an overview of XML schema in Section
3.2. Figure 4 shows a simple XML DTD file, which specifies the elements (tag
names) and their nested structures. Any valid documents conforming to this DTD
should follow the specified structure. A special syntax exists for specifying DTD
files, as illustrated in Figure 4. First, a name is given to the root tag of the document,
which is called Projects in the first line in Figure 4. Then the elements and their
nested structure are specified.
428
XML: Extensible Markup Language
Figure 4
An XML DTD file
called Projects.
When specifying elements, the following notation is used:
■ A * following the element name means that the element can be repeated zero
or more times in the document. This kind of element is known as an optional
multivalued (repeating) element.
■ A + following the element name means that the element can be repeated one
or more times in the document. This kind of element is a required multival-
ued (repeating) element.
■ A ? following the element name means that the element can be repeated zero
or one times. This kind is an optional single-valued (nonrepeating) element.
■ An element appearing without any of the preceding three symbols must
appear exactly once in the document. This kind is a required single-valued
(nonrepeating) element.
■ The type of the element is specified via parentheses following the element. If
the parentheses include names of other elements, these latter elements are
the children of the element in the tree structure. If the parentheses include
the keyword #PCDATA or one of the other data types available in XML DTD,
the element is a leaf node. PCDATA stands for parsed character data, which is
roughly similar to a string data type.
■ The list of attributes that can appear within an element can also be specified
via the keyword !ATTLIST. In Figure 3, the Project element has an attribute
ProjId. If the type of an attribute is ID, then it can be referenced from another
attribute whose type is IDREF within another element. Notice that attributes
can also be used to hold the values of simple data elements of type #PCDATA.
■ Parentheses can be nested when specifying elements.
■ A bar symbol ( e1 | e2 ) specifies that either e1 or e2 can appear in the docu-
ment.
>
] >
429
XML: Extensible Markup Language
We can see that the tree structure in Figure 1 and the XML document in Figure 3
conform to the XML DTD in Figure 4. To require that an XML document be
checked for conformance to a DTD, we must specify this in the declaration of the
document. For example, we could change the first line in Figure 3 to the
following:
When the value of the standalone attribute in an XML document is “no”, the docu-
ment needs to be checked against a separate DTD document or XML schema docu-
ment (see below). The DTD file shown in Figure 4 should be stored in the same file
system as the XML document, and should be given the file name proj.dtd.
Alternatively, we could include the DTD document text at the beginning of the
XML document itself to allow the checking.
Although XML DTD is quite adequate for specifying tree structures with required,
optional, and repeating elements, and with various types of attributes, it has several
limitations. First, the data types in DTD are not very general. Second, DTD has its
own special syntax and thus requires specialized processors. It would be advanta-
geous to specify XML schema documents using the syntax rules of XML itself so
that the same processors used for XML documents could process XML schema
descriptions. Third, all DTD elements are always forced to follow the specified
ordering of the document, so unordered elements are not permitted. These draw-
backs led to the development of XML schema, a more general but also more com-
plex language for specifying the structure and elements of XML documents.
3.2 XML Schema
The XML schema language is a standard for specifying the structure of XML docu-
ments. It uses the same syntax rules as regular XML documents, so that the same
processors can be used on both. To distinguish the two types of documents, we will
use the term XML instance document or XML document for a regular XML docu-
ment, and XML schema document for a document that specifies an XML schema.
Figure 5 shows an XML schema document corresponding to the COMPANY data-
base shown in Figures A.2 and A.3. Although it is unlikely that we would want to
display the whole database as a single document, there have been proposals to store
data in native XML format as an alternative to storing the data in relational data-
bases. The schema in Figure 5 would serve the purpose of specifying the structure of
the COMPANY database if it were stored in a native XML system. We discuss this
topic further in Section 4.
As with XML DTD, XML schema is based on the tree data model, with elements and
attributes as the main structuring concepts. However, it borrows additional concepts
from database and object models, such as keys, references, and identifiers. Here we
describe the features of XML schema in a step-by-step manner, referring to the sam-
ple XML schema document in Figure 5 for illustration. We introduce and describe
some of the schema concepts in the order in which they are used in Figure 5.
430
XML: Extensible Markup Language
Figure 5
An XML schema file called company.
Company Schema (Element Approach) – Prepared by Babak
Hojabri
431
XML: Extensible Markup Language
(continues)
432
Figure 5 (continued)
An XML schema called company.
XML: Extensible Markup Language
1. Schema descriptions and XML namespaces. It is necessary to identify the
specific set of XML schema language elements (tags) being used by specify-
ing a file stored at a Web site location. The second line in Figure 5 specifies
433
XML: Extensible Markup Language
the file used in this example, which is http://www.w3.org/2001/XMLSchema.
This is a commonly used standard for XML schema commands. Each such
definition is called an XML namespace, because it defines the set of com-
mands (names) that can be used. The file name is assigned to the variable xsd
(XML schema description) using the attribute xmlns (XML namespace), and
this variable is used as a prefix to all XML schema commands (tag names).
For example, in Figure 5, when we write xsd:element or xsd:sequence, we are
referring to the definitions of the element and sequence tags as defined in the
file http://www.w3.org/2001/XMLSchema.
2. Annotations, documentation, and language used. The next couple of lines
in Figure 5 illustrate the XML schema elements (tags) xsd:annotation and
xsd:documentation, which are used for providing comments and other
descriptions in the XML document. The attribute xml:lang of the
xsd:documentation element specifies the language being used, where en stands
for the English language.
3. Elements and types. Next, we specify the root element of our XML schema.
In XML schema, the name attribute of the xsd:element tag specifies the ele-
ment name, which is called company for the root element in our example (see
Figure 5). The structure of the company root element can then be specified,
which in our example is xsd:complexType. This is further specified to be a
sequence of departments, employees, and projects using the xsd:sequence
structure of XML schema. It is important to note here that this is not the
only way to specify an XML schema for the COMPANY database. We will dis-
cuss other options in Section 6.
4. First-level elements in the COMPANY database. Next, we specify the three
first-level elements under the company root element in Figure 5. These ele-
ments are named employee, department, and project, and each is specified in
an xsd:element tag. Notice that if a tag has only attributes and no further
subelements or data within it, it can be ended with the backslash symbol (/>)
directly instead of having a separate matching end tag. These are called
empty elements; examples are the xsd:element elements named department
and project in Figure 5.
5. Specifying element type and minimum and maximum occurrences. In
XML schema, the attributes type, minOccurs, and maxOccurs in the
xsd:element tag specify the type and multiplicity of each element in any doc-
ument that conforms to the schema specifications. If we specify a type attrib-
ute in an xsd:element, the structure of the element must be described
separately, typically using the xsd:complexType element of XML schema. This
is illustrated by the employee, department, and project elements in Figure 5. On
the other hand, if no type attribute is specified, the element structure can be
defined directly following the tag, as illustrated by the company root element
in Figure 5. The minOccurs and maxOccurs tags are used for specifying lower
and upper bounds on the number of occurrences of an element in any XML
434
document that conforms to the schema specifications. If they are not speci-
fied, the default is exactly one occurrence. These serve a similar role to the *,
+, and ? symbols of XML DTD.
6. Specifying keys. In XML schema, it is possible to specify constraints that
correspond to unique and primary key constraints in a relational database,
as well as foreign keys (or referential integrity) constraints. The xsd:unique
tag specifies elements that correspond to unique attributes in a relational
database. We can give each such uniqueness constraint a name, and we must
specify xsd:selector and xsd:field tags for it to identify the element type that
contains the unique element and the element name within it that is unique
via the xpath attribute. This is illustrated by the departmentNameUnique and
projectNameUnique elements in Figure 5. For specifying primary keys, the tag
xsd:key is used instead of xsd:unique, as illustrated by the projectNumberKey,
departmentNumberKey, and employeeSSNKey elements in Figure 5. For speci-
fying foreign keys, the tag xsd:keyref is used, as illustrated by the six xsd:keyref
elements in Figure 5. When specifying a foreign key, the attribute refer of the
xsd:keyref tag specifies the referenced primary key, whereas the tags
xsd:selector and xsd:field specify the referencing element type and foreign key
(see Figure 5).
7. Specifying the structures of complex elements via complex types. The next
part of our example specifies the structures of the complex elements
Department, Employee, Project, and Dependent, using the tag xsd:complexType
(see Figure 5). We specify each of these as a sequence of subelements corre-
sponding to the database attributes of each entity type (see Figure A.4) by
using the xsd:sequence and xsd:element tags of XML schema. Each element is
given a name and type via the attributes name and type of xsd:element. We can
also specify minOccurs and maxOccurs attributes if we need to change the
default of exactly one occurrence. For (optional) database attributes where
null is allowed, we need to specify minOccurs = 0, whereas for multivalued
database attributes we need to specify maxOccurs = “unbounded” on the corre-
sponding element. Notice that if we were not going to specify any key con-
straints, we could have embedded the subelements within the parent
element definitions directly without having to specify complex types.
However, when unique, primary key and foreign key constraints need to be
specified; we must define complex types to specify the element structures.
8. Composite (compound) attributes. Composite attributes from Figure A.3
are also specified as complex types in Figure 7, as illustrated by the Address,
Name, Worker, and WorksOn complex types. These could have been directly
embedded within their parent elements.
This example illustrates some of the main features of XML schema. There are other
features, but they are beyond the scope of our presentation. In the next section, we
discuss the different approaches to creating XML documents from relational data-
bases and storing XML documents.
XML: Extensible Markup Language
435
XML: Extensible Markup Language
4 Storing and Extracting XML Documents from
Databases
Several approaches to organizing the contents of XML documents to facilitate their
subsequent querying and retrieval have been proposed. The following are the most
common approaches:
1. Using a DBMS to store the documents as text. A relational or object DBMS
can be used to store whole XML documents as text fields within the DBMS
records or objects. This approach can be used if the DBMS has a special
module for document processing, and would work for storing schemaless
and document-centric XML documents.
2. Using a DBMS to store the document contents as data elements. This
approach would work for storing a collection of documents that follow a
specific XML DTD or XML schema. Because all the documents have the
same structure, one can design a relational (or object) database to store the
leaf-level data elements within the XML documents. This approach would
require mapping algorithms to design a database schema that is compatible
with the XML document structure as specified in the XML schema or DTD
and to recreate the XML documents from the stored data. These algorithms
can be implemented either as an internal DBMS module or as separate mid-
dleware that is not part of the DBMS.
3. Designing a specialized system for storing native XML data. A new type of
database system based on the hierarchical (tree) model could be designed
and implemented. Such systems are being called Native XML DBMSs. The
system would include specialized indexing and querying techniques, and
would work for all types of XML documents. It could also include data com-
pression techniques to reduce the size of the documents for storage. Tamino
by Software AG and the Dynamic Application Platform of eXcelon are two
popular products that offer native XML DBMS capability. Oracle also offers
a native XML storage option.
4. Creating or publishing customized XML documents from preexisting
relational databases. Because there are enormous amounts of data already
stored in relational databases, parts of this data may need to be formatted as
documents for exchanging or displaying over the Web. This approach would
use a separate middleware software layer to handle the conversions needed
between the XML documents and the relational database. Section 6 discusses
this approach, in which data-centric XML documents are extracted from
existing databases, in more detail. In particular, we show how tree structured
documents can be created from graph-structured databases. Section 6.2 dis-
cusses the problem of cycles and how to deal with it.
All of these approaches have received considerable attention. We focus on the fourth
approach in Section 6, because it gives a good conceptual understanding of the dif-
ferences between the XML tree data model and the traditional database models
436
based on flat files (relational model) and graph representations (ER model). But
first we give an overview of XML query languages in Section 5.
5 XML Languages
There have been several proposals for XML query languages, and two query language
standards have emerged. The first is XPath, which provides language constructs for
specifying path expressions to identify certain nodes (elements) or attributes within
an XML document that match specific patterns. The second is XQuery, which is a
more general query language. XQuery uses XPath expressions but has additional
constructs. We give an overview of each of these languages in this section. Then we
discuss some additional languages related to HTML in Section 5.3.
5.1 XPath: Specifying Path Expressions in XML
An XPath expression generally returns a sequence of items that satisfy a certain pat-
tern as specified by the expression. These items are either values (from leaf nodes)
or elements or attributes. The most common type of XPath expression returns a col-
lection of element or attribute nodes that satisfy certain patterns specified in the
expression. The names in the XPath expression are node names in the XML docu-
ment tree that are either tag (element) names or attribute names, possibly with
additional qualifier conditions to further restrict the nodes that satisfy the pattern.
Two main separators are used when specifying a path: single slash (/) and double
slash (//). A single slash before a tag specifies that the tag must appear as a direct
child of the previous (parent) tag, whereas a double slash specifies that the tag can
appear as a descendant of the previous tag at any level. Let us look at some examples
of XPath as shown in Figure 6.
The first XPath expression in Figure 6 returns the company root node and all its
descendant nodes, which means that it returns the whole XML document. We
should note that it is customary to include the file name in the XPath query. This
allows us to specify any local file name or even any path name that specifies a file on
the Web. For example, if the COMPANY XML document is stored at the location
www.company.com/info.XML
then the first XPath expression in Figure 6 can be written as
doc(www.company.com/info.XML)/company
This prefix would also be included in the other examples of XPath expressions.
XML: Extensible Markup Language
Figure 6
Some examples of
XPath expressions on
XML documents that
follow the XML
schema file company
in Figure 5.
1. /company
2. /company/department
3. //employee [employeeSalary gt 70000]/employeeName
4. /company/employee [employeeSalary gt 70000]/employeeName
5. /company/project/projectWorker [hours ge 20.0]
437
XML: Extensible Markup Language
The second example in Figure 6 returns all department nodes (elements) and their
descendant subtrees. Note that the nodes (elements) in an XML document are
ordered, so the XPath result that returns multiple nodes will do so in the same order
in which the nodes are ordered in the document tree.
The third XPath expression in Figure 6 illustrates the use of //, which is conve-nient
to use if we do not know the full path name we are searching for, but do know the
name of some tags of interest within the XML document. This is particularly useful
for schemaless XML documents or for documents with many nested levels of
nodes.6
The expression returns all employeeName nodes that are direct children of an
employee node, such that the employee node has another child element employeeSalary
whose value is greater than 70000. This illustrates the use of qualifier conditions,
which restrict the nodes selected by the XPath expression to those that satisfy the con-
dition. XPath has a number of comparison operations for use in qualifier conditions,
including standard arithmetic, string, and set comparison operations.
The fourth XPath expression in Figure 6 should return the same result as the previ-
ous one, except that we specified the full path name in this example. The fifth
expression in Figure 6 returns all projectWorker nodes and their descendant nodes
that are children under a path /company/project and have a child node hours with a
value greater than 20.0 hours.
When we need to include attributes in an XPath expression, the attribute name is
prefixed by the @ symbol to distinguish it from element (tag) names. It is also pos-
sible to use the wildcard symbol *, which stands for any element, as in the following
example, which retrieves all elements that are child elements of the root, regardless
of their element type. When wildcards are used, the result can be a sequence of dif-
ferent types of items.
/company/*
The examples above illustrate simple XPath expressions, where we can only move
down in the tree structure from a given node. A more general model for path
expressions has been proposed. In this model, it is possible to move in multiple
directions from the current node in the path expression. These are known as the
axes of an XPath expression. Our examples above used only three of these axes: child
of the current node (/), descendent or self at any level of the current node (//), and
attribute of the current node (@). Other axes include parent, ancestor (at any level),
previous sibling (any node at same level to the left in the tree), and next sibling (any
node at the same level to the right in the tree). These axes allow for more complex
path expressions.
The main restriction of XPath path expressions is that the path that specifies the pat-
tern also specifies the items to be retrieved. Hence, it is difficult to specify certain
conditions on the pattern while separately specifying which result items should be
6We use the terms node, tag, and element interchangeably here.
438
retrieved. The XQuery language separates these two concerns, and provides more
powerful constructs for specifying queries.
5.2 XQuery: Specifying Queries in XML
XPath allows us to write expressions that select items from a tree-structured XML
document. XQuery permits the specification of more general queries on one or more
XML documents. The typical form of a query in XQuery is known as a FLWR
expression, which stands for the four main clauses of XQuery and has the following
form:
FOR
LET
WHERE
RETURN
There can be zero or more instances of the FOR clause, as well as of the LET clause in
a single XQuery. The WHERE clause is optional, but can appear at most once, and the
RETURN clause must appear exactly once. Let us illustrate these clauses with the fol-
lowing simple example of an XQuery.
LET $d := doc(www.company.com/info.xml)
FOR $x IN $d/company/project[projectNumber = 5]/projectWorker,
$y IN $d/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
RETURN $y/employeeName/firstName, $y/employeeName/lastName,
$x/hours
1. Variables are prefixed with the $ sign. In the above example, $d, $x, and $y
are variables.
2. The LET clause assigns a variable to a particular expression for the rest of the
query. In this example, $d is assigned to the document file name. It is possi-
ble to have a query that refers to multiple documents by assigning multiple
variables in this way.
3. The FOR clause assigns a variable to range over each of the individual items
in a sequence. In our example, the sequences are specified by path expres-
sions. The $x variable ranges over elements that satisfy the path expression
$d/company/project[projectNumber = 5]/projectWorker. The $y variable ranges
over elements that satisfy the path expression $d/company/employee. Hence,
$x ranges over projectWorker elements, whereas $y ranges over employee ele-
ments.
4. The WHERE clause specifies additional conditions on the selection of items.
In this example, the first condition selects only those projectWorker elements
that satisfy the condition (hours gt 20.0). The second condition specifies a
join condition that combines an employee with a projectWorker only if they
have the same ssn value.
5. Finally, the RETURN clause specifies which elements or attributes should be
retrieved from the items that satisfy the query conditions. In this example, it
XML: Extensible Markup Language
439
XML: Extensible Markup Language
will return a sequence of elements each containing for employees who work more that 20 hours per week on project
number 5.
Figure 7 includes some additional examples of queries in XQuery that can be speci-
fied on an XML instance documents that follow the XML schema document in
Figure 5. The first query retrieves the first and last names of employees who earn
more than $70,000. The variable $x is bound to each employeeName element that is a
child of an employee element, but only for employee elements that satisfy the quali-
fier that their employeeSalary value is greater than $70,000. The result retrieves
the firstName and lastName child elements of the selected employeeName elements.
The second query is an alternative way of retrieving the same elements retrieved by
the first query.
The third query illustrates how a join operation can be performed by using more
than one variable. Here, the $x variable is bound to each projectWorker element that
is a child of project number 5, whereas the $y variable is bound to each employee ele-
ment. The join condition matches ssn values in order to retrieve the employee
names. Notice that this is an alternative way of specifying the same query in our ear-
lier example, but without the LET clause.
XQuery has very powerful constructs to specify complex queries. In particular, it can
specify universal and existential quantifiers in the conditions of a query, aggregate
functions, ordering of query results, selection based on position in a sequence, and
even conditional branching. Hence, in some ways, it qualifies as a full-fledged pro-
gramming language.
This concludes our brief introduction to XQuery. The interested reader is referred to
www.w3.org, which contains documents describing the latest standards related to
XML and XQuery. The next section briefly discusses some additional languages and
protocols related to XML.
Figure 7
Some examples of XQuery
queries on XML documents
that follow the XML schema
file company in Figure 5.
1. FOR $x IN
doc(www.company.com/info.xml)
//employee [employeeSalary gt 70000]/employeeName
RETURN $x/firstName, $x/lastName
2. FOR $x IN
doc(www.company.com/info.xml)/company/employee
WHERE $x/employeeSalary gt 70000
RETURN $x/employeeName/firstName, $x/employeeName/lastName
3. FOR $x IN
doc(www.company.com/info.xml)/company/project[projectNumber = 5]/projectWorker,
$y IN doc(www.company.com/info.xml)/company/employee
WHERE $x/hours gt 20.0 AND $y.ssn = $x.ssn
RETURN $y/employeeName/firstName, $y/employeeName/lastName, $x/hours
440
XML: Extensible Markup Language
5.3 Other Languages and Protocols Related to XML
There are several other languages and protocols related to XML technology. The
long-term goal of these and other languages and protocols is to provide the technol-
ogy for realization of the Semantic Web, where all information in the Web can be
intelligently located and processed.
■ The Extensible Stylesheet Language (XSL) can be used to define how a doc-
ument should be rendered for display by a Web browser.
■ The Extensible Stylesheet Language for Transformations (XSLT) can be used
to transform one structure into a different structure. Hence, it can convert
documents from one form to another.
■ The Web Services Description Language (WSDL) allows for the description
of Web Services in XML. This makes the Web Service available to users and
programs over the Web.
■ The Simple Object Access Protocol (SOAP) is a platform-independent and
programming language-independent protocol for messaging and remote
procedure calls.
■ The Resource Description Framework (RDF) provides languages and tools
for exchanging and processing of meta-data (schema) descriptions and spec-
ifications over the Web.
6 Extracting XML Documents from Relational
Databases
6.1 Creating Hierarchical XML Views over Flat
or Graph-Based Data
This section discusses the representational issues that arise when converting data
from a database system into XML documents. As we have discussed, XML uses a
hierarchical (tree) model to represent documents. The database systems with the
most widespread use follow the flat relational data model. When we add referential
integrity constraints, a relational schema can be considered to be a graph structure
(for example, see Figure A.4). Similarly, the ER model represents data using graph-
like structures (for example, see Figure A.3). There are straightforward mappings
between the ER and relational models, so we can conceptually represent a relational
database schema using the corresponding ER schema. Although we will use the ER
model in our discussion and examples to clarify the conceptual differences between
tree and graph models, the same issues apply to converting relational data to XML.
We will use the simplified UNIVERSITY ER schema shown in Figure 8 to illustrate our
discussion. Suppose that an application needs to extract XML documents for stu-
dent, course, and grade information from the UNIVERSITY database. The data
needed for these documents is contained in the database attributes of the entity
441
XML: Extensible Markup Language
Name
S-D
Students
Courses
Instructors
Major dept Department
1 1
1
1
N
N
S-S C-S S-1
D-1
D-C
DEPARTMENT
COURSE
SECTION
Name
Ssn
N
N
M 1
Class
YearNumber Qtr
Grade
STUDENT
Sections
completed
Sections taught
N N
Students attended Instructors
NameSsn
Name
Number
Rank
SalaryINSTRUCTOR
Department
Course
Sections
Figure 8
An ER schema diagram for a sim-
plified UNIVERSITY database.
types COURSE, SECTION, and STUDENT from Figure 8, and the relationships
S-S and C-S between them. In general, most documents extracted from a database
will only use a subset of the attributes, entity types, and relationships in the database.
In this example, the subset of the database that is needed is shown in Figure 9.
At least three possible document hierarchies can be extracted from the database
subset in Figure 9. First, we can choose COURSE as the root, as illustrated in Figure
10. Here, each course entity has the set of its sections as subelements, and each sec-
tion has its students as subelements. We can see one consequence of modeling the
information in a hierarchical tree structure. If a student has taken multiple sections,
that student’s information will appear multiple times in the document—once
under each section. A possible simplified XML schema for this view is shown in
Figure 11. The Grade database attribute in the S-S relationship is migrated to the
STUDENT element. This is because STUDENT becomes a child of SECTION in this
hierarchy, so each STUDENT element under a specific SECTION element can have a
specific grade in that section. In this document hierarchy, a student taking more
than one section will have several replicas, one under each section, and each replica
will have the specific grade given in that particular section.
442
XML: Extensible Markup Language
1
Number
Sections
Name
COURSE
1
Number
Students
attended
Qtr
YearSECTION
N
N
Name
Ssn
Grade
Class
STUDENT
Figure 10
Hierarchical (tree) view with
COURSE as the root.
Figure 11
XML schema document with course as the root.
(continues)
S-D
Ssn
Name
Class
STUDENT
Sections
completed
M N N 1
Number
Year Qtr
SECTION
Number
Name
COURSES-D
Students
attended
Course Sections
Grade
Figure 9
Subset of the UNIVERSITY database schema
needed for XML document extraction.
443
XML: Extensible Markup Language
1
Ssn
Sections
completed
NameSTUDENT
1
Number
Qtr
Year
SECTION
1
N
Grade
Class
COURSE
Course_number
Course_name
Figure 12
Hierarchical (tree) view with
STUDENT as the root.
Figure 11 (continued)
XML schema document with course as the root.
In the second hierarchical document view, we can choose STUDENT as root (Figure 12). In this hierarchi-
cal view, each student has a set of sections as its child elements, and each section is related to one course
as its child, because the relationship between SECTION and COURSE is N:1. Thus, we can merge the
COURSE and SECTION elements in this view, as shown in Figure 12. In addition, the GRADE database
attribute can be migrated to the SECTION element. In this hierarchy, the combined COURSE/SECTION
information is replicated under each student who completed the section. A possible simplified XML
schema for this view is shown in Figure 13.
444
XML: Extensible Markup Language
Figure 13
XML schema
document with student
as the root.
1
Ssn
Students
attended
Name
STUDENT
1
Number
Qtr
Year
SECTION
1
N
Grade
Class
COURSE
Course_number
Course_name
Figure 14
Hierarchical (tree)
view with SECTION as
the root.
The third possible way is to choose SECTION as the root, as shown in Figure 14.
Similar to the second hierarchical view, the COURSE information can be merged
into the SECTION element. The GRADE database attribute can be migrated to the
STUDENT element. As we can see, even in this simple example, there can be numer-
ous hierarchical document views, each corresponding to a different root and a dif-
ferent XML document structure.
445
XML: Extensible Markup Language
COURSE
INSTRUCTOR
1 1 N N
1 1NN
(a) (b)
STUDENT
DEPARTMENTSECTION COURSE
INSTRUCTOR INSTRUCTOR1
STUDENT
DEPARTMENTSECTION
(c)
STUDENT
DEPARTMENTSECTION
INSTRUCTOR COURSE1 INSTRUCTOR1 COURSE
1
M
N
N
Figure 15
Converting a graph with cycles into a hierarchical (tree) structure.
6.2 Breaking Cycles to Convert Graphs into Trees
In the previous examples, the subset of the database of interest had no cycles. It is
possible to have a more complex subset with one or more cycles, indicating multiple
relationships among the entities. In this case, it is more difficult to decide how to
create the document hierarchies. Additional duplication of entities may be needed
to represent the multiple relationships. We will illustrate this with an example using
the ER schema in Figure 8.
Suppose that we need the information in all the entity types and relationships in
Figure 8 for a particular XML document, with STUDENT as the root element. Figure
15 illustrates how a possible hierarchical tree structure can be created for this docu-
ment. First, we get a lattice with STUDENT as the root, as shown in Figure 15(a). This
is not a tree structure because of the cycles. One way to break the cycles is to repli-
cate the entity types involved in the cycles. First, we replicate INSTRUCTOR as shown
in Figure 15(b), calling the replica to the right INSTRUCTOR1. The INSTRUCTOR
replica on the left represents the relationship between instructors and the sections
they teach, whereas the INSTRUCTOR1 replica on the right represents the relation-
ship between instructors and the department each works in. After this, we still have
the cycle involving COURSE, so we can replicate COURSE in a similar manner,
leading to the hierarchy shown in Figure 15(c). The COURSE1 replica to the left
represents the relationship between courses and their sections, whereas the
COURSE replica to the right represents the relationship between courses and the
department that offers each course.
In Figure 15(c), we have converted the initial graph to a hierarchy. We can do further
merging if desired (as in our previous example) before creating the final hierarchy
and the corresponding XML schema structure.
446
XML: Extensible Markup Language
6.3 Other Steps for Extracting XML Documents
from Databases
In addition to creating the appropriate XML hierarchy and corresponding XML
schema document, several other steps are needed to extract a particular XML docu-
ment from a database:
1. It is necessary to create the correct query in SQL to extract the desired infor-
mation for the XML document.
2. Once the query is executed, its result must be restructured from the flat rela-
tional form to the XML tree structure.
3. The query can be customized to select either a single object or multiple
objects into the document. For example, in the view in Figure 13, the query
can select a single student entity and create a document correspond-ing to
that single student, or it may select several—or even all—of the students and
create a document with multiple students.
7 Summary
This chapter provided an overview of the XML standard for representing and
exchanging data over the Internet. First we discussed some of the differences
between various types of data, classifying three main types: structured, semi-struc-
tured, and unstructured. Structured data is stored in traditional databases.
Semistructured data mixes data types names and data values, but the data does not
all have to follow a fixed predefined structure. Unstructured data refers to informa-
tion displayed on the Web, specified via HTML, where information on the types of
data items is missing. We described the XML standard and its tree-structured (hier-
archical) data model, and discussed XML documents and the languages for specify-
ing the structure of these documents, namely, XML DTD (Document Type
Definition) and XML schema. We gave an overview of the various approaches for
storing XML documents, whether in their native (text) format, in a compressed
form, or in relational and other types of databases. Finally, we gave an overview of
the XPath and XQuery languages proposed for querying XML data, and discussed
the mapping issues that arise when it is necessary to convert data stored in tradi-
tional relational databases into XML documents.
Review Questions
1. What are the differences between structured, semistructured, and unstruc-
tured data?
2. Under which of the categories in 1 do XML documents fall? What about self-
describing data?
447
3. What are the differences between the use of tags in XML versus HTML?
4. What is the difference between data-centric and document-centric XML
documents?
5. What is the difference between attributes and elements in XML? List some of
the important attributes used to specify elements in XML schema.
6. What is the difference between XML schema and XML DTD?
Exercises
7. Create part of an XML instance document to correspond to the data stored
in the relational database shown in Figure A.1 such that the XML document
conforms to the XML schema document in Figure 5.
8. Create XML schema documents and XML DTDs to correspond to the hier-
archies shown in Figures 14 and 15(c).
9. Consider the LIBRARY relational database schema in Figure A.5. Create an
XML schema document that corresponds to this database schema.
10. Specify the following views as queries in XQuery on the company XML
schema shown in Figure 5.
a. A view that has the department name, manager name, and manager
salary for every department.
b. A view that has the employee name, supervisor name, and employee
salary for each employee who works in the Research department.
c. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project.
d. A view that has the project name, controlling department name, number
of employees, and total hours worked per week on the project for each
project with more than one employee working on it.
Selected Bibliography
There are so many articles and books on various aspects of XML that it would be
impossible to make even a modest list. We will mention one book: Chaudhri,
Rashid, and Zicari, eds. (2003). This book discusses various aspects of XML and
contains a list of some references to XML research and practice.
XML: Extensible Markup Language
448
DEPT_LOCATIONS
Dnumber
Houston
Stafford
Bellaire
Sugarland
Dlocation
DEPARTMENT
Dname
Research
Administration
Headquarters 1
5
4
888665555
333445555
987654321
1981-06-19
1988-05-22
1995-01-01
Dnumber Mgr_ssn Mgr_start_date
WORKS_ON
Essn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
Pno Hours
PROJECT
Pname
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
3
1
2
30
10
20
5
5
5
4
4
1
Houston
Bellaire
Sugarland
Stafford
Stafford
Houston
Pnumber Plocation Dnum
DEPENDENT
333445555
333445555
333445555
987654321
123456789
123456789
123456789
Joy
Alice F
M
F
M
M
F
F
1986-04-05
1983-10-25
1958-05-03
1942-02-28
1988-01-04
1988-12-30
1967-05-05
Theodore
Alice
Elizabeth
Abner
Michael
Spouse
Daughter
Son
Daughter
Spouse
Spouse
Son
Dependent_name Sex Bdate Relationship
EMPLOYEE
Fname
John
Franklin
Jennifer
Alicia
Ramesh
Joyce
James
Ahmad
Narayan
English
Borg
Jabbar
666884444
453453453
888665555
987987987
F
F
M
M
M
M
M
F
4
4
5
5
4
1
5
5
25000
43000
30000
40000
25000
55000
38000
25000
987654321
888665555
333445555
888665555
987654321
NULL
333445555
333445555
Zelaya
Wallace
Smith
Wong
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
731 Fondren, Houston, TX
638 Voss, Houston, TX
1968-01-19
1941-06-20
1965-01-09
1955-12-08
1969-03-29
1937-11-10
1962-09-15
1972-07-31
980 Dallas, Houston, TX
450 Stone, Houston, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
999887777
987654321
123456789
333445555
Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn
B
T
J
S
K
A
V
E
Houston
1
4
5
5
Essn
5
Figure A.1
One possible database state for the COMPANY relational database schema.
449
XML: Extensible Markup Language
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.2
Schema diagram for the
COMPANY relational
database schema.
450
XML: Extensible Markup Language
EMPLOYEE
Fname Minit Lname
Name Address
Sex
Salary
Ssn
Bdate
Supervisor Supervisee
SUPERVISION1 N
Hours
WORKS_ON
CONTROLS
M N
1
DEPENDENTS_OF
Name
Location
N
1
1 1
PROJECT
DEPARTMENT
Locations
Name Number
Number
Number_of_employees
MANAGES
Start_date
WORKS_FOR
1N
N
DEPENDENT
Sex Birth_date RelationshipName
Figure A.3
An ER schema diagram for the COMPANY database.
451
XML: Extensible Markup Language
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.4
Referential integrity constraints displayed
on the COMPANY relational database
schema.
452
XML: Extensible Markup Language
Publisher_nameBook_id Title
BOOK
BOOK_COPIES
Book_id Branch_id No_of_copies
BOOK_AUTHORS
Book_id Author_name
LIBRARY_BRANCH
Branch_id Branch_name Address
PUBLISHER
Name Address Phone
BOOK_LOANS
Book_id Branch_id Card_no Date_out Due_date
BORROWER
Card_no Name Address Phone
Figure A.5
A relational database
schema for a
LIBRARY database.
453
Introduction to SQL
Programming Techniques
In this chapter, we discuss some of the methods thathave been developed for accessing databases from
programs. Most database access in practical applications is accomplished through
software programs that implement database applications. This software is usually
developed in a general-purpose programming language such as Java, C/C++/C#,
COBOL, or some other programming language. In addition, many scripting lan-
guages, such as PHP and JavaScript, are also being used for programming of data-
base access within Web applications. In this chapter, we focus on how databases can
be accessed from the traditional programming languages C/C++ and Java, whereas
in the next chapter we introduce how databases are accessed from scripting lan-
guages such as PHP and JavaScript. Recall that when database statements are
included in a program, the general-purpose programming language is called the
host language, whereas the database language—SQL, in our case—is called the data
sublanguage. In some cases, special database programming languages are developed
specifically for writing database applications. Although many of these were devel-
oped as research prototypes, some notable database programming languages have
widespread use, such as Oracle’s PL/SQL (Programming Language/SQL).
It is important to note that database programming is a very broad topic. There are
whole textbooks devoted to each database programming technique and how that
technique is realized in a specific system. New techniques are developed all the time,
From Chapter 13 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
454
Introduction to SQL Programming Techniques
and changes to existing techniques are incorporated into newer system versions and
languages. An additional difficulty in presenting this topic is that although there are
SQL standards, these standards themselves are continually evolving, and each
DBMS vendor may have some variations from the standard. Because of this, we
have chosen to give an introduction to some of the main types of database pro-
gramming techniques and to compare these techniques, rather than study one par-
ticular method or system in detail. The examples we give serve to illustrate the main
differences that a programmer would face when using each of these database pro-
gramming techniques. We will try to use the SQL standards in our examples rather
than describe a specific system. When using a specific system, the materials in this
chapter can serve as an introduction, but should be augmented with the system
manuals or with books describing the specific system.
We start our presentation of database programming in Section 1 with an overview
of the different techniques developed for accessing a database from programs. Then,
in Section 2, we discuss the rules for embedding SQL statements into a general-pur-
pose programming language, generally known as embedded SQL. This section also
briefly discusses dynamic SQL, in which queries can be dynamically constructed at
runtime, and presents the basics of the SQLJ variation of embedded SQL that was
developed specifically for the programming language Java. In Section 3, we discuss
the technique known as SQL/CLI (Call Level Interface), in which a library of proce-
dures and functions is provided for accessing the database. Various sets of library
functions have been proposed. The SQL/CLI set of functions is the one given in the
SQL standard. Another library of functions is ODBC (Open Data Base
Connectivity). We do not describe ODBC because it is considered to be the prede-
cessor to SQL/CLI. A third library of functions—which we do describe—is JDBC;
this was developed specifically for accessing databases from Java. In Section 4 we
discuss SQL/PSM (Persistent Stored Modules), which is a part of the SQL standard
that allows program modules—procedures and functions—to be stored by the
DBMS and accessed through SQL. We briefly compare the three approaches to data-
base programming in Section 5, and provide a chapter summary in Section 6.
1 Database Programming: Techniques
and Issues
We now turn our attention to the techniques that have been developed for accessing
databases from programs and, in particular, to the issue of how to access SQL data-
bases from application programs. SQL has language constructs for various database
operations—from schema definition and constraint specification to querying,
updating, and specifying views. Most database systems have an interactive interface
where these SQL commands can be typed directly into a monitor for execution by
the database system. For example, in a computer system where the Oracle RDBMS
is installed, the command SQLPLUS starts the interactive interface. The user can
455
Introduction to SQL Programming Techniques
type SQL commands or queries directly over several lines, ended by a semicolon and
the Enter key (that is, “; “). Alternatively, a file of commands can be created
and executed through the interactive interface by typing @. The system
will execute the commands written in the file and display the results, if any.
The interactive interface is quite convenient for schema and constraint creation or
for occasional ad hoc queries. However, in practice, the majority of database inter-
actions are executed through programs that have been carefully designed and tested.
These programs are generally known as application programs or database applica-
tions, and are used as canned transactions by the end users. Another common use of
database programming is to access a database through an application program that
implements a Web interface, for example, when making airline reservations or
online purchases. In fact, the vast majority of Web electronic commerce applica-
tions include some database access commands.
In this section, first we give an overview of the main approaches to database pro-
gramming. Then we discuss some of the problems that occur when trying to access
a database from a general-purpose programming language, and the typical
sequence of commands for interacting with a database from a software program.
1.1 Approaches to Database Programming
Several techniques exist for including database interactions in application pro-
grams. The main approaches for database programming are the following:
1. Embedding database commands in a general-purpose programming lan-
guage. In this approach, database statements are embedded into the host
programming language, but they are identified by a special prefix. For exam-
ple, the prefix for embedded SQL is the string EXEC SQL, which precedes all
SQL commands in a host language program.1 A precompiler or
preproccessor scans the source program code to identify database state-
ments and extract them for processing by the DBMS. They are replaced in
the program by function calls to the DBMS-generated code. This technique
is generally referred to as embedded SQL.
2. Using a library of database functions. A library of functions is made avail-
able to the host programming language for database calls. For example, there
could be functions to connect to a database, execute a query, execute an
update, and so on. The actual database query and update commands and any
other necessary information are included as parameters in the function calls.
This approach provides what is known as an application programming
interface (API) for accessing a database from application programs.
3. Designing a brand-new language. A database programming language is
designed from scratch to be compatible with the database model and query
language. Additional programming structures such as loops and conditional
1Other prefixes are sometimes used, but this is the most common.
456
Introduction to SQL Programming Techniques
statements are added to the database language to convert it into a full-
fledged programming language. An example of this approach is Oracle’s
PL/SQL.
In practice, the first two approaches are more common, since many applications are
already written in general-purpose programming languages but require some data-
base access. The third approach is more appropriate for applications that have
intensive database interaction. One of the main problems with the first two
approaches is impedance mismatch, which does not occur in the third approach.
1.2 Impedance Mismatch
Impedance mismatch is the term used to refer to the problems that occur because
of differences between the database model and the programming language model.
For example, the practical relational model has three main constructs: columns
(attributes) and their data types, rows (also referred to as tuples or records), and
tables (sets or multisets of records). The first problem that may occur is that the
data types of the programming language differ from the attribute data types that are
available in the data model. Hence, it is necessary to have a binding for each host
programming language that specifies for each attribute type the compatible pro-
gramming language types. A different binding is needed for each programming lan-
guage because different languages have different data types. For example, the data
types available in C/C++ and Java are different, and both differ from the SQL data
types, which are the standard data types for relational databases.
Another problem occurs because the results of most queries are sets or multisets of
tuples (rows), and each tuple is formed of a sequence of attribute values. In the pro-
gram, it is often necessary to access the individual data values within individual
tuples for printing or processing. Hence, a binding is needed to map the query result
data structure, which is a table, to an appropriate data structure in the programming
language. A mechanism is needed to loop over the tuples in a query result in order
to access a single tuple at a time and to extract individual values from the tuple. The
extracted attribute values are typically copied to appropriate program variables for
further processing by the program. A cursor or iterator variable is typically used to
loop over the tuples in a query result. Individual values within each tuple are then
extracted into distinct program variables of the appropriate type.
Impedance mismatch is less of a problem when a special database programming
language is designed that uses the same data model and data types as the database
model. One example of such a language is Oracle’s PL/SQL. The SQL standard also
has a proposal for such a database programming language, known as SQL/PSM. For
object databases, the object data model is quite similar to the data model of the Java
programming language, so the impedance mismatch is greatly reduced when Java is
used as the host language for accessing a Java-compatible object database. Several
database programming languages have been implemented as research prototypes
(see the Selected Bibliography).
457
Introduction to SQL Programming Techniques
1.3 Typical Sequence of Interaction
in Database Programming
When a programmer or software engineer writes a program that requires access to a
database, it is quite common for the program to be running on one computer sys-
tem while the database is installed on another. Recall that a common architecture
for database access is the client/server model, where a client program handles the
logic of a software application, but includes some calls to one or more database
servers to access or update the data.2 When writing such a program, a common
sequence of interaction is the following:
1. When the client program requires access to a particular database, the pro-
gram must first establish or open a connection to the database server.
Typically, this involves specifying the Internet address (URL) of the machine
where the database server is located, plus providing a login account name
and password for database access.
2. Once the connection is established, the program can interact with the data-
base by submitting queries, updates, and other database commands. In gen-
eral, most types of SQL statements can be included in an application
program.
3. When the program no longer needs access to a particular database, it should
terminate or close the connection to the database.
A program can access multiple databases if needed. In some database programming
approaches, only one connection can be active at a time, whereas in other
approaches multiple connections can be established simultaneously.
In the next three sections, we discuss examples of each of the three main approaches
to database programming. Section 2 describes how SQL is embedded into a pro-
gramming language. Section 3 discusses how function calls are used to access the
database, and Section 4 discusses an extension to SQL called SQL/PSM that allows
general-purpose programming constructs for defining modules (procedures and
functions) that are stored within the database system.3 Section 5 compares these
approaches.
2 Embedded SQL, Dynamic SQL, and SQLJ
In this section, we give an overview of the technique for how SQL statements can be
embedded in a general-purpose programming language. We focus on two lan-
guages: C and Java. The examples used with the C language, known as embedded
2There are two-tier and three-tier architectures; to keep our discussion simple, we will assume a two-tier
client/server architecture here.
3SQL/PSM illustrates how typical general-purpose programming language constructs—such as loops
and conditional structures—can be incorporated into SQL.
458
Introduction to SQL Programming Techniques
SQL, are presented in Sections 2.1 through 2.3, and can be adapted to other pro-
gramming languages. The examples using Java, known as SQLJ, are presented in
Sections 2.4 and 2.5. In this embedded approach, the programming language is
called the host language. Most SQL statements—including data or constraint defi-
nitions, queries, updates, or view definitions—can be embedded in a host language
program.
2.1 Retrieving Single Tuples with Embedded SQL
To illustrate the concepts of embedded SQL, we will use C as the host programming
language.4 When using C as the host language, an embedded SQL statement is dis-
tinguished from programming language statements by prefixing it with the key-
words EXEC SQL so that a preprocessor (or precompiler) can separate embedded
SQL statements from the host language code. The SQL statements within a program
are terminated by a matching END-EXEC or by a semicolon (;). Similar rules apply
to embedding SQL in other programming languages.
Within an embedded SQL command, we may refer to specially declared C program
variables. These are called shared variables because they are used in both the C pro-
gram and the embedded SQL statements. Shared variables are prefixed by a colon (:)
when they appear in an SQL statement. This distinguishes program variable names
from the names of database schema constructs such as attributes (column names)
and relations (table names). It also allows program variables to have the same
names as attribute names, since they are distinguishable by the colon (:) prefix in the
SQL statement. Names of database schema constructs—such as attributes and rela-
tions—can only be used within the SQL commands, but shared program variables
can be used elsewhere in the C program without the colon (:) prefix.
Suppose that we want to write C programs to process the COMPANY database in
Figure A.1 in Appendix: Figures at the end of this chapter. We need to declare pro-
gram variables to match the types of the database attributes that the program will
process. The programmer can choose the names of the program variables; they may
or may not have names that are identical to their corresponding database attributes.
We will use the C program variables declared in Figure 1 for all our examples and
show C program segments without variable declarations. Shared variables are
declared within a declare section in the program, as shown in Figure 1 (lines 1
through 7).5 A few of the common bindings of C types to SQL types are as follows.
The SQL types INTEGER, SMALLINT, REAL, and DOUBLE are mapped to the C types
long, short, float, and double, respectively. Fixed-length and varying-length
strings (CHAR[i], VARCHAR[i]) in SQL can be mapped to arrays of characters (char
[i+1], varchar [i+1]) in C that are one character longer than the SQL type
4Our discussion here also applies to the C++ programming language, since we do not use any of the
object-oriented features, but focus on the database programming mechanism.
5We use line numbers in our code segments for easy reference; these numbers are not part of the actual
code.
459
Introduction to SQL Programming Techniques
Figure 1
C program variables used in the
embedded SQL examples E1 and E2.
0) int loop ;
1) EXEC SQL BEGIN DECLARE SECTION ;
2) varchar dname [16], fname [16], lname [16], address [31] ;
3) char ssn [10], bdate [11], sex [2], minit [2] ;
4) float salary, raise ;
5) int dno, dnumber ;
6) int SQLCODE ; char SQLSTATE [6] ;
7) EXEC SQL END DECLARE SECTION ;
because strings in C are terminated by a NULL character (\0), which is not part of
the character string itself.6 Although varchar is not a standard C data type, it is
permitted when C is used for SQL database programming.
Notice that the only embedded SQL commands in Figure 1 are lines 1 and 7, which
tell the precompiler to take note of the C variable names between BEGIN DECLARE
and END DECLARE because they can be included in embedded SQL statements—as
long as they are preceded by a colon (:). Lines 2 through 5 are regular C program
declarations. The C program variables declared in lines 2 through 5 correspond to
the attributes of the EMPLOYEE and DEPARTMENT tables from the COMPANY data-
base in Figure A.1. The variables declared in line 6—SQLCODE and SQLSTATE—are
used to communicate errors and exception conditions between the database system
and the executing program. Line 0 shows a program variable loop that will not be
used in any embedded SQL statement, so it is declared outside the SQL declare
section.
Connecting to the Database. The SQL command for establishing a connection
to a database has the following form:
CONNECT TO AS
AUTHORIZATION ;
In general, since a user or program can access several database servers, several con-
nections can be established, but only one connection can be active at any point in
time. The programmer or user can use the to change from the
currently active connection to a different one by using the following command:
SET CONNECTION ;
Once a connection is no longer needed, it can be terminated by the following com-
mand:
DISCONNECT ;
In the examples in this chapter, we assume that the appropriate connection has
already been established to the COMPANY database, and that it is the currently
active connection.
6SQL strings can also be mapped to char* types in C.
460
Introduction to SQL Programming Techniques
Communicating between the Program and the DBMS Using SQLCODE
and SQLSTATE. The two special communication variables that are used by the
DBMS to communicate exception or error conditions to the program are SQLCODE
and SQLSTATE. The SQLCODE variable shown in Figure 1 is an integer variable.
After each database command is executed, the DBMS returns a value in SQLCODE.
A value of 0 indicates that the statement was executed successfully by the DBMS. If
SQLCODE > 0 (or, more specifically, if SQLCODE = 100), this indicates that no
more data (records) are available in a query result. If SQLCODE < 0, this indicates
some error has occurred. In some systems—for example, in the Oracle RDBMS—
SQLCODE is a field in a record structure called SQLCA (SQL communication area),
so it is referenced as SQLCA.SQLCODE. In this case, the definition of SQLCA must
be included in the C program by including the following line:
EXEC SQL include SQLCA ;
In later versions of the SQL standard, a communication variable called SQLSTATE
was added, which is a string of five characters. A value of ‘00000’ in SQLSTATE indi-
cates no error or exception; other values indicate various errors or exceptions. For
example, ‘02000’ indicates ‘no more data’ when using SQLSTATE. Currently, both
SQLSTATE and SQLCODE are available in the SQL standard. Many of the error and
exception codes returned in SQLSTATE are supposed to be standardized for all SQL
vendors and platforms,7 whereas the codes returned in SQLCODE are not standard-
ized but are defined by the DBMS vendor. Hence, it is generally better to use
SQLSTATE because this makes error handling in the application programs inde-
pendent of a particular DBMS. As an exercise, the reader should rewrite the exam-
ples given later in this chapter using SQLSTATE instead of SQLCODE.
Example of Embedded SQL Programming. Our first example to illustrate
embedded SQL programming is a repeating program segment (loop) that takes as
input a Social Security number of an employee and prints some information from
the corresponding EMPLOYEE record in the database. The C program code is shown
as program segment E1 in Figure 2. The program reads (inputs) an Ssn value and
then retrieves the EMPLOYEE tuple with that Ssn from the database via the embed-
ded SQL command. The INTO clause (line 5) specifies the program variables into
which attribute values from the database record are retrieved. C program variables
in the INTO clause are prefixed with a colon (:), as we discussed earlier. The INTO
clause can be used in this way only when the query result is a single record; if multi-
ple records are retrieved, an error will be generated. We will see how multiple
records are handled in Section 2.2.
Line 7 in E1 illustrates the communication between the database and the program
through the special variable SQLCODE. If the value returned by the DBMS in
SQLCODE is 0, the previous statement was executed without errors or exception
conditions. Line 7 checks this and assumes that if an error occurred, it was because
7In particular, SQLSTATE codes starting with the characters 0 through 4 or A through H are supposed to
be standardized, whereas other values can be implementation-defined.
461
Introduction to SQL Programming Techniques
Figure 2
Program segment E1,
a C program segment
with embedded SQL.
//Program Segment E1:
0) loop = 1 ;
1) while (loop) {
2) prompt("Enter a Social Security Number: ", ssn) ;
3) EXEC SQL
4) select Fname, Minit, Lname, Address, Salary
5) into :fname, :minit, :lname, :address, :salary
6) from EMPLOYEE where Ssn = :ssn ;
7) if (SQLCODE == 0) printf(fname, minit, lname, address, salary)
8) else printf("Social Security Number does not exist: ", ssn) ;
9) prompt("More Social Security Numbers (enter 1 for Yes, 0 for No): ", loop) ;
10) }
no EMPLOYEE tuple existed with the given Ssn; therefore it outputs a message to
that effect (line 8).
In E1 a single record is selected by the embedded SQL query (because Ssn is a key
attribute of EMPLOYEE);. When a single record is retrieved, the programmer can
assign its attribute values directly to C program variables in the INTO clause, as in
line 5. In general, an SQL query can retrieve many tuples. In that case, the C pro-
gram will typically go through the retrieved tuples and process them one at a time.
The concept of a cursor is used to allow tuple-at-a-time processing of a query result
by the host language program. We describe cursors next.
2.2 Retrieving Multiple Tuples with Embedded SQL
Using Cursors
We can think of a cursor as a pointer that points to a single tuple (row) from the
result of a query that retrieves multiple tuples. The cursor is declared when the SQL
query command is declared in the program. Later in the program, an OPEN CUR-
SOR command fetches the query result from the database and sets the cursor to a
position before the first row in the result of the query. This becomes the current row
for the cursor. Subsequently, FETCH commands are issued in the program; each
FETCH moves the cursor to the next row in the result of the query, making it the cur-
rent row and copying its attribute values into the C (host language) program vari-
ables specified in the FETCH command by an INTO clause. The cursor variable is
basically an iterator that iterates (loops) over the tuples in the query result—one
tuple at a time.
To determine when all the tuples in the result of the query have been processed, the
communication variable SQLCODE (or, alternatively, SQLSTATE) is checked. If a
FETCH command is issued that results in moving the cursor past the last tuple in the
result of the query, a positive value (SQLCODE > 0) is returned in SQLCODE, indi-
cating that no data (tuple) was found (or the string ‘02000’ is returned in
SQLSTATE). The programmer uses this to terminate a loop over the tuples in the
query result. In general, numerous cursors can be opened at the same time. A
462
Introduction to SQL Programming Techniques
CLOSE CURSOR command is issued to indicate that we are done with processing
the result of the query associated with that cursor.
An example of using cursors to process a query result with multiple records is
shown in Figure 3, where a cursor called EMP is declared in line 4. The EMP cursor
is associated with the SQL query declared in lines 5 through 6, but the query is not
executed until the OPEN EMP command (line 8) is processed. The OPEN
command executes the query and fetches its result as a table into
the program workspace, where the program can loop through the individual rows
(tuples) by subsequent FETCH commands (line 9). We assume that
appropriate C program variables have been declared as in Figure 1. The program
segment in E2 reads (inputs) a department name (line 0), retrieves the matching
department number from the database (lines 1 to 3), and then retrieves the employ-
ees who work in that department via the declared EMP cursor. A loop (lines 10 to
18) iterates over each record in the query result, one at a time, and prints the
employee name. The program then reads (inputs) a raise amount for that employee
(line 12) and updates the employee’s salary in the database by the raise amount that
was provided (lines 14 to 16).
This example also illustrates how the programmer can update database records.
When a cursor is defined for rows that are to be modified (updated), we must add
Figure 3
Program segment E2, a C program segment that uses
cursors with embedded SQL for update purposes.
//Program Segment E2:
0) prompt(“Enter the Department Name: “, dname) ;
1) EXEC SQL
2) select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname ;
4) EXEC SQL DECLARE EMP CURSOR FOR
5) select Ssn, Fname, Minit, Lname, Salary
6) from EMPLOYEE where Dno = :dnumber
7) FOR UPDATE OF Salary ;
8) EXEC SQL OPEN EMP ;
9) EXEC SQL FETCH from EMP into :ssn, :fname, :minit, :lname, :salary ;
10) while (SQLCODE == 0) {
11) printf(“Employee name is:”, Fname, Minit, Lname) ;
12) prompt(“Enter the raise amount: “, raise) ;
13) EXEC SQL
14) update EMPLOYEE
15) set Salary = Salary + :raise
16) where CURRENT OF EMP ;
17) EXEC SQL FETCH from EMP into :ssn, :fname, :minit, :lname, :salary ;
18) }
19) EXEC SQL CLOSE EMP ;
463
Introduction to SQL Programming Techniques
the clause FOR UPDATE OF in the cursor declaration and list the names of any
attributes that will be updated by the program. This is illustrated in line 7 of code
segment E2. If rows are to be deleted, the keywords FOR UPDATE must be added
without specifying any attributes. In the embedded UPDATE (or DELETE) com-
mand, the condition WHERE CURRENT OF specifies that the cur-
rent tuple referenced by the cursor is the one to be updated (or deleted), as in line
16 of E2.
Notice that declaring a cursor and associating it with a query (lines 4 through 7 in
E2) does not execute the query; the query is executed only when the OPEN command (line 8) is executed. Also notice that there is no need to include
the FOR UPDATE OF clause in line 7 of E2 if the results of the query are to be used
for retrieval purposes only (no update or delete).
General Options for a Cursor Declaration. Several options can be specified
when declaring a cursor. The general form of a cursor declaration is as follows:
DECLARE [ INSENSITIVE ] [ SCROLL ] CURSOR
[ WITH HOLD ] FOR
[ ORDER BY ]
[ FOR READ ONLY | FOR UPDATE [ OF ] ] ;
We already briefly discussed the options listed in the last line. The default is that the
query is for retrieval purposes (FOR READ ONLY). If some of the tuples in the query
result are to be updated, we need to specify FOR UPDATE OF and list
the attributes that may be updated. If some tuples are to be deleted, we need to spec-
ify FOR UPDATE without any attributes listed.
When the optional keyword SCROLL is specified in a cursor declaration, it is possi-
ble to position the cursor in other ways than for purely sequential access. A fetch
orientation can be added to the FETCH command, whose value can be one of NEXT,
PRIOR, FIRST, LAST, ABSOLUTE i, and RELATIVE i. In the latter two commands, i
must evaluate to an integer value that specifies an absolute tuple position within the
query result (for ABSOLUTE i), or a tuple position relative to the current cursor
position (for RELATIVE i). The default fetch orientation, which we used in our exam-
ples, is NEXT. The fetch orientation allows the programmer to move the cursor
around the tuples in the query result with greater flexibility, providing random
access by position or access in reverse order. When SCROLL is specified on the cur-
sor, the general form of a FETCH command is as follows, with the parts in square
brackets being optional:
FETCH [ [ ] FROM ] INTO ;
The ORDER BY clause orders the tuples so that the FETCH command will fetch them
in the specified order. It is specified in a similar manner to the corresponding clause
for SQL queries. The last two options when declaring a cursor (INSENSITIVE and
WITH HOLD) refer to transaction characteristics of database programs.
464
Introduction to SQL Programming Techniques
Figure 4
Program segment E3, a C program segment
that uses dynamic SQL for updating a table.
//Program Segment E3:
0) EXEC SQL BEGIN DECLARE SECTION ;
1) varchar sqlupdatestring [256] ;
2) EXEC SQL END DECLARE SECTION ;
…
3) prompt(“Enter the Update Command: “, sqlupdatestring) ;
4) EXEC SQL PREPARE sqlcommand FROM :sqlupdatestring ;
5) EXEC SQL EXECUTE sqlcommand ;
…
2.3 Specifying Queries at Runtime Using Dynamic SQL
In the previous examples, the embedded SQL queries were written as part of the
host program source code. Hence, any time we want to write a different query, we
must modify the program code, and go through all the steps involved (compiling,
debugging, testing, and so on). In some cases, it is convenient to write a program
that can execute different SQL queries or updates (or other operations) dynamically
at runtime. For example, we may want to write a program that accepts an SQL query
typed from the monitor, executes it, and displays its result, such as the interactive
interfaces available for most relational DBMSs. Another example is when a user-
friendly interface generates SQL queries dynamically for the user based on point-
and-click operations on a graphical schema (for example, a QBE-like interface). In
this section, we give a brief overview of dynamic SQL, which is one technique for
writing this type of database program, by giving a simple example to illustrate how
dynamic SQL can work. In Section 3, we will describe another approach for dealing
with dynamic queries.
Program segment E3 in Figure 4 reads a string that is input by the user (that string
should be an SQL update command) into the string program variable
sqlupdatestring in line 3. It then prepares this as an SQL command in line 4 by
associating it with the SQL variable sqlcommand. Line 5 then executes the command.
Notice that in this case no syntax check or other types of checks on the command are
possible at compile time, since the SQL command is not available until runtime. This
contrasts with our previous examples of embedded SQL, where the query could be
checked at compile time because its text was in the program source code.
Although including a dynamic update command is relatively straightforward in
dynamic SQL, a dynamic query is much more complicated. This is because usually
we do not know the types or the number of attributes to be retrieved by the SQL
query when we are writing the program. A complex data structure is sometimes
needed to allow for different numbers and types of attributes in the query result if
no prior information is known about the dynamic query. Techniques similar to
those that we discuss in Section 3 can be used to assign query results (and query
parameters) to host program variables.
In E3, the reason for separating PREPARE and EXECUTE is that if the command is to
be executed multiple times in a program, it can be prepared only once. Preparing
the command generally involves syntax and other types of checks by the system, as
465
Introduction to SQL Programming Techniques
well as generating the code for executing it. It is possible to combine the PREPARE
and EXECUTE commands (lines 4 and 5 in E3) into a single statement by writing
EXEC SQL EXECUTE IMMEDIATE :sqlupdatestring ;
This is useful if the command is to be executed only once. Alternatively, the pro-
grammer can separate the two statements to catch any errors after the PREPARE
statement, if any.
2.4 SQLJ: Embedding SQL Commands in Java
In the previous subsections, we gave an overview of how SQL commands can be
embedded in a traditional programming language, using the C language in our
examples. We now turn our attention to how SQL can be embedded in an object-
oriented programming language,8 in particular, the Java language. SQLJ is a stan-
dard that has been adopted by several vendors for embedding SQL in Java.
Historically, SQLJ was developed after JDBC, which is used for accessing SQL data-
bases from Java using function calls. We discuss JDBC in Section 3.2. In this section,
we focus on SQLJ as it is used in the Oracle RDBMS. An SQLJ translator will gener-
ally convert SQL statements into Java, which can then be executed through the
JDBC interface. Hence, it is necessary to install a JDBC driver when using SQLJ.9 In
this section, we focus on how to use SQLJ concepts to write embedded SQL in a Java
program.
Before being able to process SQLJ with Java in Oracle, it is necessary to import sev-
eral class libraries, shown in Figure 5. These include the JDBC and IO classes (lines
1 and 2), plus the additional classes listed in lines 3, 4, and 5. In addition, the pro-
gram must first connect to the desired database using the function call
getConnection, which is one of the methods of the oracle class in line 5 of Figure
Figure 5
Importing classes needed for including
SQLJ in Java programs in Oracle, and
establishing a connection and default
context.
1) import java.sql.* ;
2) import java.io.* ;
3) import sqlj.runtime.* ;
4) import sqlj.runtime.ref.* ;
5) import oracle.sqlj.runtime.* ;
…
6) DefaultContext cntxt =
7) oracle.getConnection(““, ““, ““, true) ;
8) DefaultContext.setDefaultContext(cntxt) ;
…
8This section assumes familiarity with object-oriented concepts and basic JAVA concepts.
9We discuss JDBC drivers in Section 3.2.
466
Introduction to SQL Programming Techniques
Figure 6
Java program vari-
ables used in SQLJ
examples J1 and J2.
1) string dname, ssn , fname, fn, lname, ln,
bdate, address ;
2) char sex, minit, mi ;
3) double salary, sal ;
4) integer dno, dnumber ;
5. The format of this function call, which returns an object of type default context,10
is as follows:
public static DefaultContext
getConnection(String url, String user, String password,
Boolean autoCommit)
throws SQLException ;
For example, we can write the statements in lines 6 through 8 in Figure 5 to connect
to an Oracle database located at the url using the login of and with automatic commitment of each command,11 and
then set this connection as the default context for subsequent commands.
In the following examples, we will not show complete Java classes or programs since
it is not our intention to teach Java. Rather, we will show program segments that
illustrate the use of SQLJ. Figure 6 shows the Java program variables used in our
examples. Program segment J1 in Figure 7 reads an employee’s Ssn and prints some
of the employee’s information from the database.
Notice that because Java already uses the concept of exceptions for error handling,
a special exception called SQLException is used to return errors or exception con-
ditions after executing an SQL database command. This plays a similar role to
SQLCODE and SQLSTATE in embedded SQL. Java has many types of predefined
exceptions. Each Java operation (function) must specify the exceptions that can be
thrown—that is, the exception conditions that may occur while executing the Java
code of that operation. If a defined exception occurs, the system transfers control to
the Java code specified for exception handling. In J1, exception handling for an
SQLException is specified in lines 7 and 8. In Java, the following structure
try {} catch () {}
is used to deal with exceptions that occur during the execution of . If
no exception occurs, the is processed directly. Exceptions
10A default context, when set, applies to subsequent commands in the program until it is changed.
11Automatic commitment roughly means that each command is applied to the database after it is exe-
cuted. The alternative is that the programmer wants to execute several related database commands and
then commit them together.
467
Introduction to SQL Programming Techniques
Figure 7
Program segment J1,
a Java program seg-
ment with SQLJ.
//Program Segment J1:
1) ssn = readEntry(“Enter a Social Security Number: “) ;
2) try {
3) #sql { select Fname, Minit, Lname, Address, Salary
4) into :fname, :minit, :lname, :address, :salary
5) from EMPLOYEE where Ssn = :ssn} ;
6) } catch (SQLException se) {
7) System.out.println(“Social Security Number does not exist: ” + ssn) ;
8) Return ;
9) }
10) System.out.println(fname + ” ” + minit + ” ” + lname + ” ” + address
+ ” ” + salary)
that can be thrown by the code in a particular operation should be specified as part
of the operation declaration or interface—for example, in the following format:
()
throws SQLException, IOException ;
In SQLJ, the embedded SQL commands within a Java program are preceded by
#sql, as illustrated in J1 line 3, so that they can be identified by the preprocessor.
The #sql is used instead of the keywords EXEC SQL that are used in embedded SQL
with the C programming language (see Section 2.1). SQLJ uses an INTO clause—
similar to that used in embedded SQL—to return the attribute values retrieved
from the database by an SQL query into Java program variables. The program vari-
ables are preceded by colons (:) in the SQL statement, as in embedded SQL.
In J1 a single tuple is retrieved by the embedded SQLJ query; that is why we are able
to assign its attribute values directly to Java program variables in the INTO clause in
line 4 in Figure 7. For queries that retrieve many tuples, SQLJ uses the concept of an
iterator, which is similar to a cursor in embedded SQL.
2.5 Retrieving Multiple Tuples in SQLJ Using Iterators
In SQLJ, an iterator is a type of object associated with a collection (set or multiset)
of records in a query result.12 The iterator is associated with the tuples and attrib-
utes that appear in a query result. There are two types of iterators:
1. A named iterator is associated with a query result by listing the attribute
names and types that appear in the query result. The attribute names must
correspond to appropriately declared Java program variables, as shown in
Figure 6.
2. A positional iterator lists only the attribute types that appear in the query
result.
12We will not discuss iterators in more detail here.
468
Introduction to SQL Programming Techniques
Figure 8
Program segment J2A, a Java program segment that uses a named iterator to
print employee information in a particular department.
//Program Segment J2A:
0) dname = readEntry(“Enter the Department Name: “) ;
1) try {
2) #sql { select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname} ;
4) } catch (SQLException se) {
5) System.out.println(“Department does not exist: ” + dname) ;
6) Return ;
7) }
8) System.out.printline(“Employee information for Department: ” + dname) ;
9) #sql iterator Emp(String ssn, String fname, String minit, String lname,
double salary) ;
10) Emp e = null ;
11) #sql e = { select ssn, fname, minit, lname, salary
12) from EMPLOYEE where Dno = :dnumber} ;
13) while (e.next()) {
14) System.out.printline(e.ssn + ” ” + e.fname + ” ” + e.minit + ” ” +
e.lname + ” ” + e.salary) ;
15) } ;
16) e.close() ;
In both cases, the list should be in the same order as the attributes that are listed in
the SELECT clause of the query. However, looping over a query result is different for
the two types of iterators, as we shall see. First, we show an example of using a
named iterator in Figure 8, program segment J2A. Line 9 in Figure 8 shows how a
named iterator type Emp is declared. Notice that the names of the attributes in a
named iterator type must match the names of the attributes in the SQL query result.
Line 10 shows how an iterator object e of type Emp is created in the program and
then associated with a query (lines 11 and 12).
When the iterator object is associated with a query (lines 11 and 12 in Figure 8), the
program fetches the query result from the database and sets the iterator to a posi-
tion before the first row in the result of the query. This becomes the current row for
the iterator. Subsequently, next operations are issued on the iterator object; each
next moves the iterator to the next row in the result of the query, making it the cur-
rent row. If the row exists, the operation retrieves the attribute values for that row
into the corresponding program variables. If no more rows exist, the next opera-
tion returns NULL, and can thus be used to control the looping. Notice that the
named iterator does not need an INTO clause, because the program variables corre-
sponding to the retrieved attributes are already specified when the iterator type is
declared (line 9 in Figure 8).
469
Introduction to SQL Programming Techniques
In Figure 8, the command (e.next()) in line 13 performs two functions: It gets
the next tuple in the query result and controls the while loop. Once the program is
done with processing the query result, the command e.close() (line 16) closes the
iterator.
Next, consider the same example using positional iterators as shown in Figure 9
(program segment J2B). Line 9 in Figure 9 shows how a positional iterator type
Emppos is declared. The main difference between this and the named iterator is that
there are no attribute names (corresponding to program variable names) in the
positional iterator—only attribute types. This can provide more flexibility, but
makes the processing of the query result slightly more complex. The attribute types
must still must be compatible with the attribute types in the SQL query result and in
the same order. Line 10 shows how a positional iterator object e of type Emppos is
created in the program and then associated with a query (lines 11 and 12).
The positional iterator behaves in a manner that is more similar to embedded SQL
(see Section 2.2). A FETCH INTO com-
mand is needed to get the next tuple in a query result. The first time fetch is exe-
cuted, it gets the first tuple (line 13 in Figure 9). Line 16 gets the next tuple until no
more tuples exist in the query result. To control the loop, a positional iterator func-
tion e.endFetch() is used. This function is set to a value of TRUE when the itera-
tor is initially associated with an SQL query (line 11), and is set to FALSE each time
Figure 9
Program segment J2B, a Java program segment that uses a positional
iterator to print employee information in a particular department.
//Program Segment J2B:
0) dname = readEntry(“Enter the Department Name: “) ;
1) try {
2) #sql { select Dnumber into :dnumber
3) from DEPARTMENT where Dname = :dname} ;
4) } catch (SQLException se) {
5) System.out.println(“Department does not exist: ” + dname) ;
6) Return ;
7) }
8) System.out.printline(“Employee information for Department: ” + dname) ;
9) #sql iterator Emppos(String, String, String, String, double) ;
10) Emppos e = null ;
11) #sql e = { select ssn, fname, minit, lname, salary
12) from EMPLOYEE where Dno = :dnumber} ;
13) #sql { fetch :e into :ssn, :fn, :mi, :ln, :sal} ;
14) while (!e.endFetch()) {
15) System.out.printline(ssn + ” ” + fn + ” ” + mi + ” ” + ln + ” ” + sal) ;
16) #sql { fetch :e into :ssn, :fn, :mi, :ln, :sal} ;
17) } ;
18) e.close() ;
470
Introduction to SQL Programming Techniques
a fetch command returns a valid tuple from the query result. It is set to TRUE again
when a fetch command does not find any more tuples. Line 14 shows how the loop-
ing is controlled by negation.
3 Database Programming with Function Calls:
SQL/CLI and JDBC
Embedded SQL (see Section 2) is sometimes referred to as a static database pro-
gramming approach because the query text is written within the program source
code and cannot be changed without recompiling or reprocessing the source code.
The use of function calls is a more dynamic approach for database programming
than embedded SQL. We already saw one dynamic database programming tech-
nique—dynamic SQL—in Section 2.3. The techniques discussed here provide
another approach to dynamic database programming. A library of functions, also
known as an application programming interface (API), is used to access the data-
base. Although this provides more flexibility because no preprocessor is needed, one
drawback is that syntax and other checks on SQL commands have to be done at
runtime. Another drawback is that it sometimes requires more complex program-
ming to access query results because the types and numbers of attributes in a query
result may not be known in advance.
In this section, we give an overview of two function call interfaces. We first discuss
the SQL Call Level Interface (SQL/CLI), which is part of the SQL standard. This
was developed as a follow-up to the earlier technique known as ODBC (Open
Database Connectivity). We use C as the host language in our SQL/CLI examples.
Then we give an overview of JDBC, which is the call function interface for accessing
databases from Java. Although it is commonly assumed that JDBC stands for Java
Database Connectivity, JDBC is just a registered trademark of Sun Microsystems,
not an acronym.
The main advantage of using a function call interface is that it makes it easier to
access multiple databases within the same application program, even if they are
stored under different DBMS packages. We discuss this further in Section 3.2 when
we discuss Java database programming with JDBC, although this advantage also
applies to database programming with SQL/CLI and ODBC (see Section 3.1).
3.1 Database Programming with SQL/CLI Using C
as the Host Language
Before using the function calls in SQL/CLI, it is necessary to install the appropriate
library packages on the database server. These packages are obtained from the ven-
dor of the DBMS being used. We now give an overview of how SQL/CLI can be used
in a C program.13 We will illustrate our presentation with the sample program seg-
ment CLI1 shown in Figure 10.
13Our discussion here also applies to the C++ programming language, since we do not use any of the
object-oriented features but focus on the database programming mechanism.
471
Introduction to SQL Programming Techniques
When using SQL/CLI, the SQL statements are dynamically created and passed as
string parameters in the function calls. Hence, it is necessary to keep track of the
information about host program interactions with the database in runtime data
structures because the database commands are processed at runtime. The informa-
tion is kept in four types of records, represented as structs in C data types. An
environment record is used as a container to keep track of one or more database
connections and to set environment information. A connection record keeps track
of the information needed for a particular database connection. A statement record
keeps track of the information needed for one SQL statement. A description record
keeps track of the information about tuples or parameters—for example, the num-
ber of attributes and their types in a tuple, or the number and types of parameters in
a function call. This is needed when the programmer does not know this informa-
tion about the query when writing the program. In our examples, we assume that the
programmer knows the exact query, so we do not show any description records.
Each record is accessible to the program through a C pointer variable—called a
handle to the record. The handle is returned when a record is first created. To create
a record and return its handle, the following SQL/CLI function is used:
SQLAllocHandle(, , )
Figure 10
Program segment CLI1, a C program
segment with SQL/CLI.
//Program CLI1:
0) #include sqlcli.h ;
1) void printSal() {
2) SQLHSTMT stmt1 ;
3) SQLHDBC con1 ;
4) SQLHENV env1 ;
5) SQLRETURN ret1, ret2, ret3, ret4 ;
6) ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1) ;
7) if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit ;
8) if (!ret2) ret3 = SQLConnect(con1, “dbs”, SQL_NTS, “js”, SQL_NTS, “xyz”,
SQL_NTS) else exit ;
9) if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit ;
10) SQLPrepare(stmt1, “select Lname, Salary from EMPLOYEE where Ssn = ?”,
SQL_NTS) ;
11) prompt(“Enter a Social Security Number: “, ssn) ;
12) SQLBindParameter(stmt1, 1, SQL_CHAR, &ssn, 9, &fetchlen1) ;
13) ret1 = SQLExecute(stmt1) ;
14) if (!ret1) {
15) SQLBindCol(stmt1, 1, SQL_CHAR, &lname, 15, &fetchlen1) ;
16) SQLBindCol(stmt1, 2, SQL_FLOAT, &salary, 4, &fetchlen2) ;
17) ret2 = SQLFetch(stmt1) ;
18) if (!ret2) printf(ssn, lname, salary)
19) else printf(“Social Security Number does not exist: “, ssn) ;
20) }
21) }
472
Introduction to SQL Programming Techniques
In this function, the parameters are as follows:
■ indicates the type of record being created. The possible val-
ues for this parameter are the keywords SQL_HANDLE_ENV,
SQL_HANDLE_DBC, SQL_HANDLE_STMT, or SQL_HANDLE_DESC, for an envi-
ronment, connection, statement, or description record, respectively.
■ indicates the container within which the new handle is being
created. For example, for a connection record this would be the environment
within which the connection is being created, and for a statement record this
would be the connection for that statement.
■ is the pointer (handle) to the newly created record of type
.
When writing a C program that will include database calls through SQL/CLI, the
following are the typical steps that are taken. We illustrate the steps by referring to
the example CLI1 in Figure 10, which reads a Social Security number of an
employee and prints the employee’s last name and salary.
1. The library of functions comprising SQL/CLI must be included in the C pro-
gram. This is called sqlcli.h, and is included using line 0 in Figure 10.
2. Declare handle variables of types SQLHSTMT, SQLHDBC, SQLHENV, and
SQLHDESC for the statements, connections, environments, and descriptions
needed in the program, respectively (lines 2 to 4).14 Also declare variables of
type SQLRETURN (line 5) to hold the return codes from the SQL/CLI func-
tion calls. A return code of 0 (zero) indicates successful execution of the func-
tion call.
3. An environment record must be set up in the program using
SQLAllocHandle. The function to do this is shown in line 6. Because an
environment record is not contained in any other record, the parameter
is the NULL handle SQL_NULL_HANDLE (NULL pointer) when
creating an environment. The handle (pointer) to the newly created environ-
ment record is returned in variable env1 in line 6.
4. A connection record is set up in the program using SQLAllocHandle. In line
7, the connection record created has the handle con1 and is contained in the
environment env1. A connection is then established in con1 to a particular
server database using the SQLConnect function of SQL/CLI (line 8). In our
example, the database server name we are connecting to is dbs and the
account name and password for login are js and xyz, respectively.
5. A statement record is set up in the program using SQLAllocHandle. In line
9, the statement record created has the handle stmt1 and uses the connec-
tion con1.
6. The statement is prepared using the SQL/CLI function SQLPrepare. In line
10, this assigns the SQL statement string (the query in our example) to the
14To keep our presentation simple, we will not show description records here.
473
Introduction to SQL Programming Techniques
statement handle stmt1. The question mark (?) symbol in line 10 represents
a statement parameter, which is a value to be determined at runtime—typ-
ically by binding it to a C program variable. In general, there could be several
parameters in a statement string. They are distinguished by the order of
appearance of the question marks in the statement string (the first ? repre-
sents parameter 1, the second ? represents parameter 2, and so on). The last
parameter in SQLPrepare should give the length of the SQL statement
string in bytes, but if we enter the keyword SQL_NTS, this indicates that the
string holding the query is a NULL-terminated string so that SQL can calcu-
late the string length automatically. This use of SQL_NTS also applies to other
string parameters in the function calls in our examples.
7. Before executing the query, any parameters in the query string should be
bound to program variables using the SQL/CLI function
SQLBindParameter. In Figure 10, the parameter (indicated by ?) to the pre-
pared query referenced by stmt1 is bound to the C program variable ssn in
line 12. If there are n parameters in the SQL statement, we should have n
SQLBindParameter function calls, each with a different parameter position
(1, 2, …, n).
8. Following these preparations, we can now execute the SQL statement refer-
enced by the handle stmt1 using the function SQLExecute (line 13). Notice
that although the query will be executed in line 13, the query results have not
yet been assigned to any C program variables.
9. In order to determine where the result of the query is returned, one common
technique is the bound columns approach. Here, each column in a query
result is bound to a C program variable using the SQLBindCol function. The
columns are distinguished by their order of appearance in the SQL query. In
Figure 10 lines 15 and 16, the two columns in the query (Lname
and Salary) are bound to the C program variables lname and salary,
respectively.15
10. Finally, in order to retrieve the column values into the C program variables,
the function SQLFetch is used (line 17). This function is similar to the
FETCH command of embedded SQL. If a query result has a collection of
tuples, each SQLFetch call gets the next tuple and returns its column values
into the bound program variables. SQLFetch returns an exception
(nonzero) code if there are no more tuples in the query result.16
15An alternative technique known as unbound columns uses different SQL/CLI functions, namely
SQLGetCol or SQLGetData, to retrieve columns from the query result without previously binding them;
these are applied after the SQLFetch command in line 17.
16If unbound program variables are used, SQLFetch returns the tuple into a temporary program area.
Each subsequent SQLGetCol (or SQLGetData) returns one attribute value in order. Basically, for each
row in the query result, the program should iterate over the attribute values (columns) in that row. This is
useful if the number of columns in the query result is variable.
474
Introduction to SQL Programming Techniques
Figure 11
Program segment CLI2, a C program segment that uses SQL/CLI
for a query with a collection of tuples in its result.
//Program Segment CLI2:
0) #include sqlcli.h ;
1) void printDepartmentEmps() {
2) SQLHSTMT stmt1 ;
3) SQLHDBC con1 ;
4) SQLHENV env1 ;
5) SQLRETURN ret1, ret2, ret3, ret4 ;
6) ret1 = SQLAllocHandle(SQL_HANDLE_ENV, SQL_NULL_HANDLE, &env1) ;
7) if (!ret1) ret2 = SQLAllocHandle(SQL_HANDLE_DBC, env1, &con1) else exit ;
8) if (!ret2) ret3 = SQLConnect(con1, “dbs”, SQL_NTS, “js”, SQL_NTS, “xyz”,
SQL_NTS) else exit ;
9) if (!ret3) ret4 = SQLAllocHandle(SQL_HANDLE_STMT, con1, &stmt1) else exit ;
10) SQLPrepare(stmt1, “select Lname, Salary from EMPLOYEE where Dno = ?”,
SQL_NTS) ;
11) prompt(“Enter the Department Number: “, dno) ;
12) SQLBindParameter(stmt1, 1, SQL_INTEGER, &dno, 4, &fetchlen1) ;
13) ret1 = SQLExecute(stmt1) ;
14) if (!ret1) {
15) SQLBindCol(stmt1, 1, SQL_CHAR, &lname, 15, &fetchlen1) ;
16) SQLBindCol(stmt1, 2, SQL_FLOAT, &salary, 4, &fetchlen2) ;
17) ret2 = SQLFetch(stmt1) ;
18) while (!ret2) {
19) printf(lname, salary) ;
20) ret2 = SQLFetch(stmt1) ;
21) }
22) }
23) }
As we can see, using dynamic function calls requires a lot of preparation to set up
the SQL statements and to bind statement parameters and query results to the
appropriate program variables.
In CLI1 a single tuple is selected by the SQL query. Figure 11 shows an example of
retrieving multiple tuples. We assume that appropriate C program variables have
been declared as in Figure 1. The program segment in CLI2 reads (inputs) a depart-
ment number and then retrieves the employees who work in that department. A
loop then iterates over each employee record, one at a time, and prints the
employee’s last name and salary.
475
Introduction to SQL Programming Techniques
3.2 JDBC: SQL Function Calls for Java Programming
We now turn our attention to how SQL can be called from the Java object-oriented
programming language.17 The function libraries for this access are known as
JDBC.18 The Java programming language was designed to be platform indepen-
dent—that is, a program should be able to run on any type of computer system that
has a Java interpreter installed. Because of this portability, many RDBMS vendors
provide JDBC drivers so that it is possible to access their systems via Java programs.
A JDBC driver is basically an implementation of the function calls specified in the
JDBC application programming interface (API) for a particular vendor’s RDBMS.
Hence, a Java program with JDBC function calls can access any RDBMS that has a
JDBC driver available.
Because Java is object-oriented, its function libraries are implemented as classes.
Before being able to process JDBC function calls with Java, it is necessary to import
the JDBC class libraries, which are called java.sql.*. These can be downloaded
and installed via the Web.19
JDBC is designed to allow a single Java program to connect to several different data-
bases. These are sometimes called the data sources accessed by the Java program.
These data sources could be stored using RDBMSs from different vendors and could
reside on different machines. Hence, different data source accesses within the same
Java program may require JDBC drivers from different vendors. To achieve this flex-
ibility, a special JDBC class called the driver manager class is employed, which keeps
track of the installed drivers. A driver should be registered with the driver manager
before it is used. The operations (methods) of the driver manager class include
getDriver, registerDriver, and deregisterDriver. These can be used to add
and remove drivers dynamically. Other functions set up and close connections to
data sources, as we will see.
To load a JDBC driver explicitly, the generic Java function for loading a class can be
used. For example, to load the JDBC driver for the Oracle RDBMS, the following
command can be used:
Class.forName(“oracle.jdbc.driver.OracleDriver”)
This will register the driver with the driver manager and make it available to the
program. It is also possible to load and register the driver(s) needed in the com-
mand line that runs the program, for example, by including the following in the
command line:
-Djdbc.drivers = oracle.jdbc.driver
17This section assumes familiarity with object-oriented concepts and basic Java concepts.
18As we mentioned earlier, JDBC is a registered trademark of Sun Microsystems, although it is com-
monly thought to be an acronym for Java Database Connectivity.
19These are available from several Web sites—for example, at http://industry.java.sun.com/products/
jdbc/drivers.
476
Introduction to SQL Programming Techniques
Figure 12
Program segment JDBC1, a Java program segment with JDBC.
//Program JDBC1:
0) import java.io.* ;
1) import java.sql.*
…
2) class getEmpInfo {
3) public static void main (String args []) throws SQLException, IOException {
4) try { Class.forName(“oracle.jdbc.driver.OracleDriver”)
5) } catch (ClassNotFoundException x) {
6) System.out.println (“Driver could not be loaded”) ;
7) }
8) String dbacct, passwrd, ssn, lname ;
9) Double salary ;
10) dbacct = readentry(“Enter database account:”) ;
11) passwrd = readentry(“Enter password:”) ;
12) Connection conn = DriverManager.getConnection
13) (“jdbc:oracle:oci8:” + dbacct + “/” + passwrd) ;
14) String stmt1 = “select Lname, Salary from EMPLOYEE where Ssn = ?” ;
15) PreparedStatement p = conn.prepareStatement(stmt1) ;
16) ssn = readentry(“Enter a Social Security Number: “) ;
17) p.clearParameters() ;
18) p.setString(1, ssn) ;
19) ResultSet r = p.executeQuery() ;
20) while (r.next()) {
21) lname = r.getString(1) ;
22) salary = r.getDouble(2) ;
23) system.out.printline(lname + salary) ;
24) } }
25) }
The following are typical steps that are taken when writing a Java application pro-
gram with database access through JDBC function calls. We illustrate the steps by
referring to the example JDBC1 in Figure 12, which reads a Social Security number
of an employee and prints the employee’s last name and salary.
1. The JDBC library of classes must be imported into the Java program. These
classes are called java.sql.*, and can be imported using line 1 in Figure
12. Any additional Java class libraries needed by the program must also be
imported.
2. Load the JDBC driver as discussed previously (lines 4 to 7). The Java excep-
tion in line 5 occurs if the driver is not loaded successfully.
3. Create appropriate variables as needed in the Java program (lines 8 and 9).
477
Introduction to SQL Programming Techniques
4. The Connection object. A connection object is created using the
getConnection function of the DriverManager class of JDBC. In lines 12
and 13, the Connection object is created by using the function call
getConnection(urlstring), where urlstring has the form
jdbc:oracle::/
An alternative form is
getConnection(url, dbaccount, password)
Various properties can be set for a connection object, but they are mainly
related to transactional properties.
5. The Statement object. A statement object is created in the program. In
JDBC, there is a basic statement class, Statement, with two specialized sub-
classes: PreparedStatement and CallableStatement. The example in
Figure 12 illustrates how PreparedStatement objects are created and used.
The next example (Figure 13) illustrates the other type of Statement
Figure 13
Program segment JDBC2, a Java program
segment that uses JDBC for a query with a
collection of tuples in its result.
//Program Segment JDBC2:
0) import java.io.* ;
1) import java.sql.*
…
2) class printDepartmentEmps {
3) public static void main (String args [])
throws SQLException, IOException {
4) try { Class.forName(“oracle.jdbc.driver.OracleDriver”)
5) } catch (ClassNotFoundException x) {
6) System.out.println (“Driver could not be loaded”) ;
7) }
8) String dbacct, passwrd, lname ;
9) Double salary ;
10) Integer dno ;
11) dbacct = readentry(“Enter database account:”) ;
12) passwrd = readentry(“Enter password:”) ;
13) Connection conn = DriverManager.getConnection
14) (“jdbc:oracle:oci8:” + dbacct + “/” + passwrd) ;
15) dno = readentry(“Enter a Department Number: “) ;
16) String q = “select Lname, Salary from EMPLOYEE where Dno = ” +
dno.tostring() ;
17) Statement s = conn.createStatement() ;
18) ResultSet r = s.executeQuery(q) ;
19) while (r.next()) {
20) lname = r.getString(1) ;
21) salary = r.getDouble(2) ;
22) system.out.printline(lname + salary) ;
23) } }
24) }
478
Introduction to SQL Programming Techniques
objects. In line 14 in Figure 12, a query string with a single parameter—indi-
cated by the ? symbol—is created in the string variable stmt1. In line 15, an
object p of type PreparedStatement is created based on the query string in
stmt1 and using the connection object conn. In general, the programmer
should use PreparedStatement objects if a query is to be executed multiple
times, since it would be prepared, checked, and compiled only once, thus sav-
ing this cost for the additional executions of the query.
6. Setting the statement parameters. The question mark (?) symbol in line 14
represents a statement parameter, which is a value to be determined at run-
time, typically by binding it to a Java program variable. In general, there
could be several parameters, distinguished by the order of appearance of the
question marks within the statement string (first ? represents parameter 1,
second ? represents parameter 2, and so on), as we discussed previously.
7. Before executing a PreparedStatement query, any parameters should be
bound to program variables. Depending on the type of the parameter, differ-
ent functions such as setString, setInteger, setDouble, and so on are
applied to the PreparedStatement object to set its parameters. The appro-
priate function should be used to correspond to the data type of the param-
eter being set. In Figure 12, the parameter (indicated by ?) in object p is
bound to the Java program variable ssn in line 18. The function setString
is used because ssn is a string variable. If there are n parameters in the SQL
statement, we should have n set… functions, each with a different param-
eter position (1, 2, …, n). Generally, it is advisable to clear all parameters
before setting any new values (line 17).
8. Following these preparations, we can now execute the SQL statement refer-
enced by the object p using the function executeQuery (line 19). There is a
generic function execute in JDBC, plus two specialized functions:
executeUpdate and executeQuery. executeUpdate is used for SQL
insert, delete, or update statements, and returns an integer value indicating
the number of tuples that were affected. executeQuery is used for SQL
retrieval statements, and returns an object of type ResultSet, which we dis-
cuss next.
9. The ResultSet object. In line 19, the result of the query is returned in an
object r of type ResultSet. This resembles a two-dimensional array or a
table, where the tuples are the rows and the attributes returned are the
columns. A ResultSet object is similar to a cursor in embedded SQL and
an iterator in SQLJ. In our example, when the query is executed, r refers to a
tuple before the first tuple in the query result. The r.next() function (line
20) moves to the next tuple (row) in the ResultSet object and returns NULL
if there are no more objects. This is used to control the looping. The pro-
grammer can refer to the attributes in the current tuple using various
get… functions that depend on the type of each attribute (for example,
getString, getInteger, getDouble, and so on). The programmer can
either use the attribute positions (1, 2) or the actual attribute names
479
Introduction to SQL Programming Techniques
(“Lname”, “Salary”) with the get… functions. In our examples, we used
the positional notation in lines 21 and 22.
In general, the programmer can check for SQL exceptions after each JDBC function
call. We did not do this to simplify the examples.
Notice that JDBC does not distinguish between queries that return single tuples and
those that return multiple tuples, unlike some of the other techniques. This is justi-
fiable because a single tuple result set is just a special case.
In example JDBC1, a single tuple is selected by the SQL query, so the loop in lines 20
to 24 is executed at most once. The example shown in Figure 13 illustrates the
retrieval of multiple tuples. The program segment in JDBC2 reads (inputs) a
department number and then retrieves the employees who work in that depart-
ment. A loop then iterates over each employee record, one at a time, and prints the
employee’s last name and salary. This example also illustrates how we can execute a
query directly, without having to prepare it as in the previous example. This tech-
nique is preferred for queries that will be executed only once, since it is simpler to
program. In line 17 of Figure 13, the programmer creates a Statement object
(instead of PreparedStatement, as in the previous example) without associating it
with a particular query string. The query string q is passed to the statement object s
when it is executed in line 18.
This concludes our brief introduction to JDBC. The interested reader is referred to
the Web site http://java.sun.com/docs/books/tutorial/jdbc/, which contains many
further details about JDBC.
4 Database Stored Procedures
and SQL/PSM
This section introduces two additional topics related to database programming. In
Section 4.1, we discuss the concept of stored procedures, which are program mod-
ules that are stored by the DBMS at the database server. Then in Section 4.2 we dis-
cuss the extensions to SQL that are specified in the standard to include
general-purpose programming constructs in SQL. These extensions are known as
SQL/PSM (SQL/Persistent Stored Modules) and can be used to write stored proce-
dures. SQL/PSM also serves as an example of a database programming language
that extends a database model and language—namely, SQL—with some program-
ming constructs, such as conditional statements and loops.
4.1 Database Stored Procedures and Functions
In our presentation of database programming techniques so far, there was an
implicit assumption that the database application program was running on a client
machine, or more likely at the application server computer in the middle-tier of a
three-tier client-server architecture. In either case, the machine where the program
is executing is different from the machine on which the database server—and the
480
Introduction to SQL Programming Techniques
main part of the DBMS software package—is located. Although this is suitable for
many applications, it is sometimes useful to create database program modules—
procedures or functions—that are stored and executed by the DBMS at the database
server. These are historically known as database stored procedures, although they
can be functions or procedures. The term used in the SQL standard for stored pro-
cedures is persistent stored modules because these programs are stored persistently
by the DBMS, similarly to the persistent data stored by the DBMS.
Stored procedures are useful in the following circumstances:
■ If a database program is needed by several applications, it can be stored at
the server and invoked by any of the application programs. This reduces
duplication of effort and improves software modularity.
■ Executing a program at the server can reduce data transfer and communica-
tion cost between the client and server in certain situations.
■ These procedures can enhance the modeling power provided by views by
allowing more complex types of derived data to be made available to the
database users. Additionally, they can be used to check for complex con-
straints that are beyond the specification power of assertions and triggers.
In general, many commercial DBMSs allow stored procedures and functions to be
written in a general-purpose programming language. Alternatively, a stored proce-
dure can be made of simple SQL commands such as retrievals and updates. The
general form of declaring stored procedures is as follows:
CREATE PROCEDURE ()
;
The parameters and local declarations are optional, and are specified only if needed.
For declaring a function, a return type is necessary, so the declaration form is
CREATE FUNCTION ()
RETURNS
;
If the procedure (or function) is written in a general-purpose programming lan-
guage, it is typical to specify the language as well as a file name where the program
code is stored. For example, the following format can be used:
CREATE PROCEDURE ()
LANGUAGE
EXTERNAL NAME ;
In general, each parameter should have a parameter type that is one of the SQL data
types. Each parameter should also have a parameter mode, which is one of IN, OUT,
or INOUT. These correspond to parameters whose values are input only, output
(returned) only, or both input and output, respectively.
481
Introduction to SQL Programming Techniques
Because the procedures and functions are stored persistently by the DBMS, it
should be possible to call them from the various SQL interfaces and programming
techniques. The CALL statement in the SQL standard can be used to invoke a stored
procedure—either from an interactive interface or from embedded SQL or SQLJ.
The format of the statement is as follows:
CALL () ;
If this statement is called from JDBC, it should be assigned to a statement object of
type CallableStatement (see Section 3.2).
4.2 SQL/PSM: Extending SQL for Specifying Persistent Stored
Modules
SQL/PSM is the part of the SQL standard that specifies how to write persistent
stored modules. It includes the statements to create functions and procedures that
we described in the previous section. It also includes additional programming con-
structs to enhance the power of SQL for the purpose of writing the code (or body)
of stored procedures and functions.
In this section, we discuss the SQL/PSM constructs for conditional (branching)
statements and for looping statements. These will give a flavor of the type of con-
structs that SQL/PSM has incorporated;20 then we give an example to illustrate how
these constructs can be used.
The conditional branching statement in SQL/PSM has the following form:
IF THEN
ELSEIF THEN
…
ELSEIF THEN
ELSE
END IF ;
Consider the example in Figure 14, which illustrates how the conditional branch
structure can be used in an SQL/PSM function. The function returns a string value
(line 1) describing the size of a department within a company based on the number
of employees. There is one IN integer parameter, deptno, which gives a department
number. A local variable NoOfEmps is declared in line 2. The query in lines 3 and 4
returns the number of employees in the department, and the conditional branch in
lines 5 to 8 then returns one of the values {‘HUGE’, ‘LARGE’, ‘MEDIUM’, ‘SMALL’}
based on the number of employees.
SQL/PSM has several constructs for looping. There are standard while and repeat
looping structures, which have the following forms:
20We only give a brief introduction to SQL/PSM here. There are many other features in the SQL/PSM
standard.
482
Introduction to SQL Programming Techniques
Figure 14
Declaring a function in
SQL/PSM.
//Function PSM1:
0) CREATE FUNCTION Dept_size(IN deptno INTEGER)
1) RETURNS VARCHAR [7]
2) DECLARE No_of_emps INTEGER ;
3) SELECT COUNT(*) INTO No_of_emps
4) FROM EMPLOYEE WHERE Dno = deptno ;
5) IF No_of_emps > 100 THEN RETURN “HUGE”
6) ELSEIF No_of_emps > 25 THEN RETURN “LARGE”
7) ELSEIF No_of_emps > 10 THEN RETURN “MEDIUM”
8) ELSE RETURN “SMALL”
9) END IF ;
WHILE DO
END WHILE ;
REPEAT
UNTIL
END REPEAT ;
There is also a cursor-based looping structure. The statement list in such a loop is
executed once for each tuple in the query result. This has the following form:
FOR AS CURSOR FOR DO
END FOR ;
Loops can have names, and there is a LEAVE statement to break a loop
when a condition is satisfied. SQL/PSM has many other features, but they are out-
side the scope of our presentation.
5 Comparing the Three Approaches
In this section, we briefly compare the three approaches for database programming
and discuss the advantages and disadvantages of each approach.
1. Embedded SQL Approach. The main advantage of this approach is that the
query text is part of the program source code itself, and hence can be
checked for syntax errors and validated against the database schema at com-
pile time. This also makes the program quite readable, as the queries are
readily visible in the source code. The main disadvantages are the loss of flex-
ibility in changing the query at runtime, and the fact that all changes to
queries must go through the whole recompilation process. In addition,
because the queries are known beforehand, the choice of program variables
to hold the query results is a simple task, and so the programming of the
application is generally easier. However, for complex applications where
483
Introduction to SQL Programming Techniques
queries have to be generated at runtime, the function call approach will be
more suitable.
2. Library of Function Calls Approach. This approach provides more flexibil-
ity in that queries can be generated at runtime if needed. However, this leads
to more complex programming, as program variables that match the
columns in the query result may not be known in advance. Because queries
are passed as statement strings within the function calls, no checking can be
done at compile time. All syntax checking and query validation has to be
done at runtime, and the programmer must check and account for possible
additional runtime errors within the program code.
3. Database Programming Language Approach. This approach does not suf-
fer from the impedance mismatch problem, as the programming language
data types are the same as the database data types. However, programmers
must learn a new programming language rather than use a language they are
already familiar with. In addition, some database programming languages
are vendor-specific, whereas general-purpose programming languages can
easily work with systems from multiple vendors.
6 Summary
In this chapter we presented additional features of the SQL database language. In
particular, we presented an overview of the most important techniques for database
programming in Section 1. Then we discussed the various approaches to database
application programming in Sections 2 to 4.
In Section 2, we discussed the general technique known as embedded SQL, where
the queries are part of the program source code. A precompiler is typically used to
extract SQL commands from the program for processing by the DBMS, and replac-
ing them with function calls to the DBMS compiled code. We presented an overview
of embedded SQL, using the C programming language as host language in our
examples. We also discussed the SQLJ technique for embedding SQL in Java pro-
grams. The concepts of cursor (for embedded SQL) and iterator (for SQLJ) were
presented and illustrated by examples to show how they are used for looping over
the tuples in a query result, and extracting the attribute value into program vari-
ables for further processing.
In Section 3, we discussed how function call libraries can be used to access SQL
databases. This technique is more dynamic than embedding SQL, but requires more
complex programming because the attribute types and number in a query result
may be determined at runtime. An overview of the SQL/CLI standard was pre-
sented, with examples using C as the host language. We discussed some of the func-
tions in the SQL/CLI library, how queries are passed as strings, how query
parameters are assigned at runtime, and how results are returned to program vari-
ables. We then gave an overview of the JDBC class library, which is used with Java,
and discussed some of its classes and operations. In particular, the ResultSet class
is used to create objects that hold the query results, which can then be iterated over
484
Introduction to SQL Programming Techniques
by the next() operation. The get and set functions for retrieving attribute values
and setting parameter values were also discussed.
In Section 4 we gave a brief overview of stored procedures, and discussed SQL/PSM
as an example of a database programming language. Finally, we briefly compared
the three approaches in Section 5. It is important to note that we chose to give a
comparative overview of the three main approaches to database programming,
since studying a particular approach in depth is a topic that is worthy of its own
textbook.
Review Questions
1. What is ODBC? How is it related to SQL/CLI?
2. What is JDBC? Is it an example of embedded SQL or of using function calls?
3. List the three main approaches to database programming. What are the
advantages and disadvantages of each approach?
4. What is the impedance mismatch problem? Which of the three program-
ming approaches minimizes this problem?
5. Describe the concept of a cursor and how it is used in embedded SQL.
6. What is SQLJ used for? Describe the two types of iterators available in SQLJ.
Exercises
7. Consider the database shown in Figure A.2, whose schema is shown in
Figure A.3. Write a program segment to read a student’s name and print his
or her grade point average, assuming that A=4, B=3, C=2, and D=1 points.
Use embedded SQL with C as the host language.
8. Repeat Exercise 7, but use SQLJ with Java as the host language.
9. Consider the library relational database schema in Figure A.4. Write a pro-
gram segment that retrieves the list of books that became overdue yesterday
and that prints the book title and borrower name for each. Use embedded
SQL with C as the host language.
10. Repeat Exercise 9, but use SQLJ with Java as the host language.
11. Repeat Exercises 7 and 9, but use SQL/CLI with C as the host language.
12. Repeat Exercises 7 and 9, but use JDBC with Java as the host language.
13. Repeat Exercise 7, but write a function in SQL/PSM.
14. Create a function in PSM that computes the median salary for the
EMPLOYEE table shown in Figure A.1.
485
Introduction to SQL Programming Techniques
Selected Bibliography
There are many books that describe various aspects of SQL database programming.
For example, Sunderraman (2007) describes programming on the Oracle 10g
DBMS and Reese (1997) focuses on JDBC and Java programming. Many Web
resources are also available.
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.1
Schema diagram for the
COMPANY relational
database schema.
486
Introduction to SQL Programming Techniques
Name Student_number Class Major
Smith 17 1 CS
Brown 8 2 CS
STUDENT
Course_name Course_number Credit_hours Department
Intro to Computer Science CS1310 4 CS
Data Structures CS3320 4 CS
Discrete Mathematics MATH2410 3 MATH
Database CS3380 3 CS
COURSE
Section_identifier Course_number Semester Year Instructor
85 MATH2410 Fall 07 King
92 CS1310 Fall 07 Anderson
102 CS3320 Spring 08 Knuth
112 MATH2410 Fall 08 Chang
119 CS1310 Fall 08 Anderson
135 CS3380 Fall 08 Stone
SECTION
Student_number Section_identifier Grade
17 112 B
17 119 C
8 85 A
8 92 A
8 102 B
8 135 A
GRADE_REPORT
Course_number Prerequisite_number
CS3380 CS3320
CS3380 MATH2410
CS3320 CS1310
PREREQUISITE
Figure A.2
A database that stores
student and course
information.
487
Introduction to SQL Programming Techniques
Section_identifier SemesterCourse_number InstructorYear
SECTION
Course_name Course_number Credit_hours Department
COURSE
Name Student_number Class Major
STUDENT
Course_number Prerequisite_number
PREREQUISITE
Student_number GradeSection_identifier
GRADE_REPORT
Figure A.3
Schema diagram for the
database in Figure A.2.
488
Introduction to SQL Programming Techniques
Publisher_nameBook_id Title
BOOK
BOOK_COPIES
Book_id Branch_id No_of_copies
BOOK_AUTHORS
Book_id Author_name
LIBRARY_BRANCH
Branch_id Branch_name Address
PUBLISHER
Name Address Phone
BOOK_LOANS
Book_id Branch_id Card_no Date_out Due_date
BORROWER
Card_no Name Address Phone
Figure A.4
A relational database
schema for a
LIBRARY database.
489
Web Database
Programming Using PHP
In this chapter, we direct our attention to how data-bases are accessed from scripting languages. Many
electronic commerce (e-commerce) and other Internet applications that provide
Web interfaces to access information stored in one or more databases use scripting
languages. These languages are often used to generate HTML documents, which are
then displayed by the Web browser for interaction with the user.
Basic HTML is useful for generating static Web pages with fixed text and other
objects, but most e-commerce applications require Web pages that provide interac-
tive features with the user. For example, consider the case of an airline customer
who wants to check the arrival time and gate information of a particular flight. The
user may enter information such as a date and flight number in certain form fields
of the Web page. The Web program must first submit a query to the airline database
to retrieve this information, and then display it. Such Web pages, where part of the
information is extracted from databases or other data sources, are called dynamic
Web pages. The data extracted and displayed each time will be for different flights
and dates.
There are various techniques for programming dynamic features into Web pages.
We will focus on one technique here, which is based on using the PHP open source
scripting language. PHP has recently experienced widespread use. The interpreters
for PHP are provided free of charge, and are written in the C language so they are
From Chapter 14 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
490
Web Database Programming Using PHP
available on most computer platforms. A PHP interpreter provides a Hypertext
Preprocessor, which will execute PHP commands in a text file and create the desired
HTML file. To access databases, a library of PHP functions needs to be included in
the PHP interpreter as we will discuss in Section 3. PHP programs are executed on
the Web server computer. This is in contrast to some scripting languages, such as
JavaScript, that are executed on the client computer.
This chapter is organized as follows. Section 1 gives a simple example to illustrate
how PHP can be used. Section 2 gives a general overview of the PHP language, and
how it is used to program some basic functions for interactive Web pages. Section 3
focuses on using PHP to interact with SQL databases through a library of functions
known as PEAR DB. Finally, Section 4 contains a chapter summary.
1 A Simple PHP Example
PHP is an open source general-purpose scripting language. The interpreter engine
for PHP is written in the C programming language so it can be used on nearly all
types of computers and operating systems. PHP usually comes installed with the
UNIX operating system. For computer platforms with other operating systems such
as Windows, Linux, or Mac OS, the PHP interpreter can be downloaded from:
http://www.php.net. As with other scripting languages, PHP is particularly suited
for manipulation of text pages, and in particular for manipulating dynamic HTML
pages at the Web server computer. This is in contrast to JavaScript, which is down-
loaded with the Web pages to execute on the client computer.
PHP has libraries of functions for accessing databases stored under various types of
relational database systems such as Oracle, MySQL, SQLServer, and any system that
supports the ODBC standard. Under the three-tier architecture, the DBMS would
reside at the bottom-tier database server. PHP would run at the middle-tier Web
server, where the PHP program commands would manipulate the HTML files to
create the customized dynamic Web pages. The HTML is then sent to the client tier
for display and interaction with the user.
Consider the example shown in Figure 1(a), which prompts a user to enter the first
and last name and then prints a welcome message to that user. The line numbers are
not part of the program code; they are used below for explanation purposes only:
1. Suppose that the file containing PHP script in program segment P1 is stored
in the following Internet location: http://www.myserver.com/example/
greeting.php. Then if a user types this address in the browser, the PHP inter-
preter would start interpreting the code and produce the form shown in
Figure 1(b). We will explain how that happens as we go over the lines in code
segment P1.
2. Line 0 shows the PHP start tag , shown on line 16. Text outside of these tags is
491
Enter your name:
SUBMIT NAME SUBMIT NAME
Enter your name:
(b) (c)
(d)
John Smith
Welcome, John Smith
Web Database Programming Using PHP
Figure 1
(a) PHP program segment for entering a greeting,
(b) Initial form displayed by PHP program segment,
(c) User enters name John Smith, (d) Form prints
welcome message for John Smith.
(a)
//Program Segment P1:
0)
10) Enter your name:
11)
12)
13)
14) _HTML_;
15) }
16) ?>
printed as is. This allows PHP code segments to be included within a larger
HTML file. Only the sections in the file between are processed
by the PHP preprocessor.
3. Line 1 shows one way of posting comments in a PHP program on a single
line started by //. Single-line comments can also be started with #, and end
at the end of the line in which they are entered. Multiple line comments start
with /* and end with */.
4. The auto-global predefined PHP variable $_POST (line 2) is an array that
holds all the values entered through form parameters. Arrays in PHP are
dynamic arrays, with no fixed number of elements. They can be numerically
indexed arrays whose indexes (positions) are numbered (0, 1, 2, …), or they
492
Web Database Programming Using PHP
can be associative arrays whose indexes can be any string values. For example,
an associative array indexed based on color can have the indexes {“red”,
“blue”, “green”}. In this example, $_POST is associatively indexed by the name
of the posted value user_name that is specified in the name attribute of the
input tag on line 10. Thus $_POST[‘user_name’] will contain the value
typed in by the user. We will discuss PHP arrays further in Section 2.2.
5. When the Web page at http://www.myserver.com/example/greeting.php is
first opened, the if condition in line 2 will evaluate to false because there is
no value yet in $_POST[‘user_name’]. Hence, the PHP interpreter will
process lines 6 through 15, which create the text for an HTML file that dis-
plays the form shown in Figure 1(b). This is then displayed at the client side
by the Web browser.
6. Line 8 shows one way of creating long text strings in an HTML file. We will
discuss other ways to specify strings later in this section. All text between an
opening <<<_HTML_ and a closing _HTML_; is printed into the HTML file as
is. The closing _HTML_; must be alone on a separate line. Thus, the text
added to the HTML file sent to the client will be the text between lines 9 and
13. This includes HTML tags to create the form shown in Figure 1(b).
7. PHP variable names start with a $ sign and can include characters, num-
bers, and the underscore character _. The PHP auto-global (predefined)
variable $_SERVER (line 9) is an array that includes information about the
local server. The element $_SERVER['PHP_SELF'] in the array is the path
name of the PHP file currently being executed on the server. Thus, the action
attribute of the form tag (line 9) instructs the PHP interpreter to reprocess
the same file, once the form parameters are entered by the user.
8. Once the user types the name John Smith in the text box and clicks on the
SUBMIT NAME button (Figure 1(c)), program segment P1 is reprocessed.
This time, $_POST['user_name'] will include the string "John Smith",
so lines 3 and 4 will now be placed in the HTML file sent to the client, which
displays the message in Figure 1(d).
As we can see from this example, the PHP program can create two different HTML
commands depending on whether the user just started or whether they had already
submitted their name through the form. In general, a PHP program can create
numerous variations of HTML text in an HTML file at the server depending on the
particular conditional paths taken in the program. Hence, the HTML sent to the
client will be different depending on the interaction with the user. This is one way in
which PHP is used to create dynamic Web pages.
2 Overview of Basic Features of PHP
In this section we give an overview of a few of the features of PHP that are useful
in creating interactive HTML pages. Section 3 will focus on how PHP programs
can access databases for querying and updating. We cannot give a comprehensive
493
Web Database Programming Using PHP
discussion on PHP as there are whole books devoted to this subject. Rather, we
focus on illustrating certain features of PHP that are particularly suited for creating
dynamic Web pages that contain database access commands. This section covers
some PHP concepts and features that will be needed when we discuss database
access in Section 3.
2.1 PHP Variables, Data Types, and Programming Constructs
PHP variable names start with the $ symbol and can include characters, letters, and
the underscore character (_). No other special characters are permitted. Variable
names are case sensitive, and the first character cannot be a number. Variables are
not typed. The values assigned to the variables determine their type. In fact, the
same variable can change its type once a new value is assigned to it. Assignment is
via the = operator.
Since PHP is directed toward text processing, there are several different types of
string values. There are also many functions available for processing strings. We
only discuss some basic properties of string values and variables here. Figure 2 illus-
trates some string values. There are three main ways to express strings and text:
1. Single-quoted strings. Enclose the string between single quotes, as in lines
0, 1, and 2. If a single quote is needed within the string, use the escape char-
acter (\) (see line 2).
2. Double-quoted strings. Enclose strings between double quotes as in line 7.
In this case, variable names appearing within the string are replaced by the
values that are currently stored in these variables. The interpreter identifies
variable names within double-quoted strings by their initial character $ and
replaces them with the value in the variable. This is known as interpolating
variables within strings. Interpolation does not occur in single-quoted
strings.
Figure 2
Illustrating basic PHP
string and text values.
0) print 'Welcome to my Web site.';
1) print 'I said to him, "Welcome Home"';
2) print 'We\'ll now visit the next Web site';
3) printf('The cost is $%.2f and the tax is $%.2f',
$cost, $tax) ;
4) print strtolower('AbCdE');
5) print ucwords(strtolower('JOHN smith'));
6) print 'abc' . 'efg'
7) print "send your email reply to: $email_address"
8) print <<
10) Enter your name:
11) FORM_HTML
494
Web Database Programming Using PHP
3. Here documents. Enclose a part of a document between a << (greater than), >= (greater than or equal), < (less than), and <= (less than
or equal).
2.2 PHP Arrays
Arrays are very important in PHP, since they allow lists of elements. They are used
frequently in forms that employ pull-down menus. A single-dimensional array is
used to hold the list of choices in the pull-down menu. For database query results,
two-dimensional arrays are used with the first dimension representing rows of a
table and the second dimension representing columns (attributes) within a row.
495
Web Database Programming Using PHP
Figure 3
Illustrating basic PHP array processing.
0) $teaching = array('Database' => ‘Smith’, ‘OS’ => ‘Carrick’,
‘Graphics’ => ‘Kam’);
1) $teaching[‘Graphics’] = ‘Benson’; $teaching[‘Data Mining’] = ‘Kam’;
2) sort($teaching);
3) foreach ($teaching as $key => $value) {
4) print ” $key : $value\n”;}
5) $courses = array(‘Database’, ‘OS’, ‘Graphics’, ‘Data Mining’);
6) $alt_row_color = array(‘blue’, ‘yellow’);
7) for ($i = 0, $num = count($courses); i < $num; $i++) {
8) print '
‘;
9) print “Course $i is $course[$i] \n”;
10) }
There are two main types of arrays: numeric and associative. We discuss each of
these in the context of single-dimensional arrays next.
A numeric array associates a numeric index (or position or sequence number) with
each element in the array. Indexes are integer numbers that start at zero and grow
incrementally. An element in the array is referenced through its index. An
associative array provides pairs of (key => value) elements. The value of an element
is referenced through its key, and all key values in a particular array must be unique.
The element values can be strings or integers, or they can be arrays themselves, thus
leading to higher dimensional arrays.
Figure 3 gives two examples of array variables: $teaching and $courses. The first
array $teaching is associative (see line 0 in Figure 3), and each element associates a
course name (as key) with the name of the course instructor (as value). There are
three elements in this array. Line 1 shows how the array may be updated. The first
command in line 1 assigns a new instructor to the course ‘Graphics’ by updating its
value. Since the key value ‘Graphics’ already exists in the array, no new element is
created but the existing value is updated. The second command creates a new ele-
ment since the key value ‘Data Mining’ did not exist in the array before. New ele-
ments are added at the end of the array.
If we only provide values (no keys) as array elements, the keys are automatically
numeric and numbered 0, 1, 2, …. This is illustrated in line 5 of Figure 3, by the
$courses array. Both associative and numeric arrays have no size limits. If some
value of another data type, say an integer, is assigned to a PHP variable that was
holding an array, the variable now holds the integer value and the array contents are
lost. Basically, most variables can be assigned to values of any data type at any time.
There are several different techniques for looping through arrays in PHP. We illus-
trate two of these techniques in Figure 3. Lines 3 and 4 show one method of looping
through all the elements in an array using the foreach construct, and printing the
496
Web Database Programming Using PHP
key and value of each element on a separate line. Lines 7 through 10 show how a tra-
ditional for-loop construct can be used. A built-in function count (line 7) returns
the current number of elements in the array, which is assigned to the variable $num
and used to control ending the loop.
The example in lines 7 through 10 also illustrates how an HTML table can be dis-
played with alternating row colors, by setting the two colors in an array
$alt_row_color (line 8). Each time through the loop, the remainder function $i
% 2 switches from one row (index 0) to the next (index 1) (see line 8). The color is
assigned to the HTML bgcolor attribute of the
(table row) tag.
The count function (line 7) returns the current number of elements in the array.
The sort function (line 2) sorts the array based on the element values in it (not the
keys). For associative arrays, each key remains associated with the same element
value after sorting. This does not occur when sorting numeric arrays. There are
many other functions that can be applied to PHP arrays, but a full discussion is out-
side the scope of our presentation.
2.3 PHP Functions
As with other programming languages, functions can be defined in PHP to better
structure a complex program and to share common sections of code that can be
reused by multiple applications. The newer version of PHP, PHP5, also has object-
oriented features, but we will not discuss these here as we are focusing on the basics
of PHP. Basic PHP functions can have arguments that are passed by value. Global
variables can be accessed within functions. Standard scope rules apply to variables
that appear within a function and within the code that calls the function.
We now give two simple examples to illustrate basic PHP functions. In Figure 4, we
show how we could rewrite the code segment P1 from Figure 1(a) using functions.
The code segment P1� in Figure 4 has two functions: display_welcome() (lines 0
to 3) and display_empty_form() (lines 5 to 13). Neither of these functions has
arguments nor do they have return values. Lines 14 through 19 show how we can
call these functions to produce the same effect as the segment of code P1 in Figure
1(a). As we can see in this example, functions can be used just to make the PHP code
better structured and easier to follow.
A second example is shown in Figure 5. Here we are using the $teaching array
introduced in Figure 3. The function course_instructor() in lines 0 to 8 in
Figure 5 has two arguments: $course (a string holding a course name) and
$teaching_assignments (an associative array holding course assignments, simi-
lar to the $teaching array shown in Figure 3). The function finds the name of the
instructor who teaches a particular course. Lines 9 to 14 in Figure 5 show how this
function may be used.
The function call in line 11 would return the string: Smith is teaching Database,
because the array entry with the key ‘Database’ has the value ‘Smith’ for instructor.
On the other hand, the function call on line 13 would return the string: there is no
Computer Architecture course because there is no entry in the array with the key
497
Web Database Programming Using PHP
Figure 4
Rewriting program segment P1 as P1� using functions.
//Program Segment P1′:
0) function display_welcome() {
1) print(“Welcome, “) ;
2) print($_POST[‘user_name’]);
3) }
4)
5) function display_empty_form(); {
6) print <<<_HTML_
7)
12) _HTML_;
13) }
14) if ($_POST[‘user_name’]) {
15) display_welcome();
16) }
17) else {
18) display_empty_form();
19) }
Figure 5
Illustrating a function with arguments and return value.
0) function course_instructor ($course, $teaching_assignments) {
1) if (array_key_exists($course, $teaching_assignments)) {
2) $instructor = $teaching_assignments[$course];
3) RETURN “$instructor is teaching $course”;
4) }
5) else {
6) RETURN “there is no $course course”;
7) }
8) }
9) $teaching = array(‘Database’ => ‘Smith’, ‘OS’ => ‘Carrick’,
‘Graphics’ => ‘Kam’);
10) $teaching[‘Graphics’] = ‘Benson’; $teaching[‘Data Mining’] = ‘Kam’;
11) $x = course_instructor(‘Database’, $teaching);
12) print($x);
13) $x = course_instructor(‘Computer Architecture’, $teaching);
14) print($x);
498
Web Database Programming Using PHP
‘Computer Architecture’. A few comments about this example and about PHP func-
tions in general:
■ The built-in PHP array function array_key_exists($k, $a) returns true
if the value in variable $k exists as a key in the associative array in the variable
$a. In our example, it checks whether the $course value provided exists as a
key in the array $teaching_assignments (line 1 in Figure 5).
■ Function arguments are passed by value. Hence, in this example, the calls in
lines 11 and 13 could not change the array $teaching provided as argument
for the call. The values provided in the arguments are passed (copied) to the
function arguments when the function is called.
■ Return values of a function are placed after the RETURN keyword. A function
can return any type. In this example, it returns a string type. Two different
strings can be returned in our example, depending on whether the $course
key value provided exists in the array or not.
■ Scope rules for variable names apply as in other programming languages.
Global variables outside of the function cannot be used unless they are
referred to using the built-in PHP array $GLOBALS. Basically,
$GLOBALS[‘abc’] will access the value in a global variable $abc defined
outside the function. Otherwise, variables appearing inside a function are
local even if there is a global variable with the same name.
The previous discussion gives a brief overview of PHP functions. Many details are
not discussed since it is not our goal to present PHP in detail.
2.4 PHP Server Variables and Forms
There are a number of built-in entries in a PHP auto-global built-in array variable
called $_SERVER that can provide the programmer with useful information about
the server where the PHP interpreter is running, as well as other information. These
may be needed when constructing the text in an HTML document (for example, see
line 7 in Figure 4). Here are some of these entries:
1. $_SERVER[‘SERVER_NAME’]. This provides the Web site name of the server
computer where the PHP interpreter is running. For example, if the PHP
interpreter is running on the Web site http://www.uta.edu, then this string
would be the value in $_SERVER[‘SERVER_NAME’].
2. $_SERVER[‘REMOTE_ADDRESS’]. This is the IP (Internet Protocol) address
of the client user computer that is accessing the server, for example
129.107.61.8.
3. $_SERVER[‘REMOTE_HOST’]. This is the Web site name of the client user
computer, for example abc.uta.edu. In this case, the server will need to trans-
late the name into an IP address to access the client.
4. $_SERVER[‘PATH_INFO’]. This is the part of the URL address that comes
after a backslash (/) at the end of the URL.
499
Web Database Programming Using PHP
5. $_SERVER[‘QUERY_STRING’]. This provides the string that holds parame-
ters in a URL after a question mark (?) at the end of the URL. This can hold
search parameters, for example.
6. $_SERVER[‘DOCUMENT_ROOT’]. This is the root directory that holds the
files on the Web server that are accessible to client users.
These and other entries in the $_SERVER array are usually needed when creating the
HTML file to be sent for display.
Another important PHP auto-global built-in array variable is called $_POST. This
provides the programmer with input values submitted by the user through HTML
forms specified in the HTML tag and other similar tags. For example, in
Figure 4 line 14, the variable $_POST[‘user_name’] provides the programmer
with the value typed in by the user in the HTML form specified via the tag
on line 8. The keys to this array are the names of the various input parameters pro-
vided via the form, for example by using the name attribute of the HTML
tag as on line 8. When users enter data through forms, the data values can be stored
in this array.
3 Overview of PHP Database Programming
There are various techniques for accessing a database through a programming lan-
guage, such as accessing a SQL database using the C and Java programming lan-
guages, particulary, embedded SQL, JDBC, SQL/CLI (similar to ODBC), and SQLJ.
In this section we give an overview of how to access the database using the script
language PHP, which is quite suitable for creating Web interfaces for searching and
updating databases, as well as dynamic Web pages.
There is a PHP database access function library that is part of PHP Extension and
Application Repository (PEAR), which is a collection of several libraries of func-
tions for enhancing PHP. The PEAR DB library provides functions for database
access. Many database systems can be accessed from this library, including Oracle,
MySQL, SQLite, and Microsoft SQLServer, among others.
We will discuss several functions that are part of PEAR DB in the context of some
examples. Section 3.1 shows how to connect to a database using PHP. Section 3.2
discusses how data collected from HTML forms can be used to insert a new record
in a database table (relation). Section 3.3 shows how retrieval queries can be exe-
cuted and have their results displayed within a dynamic Web page.
3.1 Connecting to a Database
To use the database functions in a PHP program, the PEAR DB library module
called DB.php must be loaded. In Figure 6, this is done in line 0 of the example. The
DB library functions can now be accessed using DB::. The func-
tion for connecting to a database is called DB::connect(‘string’) where the
500
Web Database Programming Using PHP
Figure 6
Connecting to a database, creating a table, and inserting a record.
0) require ‘DB.php’;
1) $d = DB::connect(‘oci8://acct1: .com/db1’);
2) if (DB::isError($d)) { die(“cannot connect – ” . $d->getMessage());}
…
3) $q = $d->query(“CREATE TABLE EMPLOYEE
4) (Emp_id INT,
5) Name VARCHAR(15),
6) Job VARCHAR(10),
7) Dno INT)” );
8) if (DB::isError($q)) { die(“table creation not successful – ” .
$q->getMessage()); }
…
9) $d->setErrorHandling(PEAR_ERROR_DIE);
…
10) $eid = $d->nextID(‘EMPLOYEE’);
11) $q = $d->query(“INSERT INTO EMPLOYEE VALUES
12) ($eid, $_POST[’emp_name’], $_POST[’emp_job’], $_POST[’emp_dno’])” );
…
13) $eid = $d->nextID(‘EMPLOYEE’);
14) $q = $d->query(‘INSERT INTO EMPLOYEE VALUES (?, ?, ?, ?)’,
15) array($eid, $_POST[’emp_name’], $_POST[’emp_job’], $_POST[’emp_dno’]) );
string argument specifies the database information. The format for ‘string’ is:
://:@
In Figure 6, line 1 connects to the database that is stored using Oracle (specified via
the string oci8). The portion of the ‘string’ specifies the par-
ticular DBMS software package being connected to. Some of the DBMS software
packages that are accessible through PEAR DB are:
■ MySQL. Specified as mysql for earlier versions and mysqli for later versions
starting with version 4.1.2.
■ Oracle. Specified as oc8i for versions 7, 8, and 9. This is used in line 1 of
Figure 6.
■ SQLite. Specified as sqlite.
■ Microsoft SQL Server. Specified as mssql.
■ Mini SQL. Specified as msql.
■ Informix. Specified as ifx.
■ Sybase. Specified as sybase.
■ Any ODBC-compliant system. Specified as odbc.
The above is not a comprehensive list.
501
Web Database Programming Using PHP
Following the in the string argument passed to DB::connect is
the separator :// followed by the user account name followed by
the separator : and the account password . These are followed by the
separator @ and the server name and directory where the
database is stored.
In line 1 of Figure 6, the user is connecting to the server at www.host.com/db1 using
the account name acct1 and password pass12 stored under the Oracle DBMS
oci8. The whole string is passed using DB::connect. The connection information
is kept in the database connection variable $d, which is used whenever an operation
to this particular database is applied.
Line 2 in Figure 6 shows how to check whether the connection to the database was
established successfully or not. PEAR DB has a function DB::isError, which can
determine whether any database access operation was successful or not. The argu-
ment to this function is the database connection variable ($d in this example). In
general, the PHP programmer can check after every database call to determine
whether the last database operation was successful or not, and terminate the pro-
gram (using the die function) if it was not successful. An error message is also
returned from the database via the operation $d->get_message(). This can also
be displayed as shown in line 2 of Figure 6.
In general, most SQL commands can be sent to the database once a connection is
established via the query function. The function $d->query takes an SQL com-
mand as its string argument and sends it to the database server for execution. In
Figure 6, lines 3 to 7 send a CREATE TABLE command to create a table called
EMPLOYEE with four attributes. Whenever a query is executed, the result of the
query is assigned to a query variable, which is called $q in our example. Line 8
checks whether the query was executed successfully or not.
The PHP PEAR DB library offers an alternative to having to check for errors after
every database command. The function
$d–>setErrorHandling(PEAR_ERROR_DIE)
will terminate the program and print the default error messages if any subsequent
errors occur when accessing the database through connection $d (see line 9 in
Figure 6).
3.2 Collecting Data from Forms and Inserting Records
It is common in database applications to collect information through HTML or
other types of Web forms. For example, when purchasing an airline ticket or apply-
ing for a credit card, the user has to enter personal information such as name,
address, and phone number. This information is typically collected and stored in a
database record on a database server.
Lines 10 through 12 in Figure 6 illustrate how this may be done. In this example, we
omitted the code for creating the form and collecting the data, which can be a vari-
ation of the example in Figure 1. We assume that the user entered valid values in the
502
Web Database Programming Using PHP
input parameters called emp_name, emp_job, and emp_dno. These would be acces-
sible via the PHP auto-global array $_POST as discussed at the end of Section 2.4.
In the SQL INSERT command shown on lines 11 and 12 in Figure 6, the array
entries $POST[’emp_name’], $POST[’emp_job’], and $POST[’emp_dno’] will
hold the values collected from the user through the input form of HTML. These are
then inserted as a new employee record in the EMPLOYEE table.
This example also illustrates another feature of PEAR DB. It is common in some
applications to create a unique record identifier for each new record inserted into
the database.1
PHP has a function $d–>nextID to create a sequence of unique values for a partic-
ular table. In our example, the field Emp_id of the EMPLOYEE table (see Figure 6, line
4) is created for this purpose. Line 10 shows how to retrieve the next unique value in
the sequence for the EMPLOYEE table and insert it as part of the new record in lines
11 and 12.
The code for insert in lines 10 to 12 in Figure 6 may allow malicious strings to be
entered that can alter the INSERT command. A safer way to do inserts and other
queries is through the use of placeholders (specified by the ? symbol). An example
is illustrated in lines 13 to 15, where another record is to be inserted. In this form of
the $d->query() function, there are two arguments. The first argument is the SQL
statement, with one or more ? symbols (placeholders). The second argument is an
array, whose element values will be used to replace the placeholders in the order
they are specified.
3.3 Retrieval Queries from Database Tables
We now give three examples of retrieval queries through PHP, shown in Figure 7.
The first few lines 0 to 3 establish a database connection $d and set the error han-
dling to the default, as we discussed in the previous section. The first query (lines 4
to 7) retrieves the name and department number of all employee records. The query
variable $q is used to refer to the query result. A while-loop to go over each row in
the result is shown in lines 5 to 7. The function $q->fetchRow() in line 5 serves to
retrieve the next record in the query result and to control the loop. The looping
starts at the first record.
The second query example is shown in lines 8 to 13 and illustrates a dynamic query.
In this query, the conditions for selection of rows are based on values input by the
user. Here we want to retrieve the names of employees who have a specific job and
work for a particular department. The particular job and department number are
entered through a form in the array variables $POST[’emp_job’] and
1This would be similar to system-generated OID for object and object-relational database systems.
503
Web Database Programming Using PHP
Figure 7
Illustrating database retrieval queries.
0) require ‘DB.php’;
1) $d = DB::connect(‘oci8://acct1: .com/dbname’);
2) if (DB::isError($d)) { die(“cannot connect – ” . $d->getMessage()); }
3) $d->setErrorHandling(PEAR_ERROR_DIE);
…
4) $q = $d->query(‘SELECT Name, Dno FROM EMPLOYEE’);
5) while ($r = $q->fetchRow()) {
6) print “employee $r[0] works for department $r[1] \n” ;
7) }
…
8) $q = $d->query(‘SELECT Name FROM EMPLOYEE WHERE Job = ? AND Dno = ?’,
9) array($_POST[’emp_job’], $_POST[’emp_dno’]) );
10) print “employees in dept $_POST[’emp_dno’] whose job is
$_POST[’emp_job’]: \n”
11) while ($r = $q->fetchRow()) {
12) print “employee $r[0] \n” ;
13) }
…
14) $allresult = $d->getAll(‘SELECT Name, Job, Dno FROM EMPLOYEE’);
15) foreach ($allresult as $r) {
16) print “employee $r[0] has job $r[1] and works for department $r[2] \n” ;
17) }
…
$POST[’emp_dno’]. If the user had entered ‘Engineer’ for the job and 5 for the
department number, the query would select the names of all engineers who worked
in department 5. As we can see, this is a dynamic query whose results differ depend-
ing on the choices that the user enters as input. We used two ? placeholders in this
example, as discussed at the end of Section 3.2.
The last query (lines 14 to 17) shows an alternative way of specifying a query and
looping over its rows. In this example, the function $d=>getAll holds all the
records in a query result in a single variable, called $allresult. To loop over the
individual records, a foreach loop can be used, with the row variable $r iterating
over each row in $allresult.2
As we can see, PHP is suited for both database access and creating dynamic Web
pages.
2The $r variable is similar to the cursors and iterator variables.
504
Web Database Programming Using PHP
4 Summary
In this chapter, we gave an overview of how to convert some structured data from
databases into elements to be entered or displayed on a Web page. We focused on
the PHP scripting language, which is becoming very popular for Web database pro-
gramming. Section 1 presented some PHP basics for Web programming through a
simple example. Section 2 gave some of the basics of the PHP language, including its
array and string data types that are used extensively. Section 3 presented an
overview of how PHP can be used to specify various types of database commands,
including creating tables, inserting new records, and retrieving database records.
PHP runs at the server computer in comparison to some other scripting languages
that run on the client computer.
We gave only a very basic introduction to PHP. There are many books as well as
many Web sites devoted to introductory and advanced PHP programming. Many
libraries of functions also exist for PHP, as it is an open source product.
Review Questions
1. Why are scripting languages popular for programming Web applications?
Where in the three-tier architecture does a PHP program execute? Where
does a JavaScript program execute?
2. What type of programming language is PHP?
3. Discuss the different ways of specifying strings in PHP.
4. Discuss the different types of arrays in PHP.
5. What are PHP auto-global variables? Give some examples of PHP auto-
global arrays, and discuss how each is typically used.
6. What is PEAR? What is PEAR DB?
7. Discuss the main functions for accessing a database in PEAR DB, and how
each is used.
8. Discuss the different ways for looping over a query result in PHP.
9. What are placeholders? How are they used in PHP database programming?
505
Web Database Programming Using PHP
Exercises
10. Consider the LIBRARY database schema shown in Figure A.1. Write PHP
code to create the tables of this schema.
11. Write a PHP program that creates Web forms for entering the information
about a new BORROWER entity. Repeat for a new BOOK entity.
12. Write PHP Web interfaces for the queries specified in Exercise 18 from the
chapter “The Relational Algebra and Relational Calculus.”
Selected Bibliography
There are many sources for PHP programming, both in print and on the Web. We
give two books as examples. A very good introduction to PHP is given in Sklar
(2005). For advanced Web site development, the book by Schlossnagle (2005) pro-
vides many detailed examples.
506
Publisher_nameBook_id Title
BOOK
BOOK_COPIES
Book_id Branch_id No_of_copies
BOOK_AUTHORS
Book_id Author_name
LIBRARY_BRANCH
Branch_id Branch_name Address
PUBLISHER
Name Address Phone
BOOK_LOANS
Book_id Branch_id Card_no Date_out Due_date
BORROWER
Card_no Name Address Phone
Figure A.1
A relational database
schema for a
LIBRARY database.
507
Basics of Functional
Dependencies and Normalization
for Relational Databases
Each relation schema consists of a number of attrib-utes, and the relational database schema consists of
a number of relation schemas. You may have assumed that attributes are grouped to
form a relation schema by using the common sense of the database designer or by
mapping a database schema design from a conceptual data model such as the
Entity-Relationship (ER) or Enhanced-ER (EER) data model. These models make
the designer identify entity types and relationship types and their respective attrib-
utes, which leads to a natural and logical grouping of the attributes into relations
when mapping procedures are followed. However, we need some formal way of ana-
lyzing why one grouping of attributes into a relation schema may be better than
another. While discussing database design, you may not have developed any meas-
ure of appropriateness or goodness to measure the quality of the design, other than
the intuition of the designer. In this chapter we discuss some of the theory that has
been developed with the goal of evaluating relational schemas for design quality—
that is, to measure formally why one set of groupings of attributes into relation
schemas is better than another.
There are two levels at which we can discuss the goodness of relation schemas. The
first is the logical (or conceptual) level—how users interpret the relation schemas
and the meaning of their attributes. Having good relation schemas at this level
enables users to understand clearly the meaning of the data in the relations, and
hence to formulate their queries correctly. The second is the implementation (or
From Chapter 15 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
508
Basics of Functional Dependencies and Normalization for Relational Databases
physical storage) level—how the tuples in a base relation are stored and updated.
This level applies only to schemas of base relations—which will be physically stored
as files—whereas at the logical level we are interested in schemas of both base rela-
tions and views (virtual relations). The relational database design theory developed
in this chapter applies mainly to base relations, although some criteria of appropri-
ateness also apply to views, as shown in Section 1.
As with many design problems, database design may be performed using two
approaches: bottom-up or top-down. A bottom-up design methodology (also
called design by synthesis) considers the basic relationships among individual attrib-
utes as the starting point and uses those to construct relation schemas. This
approach is not very popular in practice1 because it suffers from the problem of
having to collect a large number of binary relationships among attributes as the
starting point. For practical situations, it is next to impossible to capture binary
relationships among all such pairs of attributes. In contrast, a top-down design
methodology (also called design by analysis) starts with a number of groupings of
attributes into relations that exist together naturally, for example, on an invoice, a
form, or a report. The relations are then analyzed individually and collectively, lead-
ing to further decomposition until all desirable properties are met. The theory
described in this chapter is applicable to both the top-down and bottom-up design
approaches, but is more appropriate when used with the top-down approach.
Relational database design ultimately produces a set of relations. The implicit goals
of the design activity are information preservation and minimum redundancy.
Information is very hard to quantify—hence we consider information preservation
in terms of maintaining all concepts, including attribute types, entity types, and
relationship types as well as generalization/specialization relationships, which are
described using a model such as the EER model. Thus, the relational design must
preserve all of these concepts, which are originally captured in the conceptual
design after the conceptual to logical design mapping. Minimizing redundancy
implies minimizing redundant storage of the same information and reducing the
need for multiple updates to maintain consistency across multiple copies of the
same information in response to real-world events that require making an update.
We start this chapter by informally discussing some criteria for good and bad rela-
tion schemas in Section 1. In Section 2, we define the concept of functional depend-
ency, a formal constraint among attributes that is the main tool for formally
measuring the appropriateness of attribute groupings into relation schemas. In
Section 3, we discuss normal forms and the process of normalization using func-
tional dependencies. Successive normal forms are defined to meet a set of desirable
constraints expressed using functional dependencies. The normalization procedure
consists of applying a series of tests to relations to meet these increasingly stringent
requirements and decompose the relations when necessary. In Section 4, we discuss
1An exception in which this approach is used in practice is based on a model called the binary relational
model. An example is the NIAM methodology (Verheijen and VanBekkum, 1982).
509
Basics of Functional Dependencies and Normalization for Relational Databases
more general definitions of normal forms that can be directly applied to any given
design and do not require step-by-step analysis and normalization. Sections 5 to 7
discuss further normal forms up to the fifth normal form. In Section 6 we introduce
the multivalued dependency (MVD), followed by the join dependency (JD) in
Section 7. Section 8 summarizes the chapter.
Further study should continue the development of the theory related to the design
of good relational schemas. This includes: desirable properties of relational decom-
position—nonadditive join property and functional dependency preservation
property; a general algorithm that tests whether or not a decomposition has the
nonadditive (or lossless) join property; properties of functional dependencies and
the concept of a minimal cover of dependencies; the bottom-up approach to data-
base design consisting of a set of algorithms to design relations in a desired normal
form; and define additional types of dependencies that further enhance the evalua-
tion of the goodness of relation schemas.
1 Informal Design Guidelines
for Relation Schemas
Before discussing the formal theory of relational database design, we discuss four
informal guidelines that may be used as measures to determine the quality of relation
schema design:
■ Making sure that the semantics of the attributes is clear in the schema
■ Reducing the redundant information in tuples
■ Reducing the NULL values in tuples
■ Disallowing the possibility of generating spurious tuples
These measures are not always independent of one another, as we will see.
1.1 Imparting Clear Semantics to Attributes in Relations
Whenever we group attributes to form a relation schema, we assume that attributes
belonging to one relation have certain real-world meaning and a proper interpreta-
tion associated with them. The semantics of a relation refers to its meaning result-
ing from the interpretation of attribute values in a tuple. If the conceptual design is
done carefully and the mapping procedure is followed systematically, the relational
schema design should have a clear meaning.
510
Basics of Functional Dependencies and Normalization for Relational Databases
DEPARTMENT
DnumberDname
Ename Bdate Address Dnumber
EMPLOYEE
P.K.
P.K.
F.K.
Pname Pnumber Plocation Dnum
PROJECT F.K.
F.K.
DEPT_LOCATIONS
Dnumber Dlocation
P.K.
P.K.
Pnumber Hours
WORKS_ON
F.K. F.K.
P.K.
F.K.
Ssn
Dmgr_ssn
Ssn
Figure 1
A simplified COMPANY relational
database schema.
In general, the easier it is to explain the semantics of the relation, the better the rela-
tion schema design will be. To illustrate this, consider Figure 1, a simplified version
of the COMPANY relational database schema in Figure 2, which presents an example
of populated relation states of this schema. The meaning of the EMPLOYEE relation
schema is quite simple: Each tuple represents an employee, with values for the
employee’s name (Ename), Social Security number (Ssn), birth date (Bdate), and
address (Address), and the number of the department that the employee works for
(Dnumber). The Dnumber attribute is a foreign key that represents an implicit rela-
tionship between EMPLOYEE and DEPARTMENT. The semantics of the
DEPARTMENT and PROJECT schemas are also straightforward: Each DEPARTMENT
tuple represents a department entity, and each PROJECT tuple represents a project
entity. The attribute Dmgr_ssn of DEPARTMENT relates a department to the
employee who is its manager, while Dnum of PROJECT relates a project to its con-
trolling department; both are foreign key attributes. The ease with which the mean-
ing of a relation’s attributes can be explained is an informal measure of how well the
relation is designed.
511
Basics of Functional Dependencies and Normalization for Relational Databases
Ename
EMPLOYEE
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
999887777
123456789
333445555
453453453
987654321
666884444
987987987
888665555
666884444
123456789
123456789
333445555
453453453
453453453
333445555
333445555
333445555
999887777
987987987
999887777
987987987
987654321
987654321
888665555
3
1
2
2
1
2
3
10
20
10
30
10
30
30
20
20
40.0
32.5
7.5
10.0
20.0
20.0
10.0
10.0
10.0
35.0
30.0
10.0
5.0
20.0
15.0
Null
1937-11-10
1968-07-19
1965-01-09
1955-12-08
1972-07-31
1969-03-29
1941-06-20
1962-09-15
Bdate
3321 Castle, Spring, TX
731 Fondren, Houston, TX 5
638 Voss, Houston, TX
5631 Rice, Houston, TX
980 Dallas, Houston, TX
450 Stone, Houston, TX
291Berry, Bellaire, TX
975 Fire Oak, Humble, TX
Address
4
5
5
4
1
4
5
Dnumber
Dname
DEPARTMENT
Research
Administration
Headquarters 888665555
333445555
987654321
Dnumber
5
1
4
DEPT_LOCATIONS
1
4
5
Dnumber
Houston
Dlocation
Bellaire
Stafford
Houston
Sugarland
5
5
PROJECT
ProductX
ProductY
ProductZ
Pname
1
Pnumber Plocation Dnum
3
2
20
10
Reorganization
30
5
5
5
1
4
4
Bellaire
Houston
Sugarland
Houston
Stafford
StaffordNewbenefits
Computerization
WORKS_ON
Pnumber Hours
Ssn
Dmgr_ssn
Ssn
Figure 2
Sample database state for the relational database schema in Figure 1.
512
Basics of Functional Dependencies and Normalization for Relational Databases
Ssn
EMP_PROJ
(b)
(a)
FD1
FD2
FD3
Pnumber Hours Ename Pname Plocation
Ename Ssn
EMP_DEPT
Bdate Address Dnumber Dname Dmgr_ssn
Figure 3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
The semantics of the other two relation schemas in Figure 1 are slightly more com-
plex. Each tuple in DEPT_LOCATIONS gives a department number (Dnumber) and
one of the locations of the department (Dlocation). Each tuple in WORKS_ON gives
an employee Social Security number (Ssn), the project number of one of the proj-
ects that the employee works on (Pnumber), and the number of hours per week that
the employee works on that project (Hours). However, both schemas have a well-
defined and unambiguous interpretation. The schema DEPT_LOCATIONS repre-
sents a multivalued attribute of DEPARTMENT, whereas WORKS_ON represents an
M:N relationship between EMPLOYEE and PROJECT. Hence, all the relation
schemas in Figure 1 may be considered as easy to explain and therefore good from
the standpoint of having clear semantics. We can thus formulate the following
informal design guideline.
Guideline 1
Design a relation schema so that it is easy to explain its meaning. Do not combine
attributes from multiple entity types and relationship types into a single relation.
Intuitively, if a relation schema corresponds to one entity type or one relationship
type, it is straightforward to interpret and to explain its meaning. Otherwise, if the
relation corresponds to a mixture of multiple entities and relationships, semantic
ambiguities will result and the relation cannot be easily explained.
Examples of Violating Guideline 1. The relation schemas in Figures 3(a) and
3(b) also have clear semantics. (The reader should ignore the lines under the rela-
tions for now; they are used to illustrate functional dependency notation, discussed
in Section 2.) A tuple in the EMP_DEPT relation schema in Figure 3(a) represents a
single employee but includes additional information—namely, the name (Dname)
of the department for which the employee works and the Social Security number
(Dmgr_ssn) of the department manager. For the EMP_PROJ relation in Figure 3(b),
each tuple relates an employee to a project but also includes the employee name
513
Basics of Functional Dependencies and Normalization for Relational Databases
(Ename), project name (Pname), and project location (Plocation). Although there is
nothing wrong logically with these two relations, they violate Guideline 1 by mixing
attributes from distinct real-world entities: EMP_DEPT mixes attributes of employ-
ees and departments, and EMP_PROJ mixes attributes of employees and projects
and the WORKS_ON relationship. Hence, they fare poorly against the above meas-
ure of design quality. They may be used as views, but they cause problems when
used as base relations, as we discuss in the following section.
1.2 Redundant Information in Tuples
and Update Anomalies
One goal of schema design is to minimize the storage space used by the base rela-
tions (and hence the corresponding files). Grouping attributes into relation
schemas has a significant effect on storage space. For example, compare the space
used by the two base relations EMPLOYEE and DEPARTMENT in Figure 2 with that
for an EMP_DEPT base relation in Figure 4, which is the result of applying the
NATURAL JOIN operation to EMPLOYEE and DEPARTMENT. In EMP_DEPT, the
attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn)
are repeated for every employee who works for that department. In contrast, each
department’s information appears only once in the DEPARTMENT relation in Figure
2. Only the department number (Dnumber) is repeated in the EMPLOYEE relation
for each employee who works in that department as a foreign key. Similar com-
ments apply to the EMP_PROJ relation (see Figure 4), which augments the
WORKS_ON relation with additional attributes from EMPLOYEE and PROJECT.
Storing natural joins of base relations leads to an additional problem referred to as
update anomalies. These can be classified into insertion anomalies, deletion anom-
alies, and modification anomalies.2
Insertion Anomalies. Insertion anomalies can be differentiated into two types,
illustrated by the following examples based on the EMP_DEPT relation:
■ To insert a new employee tuple into EMP_DEPT, we must include either the
attribute values for the department that the employee works for, or NULLs (if
the employee does not work for a department as yet). For example, to insert
a new tuple for an employee who works in department number 5, we must
enter all the attribute values of department 5 correctly so that they are
consistent with the corresponding values for department 5 in other tuples in
EMP_DEPT. In the design of Figure 2, we do not have to worry about this
consistency problem because we enter only the department number in the
employee tuple; all other attribute values of department 5 are recorded only
once in the database, as a single tuple in the DEPARTMENT relation.
■ It is difficult to insert a new department that has no employees as yet in the
EMP_DEPT relation. The only way to do this is to place NULL values in the
2These anomalies were identified by Codd (1972a) to justify the need for normalization of relations, as
we shall discuss in Section 3.
514
Basics of Functional Dependencies and Normalization for Relational Databases
Ename
EMP_DEPT
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
999887777
123456789
333445555
453453453
987654321
666884444
987987987
888665555 1937-11-10
Ssn
1968-07-19
1965-01-09
1955-12-08
1972-07-31
1969-03-29
1941-06-20
1962-09-15
Bdate
3321 Castle, Spring, TX
731 Fondren, Houston, TX 5
638 Voss, Houston, TX
5631 Rice, Houston, TX
980 Dallas, Houston, TX
450 Stone, Houston, TX
291 Berry, Bellaire, TX
975 FireOak, Humble, TX
Address
4
5
5
4
1
4
5
Administration
Research
Research
Research
Administration
Headquarters
Administration
Research
987654321
333445555
333445555
333445555
987654321
888665555
987654321
333445555
Dnumber Dname Dmgr_ssn
Ssn
EMP_PROJ
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
Pnumber
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
Null
Hours
Narayan, Ramesh K.
Smith, John B.
Smith, John B.
Wong, Franklin T.
Wong, Franklin T.
Wong, Franklin T.
Wong, Franklin T.
English, Joyce A.
English, Joyce A.
Zelaya, Alicia J.
Jabbar, Ahmad V.
Zelaya, Alicia J.
Jabbar, Ahmad V.
Wallace, Jennifer S.
Wallace, Jennifer S.
Borg, James E.
Ename
ProductZ
ProductX
ProductY
ProductY
ProductZ
Reorganization
ProductX
ProductY
Newbenefits
Newbenefits
Computerization
Computerization
Newbenefits
Reorganization
Reorganization
Houston
Bellaire
Sugarland
Sugarland
Houston
Stafford
Houston
Bellaire
Sugarland
Stafford
Stafford
Stafford
Stafford
Stafford
Houston
Houston
Pname Plocation
Computerization
Redundancy Redundancy
Redundancy
Figure 4
Sample states for EMP_DEPT and EMP_PROJ resulting from applying NATURAL JOIN to the
relations in Figure 2. These may be stored as base relations for performance reasons.
attributes for employee. This violates the entity integrity for EMP_DEPT
because Ssn is its primary key. Moreover, when the first employee is assigned
to that department, we do not need this tuple with NULL values any more.
This problem does not occur in the design of Figure 2 because a department
is entered in the DEPARTMENT relation whether or not any employees work
for it, and whenever an employee is assigned to that department, a corre-
sponding tuple is inserted in EMPLOYEE.
515
Basics of Functional Dependencies and Normalization for Relational Databases
Deletion Anomalies. The problem of deletion anomalies is related to the second
insertion anomaly situation just discussed. If we delete from EMP_DEPT an
employee tuple that happens to represent the last employee working for a particular
department, the information concerning that department is lost from the database.
This problem does not occur in the database of Figure 2 because DEPARTMENT
tuples are stored separately.
Modification Anomalies. In EMP_DEPT, if we change the value of one of the
attributes of a particular department—say, the manager of department 5—we must
update the tuples of all employees who work in that department; otherwise, the
database will become inconsistent. If we fail to update some tuples, the same depart-
ment will be shown to have two different values for manager in different employee
tuples, which would be wrong.3
It is easy to see that these three anomalies are undesirable and cause difficulties to
maintain consistency of data as well as require unnecessary updates that can be
avoided; hence, we can state the next guideline as follows.
Guideline 2
Design the base relation schemas so that no insertion, deletion, or modification
anomalies are present in the relations. If any anomalies are present,4 note them
clearly and make sure that the programs that update the database will operate
correctly.
The second guideline is consistent with and, in a way, a restatement of the first
guideline. We can also see the need for a more formal approach to evaluating
whether a design meets these guidelines. Sections 2 through 4 provide these needed
formal concepts. It is important to note that these guidelines may sometimes have to
be violated in order to improve the performance of certain queries. If EMP_DEPT is
used as a stored relation (known otherwise as a materialized view) in addition to the
base relations of EMPLOYEE and DEPARTMENT, the anomalies in EMP_DEPT must
be noted and accounted for (for example, by using triggers or stored procedures
that would make automatic updates). This way, whenever the base relation is
updated, we do not end up with inconsistencies. In general, it is advisable to use
anomaly-free base relations and to specify views that include the joins for placing
together the attributes frequently referenced in important queries.
1.3 NULL Values in Tuples
In some schema designs we may group many attributes together into a “fat” rela-
tion. If many of the attributes do not apply to all tuples in the relation, we end up
with many NULLs in those tuples. This can waste space at the storage level and may
3This is not as serious as the other problems, because all tuples can be updated by a single SQL query.
4Other application considerations may dictate and make certain anomalies unavoidable. For example, the
EMP_DEPT relation may correspond to a query or a report that is frequently required.
516
Basics of Functional Dependencies and Normalization for Relational Databases
also lead to problems with understanding the meaning of the attributes and with
specifying JOIN operations at the logical level.5 Another problem with NULLs is how
to account for them when aggregate operations such as COUNT or SUM are applied.
SELECT and JOIN operations involve comparisons; if NULL values are present, the
results may become unpredictable.6 Moreover, NULLs can have multiple interpreta-
tions, such as the following:
■ The attribute does not apply to this tuple. For example, Visa_status may not
apply to U.S. students.
■ The attribute value for this tuple is unknown. For example, the Date_of_birth
may be unknown for an employee.
■ The value is known but absent; that is, it has not been recorded yet. For exam-
ple, the Home_Phone_Number for an employee may exist, but may not be
available and recorded yet.
Having the same representation for all NULLs compromises the different meanings
they may have. Therefore, we may state another guideline.
Guideline 3
As far as possible, avoid placing attributes in a base relation whose values may fre-
quently be NULL. If NULLs are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.
Using space efficiently and avoiding joins with NULL values are the two overriding
criteria that determine whether to include the columns that may have NULLs in a
relation or to have a separate relation for those columns (with the appropriate key
columns). For example, if only 15 percent of employees have individual offices,
there is little justification for including an attribute Office_number in the EMPLOYEE
relation; rather, a relation EMP_OFFICES(Essn, Office_number) can be created to
include tuples for only the employees with individual offices.
1.4 Generation of Spurious Tuples
Consider the two relation schemas EMP_LOCS and EMP_PROJ1 in Figure 5(a),
which can be used instead of the single EMP_PROJ relation in Figure 3(b). A tuple in
EMP_LOCS means that the employee whose name is Ename works on some project
whose location is Plocation. A tuple in EMP_PROJ1 refers to the fact that the
employee whose Social Security number is Ssn works Hours per week on the project
whose name, number, and location are Pname, Pnumber, and Plocation. Figure 5(b)
shows relation states of EMP_LOCS and EMP_PROJ1 corresponding to the
5This is because inner and outer joins produce different results when NULLs are involved in joins. The
users must thus be aware of the different meanings of the various types of joins. Although this is reason-
able for sophisticated users, it may be difficult for others.
6Recall comparisons involving NULL values where the outcome (in three-valued logic) are TRUE,
FALSE, and UNKNOWN.
517
Basics of Functional Dependencies and Normalization for Relational Databases
Ssn Pnumber Hours Pname Plocation
Ename
P.K.
EMP_PROJ1
Plocation
P.K.
EMP_LOCS
Ename
Smith, John B.
Smith, John B.
Narayan, Ramesh K.
English, Joyce A.
English, Joyce A.
Wong, Franklin T.
Wong, Franklin T.
Wong, Franklin T.
Zelaya, Alicia J.
Jabbar, Ahmad V.
Wallace, Jennifer S.
Wallace, Jennifer S.
Borg, James E.
Houston
Bellaire
Sugarland
Sugarland
Bellaire
Sugarland
Stafford
Houston
Stafford
Houston
Houston
Stafford
Stafford
Plocation
(b)
(a)
EMP_PROJ1
Ssn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
Pnumber
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
ProductZ
ProductX
ProductY
ProductY
ProductZ
Computerization
Reorganization
ProductX
ProductY
Newbenefits
Newbenefits
Computerization
Computerization
Newbenefits
Reorganization
Reorganization
Houston
Bellaire
Sugarland
Sugarland
Houston
Stafford
Houston
Bellaire
Sugarland
Stafford
Stafford
Stafford
Stafford
Stafford
Houston
Houston
Hours Pname Plocation
EMP_LOCS
Figure 5
Particularly poor design for the EMP_PROJ relation in
Figure 3(b). (a) The two relation schemas EMP_LOCS
and EMP_PROJ1. (b) The result of projecting the exten-
sion of EMP_PROJ from Figure 4 onto the relations
EMP_LOCS and EMP_PROJ1.
EMP_PROJ relation in Figure 4, which are obtained by applying the appropriate
PROJECT (π) operations to EMP_PROJ (ignore the dashed lines in Figure 5(b) for
now).
Suppose that we used EMP_PROJ1 and EMP_LOCS as the base relations instead of
EMP_PROJ. This produces a particularly bad schema design because we cannot
recover the information that was originally in EMP_PROJ from EMP_PROJ1 and
EMP_LOCS. If we attempt a NATURAL JOIN operation on EMP_PROJ1 and
EMP_LOCS, the result produces many more tuples than the original set of tuples in
EMP_PROJ. In Figure 6, the result of applying the join to only the tuples above the
dashed lines in Figure 5(b) is shown (to reduce the size of the resulting relation).
Additional tuples that were not in EMP_PROJ are called spurious tuples because
518
Basics of Functional Dependencies and Normalization for Relational Databases
Ssn
123456789
123456789
123456789
123456789
123456789
666884444
666884444
453453453
453453453
453453453
453453453
453453453
333445555
333445555
333445555
333445555
2
1
1
3
2
2
2
2
2
2
2
3
1
1
2
3
Pnumber
7.5
32.5
32.5
40.0
40.0
20.0
20.0
7.5
7.5
20.0
10.0
20.0
20.0
10.0
10.0
10.0
Hours
ProductY
ProductX
ProductX
ProductZ
ProductZ
ProductX
ProductX
ProductY
ProductY
ProductY
ProductY
ProductY
ProductY
ProductY
ProductY
ProductZ
Pname
Sugarland
Bellaire
Bellaire
Houston
Houston
Bellaire
Bellaire
Sugarland
Sugarland
Sugarland
Sugarland
Sugarland
Sugarland
Sugarland
Sugarland
Houston
333445555 3 10.0 ProductZ Houston
333445555 10 10.0 Computerization Stafford
333445555 20 10.0 Reorganization Houston
333445555 20
*
*
*
*
*
*
*
*
*
*
*
10.0 Reorganization Houston
Smith, John B.
Smith, John B.
English, Joyce A.
Narayan, Ramesh K.
Wong, Franklin T.
Smith, John B.
English, Joyce A.
English, Joyce A.
Wong, Franklin T.
Smith, John B.
Smith, John B.
English, Joyce A.
Wong, Franklin T.
English, Joyce A.
Wong, Franklin T.
Narayan, Ramesh K.
Wong, Franklin T.
Wong, Franklin T.
Narayan, Ramesh K.
Wong, Franklin T.
Plocation Ename
*
*
*
Figure 6
Result of applying NATURAL JOIN to the tuples above the dashed lines in
EMP_PROJ1 and EMP_LOCS of Figure 5. Generated spurious tuples are
marked by asterisks.
they represent spurious information that is not valid. The spurious tuples are
marked by asterisks (*) in Figure 6.
Decomposing EMP_PROJ into EMP_LOCS and EMP_PROJ1 is undesirable because
when we JOIN them back using NATURAL JOIN, we do not get the correct original
information. This is because in this case Plocation is the attribute that relates
EMP_LOCS and EMP_PROJ1, and Plocation is neither a primary key nor a foreign
key in either EMP_LOCS or EMP_PROJ1. We can now informally state another
design guideline.
Guideline 4
Design relation schemas so that they can be joined with equality conditions on
attributes that are appropriately related (primary key, foreign key) pairs in a way
that guarantees that no spurious tuples are generated. Avoid relations that contain
519
Basics of Functional Dependencies and Normalization for Relational Databases
matching attributes that are not (foreign key, primary key) combinations because
joining on such attributes may produce spurious tuples.
This informal guideline obviously needs to be stated more formally. There is a for-
mal condition called the nonadditive (or lossless) join property that guarantees that
certain joins do not produce spurious tuples.
1.5 Summary and Discussion of Design Guidelines
In Sections 1.1 through 1.4, we informally discussed situations that lead to prob-
lematic relation schemas and we proposed informal guidelines for a good relational
design. The problems we pointed out, which can be detected without additional
tools of analysis, are as follows:
■ Anomalies that cause redundant work to be done during insertion into and
modification of a relation, and that may cause accidental loss of information
during a deletion from a relation
■ Waste of storage space due to NULLs and the difficulty of performing selec-
tions, aggregation operations, and joins due to NULL values
■ Generation of invalid and spurious data during joins on base relations with
matched attributes that may not represent a proper (foreign key, primary
key) relationship
In the rest of this chapter we present formal concepts and theory that may be used
to define the goodness and badness of individual relation schemas more precisely.
First we discuss functional dependency as a tool for analysis. Then we specify the
three normal forms and Boyce-Codd normal form (BCNF) for relation schemas.
The strategy for achieving a good design is to decompose a badly designed relation
appropriately. We also briefly introduce additional normal forms that deal with
additional dependencies.
2 Functional Dependencies
So far we have dealt with the informal measures of database design. We now intro-
duce a formal tool for analysis of relational schemas that enables us to detect and
describe some of the above-mentioned problems in precise terms. The single most
important concept in relational schema design theory is that of a functional
dependency. In this section we formally define the concept, and in Section 3 we see
how it can be used to define normal forms for relation schemas.
2.1 Definition of Functional Dependency
A functional dependency is a constraint between two sets of attributes from the
database. Suppose that our relational database schema has n attributes A1, A2, …,
An; let us think of the whole database as being described by a single universal
520
Basics of Functional Dependencies and Normalization for Relational Databases
relation schema R = {A1, A2, … , An}.
7 We do not imply that we will actually store the
database as a single universal table; we use this concept only in developing the for-
mal theory of data dependencies.8
Definition. A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible
tuples that can form a relation state r of R. The constraint is that, for any two
tuples t1 and t2 in r that have t1[X] = t2[X], they must also have t1[Y] = t2[Y].
This means that the values of the Y component of a tuple in r depend on, or are
determined by, the values of the X component; alternatively, the values of the X com-
ponent of a tuple uniquely (or functionally) determine the values of the Y compo-
nent. We also say that there is a functional dependency from X to Y, or that Y is
functionally dependent on X. The abbreviation for functional dependency is FD or
f.d. The set of attributes X is called the left-hand side of the FD, and Y is called the
right-hand side.
Thus, X functionally determines Y in a relation schema R if, and only if, whenever
two tuples of r(R) agree on their X-value, they must necessarily agree on their Y-
value. Note the following:
■ If a constraint on R states that there cannot be more than one tuple with a
given X-value in any relation instance r(R)—that is, X is a candidate key of
R—this implies that X → Y for any subset of attributes Y of R (because the
key constraint implies that no two tuples in any legal state r(R) will have the
same value of X). If X is a candidate key of R, then X → R.
■ If X → Y in R, this does not say whether or not Y → X in R.
A functional dependency is a property of the semantics or meaning of the attrib-
utes. The database designers will use their understanding of the semantics of the
attributes of R—that is, how they relate to one another—to specify the functional
dependencies that should hold on all relation states (extensions) r of R. Whenever
the semantics of two sets of attributes in R indicate that a functional dependency
should hold, we specify the dependency as a constraint. Relation extensions r(R)
that satisfy the functional dependency constraints are called legal relation states (or
legal extensions) of R. Hence, the main use of functional dependencies is to
describe further a relation schema R by specifying constraints on its attributes that
must hold at all times. Certain FDs can be specified without referring to a specific
relation, but as a property of those attributes given their commonly understood
meaning. For example, {State, Driver_license_number} → Ssn should hold for any
adult in the United States and hence should hold whenever these attributes appear
in a relation. It is also possible that certain functional dependencies may cease to
7This concept of a universal relation is important in the discussion of algorithms for relational database
design.
8This assumption implies that every attribute in the database should have a distinct name.
521
Basics of Functional Dependencies and Normalization for Relational Databases
TEACH
Teacher
Smith
Smith
Hall
Brown
Bartram
Martin
Hoffman
Horowitz
Compilers
Data Structures
Data Management
Data Structures
Course Text
Figure 7
A relation state of TEACH with a
possible functional dependency
TEXT → COURSE. However,
TEACHER → COURSE is ruled
out.
exist in the real world if the relationship changes. For example, the FD Zip_code →
Area_code used to exist as a relationship between postal codes and telephone num-
ber codes in the United States, but with the proliferation of telephone area codes it
is no longer true.
Consider the relation schema EMP_PROJ in Figure 3(b); from the semantics of the
attributes and the relation, we know that the following functional dependencies
should hold:
a. Ssn → Ename
b. Pnumber →{Pname, Plocation}
c. {Ssn, Pnumber} → Hours
These functional dependencies specify that (a) the value of an employee’s Social
Security number (Ssn) uniquely determines the employee name (Ename), (b) the
value of a project’s number (Pnumber) uniquely determines the project name
(Pname) and location (Plocation), and (c) a combination of Ssn and Pnumber values
uniquely determines the number of hours the employee currently works on the
project per week (Hours). Alternatively, we say that Ename is functionally determined
by (or functionally dependent on) Ssn, or given a value of Ssn, we know the value of
Ename, and so on.
A functional dependency is a property of the relation schema R, not of a particular
legal relation state r of R. Therefore, an FD cannot be inferred automatically from a
given relation extension r but must be defined explicitly by someone who knows the
semantics of the attributes of R. For example, Figure 7 shows a particular state of the
TEACH relation schema. Although at first glance we may think that Text → Course,
we cannot confirm this unless we know that it is true for all possible legal states of
TEACH. It is, however, sufficient to demonstrate a single counterexample to disprove
a functional dependency. For example, because ‘Smith’ teaches both ‘Data
Structures’ and ‘Data Management,’ we can conclude that Teacher does not function-
ally determine Course.
Given a populated relation, one cannot determine which FDs hold and which do
not unless the meaning of and the relationships among the attributes are known. All
one can say is that a certain FD may exist if it holds in that particular extension. One
cannot guarantee its existence until the meaning of the corresponding attributes is
clearly understood. One can, however, emphatically state that a certain FD does not
522
Basics of Functional Dependencies and Normalization for Relational Databases
Figure 8
A relation R (A, B, C, D)
with its extension.
A B C D
a1 b1 c1 d1
a1 b2 c2 d2
a2 b2 c2 d3
a3 b3 c4 d3
hold if there are tuples that show the violation of such an FD. See the illustrative
example relation in Figure 8. Here, the following FDs may hold because the four
tuples in the current extension have no violation of these constraints: B → C;
C → B; {A, B} → C; {A, B} → D; and {C, D} → B. However, the following do not
hold because we already have violations of them in the given extension: A → B
(tuples 1 and 2 violate this constraint); B → A (tuples 2 and 3 violate this con-
straint); D → C (tuples 3 and 4 violate it).
Figure 3 introduces a diagrammatic notation for displaying FDs: Each FD is dis-
played as a horizontal line. The left-hand-side attributes of the FD are connected by
vertical lines to the line representing the FD, while the right-hand-side attributes are
connected by the lines with arrows pointing toward the attributes.
We denote by F the set of functional dependencies that are specified on relation
schema R. Typically, the schema designer specifies the functional dependencies that
are semantically obvious; usually, however, numerous other functional dependencies
hold in all legal relation instances among sets of attributes that can be derived from
and satisfy the dependencies in F. Those other dependencies can be inferred or
deduced from the FDs in F.
3 Normal Forms Based on Primary Keys
Having introduced functional dependencies, we are now ready to use them to spec-
ify some aspects of the semantics of relation schemas. We assume that a set of func-
tional dependencies is given for each relation, and that each relation has a
designated primary key; this information combined with the tests (conditions) for
normal forms drives the normalization process for relational schema design. Most
practical relational design projects take one of the following two approaches:
■ Perform a conceptual schema design using a conceptual model such as ER or
EER and map the conceptual design into a set of relations
■ Design the relations based on external knowledge derived from an existing
implementation of files or forms or reports
Following either of these approaches, it is then useful to evaluate the relations for
goodness and decompose them further as needed to achieve higher normal forms,
using the normalization theory presented in this chapter and the next. We focus in
523
Basics of Functional Dependencies and Normalization for Relational Databases
this section on the first three normal forms for relation schemas and the intuition
behind them, and discuss how they were developed historically. More general defi-
nitions of these normal forms, which take into account all candidate keys of a rela-
tion rather than just the primary key, are deferred to Section 4.
We start by informally discussing normal forms and the motivation behind their
development, as well as reviewing some definitions that are needed here. Then we
discuss the first normal form (1NF) in Section 3.4, and present the definitions of
second normal form (2NF) and third normal form (3NF), which are based on pri-
mary keys, in Sections 3.5 and 3.6, respectively.
3.1 Normalization of Relations
The normalization process, as first proposed by Codd (1972a), takes a relation
schema through a series of tests to certify whether it satisfies a certain normal form.
The process, which proceeds in a top-down fashion by evaluating each relation
against the criteria for normal forms and decomposing relations as necessary, can
thus be considered as relational design by analysis. Initially, Codd proposed three
normal forms, which he called first, second, and third normal form. A stronger def-
inition of 3NF—called Boyce-Codd normal form (BCNF)—was proposed later by
Boyce and Codd. All these normal forms are based on a single analytical tool: the
functional dependencies among the attributes of a relation. Later, a fourth normal
form (4NF) and a fifth normal form (5NF) were proposed, based on the concepts of
multivalued dependencies and join dependencies, respectively; these are briefly dis-
cussed in Sections 6 and 7.
Normalization of data can be considered a process of analyzing the given relation
schemas based on their FDs and primary keys to achieve the desirable properties of
(1) minimizing redundancy and (2) minimizing the insertion, deletion, and update
anomalies discussed in Section 1.2. It can be considered as a “filtering” or “purifica-
tion” process to make the design have successively better quality. Unsatisfactory
relation schemas that do not meet certain conditions—the normal form tests—are
decomposed into smaller relation schemas that meet the tests and hence possess the
desirable properties. Thus, the normalization procedure provides database design-
ers with the following:
■ A formal framework for analyzing relation schemas based on their keys and
on the functional dependencies among their attributes
■ A series of normal form tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any desired
degree
Definition. The normal form of a relation refers to the highest normal form
condition that it meets, and hence indicates the degree to which it has been nor-
malized.
Normal forms, when considered in isolation from other factors, do not guarantee a
good database design. It is generally not sufficient to check separately that each
524
Basics of Functional Dependencies and Normalization for Relational Databases
relation schema in the database is, say, in BCNF or 3NF. Rather, the process of nor-
malization through decomposition must also confirm the existence of additional
properties that the relational schemas, taken together, should possess. These would
include two properties:
■ The nonadditive join or lossless join property, which guarantees that the
spurious tuple generation problem discussed in Section 1.4 does not occur
with respect to the relation schemas created after decomposition.
■ The dependency preservation property, which ensures that each functional
dependency is represented in some individual relation resulting after
decomposition.
The nonadditive join property is extremely critical and must be achieved at any
cost, whereas the dependency preservation property, although desirable, is some-
times sacrificed.
3.2 Practical Use of Normal Forms
Most practical design projects acquire existing designs of databases from previous
designs, designs in legacy models, or from existing files. Normalization is carried
out in practice so that the resulting designs are of high quality and meet the desir-
able properties stated previously. Although several higher normal forms have been
defined, such as the 4NF and 5NF that we discuss in Sections 6 and 7, the practical
utility of these normal forms becomes questionable when the constraints on which
they are based are rare, and hard to understand or to detect by the database design-
ers and users who must discover these constraints. Thus, database design as prac-
ticed in industry today pays particular attention to normalization only up to 3NF,
BCNF, or at most 4NF.
Another point worth noting is that the database designers need not normalize to the
highest possible normal form. Relations may be left in a lower normalization status,
such as 2NF, for performance reasons, such as those discussed at the end of Section
1.2. Doing so incurs the corresponding penalties of dealing with the anomalies.
Definition. Denormalization is the process of storing the join of higher nor-
mal form relations as a base relation, which is in a lower normal form.
3.3 Definitions of Keys and Attributes
Participating in Keys
Before proceeding further, let’s look again at the definitions of keys of a relation
schema.
Definition. A superkey of a relation schema R = {A1, A2, … , An} is a set of
attributes S ⊆ R with the property that no two tuples t1 and t2 in any legal rela-
tion state r of R will have t1[S] = t2[S]. A key K is a superkey with the additional
property that removal of any attribute from K will cause K not to be a superkey
any more.
525
Basics of Functional Dependencies and Normalization for Relational Databases
The difference between a key and a superkey is that a key has to be minimal; that is,
if we have a key K = {A1, A2, …, Ak} of R, then K – {Ai} is not a key of R for any Ai, 1
≤ i ≤ k. In Figure 1, {Ssn} is a key for EMPLOYEE, whereas {Ssn}, {Ssn, Ename}, {Ssn,
Ename, Bdate}, and any set of attributes that includes Ssn are all superkeys.
If a relation schema has more than one key, each is called a candidate key. One of
the candidate keys is arbitrarily designated to be the primary key, and the others are
called secondary keys. In a practical relational database, each relation schema must
have a primary key. If no candidate key is known for a relation, the entire relation
can be treated as a default superkey. In Figure 1, {Ssn} is the only candidate key for
EMPLOYEE, so it is also the primary key.
Definition. An attribute of relation schema R is called a prime attribute of R if
it is a member of some candidate key of R. An attribute is called nonprime if it
is not a prime attribute—that is, if it is not a member of any candidate key.
In Figure 1, both Ssn and Pnumber are prime attributes of WORKS_ON, whereas
other attributes of WORKS_ON are nonprime.
We now present the first three normal forms: 1NF, 2NF, and 3NF. These were pro-
posed by Codd (1972a) as a sequence to achieve the desirable state of 3NF relations
by progressing through the intermediate states of 1NF and 2NF if needed. As we
shall see, 2NF and 3NF attack different problems. However, for historical reasons, it
is customary to follow them in that sequence; hence, by definition a 3NF relation
already satisfies 2NF.
3.4 First Normal Form
First normal form (1NF) is now considered to be part of the formal definition of a
relation in the basic (flat) relational model; historically, it was defined to disallow
multivalued attributes, composite attributes, and their combinations. It states that
the domain of an attribute must include only atomic (simple, indivisible) values and
that the value of any attribute in a tuple must be a single value from the domain of
that attribute. Hence, 1NF disallows having a set of values, a tuple of values, or a
combination of both as an attribute value for a single tuple. In other words, 1NF dis-
allows relations within relations or relations as attribute values within tuples. The only
attribute values permitted by 1NF are single atomic (or indivisible) values.
Consider the DEPARTMENT relation schema shown in Figure 1, whose primary key
is Dnumber, and suppose that we extend it by including the Dlocations attribute as
shown in Figure 9(a). We assume that each department can have a number of loca-
tions. The DEPARTMENT schema and a sample relation state are shown in Figure 9.
As we can see, this is not in 1NF because Dlocations is not an atomic attribute, as
illustrated by the first tuple in Figure 9(b). There are two ways we can look at the
Dlocations attribute:
■ The domain of Dlocations contains atomic values, but some tuples can have a
set of these values. In this case, Dlocations is not functionally dependent on
the primary key Dnumber.
526
Basics of Functional Dependencies and Normalization for Relational Databases
Dname
DEPARTMENT
(a)
DEPARTMENT
(b)
DEPARTMENT
(c)
Dnumber Dmgr_ssn Dlocations
Dname
Research
Administration
Headquarters 1
5
4
Dnumber
888665555
333445555
987654321
Dmgr_ssn
{Houston}
{Bellaire, Sugarland, Houston}
{Stafford}
Dlocations
Dname
Research
Research
Research
Administration
Headquarters
Bellaire
Sugarland
Houston
Stafford
Houston
5
5
5
4
1
Dnumber
333445555
333445555
333445555
987654321
888665555
Dmgr_ssn Dlocation
Figure 9
Normalization into 1NF. (a) A
relation schema that is not in
1NF. (b) Sample state of
relation DEPARTMENT. (c)
1NF version of the same
relation with redundancy.
■ The domain of Dlocations contains sets of values and hence is nonatomic. In
this case, Dnumber → Dlocations because each set is considered a single mem-
ber of the attribute domain.9
In either case, the DEPARTMENT relation in Figure 9 is not in 1NF; in fact, it does
not even qualify as a relation. There are three main techniques to achieve first nor-
mal form for such a relation:
1. Remove the attribute Dlocations that violates 1NF and place it in a separate
relation DEPT_LOCATIONS along with the primary key Dnumber of
DEPARTMENT. The primary key of this relation is the combination
{Dnumber, Dlocation}, as shown in Figure 2. A distinct tuple in
DEPT_LOCATIONS exists for each location of a department. This decomposes
the non-1NF relation into two 1NF relations.
9In this case we can consider the domain of Dlocations to be the power set of the set of single loca-
tions; that is, the domain is made up of all possible subsets of the set of single locations.
527
Basics of Functional Dependencies and Normalization for Relational Databases
2. Expand the key so that there will be a separate tuple in the original
DEPARTMENT relation for each location of a DEPARTMENT, as shown in
Figure 9(c). In this case, the primary key becomes the combination
{Dnumber, Dlocation}. This solution has the disadvantage of introducing
redundancy in the relation.
3. If a maximum number of values is known for the attribute—for example, if it
is known that at most three locations can exist for a department—replace the
Dlocations attribute by three atomic attributes: Dlocation1, Dlocation2, and
Dlocation3. This solution has the disadvantage of introducing NULL values if
most departments have fewer than three locations. It further introduces spu-
rious semantics about the ordering among the location values that is not
originally intended. Querying on this attribute becomes more difficult; for
example, consider how you would write the query: List the departments that
have ‘Bellaire’ as one of their locations in this design.
Of the three solutions above, the first is generally considered best because it does
not suffer from redundancy and it is completely general, having no limit placed on
a maximum number of values. In fact, if we choose the second solution, it will be
decomposed further during subsequent normalization steps into the first solution.
First normal form also disallows multivalued attributes that are themselves com-
posite. These are called nested relations because each tuple can have a relation
within it. Figure 10 shows how the EMP_PROJ relation could appear if nesting is
allowed. Each tuple represents an employee entity, and a relation PROJS(Pnumber,
Hours) within each tuple represents the employee’s projects and the hours per week
that employee works on each project. The schema of this EMP_PROJ relation can be
represented as follows:
EMP_PROJ(Ssn, Ename, {PROJS(Pnumber, Hours)})
The set braces { } identify the attribute PROJS as multivalued, and we list the com-
ponent attributes that form PROJS between parentheses ( ). Interestingly, recent
trends for supporting complex objects and XML data attempt to allow and formal-
ize nested relations within relational database systems, which were disallowed early
on by 1NF.
Notice that Ssn is the primary key of the EMP_PROJ relation in Figures 10(a) and
(b), while Pnumber is the partial key of the nested relation; that is, within each tuple,
the nested relation must have unique values of Pnumber. To normalize this into 1NF,
we remove the nested relation attributes into a new relation and propagate the pri-
mary key into it; the primary key of the new relation will combine the partial key
with the primary key of the original relation. Decomposition and primary key
propagation yield the schemas EMP_PROJ1 and EMP_PROJ2, as shown in Figure
10(c).
This procedure can be applied recursively to a relation with multiple-level nesting
to unnest the relation into a set of 1NF relations. This is useful in converting an
unnormalized relation schema with many levels of nesting into 1NF relations. The
528
Basics of Functional Dependencies and Normalization for Relational Databases
EMP_PROJ
(a)
Projs
Pnumber HoursSsn Ename
EMP_PROJ1
(c)
Ssn Ename
EMP_PROJ2
HoursSsn Pnumber
EMP_PROJ
(b)
Ssn
123456789
666884444
453453453
333445555
999887777
987987987
987654321
888665555
Zelaya, Alicia J.
Jabbar, Ahmad V.
Wallace, Jennifer S.
Borg, James E.
32.5
7.5
40.0
20.0
20.0
10.0
10.0
10.0
10.0
30.0
10.0
35.0
5.0
20.0
15.0
NULL
English, Joyce A.
Narayan, Ramesh K.
Smith, John B.
Wong, Franklin T.
Ename
3
1
2
1
2
2
20
3
10
30
10
10
20
30
30
20
Pnumber Hours
Figure 10
Normalizing nested rela-
tions into 1NF. (a)
Schema of the
EMP_PROJ relation with
a nested relation attribute
PROJS. (b) Sample
extension of the
EMP_PROJ relation
showing nested relations
within each tuple. (c)
Decomposition of
EMP_PROJ into relations
EMP_PROJ1 and
EMP_PROJ2 by propa-
gating the primary key.
existence of more than one multivalued attribute in one relation must be handled
carefully. As an example, consider the following non-1NF relation:
PERSON (Ss#, {Car_lic#}, {Phone#})
This relation represents the fact that a person has multiple cars and multiple
phones. If strategy 2 above is followed, it results in an all-key relation:
PERSON_IN_1NF (Ss#, Car_lic#, Phone#)
529
Basics of Functional Dependencies and Normalization for Relational Databases
To avoid introducing any extraneous relationship between Car_lic# and Phone#, all
possible combinations of values are represented for every Ss#, giving rise to redun-
dancy. This leads to the problems handled by multivalued dependencies and 4NF,
which we will discuss in Section 6. The right way to deal with the two multivalued
attributes in PERSON shown previously is to decompose it into two separate rela-
tions, using strategy 1 discussed above: P1(Ss#, Car_lic#) and P2(Ss#, Phone#).
3.5 Second Normal Form
Second normal form (2NF) is based on the concept of full functional dependency. A
functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold any more; that is, for
any attribute A ε X, (X – {A}) does not functionally determine Y. A functional
dependency X → Y is a partial dependency if some attribute A ε X can be removed
from X and the dependency still holds; that is, for some A ε X, (X – {A}) → Y. In
Figure 3(b), {Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor
Pnumber → Hours holds). However, the dependency {Ssn, Pnumber} → Ename is par-
tial because Ssn → Ename holds.
Definition. A relation schema R is in 2NF if every nonprime attribute A in R is
fully functionally dependent on the primary key of R.
The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key. If the primary key contains a single attribute,
the test need not be applied at all. The EMP_PROJ relation in Figure 3(b) is in 1NF
but is not in 2NF. The nonprime attribute Ename violates 2NF because of FD2, as do
the nonprime attributes Pname and Plocation because of FD3. The functional
dependencies FD2 and FD3 make Ename, Pname, and Plocation partially dependent
on the primary key {Ssn, Pnumber} of EMP_PROJ, thus violating the 2NF test.
If a relation schema is not in 2NF, it can be second normalized or 2NF normalized
into a number of 2NF relations in which nonprime attributes are associated only
with the part of the primary key on which they are fully functionally dependent.
Therefore, the functional dependencies FD1, FD2, and FD3 in Figure 3(b) lead to the
decomposition of EMP_PROJ into the three relation schemas EP1, EP2, and EP3
shown in Figure 11(a), each of which is in 2NF.
3.6 Third Normal Form
Third normal form (3NF) is based on the concept of transitive dependency. A
functional dependency X → Y in a relation schema R is a transitive dependency if
there exists a set of attributes Z in R that is neither a candidate key nor a subset of
any key of R,10 and both X → Z and Z → Y hold. The dependency Ssn → Dmgr_ssn
is transitive through Dnumber in EMP_DEPT in Figure 3(a), because both the
10This is the general definition of transitive dependency. Because we are concerned only with primary
keys in this section, we allow transitive dependencies where X is the primary key but Z may be (a subset
of) a candidate key.
530
Basics of Functional Dependencies and Normalization for Relational Databases
Ssn
EMP_PROJ
(a)
(b)
FD1
FD2
FD3
2NF Normalization
Pnumber Hours Ename Pname Plocation
Ssn
EP1
FD1
Pnumber Hours
Ename Ssn
ED1
Bdate Address Dnumber
Ssn
EP2
FD2
Ename Pnumber
EP3
FD3
Pname Plocation
Ename Ssn
EMP_DEPT
Bdate Address Dnumber Dname Dmgr_ssn
Dnumber
ED2
Dname Dmgr_ssn
3NF Normalization
Figure 11
Normalizing into 2NF and 3NF. (a) Normalizing EMP_PROJ into
2NF relations. (b) Normalizing EMP_DEPT into 3NF relations.
dependencies Ssn → Dnumber and Dnumber → Dmgr_ssn hold and Dnumber is nei-
ther a key itself nor a subset of the key of EMP_DEPT. Intuitively, we can see that
the dependency of Dmgr_ssn on Dnumber is undesirable in EMP_DEPT since
Dnumber is not a key of EMP_DEPT.
Definition. According to Codd’s original definition, a relation schema R is in
3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent
on the primary key.
The relation schema EMP_DEPT in Figure 3(a) is in 2NF, since no partial depen-
dencies on a key exist. However, EMP_DEPT is not in 3NF because of the transitive
dependency of Dmgr_ssn (and also Dname) on Ssn via Dnumber. We can normalize
531
Basics of Functional Dependencies and Normalization for Relational Databases
Table 1 Summary of Normal Forms Based on Primary Keys and Corresponding Normalization
Normal Form Test Remedy (Normalization)
First (1NF) Relation should have no multivalued
attributes or nested relations.
Form new relations for each multivalued
attribute or nested relation.
Second (2NF) For relations where primary key con-
tains multiple attributes, no nonkey
attribute should be functionally
dependent on a part of the primary key.
Decompose and set up a new relation for
each partial key with its dependent attrib-
ute(s). Make sure to keep a relation with
the original primary key and any attributes
that are fully functionally dependent on it.
Third (3NF) Relation should not have a nonkey
attribute functionally determined by
another nonkey attribute (or by a set of
nonkey attributes). That is, there should
be no transitive dependency of a non-
key attribute on the primary key.
Decompose and set up a relation that
includes the nonkey attribute(s) that func-
tionally determine(s) other nonkey attrib-
ute(s).
EMP_DEPT by decomposing it into the two 3NF relation schemas ED1 and ED2
shown in Figure 11(b). Intuitively, we see that ED1 and ED2 represent independent
entity facts about employees and departments. A NATURAL JOIN operation on ED1
and ED2 will recover the original relation EMP_DEPT without generating spurious
tuples.
Intuitively, we can see that any functional dependency in which the left-hand side is
part (a proper subset) of the primary key, or any functional dependency in which
the left-hand side is a nonkey attribute, is a problematic FD. 2NF and 3NF normal-
ization remove these problem FDs by decomposing the original relation into new
relations. In terms of the normalization process, it is not necessary to remove the
partial dependencies before the transitive dependencies, but historically, 3NF has
been defined with the assumption that a relation is tested for 2NF first before it is
tested for 3NF. Table 1 informally summarizes the three normal forms based on pri-
mary keys, the tests used in each case, and the corresponding remedy or normaliza-
tion performed to achieve the normal form.
4 General Definitions of Second
and Third Normal Forms
In general, we want to design our relation schemas so that they have neither partial
nor transitive dependencies because these types of dependencies cause the update
anomalies discussed in Section 1.2. The steps for normalization into 3NF relations
that we have discussed so far disallow partial and transitive dependencies on the
primary key. The normalization procedure described so far is useful for analysis in
practical situations for a given database where primary keys have already been
defined. These definitions, however, do not take other candidate keys of a relation, if
532
Basics of Functional Dependencies and Normalization for Relational Databases
any, into account. In this section we give the more general definitions of 2NF and
3NF that take all candidate keys of a relation into account. Notice that this does not
affect the definition of 1NF since it is independent of keys and functional depen-
dencies. As a general definition of prime attribute, an attribute that is part of any
candidate key will be considered as prime. Partial and full functional dependencies
and transitive dependencies will now be considered with respect to all candidate keys
of a relation.
4.1 General Definition of Second Normal Form
Definition. A relation schema R is in second normal form (2NF) if every non-
prime attribute A in R is not partially dependent on any key of R.11
The test for 2NF involves testing for functional dependencies whose left-hand side
attributes are part of the primary key. If the primary key contains a single attribute,
the test need not be applied at all. Consider the relation schema LOTS shown in
Figure 12(a), which describes parcels of land for sale in various counties of a state.
Suppose that there are two candidate keys: Property_id# and {County_name, Lot#};
that is, lot numbers are unique only within each county, but Property_id# numbers
are unique across counties for the entire state.
Based on the two candidate keys Property_id# and {County_name, Lot#}, the func-
tional dependencies FD1 and FD2 in Figure 12(a) hold. We choose Property_id# as
the primary key, so it is underlined in Figure 12(a), but no special consideration will
be given to this key over the other candidate key. Suppose that the following two
additional functional dependencies hold in LOTS:
FD3: County_name → Tax_rate
FD4: Area → Price
In words, the dependency FD3 says that the tax rate is fixed for a given county (does
not vary lot by lot within the same county), while FD4 says that the price of a lot is
determined by its area regardless of which county it is in. (Assume that this is the
price of the lot for tax purposes.)
The LOTS relation schema violates the general definition of 2NF because Tax_rate is
partially dependent on the candidate key {County_name, Lot#}, due to FD3. To nor-
malize LOTS into 2NF, we decompose it into the two relations LOTS1 and LOTS2,
shown in Figure 12(b). We construct LOTS1 by removing the attribute Tax_rate that
violates 2NF from LOTS and placing it with County_name (the left-hand side of FD3
that causes the partial dependency) into another relation LOTS2. Both LOTS1 and
LOTS2 are in 2NF. Notice that FD4 does not violate 2NF and is carried over to
LOTS1.
11This definition can be restated as follows: A relation schema R is in 2NF if every nonprime attribute A
in R is fully functionally dependent on every key of R.
533
Basics of Functional Dependencies and Normalization for Relational Databases
Property_id#
LOTS
(a)
FD1
FD2
FD3
FD4
County_name Lot# Area Price Tax_rate
Property_id#
LOTS1
(b)
FD1
FD2
FD4
County_name Lot# Area Price
(c)
(d)
Property_id#
LOTS1A
FD1
FD2
County_name Lot# Area
LOTS2
FD3
County_name Tax_rate
LOTS1B
FD4
Area Price
LOTS 1NF
LOTS1
LOTS1A LOTS1B
LOTS2 2NF
LOTS2 3NF
Candidate Key
Figure 12
Normalization into 2NF and 3NF. (a) The LOTS relation with its functional dependencies
FD1 through FD4. (b) Decomposing into the 2NF relations LOTS1 and LOTS2. (c)
Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B. (d) Summary of the
progressive normalization of LOTS.
534
Basics of Functional Dependencies and Normalization for Relational Databases
4.2 General Definition of Third Normal Form
Definition. A relation schema R is in third normal form (3NF) if, whenever a
nontrivial functional dependency X → A holds in R, either (a) X is a superkey of
R, or (b) A is a prime attribute of R.
According to this definition, LOTS2 (Figure 12(b)) is in 3NF. However, FD4 in LOTS1
violates 3NF because Area is not a superkey and Price is not a prime attribute in
LOTS1. To normalize LOTS1 into 3NF, we decompose it into the relation schemas
LOTS1A and LOTS1B shown in Figure 12(c). We construct LOTS1A by removing the
attribute Price that violates 3NF from LOTS1 and placing it with Area (the left-hand
side of FD4 that causes the transitive dependency) into another relation LOTS1B.
Both LOTS1A and LOTS1B are in 3NF.
Two points are worth noting about this example and the general definition of 3NF:
■ LOTS1 violates 3NF because Price is transitively dependent on each of the
candidate keys of LOTS1 via the nonprime attribute Area.
■ This general definition can be applied directly to test whether a relation
schema is in 3NF; it does not have to go through 2NF first. If we apply the
above 3NF definition to LOTS with the dependencies FD1 through FD4, we
find that both FD3 and FD4 violate 3NF. Therefore, we could decompose
LOTS into LOTS1A, LOTS1B, and LOTS2 directly. Hence, the transitive and
partial dependencies that violate 3NF can be removed in any order.
4.3 Interpreting the General Definition
of Third Normal Form
A relation schema R violates the general definition of 3NF if a functional depen-
dency X → A holds in R that does not meet either condition—meaning that it vio-
lates both conditions (a) and (b) of 3NF. This can occur due to two types of
problematic functional dependencies:
■ A nonprime attribute determines another nonprime attribute. Here we typ-
ically have a transitive dependency that violates 3NF.
■ A proper subset of a key of R functionally determines a nonprime attribute.
Here we have a partial dependency that violates 3NF (and also 2NF).
Therefore, we can state a general alternative definition of 3NF as follows:
Alternative Definition. A relation schema R is in 3NF if every nonprime attribute
of R meets both of the following conditions:
■ It is fully functionally dependent on every key of R.
■ It is nontransitively dependent on every key of R.
535
Basics of Functional Dependencies and Normalization for Relational Databases
Property_id#
LOTS1A(a)
(b)
FD1
FD2
FD1
FD2
FD5
BCNF Normalization
County_name Lot# Area
Property_id#
LOTS1AX
Area Lot#
A
R
B C
Area
LOTS1AY
County_name
Figure 13
Boyce-Codd normal form. (a) BCNF
normalization of LOTS1A with the func-
tional dependency FD2 being lost in
the decomposition. (b) A schematic
relation with FDs; it is in 3NF, but not in
BCNF.
5 Boyce-Codd Normal Form
Boyce-Codd normal form (BCNF) was proposed as a simpler form of 3NF, but it
was found to be stricter than 3NF. That is, every relation in BCNF is also in 3NF;
however, a relation in 3NF is not necessarily in BCNF. Intuitively, we can see the need
for a stronger normal form than 3NF by going back to the LOTS relation schema in
Figure 12(a) with its four functional dependencies FD1 through FD4. Suppose that
we have thousands of lots in the relation but the lots are from only two counties:
DeKalb and Fulton. Suppose also that lot sizes in DeKalb County are only 0.5, 0.6,
0.7, 0.8, 0.9, and 1.0 acres, whereas lot sizes in Fulton County are restricted to 1.1,
1.2, …, 1.9, and 2.0 acres. In such a situation we would have the additional func-
tional dependency FD5: Area → County_name. If we add this to the other dependen-
cies, the relation schema LOTS1A still is in 3NF because County_name is a prime
attribute.
The area of a lot that determines the county, as specified by FD5, can be represented
by 16 tuples in a separate relation R(Area, County_name), since there are only 16 pos-
sible Area values (see Figure 13). This representation reduces the redundancy of
repeating the same information in the thousands of LOTS1A tuples. BCNF is a
stronger normal form that would disallow LOTS1A and suggest the need for decom-
posing it.
Definition. A relation schema R is in BCNF if whenever a nontrivial functional
dependency X → A holds in R, then X is a superkey of R.
536
Basics of Functional Dependencies and Normalization for Relational Databases
TEACH
Student
Narayan
Smith
Smith
Smith
Mark
Navathe
Ammar
Schulman
Operating Systems
Database
Database
Theory
Wallace
Wallace
Wong
Zelaya
Mark
Ahamad
Omiecinski
Navathe
Database
Database
Operating Systems
Database
Course Instructor
Narayan Operating Systems Ammar
Figure 14
A relation TEACH that is in
3NF but not BCNF.
The formal definition of BCNF differs from the definition of 3NF in that condition
(b) of 3NF, which allows A to be prime, is absent from BCNF. That makes BCNF a
stronger normal form compared to 3NF. In our example, FD5 violates BCNF in
LOTS1A because AREA is not a superkey of LOTS1A. Note that FD5 satisfies 3NF in
LOTS1A because County_name is a prime attribute (condition b), but this condition
does not exist in the definition of BCNF. We can decompose LOTS1A into two BCNF
relations LOTS1AX and LOTS1AY, shown in Figure 13(a). This decomposition loses
the functional dependency FD2 because its attributes no longer coexist in the same
relation after decomposition.
In practice, most relation schemas that are in 3NF are also in BCNF. Only if X → A
holds in a relation schema R with X not being a superkey and A being a prime
attribute will R be in 3NF but not in BCNF. The relation schema R shown in Figure
13(b) illustrates the general case of such a relation. Ideally, relational database
design should strive to achieve BCNF or 3NF for every relation schema. Achieving
the normalization status of just 1NF or 2NF is not considered adequate, since they
were developed historically as stepping stones to 3NF and BCNF.
As another example, consider Figure 14, which shows a relation TEACH with the fol-
lowing dependencies:
FD1: {Student, Course} → Instructor
FD2:12 Instructor → Course
Note that {Student, Course} is a candidate key for this relation and that the depen-
dencies shown follow the pattern in Figure 13(b), with Student as A, Course as B,
and Instructor as C. Hence this relation is in 3NF but not BCNF. Decomposition of
this relation schema into two schemas is not straightforward because it may be
12This dependency means that each instructor teaches one course is a constraint for this application.
537
Basics of Functional Dependencies and Normalization for Relational Databases
decomposed into one of the three following possible pairs:
1. {Student, Instructor} and {Student, Course}.
2. {Course, Instructor} and {Course, Student}.
3. {Instructor, Course} and {Instructor, Student}.
All three decompositions lose the functional dependency FD1. The desirable decom-
position of those just shown is 3 because it will not generate spurious tuples after a
join.
A test to determine whether a decomposition is nonadditive (or lossless) is dis-
cussed under Property NJB. In general, a relation not in BCNF should be decom-
posed so as to meet this property.
We make sure that we meet this property, because nonadditive decomposition is a
must during normalization. We may have to possibly forgo the preservation of all
functional dependencies in the decomposed relations, as is the case in this example.
The algorithm used to give decomposition 3 for TEACH which yields two relations
in BCNF as:
(Instructor, Course) and (Instructor, Student)
Note that if we designate (Student, Instructor) as a primary key of the relation
TEACH, the FD Instructor → Course causes a partial (non-full-functional) depend-
ency of Course on a part of this key. This FD may be removed as a part of second
normalization yielding exactly the same two relations in the result. This is an
example of a case where we may reach the same ultimate BCNF design via alternate
paths of normalization.
6 Multivalued Dependency
and Fourth Normal Form
So far we have discussed the concept of functional dependency, which is by far the
most important type of dependency in relational database design theory, and nor-
mal forms based on functional dependencies. However, in many cases relations have
constraints that cannot be specified as functional dependencies. In this section, we
discuss the concept of multivalued dependency (MVD) and define fourth normal
form, which is based on this dependency. Multivalued dependencies are a conse-
quence of first normal form (1NF) (see Section 3.4), which disallows an attribute in
a tuple to have a set of values, and the accompanying process of converting an
unnormalized relation into 1NF. If we have two or more multivalued independent
attributes in the same relation schema, we get into a problem of having to repeat
every value of one of the attributes with every value of the other attribute to keep
the relation state consistent and to maintain the independence among the attributes
involved. This constraint is specified by a multivalued dependency.
538
Basics of Functional Dependencies and Normalization for Relational Databases
(a) EMP
Ename
Smith
Smith
Smith
Smith
John
Anna
Anna
John
X
Y
X
Y
Pname Dname
(b) EMP_PROJECTS
Ename
Smith
Smith
X
Y
Pname
EMP_DEPENDENTS
Ename
Smith
Smith
John
Anna
Dname
(c) SUPPLY
Sname
Smith
Smith
Adamsky
Walton
Adamsky
Adamsky
Smith
Bolt
Bolt
Nut
Bolt
Nut
Nail
Bolt
ProjY
ProjX
ProjY
ProjX
ProjZ
ProjX
ProjY
Part_name Proj_name
(d) R1
Sname
Smith
Smith
Adamsky
Walton
Adamsky
Bolt
Bolt
Nut
Nut
Nail
Bolt
Bolt
Nut
Nut
Nail
Part_name
R2
Sname
Smith
Smith
Adamsky
Walton
Adamsky
Proj_name
ProjY
ProjX
ProjY
ProjZ
ProjX
R3
Part_name Proj_name
ProjY
ProjX
ProjY
ProjZ
ProjX
Figure 15
Fourth and fifth normal forms.
(a) The EMP relation with two MVDs: Ename →→ Pname and Ename →→ Dname.
(b) Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS.
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3).
(d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, R3.
For example, consider the relation EMP shown in Figure 15(a). A tuple in this EMP
relation represents the fact that an employee whose name is Ename works on the
project whose name is Pname and has a dependent whose name is Dname. An
employee may work on several projects and may have several dependents, and the
employee’s projects and dependents are independent of one another.13 To keep the
relation state consistent, and to avoid any spurious relationship between the two
independent attributes, we must have a separate tuple to represent every combina-
tion of an employee’s dependent and an employee’s project. This constraint is spec-
13In an ER diagram, each would be represented as a multivalued attribute or as a weak entity type.
539
Basics of Functional Dependencies and Normalization for Relational Databases
ified as a multivalued dependency on the EMP relation, which we define in this sec-
tion. Informally, whenever two independent 1:N relationships A:B and A:C are
mixed in the same relation, R(A, B, C), an MVD may arise.14
6.1 Formal Definition of Multivalued Dependency
Definition. A multivalued dependency X →→ Y specified on relation schema R,
where X and Y are both subsets of R, specifies the following constraint on any
relation state r of R: If two tuples t1 and t2 exist in r such that t1[X] = t2[X], then
two tuples t3 and t4 should also exist in r with the following properties,
15 where
we use Z to denote (R – (X ∪ Y)):16
■ t3[X] = t4[X] = t1[X] = t2[X].
■ t3[Y] = t1[Y] and t4[Y] = t2[Y].
■ t3[Z] = t2[Z] and t4[Z] = t1[Z].
Whenever X →→ Y holds, we say that X multidetermines Y. Because of the symme-
try in the definition, whenever X →→ Y holds in R, so does X →→ Z. Hence, X →→ Y
implies X →→ Z, and therefore it is sometimes written as X →→ Y|Z.
An MVD X →→ Y in R is called a trivial MVD if (a) Y is a subset of X, or (b) X ∪ Y
= R. For example, the relation EMP_PROJECTS in Figure 15(b) has the trivial MVD
Ename →→ Pname. An MVD that satisfies neither (a) nor (b) is called a nontrivial
MVD. A trivial MVD will hold in any relation state r of R; it is called trivial because
it does not specify any significant or meaningful constraint on R.
If we have a nontrivial MVD in a relation, we may have to repeat values redundantly
in the tuples. In the EMP relation of Figure 15(a), the values ‘X’ and ‘Y’ of Pname are
repeated with each value of Dname (or, by symmetry, the values ‘John’ and ‘Anna’ of
Dname are repeated with each value of Pname). This redundancy is clearly undesir-
able. However, the EMP schema is in BCNF because no functional dependencies
hold in EMP. Therefore, we need to define a fourth normal form that is stronger
than BCNF and disallows relation schemas such as EMP. Notice that relations con-
taining nontrivial MVDs tend to be all-key relations—that is, their key is all their
attributes taken together. Furthermore, it is rare that such all-key relations with a
combinatorial occurrence of repeated values would be designed in practice.
However, recognition of MVDs as a potential problematic dependency is essential
in relational design.
We now present the definition of fourth normal form (4NF), which is violated
when a relation has undesirable multivalued dependencies, and hence can be used
to identify and decompose such relations.
14This MVD is denoted as A →→ B|C.
15The tuples t1, t2, t3, and t4 are not necessarily distinct.
16Z is shorthand for the attributes in R after the attributes in (X ∪ Y) are removed from R.
540
Basics of Functional Dependencies and Normalization for Relational Databases
Definition. A relation schema R is in 4NF with respect to a set of dependencies
F (that includes functional dependencies and multivalued dependencies) if, for
every nontrivial multivalued dependency X →→ Y in F+17 X is a superkey for R.
We can state the following points:
■ An all-key relation is always in BCNF since it has no FDs.
■ An all-key relation such as the EMP relation in Figure 15(a), which has no
FDs but has the MVD Ename →→ Pname | Dname, is not in 4NF.
■ A relation that is not in 4NF due to a nontrivial MVD must be decomposed
to convert it into a set of relations in 4NF.
■ The decomposition removes the redundancy caused by the MVD.
The process of normalizing a relation involving the nontrivial MVDs that is not in
4NF consists of decomposing it so that each MVD is represented by a separate rela-
tion where it becomes a trivial MVD. Consider the EMP relation in Figure 15(a).
EMP is not in 4NF because in the nontrivial MVDs Ename →→ Pname and Ename
→→ Dname, and Ename is not a superkey of EMP. We decompose EMP into
EMP_PROJECTS and EMP_DEPENDENTS, shown in Figure 15(b). Both
EMP_PROJECTS and EMP_DEPENDENTS are in 4NF, because the MVDs Ename
→→ Pname in EMP_PROJECTS and Ename →→ Dname in EMP_DEPENDENTS are
trivial MVDs. No other nontrivial MVDs hold in either EMP_PROJECTS or
EMP_DEPENDENTS. No FDs hold in these relation schemas either.
7 Join Dependencies
and Fifth Normal Form
In our discussion so far, we have pointed out the problematic functional dependen-
cies and showed how they were eliminated by a process of repeated binary decom-
position to remove them during the process of normalization to achieve 1NF, 2NF,
3NF and BCNF. These binary decompositions must obey the NJB property that we
referenced while discussing the decomposition to achieve BCNF. Achieving 4NF
typically involves eliminating MVDs by repeated binary decompositions as well.
However, in some cases there may be no nonadditive join decomposition of R into
two relation schemas, but there may be a nonadditive join decomposition into more
than two relation schemas. Moreover, there may be no functional dependency in R
that violates any normal form up to BCNF, and there may be no nontrivial MVD
present in R either that violates 4NF. We then resort to another dependency called
the join dependency and, if it is present, carry out a multiway decomposition into fifth
normal form (5NF). It is important to note that such a dependency is a very pecu-
liar semantic constraint that is very difficult to detect in practice; therefore, normal-
ization into 5NF is very rarely done in practice.
17F+ refers to the cover of functional dependencies F, or all dependencies that are implied by F.
541
Basics of Functional Dependencies and Normalization for Relational Databases
Definition. A join dependency (JD), denoted by JD(R1, R2, …, Rn), specified on
relation schema R, specifies a constraint on the states r of R. The constraint
states that every legal state r of R should have a nonadditive join decomposition
into R1, R2, …, Rn. Hence, for every such r we have
∗ (πR1(r), πR2(r), …, πRn(r)) = r
Notice that an MVD is a special case of a JD where n = 2. That is, a JD denoted as
JD(R1, R2) implies an MVD (R1 ∩ R2) →→ (R1 – R2) (or, by symmetry, (R1 ∩ R2)
→→(R2 – R1)). A join dependency JD(R1, R2, …, Rn), specified on relation schema R,
is a trivial JD if one of the relation schemas Ri in JD(R1, R2, …, Rn) is equal to R.
Such a dependency is called trivial because it has the nonadditive join property for
any relation state r of R and thus does not specify any constraint on R. We can now
define fifth normal form, which is also called project-join normal form.
Definition. A relation schema R is in fifth normal form (5NF) (or project-join
normal form (PJNF)) with respect to a set F of functional, multivalued, and
join dependencies if, for every nontrivial join dependency JD(R1, R2, …, Rn) in
F+ (that is, implied by F),18 every Ri is a superkey of R.
For an example of a JD, consider once again the SUPPLY all-key relation in Figure
15(c). Suppose that the following additional constraint always holds: Whenever a
supplier s supplies part p, and a project j uses part p, and the supplier s supplies at
least one part to project j, then supplier s will also be supplying part p to project j.
This constraint can be restated in other ways and specifies a join dependency JD(R1,
R2, R3) among the three projections R1(Sname, Part_name), R2(Sname, Proj_name),
and R3(Part_name, Proj_name) of SUPPLY. If this constraint holds, the tuples below
the dashed line in Figure 15(c) must exist in any legal state of the SUPPLY relation
that also contains the tuples above the dashed line. Figure 15(d) shows how the
SUPPLY relation with the join dependency is decomposed into three relations R1, R2,
and R3 that are each in 5NF. Notice that applying a natural join to any two of these
relations produces spurious tuples, but applying a natural join to all three together
does not. The reader should verify this on the sample relation in Figure 15(c) and its
projections in Figure 15(d). This is because only the JD exists, but no MVDs are
specified. Notice, too, that the JD(R1, R2, R3) is specified on all legal relation states,
not just on the one shown in Figure 15(c).
Discovering JDs in practical databases with hundreds of attributes is next to impos-
sible. It can be done only with a great degree of intuition about the data on the part
of the designer. Therefore, the current practice of database design pays scant atten-
tion to them.
8 Summary
In this chapter we discussed several pitfalls in relational database design using intu-
itive arguments. We identified informally some of the measures for indicating
18Again, F+ refers to the cover of functional dependencies F, or all dependencies that are implied by F.
542
Basics of Functional Dependencies and Normalization for Relational Databases
whether a relation schema is good or bad, and provided informal guidelines for a
good design. These guidelines are based on doing a careful conceptual design in the
ER and EER model, following mapping procedure correctly to map entities and
relationships into relations. Proper enforcement of these guidelines and lack of
redundancy will avoid the insertion/deletion/update anomalies, and generation of
spurious data. We recommended limiting NULL values, which cause problems dur-
ing SELECT, JOIN, and aggregation operations. Then we presented some formal
concepts that allow us to do relational design in a top-down fashion by analyzing
relations individually. We defined this process of design by analysis and decomposi-
tion by introducing the process of normalization.
We defined the concept of functional dependency, which is the basic tool for analyz-
ing relational schemas, and discussed some of its properties. Functional dependen-
cies specify semantic constraints among the attributes of a relation schema. Next we
described the normalization process for achieving good designs by testing relations
for undesirable types of problematic functional dependencies. We provided a treat-
ment of successive normalization based on a predefined primary key in each rela-
tion, and then relaxed this requirement and provided more general definitions of
second normal form (2NF) and third normal form (3NF) that take all candidate
keys of a relation into account. We presented examples to illustrate how by using the
general definition of 3NF a given relation may be analyzed and decomposed to
eventually yield a set of relations in 3NF.
We presented Boyce-Codd normal form (BCNF) and discussed how it is a stronger
form of 3NF. We also illustrated how the decomposition of a non-BCNF relation
must be done by considering the nonadditive decomposition requirement. Then we
introduced the fourth normal form based on multivalued dependencies that typi-
cally arise due to mixing independent multivalued attributes into a single relation.
Finally, we introduced the fifth normal form, which is based on join dependency,
and which identifies a peculiar constraint that causes a relation to be decomposed
into several components so that they always yield the original relation back after a
join. In practice, most commercial designs have followed the normal forms up to
BCNF. Need for decomposing into 5NF rarely arises in practice, and join dependen-
cies are difficult to detect for most practical situations, making 5NF more of theo-
retical value.
Review Questions
1. Discuss attribute semantics as an informal measure of goodness for a rela-
tion schema.
543
Basics of Functional Dependencies and Normalization for Relational Databases
2. Discuss insertion, deletion, and modification anomalies. Why are they con-
sidered bad? Illustrate with examples.
3. Why should NULLs in a relation be avoided as much as possible? Discuss the
problem of spurious tuples and how we may prevent it.
4. State the informal guidelines for relation schema design that we discussed.
Illustrate how violation of these guidelines may be harmful.
5. What is a functional dependency? What are the possible sources of the infor-
mation that defines the functional dependencies that hold among the attrib-
utes of a relation schema?
6. Why can we not infer a functional dependency automatically from a partic-
ular relation state?
7. What does the term unnormalized relation refer to? How did the normal
forms develop historically from first normal form up to Boyce-Codd normal
form?
8. Define first, second, and third normal forms when only primary keys are
considered. How do the general definitions of 2NF and 3NF, which consider
all keys of a relation, differ from those that consider only primary keys?
9. What undesirable dependencies are avoided when a relation is in 2NF?
10. What undesirable dependencies are avoided when a relation is in 3NF?
11. In what way do the generalized definitions of 2NF and 3NF extend the defi-
nitions beyond primary keys?
12. Define Boyce-Codd normal form. How does it differ from 3NF? Why is it
considered a stronger form of 3NF?
13. What is multivalued dependency? When does it arise?
14. Does a relation with two or more columns always have an MVD? Show with
an example.
15. Define fourth normal form. When is it violated? When is it typically
applicable?
16. Define join dependency and fifth normal form.
17. Why is 5NF also called project-join normal form (PJNF)?
18. Why do practical database designs typically aim for BCNF and not aim for
higher normal forms?
Exercises
19. Suppose that we have the following requirements for a university database
that is used to keep track of students’ transcripts:
a. The university keeps track of each student’s name (Sname), student num-
ber (Snum), Social Security number (Ssn), current address (Sc_addr) and
544
Basics of Functional Dependencies and Normalization for Relational Databases
phone (Sc_phone), permanent address (Sp_addr) and phone (Sp_phone),
birth date (Bdate), sex (Sex), class (Class) (‘freshman’, ‘sophomore’, … ,
‘graduate’), major department (Major_code), minor department
(Minor_code) (if any), and degree program (Prog) (‘b.a.’, ‘b.s.’, … , ‘ph.d.’).
Both Ssn and student number have unique values for each student.
b. Each department is described by a name (Dname), department code
(Dcode), office number (Doffice), office phone (Dphone), and college
(Dcollege). Both name and code have unique values for each department.
c. Each course has a course name (Cname), description (Cdesc), course
number (Cnum), number of semester hours (Credit), level (Level), and
offering department (Cdept). The course number is unique for each
course.
d. Each section has an instructor (Iname), semester (Semester), year (Year),
course (Sec_course), and section number (Sec_num). The section number
distinguishes different sections of the same course that are taught during
the same semester/year; its values are 1, 2, 3, …, up to the total number of
sections taught during each semester.
e. A grade record refers to a student (Ssn), a particular section, and a grade
(Grade).
Design a relational database schema for this database application. First show
all the functional dependencies that should hold among the attributes. Then
design relation schemas for the database that are each in 3NF or BCNF.
Specify the key attributes of each relation. Note any unspecified require-
ments, and make appropriate assumptions to render the specification
complete.
20. What update anomalies occur in the EMP_PROJ and EMP_DEPT relations of
Figures 3 and 4?
21. In what normal form is the LOTS relation schema in Figure 12(a) with
respect to the restrictive interpretations of normal form that take only the
primary key into account? Would it be in the same normal form if the gen-
eral definitions of normal form were used?
22. Prove that any relation schema with two attributes is in BCNF.
23. Why do spurious tuples occur in the result of joining the EMP_PROJ1 and
EMP_ LOCS relations in Figure 5 (result shown in Figure 6)?
24. Consider the universal relation R = {A, B, C, D, E, F, G, H, I, J} and the set of
functional dependencies F = { {A, B}→{C}, {A}→{D, E}, {B}→{F}, {F}→{G,
H}, {D}→{I, J} }. What is the key for R? Decompose R into 2NF and then
3NF relations.
25. Repeat Exercise 24 for the following different set of functional dependencies
G = {{A, B}→{C}, {B, D}→{E, F}, {A, D}→{G, H}, {A}→{I}, {H}→{J} }.
545
Basics of Functional Dependencies and Normalization for Relational Databases
A B C TUPLE#
10 b1 c1 1
10 b2 c2 2
11 b4 c1 3
12 b3 c4 4
13 b1 c1 5
14 b3 c4 6
26. Consider the following relation:
a. Given the previous extension (state), which of the following dependencies
may hold in the above relation? If the dependency cannot hold, explain
why by specifying the tuples that cause the violation.
i. A → B, ii. B → C, iii. C → B, iv. B → A, v. C → A
b. Does the above relation have a potential candidate key? If it does, what is
it? If it does not, why not?
27. Consider a relation R(A, B, C, D, E) with the following dependencies:
AB → C, CD → E, DE → B
Is AB a candidate key of this relation? If not, is ABD? Explain your answer.
28. Consider the relation R, which has attributes that hold schedules of courses
and sections at a university; R = {Course_no, Sec_no, Offering_dept,
Credit_hours, Course_level, Instructor_ssn, Semester, Year, Days_hours, Room_no,
No_of_students}. Suppose that the following functional dependencies hold
on R:
{Course_no} → {Offering_dept, Credit_hours, Course_level}
{Course_no, Sec_no, Semester, Year} → {Days_hours, Room_no,
No_of_students, Instructor_ssn}
{Room_no, Days_hours, Semester, Year} → {Instructor_ssn, Course_no,
Sec_no}
Try to determine which sets of attributes form keys of R. How would you
normalize this relation?
29. Consider the following relations for an order-processing application data-
base at ABC, Inc.
ORDER (O#, Odate, Cust#, Total_amount)
ORDER_ITEM(O#, I#, Qty_ordered, Total_price, Discount%)
Assume that each item has a different discount. The Total_price refers to one
item, Odate is the date on which the order was placed, and the Total_amount is
the amount of the order. If we apply a natural join on the relations
ORDER_ITEM and ORDER in this database, what does the resulting relation
schema look like? What will be its key? Show the FDs in this resulting rela-
tion. Is it in 2NF? Is it in 3NF? Why or why not? (State assumptions, if you
make any.)
546
Basics of Functional Dependencies and Normalization for Relational Databases
30. Consider the following relation:
CAR_SALE(Car#, Date_sold, Salesperson#, Commission%, Discount_amt)
Assume that a car may be sold by multiple salespeople, and hence {Car#,
Salesperson#} is the primary key. Additional dependencies are
Date_sold → Discount_amt and
Salesperson# → Commission%
Based on the given primary key, is this relation in 1NF, 2NF, or 3NF? Why or
why not? How would you successively normalize it completely?
31. Consider the following relation for published books:
BOOK (Book_title, Author_name, Book_type, List_price, Author_affil,
Publisher)
Author_affil refers to the affiliation of author. Suppose the following depen-
dencies exist:
Book_title → Publisher, Book_type
Book_type → List_price
Author_name → Author_affil
a. What normal form is the relation in? Explain your answer.
b. Apply normalization until you cannot decompose the relations further.
State the reasons behind each decomposition.
32. This exercise asks you to convert business statements into dependencies.
Consider the relation DISK_DRIVE (Serial_number, Manufacturer, Model, Batch,
Capacity, Retailer). Each tuple in the relation DISK_DRIVE contains informa-
tion about a disk drive with a unique Serial_number, made by a manufacturer,
with a particular model number, released in a certain batch, which has a cer-
tain storage capacity and is sold by a certain retailer. For example, the tuple
Disk_drive (‘1978619’, ‘WesternDigital’, ‘A2235X’, ‘765234’, 500, ‘CompUSA’)
specifies that WesternDigital made a disk drive with serial number 1978619
and model number A2235X, released in batch 765234; it is 500GB and sold
by CompUSA.
Write each of the following dependencies as an FD:
a. The manufacturer and serial number uniquely identifies the drive.
b. A model number is registered by a manufacturer and therefore can’t be
used by another manufacturer.
c. All disk drives in a particular batch are the same model.
d. All disk drives of a certain model of a particular manufacturer have
exactly the same capacity.
33. Consider the following relation:
R (Doctor#, Patient#, Date, Diagnosis, Treat_code, Charge)
547
Basics of Functional Dependencies and Normalization for Relational Databases
In the above relation, a tuple describes a visit of a patient to a doctor along
with a treatment code and daily charge. Assume that diagnosis is determined
(uniquely) for each patient by a doctor. Assume that each treatment code has
a fixed charge (regardless of patient). Is this relation in 2NF? Justify your
answer and decompose if necessary. Then argue whether further normaliza-
tion to 3NF is necessary, and if so, perform it.
34. Consider the following relation:
CAR_SALE (Car_id, Option_type, Option_listprice, Sale_date,
Option_discountedprice)
This relation refers to options installed in cars (e.g., cruise control) that were
sold at a dealership, and the list and discounted prices of the options.
If CarID → Sale_date and Option_type → Option_listprice and CarID,
Option_type → Option_discountedprice, argue using the generalized defini-
tion of the 3NF that this relation is not in 3NF. Then argue from your knowl-
edge of 2NF, why it is not even in 2NF.
35. Consider the relation:
BOOK (Book_Name, Author, Edition, Year)
with the data:
a. Based on a common-sense understanding of the above data, what are the
possible candidate keys of this relation?
b. Justify that this relation has the MVD { Book } →→ { Author } | { Edition, Year }.
c. What would be the decomposition of this relation based on the above
MVD? Evaluate each resulting relation for the highest normal form it
possesses.
36. Consider the following relation:
TRIP (Trip_id, Start_date, Cities_visited, Cards_used)
This relation refers to business trips made by company salespeople. Suppose
the TRIP has a single Start_date, but involves many Cities and salespeople may
use multiple credit cards on the trip. Make up a mock-up population of the
table.
a. Discuss what FDs and/or MVDs exist in this relation.
b. Show how you will go about normalizing it.
Book_Name Author Edition Copyright_Year
DB_fundamentals Navathe 4 2004
DB_fundamentals Elmasri 4 2004
DB_fundamentals Elmasri 5 2007
DB_fundamentals Navathe 5 2007
548
Basics of Functional Dependencies and Normalization for Relational Databases
Laboratory Exercise
Note: The following exercise use the DBD (Data Base Designer) system that is
described in the laboratory manual. The relational schema R and set of functional
dependencies F need to be coded as lists. As an example, R and F for this problem is
coded as:
R = [a, b, c, d, e, f, g, h, i, j]
F = [[[a, b],[c]],
[[a],[d, e]],
[[b],[f]],
[[f],[g, h]],
[[d],[i, j]]]
Since DBD is implemented in Prolog, use of uppercase terms is reserved for vari-
ables in the language and therefore lowercase constants are used to code the attrib-
utes. For further details on using the DBD system, please refer to the laboratory
manual.
37. Using the DBD system, verify your answers to the following exercises:
a. 15.24 (3NF only)
b. 15.25
c. 15.27
d. 15.28
Selected Bibliography
Functional dependencies were originally introduced by Codd (1970). The original
definitions of first, second, and third normal form were also defined in Codd
(1972a), where a discussion on update anomalies can be found. Boyce-Codd nor-
mal form was defined in Codd (1974). The alternative definition of third normal
form is given in Ullman (1988), as is the definition of BCNF that we give here.
Ullman (1988), Maier (1983), and Atzeni and De Antonellis (1993) contain many of
the theorems and proofs concerning functional dependencies.
549
Relational Database Design
Algorithms and Further
Dependencies
Atop-down relational design technique involvesdesigning an ER or EER conceptual schema, then
mapping it to the relational model. Primary keys are assigned to each relation based
on known functional dependencies. In the subsequent process, which may be called
relational design by analysis, initially designed relations from the above proce-
dure—or those inherited from previous files, forms, and other sources—are ana-
lyzed to detect undesirable functional dependencies. These dependencies are
removed by a successive normalization procedure.
In this chapter we use the theory of normal forms and functional, multivalued, and
join dependencies and build upon it while maintaining three different thrusts. First,
we discuss the concept of inferring new functional dependencies from a given set
and discuss notions including cover, minimal cover, and equivalence. Conceptually,
we need to capture the semantics of attibutes within a relation completely and
From Chapter 16 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
550
Relational Database Design Algorithms and Further Dependencies
succinctly, and the minimal cover allows us to do it. Second, we discuss the desirable
properties of nonadditive (lossless) joins and preservation of functional dependen-
cies. A general algorithm to test for nonadditivity of joins among a set of relations is
presented. Third, we present an approach to relational design by synthesis of func-
tional dependencies. This is a bottom-up approach to design that presupposes that
the known functional dependencies among sets of attributes in the Universe of
Discourse (UoD) have been given as input. We present algorithms to achieve the
desirable normal forms, namely 3NF and BCNF, and achieve one or both of the
desirable properties of nonadditivity of joins and functional dependency preserva-
tion. Although the synthesis approach is theoretically appealing as a formal
approach, it has not been used in practice for large database design projects because
of the difficulty of providing all possible functional dependencies up front before
the design can be attempted. Alternately, successive decompositions and ongoing
refinements to design become more manageable and may evolve over time. The
final goal of this chapter is to discuss further the multivalued dependency (MVD)
concept and briefly point out other types of dependencies.
In Section 1 we discuss the rules of inference for functional dependencies and use
them to define the concepts of a cover, equivalence, and minimal cover among func-
tional dependencies. In Section 2, first we describe the two desirable properties of
decompositions, namely, the dependency preservation property and the nonaddi-
tive (or lossless) join property, which are both used by the design algorithms to
achieve desirable decompositions. It is important to note that it is insufficient to test
the relation schemas independently of one another for compliance with higher nor-
mal forms like 2NF, 3NF, and BCNF. The resulting relations must collectively satisfy
these two additional properties to qualify as a good design. Section 3 is devoted to
the development of relational design algorithms that start off with one giant rela-
tion schema called the universal relation, which is a hypothetical relation contain-
ing all the attributes. This relation is decomposed (or in other words, the given
functional dependencies are synthesized) into relations that satisfy a certain normal
form like 3NF or BCNF and also meet one or both of the desirable properties.
In Section 5 we discuss the multivalued dependency (MVD) concept further by
applying the notions of inference, and equivalence to MVDs. Finally, in Section 6 we
complete the discussion on dependencies among data by introducing inclusion
dependencies and template dependencies. Inclusion dependencies can represent
referential integrity constraints and class/subclass constraints across relations.
Template dependencies are a way of representing any generalized constraint on
attributes. We also describe some situations where a procedure or function is
needed to state and verify a functional dependency among attributes. Then we
briefly discuss domain-key normal form (DKNF), which is considered the most
general normal form. Section 7 summarizes this chapter.
It is possible to skip some or all of Sections 3, 4, and 5 in an introductory database
course.
551
Relational Database Design Algorithms and Further Dependencies
1 Further Topics in Functional Dependencies:
Inference Rules, Equivalence, and Minimal
Cover
In the chapter “Basics of Functional Dependencies and Normalization for Relational
Databases,” we introduced the concept of functional dependencies (FDs), illustrated
it with some examples, and developed a notation to denote multiple FDs over a sin-
gle relation. We identified and discussed problematic functional dependencies and
showed how they can be eliminated by a proper decomposition of a relation. This
process was described as normalization and we showed how to achieve the first
through third normal forms (1NF through 3NF) given primary keys. We provided
generalized tests for 2NF (Second normal form), 3NF (Third normal form), and
BCNF (Boyce-Codd normal form) given any number of candidate keys in a relation
and showed how to achieve them. Now we return to the study of functional depen-
dencies and show how new dependencies can be inferred from a given set and discuss
the concepts of closure, equivalence, and minimal cover that we will need when we
later consider a synthesis approach to design of relations given a set of FDs.
1.1 Inference Rules for Functional Dependencies
We denote by F the set of functional dependencies that are specified on relation
schema R. Typically, the schema designer specifies the functional dependencies that
are semantically obvious; usually, however, numerous other functional dependencies
hold in all legal relation instances among sets of attributes that can be derived from
and satisfy the dependencies in F. Those other dependencies can be inferred or
deduced from the FDs in F.
In real life, it is impossible to specify all possible functional dependencies for a given
situation. For example, if each department has one manager, so that Dept_no
uniquely determines Mgr_ssn (Dept_no → Mgr_ssn), and a manager has a unique
phone number called Mgr_phone (Mgr_ssn → Mgr_phone), then these two dependen-
cies together imply that Dept_no → Mgr_phone. This is an inferred FD and need not
be explicitly stated in addition to the two given FDs. Therefore, it is useful to define
a concept called closure formally that includes all possible dependencies that can be
inferred from the given set F.
Definition. Formally, the set of all dependencies that include F as well as all
dependencies that can be inferred from F is called the closure of F; it is denoted
by F+.
For example, suppose that we specify the following set F of obvious functional
dependencies on a relation schema:
F = {Ssn → {Ename, Bdate, Address, Dnumber}, Dnumber → {Dname, Dmgr_ssn} }
Some of the additional functional dependencies that we can infer from F are the fol-
lowing:
Ssn → {Dname, Dmgr_ssn}
Ssn → Ssn
Dnumber → Dname
552
Relational Database Design Algorithms and Further Dependencies
An FD X → Y is inferred from a set of dependencies F specified on R if X → Y
holds in every legal relation state r of R; that is, whenever r satisfies all the depend-
encies in F, X → Y also holds in r. The closure F+ of F is the set of all functional
dependencies that can be inferred from F. To determine a systematic way to infer
dependencies, we must discover a set of inference rules that can be used to infer
new dependencies from a given set of dependencies. We consider some of these
inference rules next. We use the notation F |=X → Y to denote that the functional
dependency X → Y is inferred from the set of functional dependencies F.
In the following discussion, we use an abbreviated notation when discussing func-
tional dependencies. We concatenate attribute variables and drop the commas for
convenience. Hence, the FD {X,Y} → Z is abbreviated to XY → Z, and the FD {X, Y,
Z} → {U, V} is abbreviated to XYZ → UV. The following six rules IR1 through IR6
are well-known inference rules for functional dependencies:
IR1 (reflexive rule)1: If X ⊇ Y, then X →Y.
IR2 (augmentation rule)2: {X → Y} |=XZ → YZ.
IR3 (transitive rule): {X → Y, Y → Z} |=X → Z.
IR4 (decomposition, or projective, rule): {X → YZ} |=X → Y.
IR5 (union, or additive, rule): {X → Y, X → Z} |=X → YZ.
IR6 (pseudotransitive rule): {X → Y, WY → Z} |=WX → Z.
The reflexive rule (IR1) states that a set of attributes always determines itself or any
of its subsets, which is obvious. Because IR1 generates dependencies that are always
true, such dependencies are called trivial. Formally, a functional dependency X → Y
is trivial if X ⊇ Y; otherwise, it is nontrivial. The augmentation rule (IR2) says that
adding the same set of attributes to both the left- and right-hand sides of a depen-
dency results in another valid dependency. According to IR3, functional dependen-
cies are transitive. The decomposition rule (IR4) says that we can remove attributes
from the right-hand side of a dependency; applying this rule repeatedly can decom-
pose the FD X → {A1, A2, …, An} into the set of dependencies {X → A1, X → A2, …,
X → An}. The union rule (IR5) allows us to do the opposite; we can combine a set of
dependencies {X → A1, X → A2, …, X → An} into the single FD X → {A1, A2, …, An}.
The pseudotransitive rule (IR6) allows us to replace a set of attributes Y on the left
hand side of a dependency with another set X that functionally determines Y, and
can be derived from IR2 and IR3 if we augment the first functional dependency
X → Y with W (the augmentation rule) and then apply the transitive rule.
One cautionary note regarding the use of these rules. Although X → A and X → B
implies X → AB by the union rule stated above, X → A and Y → B does imply that
XY → AB. Also, XY → A does not necessarily imply either X → A or Y → A.
1The reflexive rule can also be stated as X → X; that is, any set of attributes functionally determines itself.
2The augmentation rule can also be stated as X → Y |=XZ → Y; that is, augmenting the left-hand side
attributes of an FD produces another valid FD.
553
Relational Database Design Algorithms and Further Dependencies
Each of the preceding inference rules can be proved from the definition of func-
tional dependency, either by direct proof or by contradiction. A proof by contradic-
tion assumes that the rule does not hold and shows that this is not possible. We now
prove that the first three rules IR1 through IR3 are valid. The second proof is by con-
tradiction.
Proof of IR1. Suppose that X ⊇ Y and that two tuples t1 and t2 exist in some rela-
tion instance r of R such that t1 [X] = t2 [X]. Then t1[Y] = t2[Y] because X ⊇ Y;
hence, X → Y must hold in r.
Proof of IR2 (by contradiction). Assume that X → Y holds in a relation instance
r of R but that XZ → YZ does not hold. Then there must exist two tuples t1 and
t2 in r such that (1) t1 [X] = t2 [X], (2) t1 [Y] = t2 [Y], (3) t1 [XZ] = t2 [XZ], and
(4) t1 [YZ] ≠ t2 [YZ]. This is not possible because from (1) and (3) we deduce
(5) t1 [Z] = t2 [Z], and from (2) and (5) we deduce (6) t1 [YZ] = t2 [YZ], contra-
dicting (4).
Proof of IR3. Assume that (1) X → Y and (2) Y → Z both hold in a relation r.
Then for any two tuples t1 and t2 in r such that t1 [X] = t2 [X], we must have (3)
t1 [Y] = t2 [Y], from assumption (1); hence we must also have (4) t1 [Z] = t2 [Z]
from (3) and assumption (2); thus X → Z must hold in r.
Using similar proof arguments, we can prove the inference rules IR4 to IR6 and any
additional valid inference rules. However, a simpler way to prove that an inference
rule for functional dependencies is valid is to prove it by using inference rules that
have already been shown to be valid. For example, we can prove IR4 through IR6 by
using IR1 through IR3 as follows.
Proof of IR4 (Using IR1 through IR3).
1. X → YZ (given).
2. YZ → Y (using IR1 and knowing that YZ ⊇ Y).
3. X → Y (using IR3 on 1 and 2).
Proof of IR5 (using IR1 through IR3).
1. X →Y (given).
2. X → Z (given).
3. X → XY (using IR2 on 1 by augmenting with X; notice that XX = X).
4. XY → YZ (using IR2 on 2 by augmenting with Y).
5. X → YZ (using IR3 on 3 and 4).
Proof of IR6 (using IR1 through IR3).
1. X → Y (given).
2. WY → Z (given).
3. WX → WY (using IR2 on 1 by augmenting with W).
4. WX → Z (using IR3 on 3 and 2).
It has been shown by Armstrong (1974) that inference rules IR1 through IR3 are
sound and complete. By sound, we mean that given a set of functional dependencies
554
Relational Database Design Algorithms and Further Dependencies
F specified on a relation schema R, any dependency that we can infer from F by
using IR1 through IR3 holds in every relation state r of R that satisfies the dependen-
cies in F. By complete, we mean that using IR1 through IR3 repeatedly to infer
dependencies until no more dependencies can be inferred results in the complete
set of all possible dependencies that can be inferred from F. In other words, the set of
dependencies F+, which we called the closure of F, can be determined from F by
using only inference rules IR1 through IR3. Inference rules IR1 through IR3 are
known as Armstrong’s inference rules.3
Typically, database designers first specify the set of functional dependencies F that
can easily be determined from the semantics of the attributes of R; then IR1, IR2,
and IR3 are used to infer additional functional dependencies that will also hold on
R. A systematic way to determine these additional functional dependencies is first to
determine each set of attributes X that appears as a left-hand side of some func-
tional dependency in F and then to determine the set of all attributes that are
dependent on X.
Definition. For each such set of attributes X, we determine the set X+ of attrib-
utes that are functionally determined by X based on F; X+ is called the closure
of X under F. Algorithm 1 can be used to calculate X+.
Algorithm 1. Determining X+, the Closure of X under F
Input: A set F of FDs on a relation schema R, and a set of attributes X, which is
a subset of R.
X+ := X;
repeat
oldX+ := X+;
for each functional dependency Y → Z in F do
if X+ ⊇ Y then X+ := X+ ∪ Z;
until (X+ = oldX+);
Algorithm 1 starts by setting X+ to all the attributes in X. By IR1, we know that all
these attributes are functionally dependent on X. Using inference rules IR3 and IR4,
we add attributes to X+, using each functional dependency in F. We keep going
through all the dependencies in F (the repeat loop) until no more attributes are
added to X+ during a complete cycle (of the for loop) through the dependencies in F.
For example, consider the relation schema EMP_PROJ; from the semantics of the
attributes, we specify the following set F of functional dependencies that should
hold on EMP_PROJ:
F = {Ssn → Ename,
Pnumber → {Pname, Plocation},
{Ssn, Pnumber} → Hours}
3They are actually known as Armstrong’s axioms. In the strict mathematical sense, the axioms (given
facts) are the functional dependencies in F, since we assume that they are correct, whereas IR1 through
IR3 are the inference rules for inferring new functional dependencies (new facts).
555
Relational Database Design Algorithms and Further Dependencies
Using Algorithm 1, we calculate the following closure sets with respect to F:
{Ssn} + = {Ssn, Ename}
{Pnumber} + = {Pnumber, Pname, Plocation}
{Ssn, Pnumber} + = {Ssn, Pnumber, Ename, Pname, Plocation, Hours}
Intuitively, the set of attributes in the right-hand side in each line above represents
all those attributes that are functionally dependent on the set of attributes in the
left-hand side based on the given set F.
1.2 Equivalence of Sets of Functional Dependencies
In this section we discuss the equivalence of two sets of functional dependencies.
First, we give some preliminary definitions.
Definition. A set of functional dependencies F is said to cover another set of
functional dependencies E if every FD in E is also in F+; that is, if every depen-
dency in E can be inferred from F; alternatively, we can say that E is covered by F.
Definition. Two sets of functional dependencies E and F are equivalent if
E+ = F+. Therefore, equivalence means that every FD in E can be inferred from
F, and every FD in F can be inferred from E; that is, E is equivalent to F if both
the conditions—E covers F and F covers E—hold.
We can determine whether F covers E by calculating X+ with respect to F for each FD
X → Y in E, and then checking whether this X+ includes the attributes in Y. If this is
the case for every FD in E, then F covers E. We determine whether E and F are equiv-
alent by checking that E covers F and F covers E. It is left to the reader as an exercise
to show that the following two sets of FDs are equivalent:
F = {A → C, AC → D, E → AD, E → H}
and G = {A → CD, E → AH}.
1.3 Minimal Sets of Functional Dependencies
Informally, a minimal cover of a set of functional dependencies E is a set of func-
tional dependencies F that satisfies the property that every dependency in E is in the
closure F+ of F. In addition, this property is lost if any dependency from the set F is
removed; F must have no redundancies in it, and the dependencies in F are in a
standard form. To satisfy these properties, we can formally define a set of functional
dependencies F to be minimal if it satisfies the following conditions:
1. Every dependency in F has a single attribute for its right-hand side.
2. We cannot replace any dependency X → A in F with a dependency Y → A,
where Y is a proper subset of X, and still have a set of dependencies that is
equivalent to F.
3. We cannot remove any dependency from F and still have a set of dependen-
cies that is equivalent to F.
We can think of a minimal set of dependencies as being a set of dependencies in a
standard or canonical form and with no redundancies. Condition 1 just represents
556
Relational Database Design Algorithms and Further Dependencies
every dependency in a canonical form with a single attribute on the right-hand
side.4 Conditions 2 and 3 ensure that there are no redundancies in the dependencies
either by having redundant attributes on the left-hand side of a dependency
(Condition 2) or by having a dependency that can be inferred from the remaining
FDs in F (Condition 3).
Definition. A minimal cover of a set of functional dependencies E is a minimal
set of dependencies (in the standard canonical form and without redundancy)
that is equivalent to E. We can always find at least one minimal cover F for any
set of dependencies E using Algorithm 2.
If several sets of FDs qualify as minimal covers of E by the definition above, it is cus-
tomary to use additional criteria for minimality. For example, we can choose the
minimal set with the smallest number of dependencies or with the smallest total
length (the total length of a set of dependencies is calculated by concatenating the
dependencies and treating them as one long character string).
Algorithm 2. Finding a Minimal Cover F for a Set of Functional Dependencies
E
Input: A set of functional dependencies E.
1. Set F := E.
2. Replace each functional dependency X → {A1, A2, …, An} in F by the n func-
tional dependencies X →A1, X →A2, …, X → An.
3. For each functional dependency X → A in F
for each attribute B that is an element of X
if { {F – {X → A} } ∪ { (X – {B} ) → A} } is equivalent to F
then replace X → A with (X – {B} ) → A in F.
4. For each remaining functional dependency X → A in F
if {F – {X → A} } is equivalent to F,
then remove X → A from F.
We illustrate the above algorithm with the following:
Let the given set of FDs be E : {B → A, D → A, AB → D}. We have to find the mini-
mal cover of E.
■ All above dependencies are in canonical form (that is, they have only one
attribute on the right-hand side), so we have completed step 1 of Algorithm
2 and can proceed to step 2. In step 2 we need to determine if AB → D has
any redundant attribute on the left-hand side; that is, can it be replaced by
B → D or A → D?
4This is a standard form to simplify the conditions and algorithms that ensure no redundancy exists in F.
By using the inference rule IR4, we can convert a single dependency with multiple attributes on the right-
hand side into a set of dependencies with single attributes on the right-hand side.
557
Relational Database Design Algorithms and Further Dependencies
■ Since B → A, by augmenting with B on both sides (IR2), we have BB → AB,
or B → AB (i). However, AB → D as given (ii).
■ Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus
AB → D may be replaced by B → D.
■ We now have a set equivalent to original E, say E�: {B → A, D → A, B → D}.
No further reduction is possible in step 2 since all FDs have a single attribute
on the left-hand side.
■ In step 3 we look for a redundant FD in E�. By using the transitive rule on
B → D and D → A, we derive B → A. Hence B → A is redundant in E� and
can be eliminated.
■ Therefore, the minimal cover of E is {B → D, D → A}.
In Section 3 we will see how relations can be synthesized from a given set of depend-
encies E by first finding the minimal cover F for E.
Next, we provide a simple algorithm to determine the key of a relation:
Algorithm 2(a). Finding a Key K for R Given a set F of Functional
Dependencies
Input: A relation R and a set of functional dependencies F on the attributes of
R.
1. Set K := R.
2. For each attribute A in K
{compute (K – A)+ with respect to F;
if (K – A)+ contains all the attributes in R, then set K := K – {A} };
In Algoritm 2(a), we start by setting K to all the attributes of R; we then remove one
attribute at a time and check whether the remaining attributes still form a superkey.
Notice, too, that Algorithm 2(a) determines only one key out of the possible candi-
date keys for R; the key returned depends on the order in which attributes are
removed from R in step 2.
2 Properties of Relational Decompositions
We now turn our attention to the process of decomposition to decompose relations
in order to get rid of unwanted dependencies and achieve higher normal forms. In
Section 2.1 we give examples to show that looking at an individual relation to test
whether it is in a higher normal form does not, on its own, guarantee a good design;
rather, a set of relations that together form the relational database schema must pos-
sess certain additional properties to ensure a good design. In Sections 2.2 and 2.3 we
discuss two of these properties: the dependency preservation property and the non-
additive (or lossless) join property. Section 2.4 discusses binary decompositions and
Section 2.5 discusses successive nonadditive join decompositions.
558
Relational Database Design Algorithms and Further Dependencies
2.1 Relation Decomposition and Insufficiency
of Normal Forms
The relational database design algorithms that we present in Section 3 start from a
single universal relation schema R = {A1, A2, …, An} that includes all the attributes
of the database. We implicitly make the universal relation assumption, which states
that every attribute name is unique. The set F of functional dependencies that
should hold on the attributes of R is specified by the database designers and is made
available to the design algorithms. Using the functional dependencies, the algo-
rithms decompose the universal relation schema R into a set of relation schemas D
= {R1, R2, …, Rm} that will become the relational database schema; D is called a
decomposition of R.
We must make sure that each attribute in R will appear in at least one relation
schema Ri in the decomposition so that no attributes are lost; formally, we have
This is called the attribute preservation condition of a decomposition.
Another goal is to have each individual relation Ri in the decomposition D be in
BCNF or 3NF. However, this condition is not sufficient to guarantee a good data-
base design on its own. We must consider the decomposition of the universal rela-
tion as a whole, in addition to looking at the individual relations. To illustrate this
point, consider the EMP_LOCS(Ename, Plocation) relation in Figure 5 from the chap-
ter “Basics of Functional Dependencies and Normalization for Relational
Databases” which is in 3NF and also in BCNF. In fact, any relation schema with only
two attributes is automatically in BCNF.5 Although EMP_LOCS is in BCNF, it still
gives rise to spurious tuples when joined with EMP_PROJ (Ssn, Pnumber, Hours,
Pname, Plocation), which is not in BCNF (see the result of the natural join in Figure
6 from the same chapter). Hence, EMP_LOCS represents a particularly bad relation
schema because of its convoluted semantics by which Plocation gives the location of
one of the projects on which an employee works. Joining EMP_LOCS with
PROJECT(Pname, Pnumber, Plocation, Dnum) in Figure 2 from the chapter “Basics of
Functional Dependencies and Normalization for Relational Databases”—which is
in BCNF—using Plocation as a joining attribute also gives rise to spurious tuples.
This underscores the need for other criteria that, together with the conditions of
3NF or BCNF, prevent such bad designs. In the next three subsections we discuss
such additional conditions that should hold on a decomposition D as a whole.
2.2 Dependency Preservation Property
of a Decomposition
It would be useful if each functional dependency X→Y specified in F either
appeared directly in one of the relation schemas Ri in the decomposition D or could
be inferred from the dependencies that appear in some Ri. Informally, this is the
dependency preservation condition. We want to preserve the dependencies because
R Ri
i
m
=
=
1
∪
5As an exercise, the reader should prove that this statement is true.
559
Relational Database Design Algorithms and Further Dependencies
each dependency in F represents a constraint on the database. If one of the depen-
dencies is not represented in some individual relation Ri of the decomposition, we
cannot enforce this constraint by dealing with an individual relation. We may have
to join multiple relations so as to include all attributes involved in that dependency.
It is not necessary that the exact dependencies specified in F appear themselves in
individual relations of the decomposition D. It is sufficient that the union of the
dependencies that hold on the individual relations in D be equivalent to F. We now
define these concepts more formally.
Definition. Given a set of dependencies F on R, the projection of F on Ri,
denoted by πRi(F) where Ri is a subset of R, is the set of dependencies X → Y in
F+ such that the attributes in X ∪ Y are all contained in Ri. Hence, the projec-
tion of F on each relation schema Ri in the decomposition D is the set of func-
tional dependencies in F+, the closure of F, such that all their left- and
right-hand-side attributes are in Ri. We say that a decomposition D = {R1,
R2, …, Rm} of R is dependency-preserving with respect to F if the union of the
projections of F on each Ri in D is equivalent to F; that is, ((πR1(F)) ∪ … ∪
(πRm(F)))
+ = F+.
If a decomposition is not dependency-preserving, some dependency is lost in the
decomposition. To check that a lost dependency holds, we must take the JOIN of
two or more relations in the decomposition to get a relation that includes all left-
and right-hand-side attributes of the lost dependency, and then check that the
dependency holds on the result of the JOIN—an option that is not practical.
An example of a decomposition that does not preserve dependencies is shown in
Figure 13 (a) from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases,” in which the functional dependency FD2
is lost when LOTS1A is decomposed into {LOTS1AX, LOTS1AY}. From the same
chapter, the decompositions in Figure 12, however, are dependency-preserving;
similarly, for the example in Figure 14, no matter what decomposition is chosen for
the relation TEACH(Student, Course, Instructor) from the three provided in the text,
one or both of the dependencies originally present are bound to be lost. We state a
claim below related to this property without providing any proof.
Claim 1. It is always possible to find a dependency-preserving decomposition
D with respect to F such that each relation Ri in D is in 3NF.
In Section 3.1, we describe Algorithm 4, which creates a dependency-
preserving decomposition D = {R1, R2, …, Rm} of a universal relation R based on a
set of functional dependencies F, such that each Ri in D is in 3NF.
2.3 Nonadditive (Lossless) Join Property
of a Decomposition
Another property that a decomposition D should possess is the nonadditive join
property, which ensures that no spurious tuples are generated when a NATURAL
JOIN operation is applied to the relations resulting from the decomposition. We
already illustrated this problem in Section 1.4 from the chapter “Basics of
560
Relational Database Design Algorithms and Further Dependencies
Functional Dependencies and Normalization for Relational Databases” with the
example from that chapter in Figures 5 and 6. Because this is a property of a decom-
position of relation schemas, the condition of no spurious tuples should hold on
every legal relation state—that is, every relation state that satisfies the functional
dependencies in F. Hence, the lossless join property is always defined with respect to
a specific set F of dependencies.
Definition. Formally, a decomposition D = {R1, R2, …, Rm} of R has the lossless
(nonadditive) join property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following holds, where * is the
NATURAL JOIN of all the relations in D: *(πR1(r), …, πRm(r)) = r.
The word loss in lossless refers to loss of information, not to loss of tuples. If a decom-
position does not have the lossless join property, we may get additional spurious
tuples after the PROJECT (π) and NATURAL JOIN (*) operations are applied; these
additional tuples represent erroneous or invalid information. We prefer the term
nonadditive join because it describes the situation more accurately. Although the
term lossless join has been popular in the literature, we will henceforth use the term
nonadditive join, which is self-explanatory and unambiguous. The nonadditive join
property ensures that no spurious tuples result after the application of PROJECT and
JOIN operations. We may, however, sometimes use the term lossy design to refer to a
design that represents a loss of information (see example at the end of Algorithm 4).
The decomposition of EMP_PROJ(Ssn, Pnumber, Hours, Ename, Pname, Plocation) in
Figure 3 from the chapter “Basics of Functional Dependencies and Normalization
for Relational Databases” into EMP_LOCS(Ename, Plocation) and EMP_PROJ1(Ssn,
Pnumber, Hours, Pname, Plocation) in Figure 5 (same chapter) obviously does not
have the nonadditive join property, as illustrated by Figure 6 (same chapter). We will
use a general procedure for testing whether any decomposition D of a relation into
n relations is nonadditive with respect to a set of given functional dependencies F in
the relation; it is presented as Algorithm 3 below. It is possible to apply a simpler test
to check if the decomposition is nonadditive for binary decompositions; that test is
described in Section 2.4.
Algorithm 3. Testing for Nonadditive Join Property
Input: A universal relation R, a decomposition D = {R1, R2, …, Rm} of R, and a
set F of functional dependencies.
Note: Explanatory comments are given at the end of some of the steps. They fol-
low the format: (* comment *).
1. Create an initial matrix S with one row i for each relation Ri in D, and one
column j for each attribute Aj in R.
2. Set S(i, j):= bij for all matrix entries. (* each bij is a distinct symbol associated
with indices (i, j) *).
3. For each row i representing relation schema Ri
{for each column j representing attribute Aj
{if (relation Ri includes attribute Aj) then set S(i, j):= aj;};}; (* each aj is a
distinct symbol associated with index ( j) *).
561
Relational Database Design Algorithms and Further Dependencies
4. Repeat the following loop until a complete loop execution results in no
changes to S
{for each functional dependency X → Y in F
{for all rows in S that have the same symbols in the columns corresponding
to attributes in X
{make the symbols in each column that correspond to an attribute in Y
be the same in all these rows as follows: If any of the rows has an a sym-
bol for the column, set the other rows to that same a symbol in the col-
umn. If no a symbol exists for the attribute in any of the rows, choose
one of the b symbols that appears in one of the rows for the attribute
and set the other rows to that same b symbol in the column ;} ; } ;};
5. If a row is made up entirely of a symbols, then the decomposition has the
nonadditive join property; otherwise, it does not.
Given a relation R that is decomposed into a number of relations R1, R2, …, Rm,
Algorithm 3 begins the matrix S that we consider to be some relation state r of R.
Row i in S represents a tuple ti (corresponding to relation Ri) that has a symbols in
the columns that correspond to the attributes of Ri and b symbols in the remaining
columns. The algorithm then transforms the rows of this matrix (during the loop in
step 4) so that they represent tuples that satisfy all the functional dependencies in F.
At the end of step 4, any two rows in S—which represent two tuples in r—that agree
in their values for the left-hand-side attributes X of a functional dependency X → Y
in F will also agree in their values for the right-hand-side attributes Y. It can be
shown that after applying the loop of step 4, if any row in S ends up with all a sym-
bols, then the decomposition D has the nonadditive join property with respect to F.
If, on the other hand, no row ends up being all a symbols, D does not satisfy the
lossless join property. In this case, the relation state r represented by S at the end of
the algorithm will be an example of a relation state r of R that satisfies the depend-
encies in F but does not satisfy the nonadditive join condition. Thus, this relation
serves as a counterexample that proves that D does not have the nonadditive join
property with respect to F. Note that the a and b symbols have no special meaning
at the end of the algorithm.
Figure 1(a) shows how we apply Algorithm 3 to the decomposition of the
EMP_PROJ relation schema from Figure 3(b) from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases” into the two relation
schemas EMP_PROJ1 and EMP_LOCS in Figure 5(a) of the same chapter. The loop
in step 4 of the algorithm cannot change any b symbols to a symbols; hence, the
resulting matrix S does not have a row with all a symbols, and so the decomposition
does not have the non-additive join property.
Figure 1(b) shows another decomposition of EMP_PROJ (into EMP, PROJECT, and
WORKS_ON) that does have the nonadditive join property, and Figure 1(c) shows
how we apply the algorithm to that decomposition. Once a row consists only of a
symbols, we conclude that the decomposition has the nonadditive join property,
and we can stop applying the functional dependencies (step 4 in the algorithm) to
the matrix S.
562
Relational Database Design Algorithms and Further Dependencies
Pnumber
PROJECT(b)
Pname Plocation
Ssn
R1 b11
a1
a2
b22
b13
a3
b14
a4
a5
a5
b16
a6
a1
b21
a2
b22
b13
a3
b14
a4
b15
a5
b16
b26
R2
R1
R2
R3
D = {R1, R2 }
(No changes to matrix after applying functional dependencies)
Ename Pnumber Pname HoursPlocation
Ssn
EMP
(a) R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}
R1 = EMP_LOCS = {Ename, Plocation}
R2 = EMP_PROJ1 = {Ssn, Pnumber, Hours, Pname, Plocation}
(c)
Ename Ssn
WORKS_ON
Pnumber Hours
Ssn
a1 b32 a3 b34 b35 a6
a1
b21
a2
b22
b13
a3
b14
a4
b15
a5
b16
b26
R1
R2
R3 a1 a2b32 b34a3 a4 a5 a6
(Original matrix S at start of algorithm)
Ename Pnumber Pname HoursPlocation
Ssn
(Matrix S after applying the first two functional dependencies;
last row is all “a” symbols so we stop)
Ename Pnumber Pname HoursPlocation
F = {Ssn Ename; Pnumber {Pname, Plocation}; {Ssn, Pnumber} Hours}
D = {R1, R2, R3}R = {Ssn, Ename, Pnumber, Pname, Plocation, Hours}
R1 = EMP = {Ssn, Ename}
R2 = PROJ = {Pnumber, Pname, Plocation}
R3 = WORKS_ON = {Ssn, Pnumber, Hours}
F = {Ssn Ename; Pnumber {Pname, Plocation}; {Ssn, Pnumber} Hours}
b35
Figure 1
Nonadditive join test for n-ary decompositions. (a) Case 1: Decomposition of EMP_PROJ
into EMP_PROJ1 and EMP_LOCS fails test. (b) A decomposition of EMP_PROJ that has
the lossless join property. (c) Case 2: Decomposition of EMP_PROJ into EMP, PROJECT,
and WORKS_ON satisfies test.
563
Relational Database Design Algorithms and Further Dependencies
2.4 Testing Binary Decompositions for the Nonadditive
Join Property
Algorithm 3 allows us to test whether a particular decomposition D into n relations
obeys the nonadditive join property with respect to a set of functional dependencies
F. There is a special case of a decomposition called a binary decomposition—
decomposition of a relation R into two relations. We give an easier test to apply than
Algorithm 3, but while it is very handy to use, it is limited to binary decompositions
only.
Property NJB (Nonadditive Join Test for Binary Decompositions). A
decomposition D = {R1, R2} of R has the lossless (nonadditive) join property
with respect to a set of functional dependencies F on R if and only if either
■ The FD ((R1 ∩ R2) → (R1 – R2)) is in F
+, or
■ The FD ((R1 ∩ R2) → (R2 – R1)) is in F
+
You should verify that this property holds with respect to our informal successive
normalization examples in Sections 3 and 4 from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases.” In Section 5 of the
same chapter we decomposed LOTS1A into two BCNF relations LOTS1AX and
LOTS1AY, and decomposed the TEACH relation in Figure 14 of that chapter into the
two relations {Instructor, Course} and {Instructor, Student}. These are valid decompo-
sitions because they are nonadditive per the above test.
2.5 Successive Nonadditive Join Decompositions
We saw the successive decomposition of relations during the process of second and
third normalization in Sections 3 and 4 from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases.” To verify that these
decompositions are nonadditive, we need to ensure another property, as set forth in
Claim 2.
Claim 2 (Preservation of Nonadditivity in Successive Decompositions). If a
decomposition D = {R1, R2, …, Rm} of R has the nonadditive (lossless) join
property with respect to a set of functional dependencies F on R, and if a
decomposition Di = {Q1, Q2, …, Qk} of Ri has the nonadditive join property
with respect to the projection of F on Ri, then the decomposition D2 = {R1, R2,
…, Ri−1, Q1, Q2, …, Qk, Ri+1, …, Rm} of R has the nonadditive join property with
respect to F.
3 Algorithms for Relational Database Schema
Design
We now give three algorithms for creating a relational decomposition from a uni-
versal relation. Each algorithm has specific properties, as we discuss next.
564
Relational Database Design Algorithms and Further Dependencies
3.1 Dependency-Preserving Decomposition
into 3NF Schemas
Algorithm 4 creates a dependency-preserving decomposition D = {R1, R2, …, Rm} of
a universal relation R based on a set of functional dependencies F, such that each Ri
in D is in 3NF. It guarantees only the dependency-preserving property; it does not
guarantee the nonadditive join property. The first step of Algorithm 4 is to find a
minimal cover G for F; Algorithm 2 can be used for this step. Note that multiple
minimal covers may exist for a given set F (as we illustrate later in the example after
Algorithm 4). In such cases the algorithms can potentially yield multiple alternative
designs.
Algorithm 4. Relational Synthesis into 3NF with Dependency Preservation
Input: A universal relation R and a set of functional dependencies F on the
attributes of R.
1. Find a minimal cover G for F (use Algorithm 2);
2. For each left-hand-side X of a functional dependency that appears in G, cre-
ate a relation schema in D with attributes {X ∪ {A1} ∪ {A2} … ∪ {Ak} },
where X → A1, X → A2, …, X → Ak are the only dependencies in G with X as
the left-hand-side (X is the key of this relation);
3. Place any remaining attributes (that have not been placed in any relation) in
a single relation schema to ensure the attribute preservation property.
Example of Algorithm 4. Consider the following universal relation:
U(Emp_ssn, Pno, Esal, Ephone, Dno, Pname, Plocation)
Emp_ssn, Esal, Ephone refer to the Social Security number, salary, and phone number
of the employee. Pno, Pname, and Plocation refer to the number, name, and location
of the project. Dno is department number.
The following dependencies are present:
FD1: Emp_ssn → {Esal, Ephone, Dno}
FD2: Pno → { Pname, Plocation}
FD3: Emp_ssn, Pno → {Esal, Ephone, Dno, Pname, Plocation}
By virtue of FD3, the attribute set {Emp_ssn, Pno} represents a key of the universal
relation. Hence F, the set of given FDs includes {Emp_ssn → Esal, Ephone, Dno;
Pno → Pname, Plocation; Emp_ssn, Pno → Esal, Ephone, Dno, Pname, Plocation}.
By applying the minimal cover Algorithm 2, in step 3 we see that Pno is a redundant
attribute in Emp_ssn, Pno → Esal, Ephone, Dno. Moreover, Emp_ssn is redundant in
Emp_ssn, Pno → Pname, Plocation. Hence the minimal cover consists of FD1 and FD2
only (FD3 being completely redundant) as follows (if we group attributes with the
same left-hand side into one FD):
Minimal cover G: {Emp_ssn → Esal, Ephone, Dno; Pno → Pname, Plocation}
565
Relational Database Design Algorithms and Further Dependencies
6See Maier (1983) or Ullman (1982) for a proof.
By applying Algorithm 4 to the above Minimal cover G, we get a 3NF design consist-
ing of two relations with keys Emp_ssn and Pno as follows:
R1 (Emp_ssn, Esal, Ephone, Dno)
R2 (Pno, Pname, Plocation)
An observant reader would notice easily that these two relations have lost the original
information contained in the key of the universal relation U (namely, that there are
certain employees working on certain projects in a many-to-many relationship).
Thus, while the algorithm does preserve the original dependencies, it makes no guar-
antee of preserving all of the information. Hence, the resulting design is a lossy design.
Claim 3. Every relation schema created by Algorithm 4 is in 3NF. (We will not
provide a formal proof here;6 the proof depends on G being a minimal set of
dependencies.)
It is obvious that all the dependencies in G are preserved by the algorithm because
each dependency appears in one of the relations Ri in the decomposition D. Since G
is equivalent to F, all the dependencies in F are either preserved directly in the
decomposition or are derivable using the inference rules from Section 1.1 from
those in the resulting relations, thus ensuring the dependency preservation prop-
erty. Algorithm 4 is called a relational synthesis algorithm, because each relation
schema Ri in the decomposition is synthesized (constructed) from the set of func-
tional dependencies in G with the same left-hand-side X.
3.2 Nonadditive Join Decomposition into BCNF Schemas
The next algorithm decomposes a universal relation schema R = {A1, A2, …, An} into
a decomposition D = {R1, R2, …, Rm} such that each Ri is in BCNF and the decom-
position D has the lossless join property with respect to F. Algorithm 5 utilizes
Property NJB and Claim 2 (preservation of nonadditivity in successive decomposi-
tions) to create a nonadditive join decomposition D = {R1, R2, …, Rm} of a universal
relation R based on a set of functional dependencies F, such that each Ri in D is in
BCNF.
Algorithm 5. Relational Decomposition into BCNF with Nonadditive
Join Property
Input: A universal relation R and a set of functional dependencies F on the
attributes of R.
1. Set D := {R} ;
2. While there is a relation schema Q in D that is not in BCNF do
{
choose a relation schema Q in D that is not in BCNF;
find a functional dependency X → Y in Q that violates BCNF;
replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);
} ;
566
Relational Database Design Algorithms and Further Dependencies
Each time through the loop in Algorithm 5, we decompose one relation schema Q
that is not in BCNF into two relation schemas. According to Property NJB for
binary decompositions and Claim 2, the decomposition D has the nonadditive join
property. At the end of the algorithm, all relation schemas in D will be in BCNF. The
reader can check that the normalization example in Figures 12 and 13 from the
chapter “Basics of Functional Dependencies and Normalization for Relational
Databases” basically follows this algorithm. The functional dependencies FD3, FD4,
and later FD5 violate BCNF, so the LOTS relation is decomposed appropriately into
BCNF relations, and the decomposition then satisfies the nonadditive join property.
Similarly, if we apply the algorithm to the TEACH relation schema from Figure 14
that same chapter, it is decomposed into TEACH1(Instructor, Student) and
TEACH2(Instructor, Course) because the dependency FD2 Instructor → Course vio-
lates BCNF.
In step 2 of Algorithm 5, it is necessary to determine whether a relation schema Q is
in BCNF or not. One method for doing this is to test, for each functional depend-
ency X → Y in Q, whether X+ fails to include all the attributes in Q, thereby deter-
mining whether or not X is a (super)key in Q. Another technique is based on an
observation that whenever a relation schema Q has a BCNF violation, there exists a
pair of attributes A and B in Q such that {Q – {A, B} } → A; by computing the clo-
sure {Q – {A, B} }+ for each pair of attributes {A, B} of Q, and checking whether the
closure includes A (or B), we can determine whether Q is in BCNF.
3.3 Dependency-Preserving and Nonadditive (Lossless)
Join Decomposition into 3NF Schemas
So far, in Algorithm 4 we showed how to achieve a 3NF design with the potential for
loss of information and in Algorithm 5 we showed how to achieve BCNF design
with the potential loss of certain functional dependencies. By now we know that it is
not possible to have all three of the following: (1) guaranteed nonlossy design, (2)
guaranteed dependency preservation, and (3) all relations in BCNF. As we have said
before, the first condition is a must and cannot be compromised. The second condi-
tion is desirable, but not a must, and may have to be relaxed if we insist on achiev-
ing BCNF. Now we give an alternative algorithm where we achieve conditions 1 and
2 and only guarantee 3NF. A simple modification to Algorithm 4, shown as
Algorithm 6, yields a decomposition D of R that does the following:
■ Preserves dependencies
■ Has the nonadditive join property
■ Is such that each resulting relation schema in the decomposition is in 3NF
Because the Algorithm 6 achieves both the desirable properties, rather than only
functional dependency preservation as guaranteed by Algorithm 4, it is preferred
over Algorithm 4.
Algorithm 6. Relational Synthesis into 3NF with Dependency Preservation
and Nonadditive Join Property
Input: A universal relation R and a set of functional dependencies F on the
attributes of R.
567
Relational Database Design Algorithms and Further Dependencies
7Step 3 of Algorithm 4 is not needed in Algorithm 6 to preserve attributes because the key will include
any unplaced attributes; these are the attributes that do not participate in any functional depen-dency.
8Note that there is an additional type of dependency: R is a projection of the join of two or more relations
in the schema. This type of redundancy is considered join dependency. Hence, technically, it may con-
tinue to exist without disturbing the 3NF status for the schema.
1. Find a minimal cover G for F (use Algorithm 2).
2. For each left-hand-side X of a functional dependency that appears in G, cre-
ate a relation schema in D with attributes {X ∪ {A1} ∪ {A2} … ∪ {Ak} },
where X → A1, X → A2, …, X → Ak are the only dependencies in G with X as
left-hand-side (X is the key of this relation).
3. If none of the relation schemas in D contains a key of R, then create one
more relation schema in D that contains attributes that form a key of R.7
(Algorithm 2(a) may be used to find a key.)
4. Eliminate redundant relations from the resulting set of relations in the rela-
tional database schema. A relation R is considered redundant if R is a projec-
tion of another relation S in the schema; alternately, R is subsumed by S.8
Step 3 of Algorithm 6 involves identifying a key K of R. Algorithm 2(a) can be used
to identify a key K of R based on the set of given functional dependencies F. Notice
that the set of functional dependencies used to determine a key in Algorithm 2(a)
could be either F or G, since they are equivalent.
Example 1 of Algorithm 6. Let us revisit the example given earlier at the end of
Algorithm 4. The minimal cover G holds as before. The second step produces rela-
tions R1 and R2 as before. However, now in step 3, we will generate a relation corre-
sponding to the key {Emp_ssn, Pno}. Hence, the resulting design contains:
R1 (Emp_ssn , Esal, Ephone, Dno)
R2 (Pno, Pname, Plocation)
R3 (Emp_ssn, Pno)
This design achieves both the desirable properties of dependency preservation and
nonadditive join.
Example 2 of Algorithm 6 (Case X ). Consider the relation schema LOTS1A
shown in Figure 13(a) from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases.” Assume that this relation is given as a uni-
versal relation with the following functional dependencies:
FD1: Property_id → Lot#, County, Area
FD2: Lot#, County → Area, Property_id
FD3: Area → County
These were called FD1, FD2, and FD5 in Figure 13(a) of that chapter. For ease of ref-
568
Relational Database Design Algorithms and Further Dependencies
erence, let us abbreviate the above attributes with the first letter for each and repre-
sent the functional dependencies as the set
F : { P → LCA, LC → AP, A → C }.
If we apply the minimal cover Algorithm 2 to F, (in step 2) we first represent the set
F as
F : {P → L, P → C, P → A, LC → A, LC → P, A → C}.
In the set F, P → A can be inferred from P → LC and LC → A; hence P → A by tran-
sitivity and is therefore redundant. Thus, one possible minimal cover is
Minimal cover GX: {P → LC, LC → AP, A → C }.
In step 2 of Algorithm 6 we produce design X (before removing redundant rela-
tions) using the above minimal cover as
Design X: R1 (P, L, C), R2 (L, C, A, P), and R3 (A, C).
In step 4 of the algorithm, we find that R3 is subsumed by R2 (that is, R3 is always a
projection of R2 and R1 is a projection of R2 as well. Hence both of those relations
are redundant. Thus the 3NF schema that achieves both of the desirable properties
is (after removing redundant relations)
Design X: R2 (L, C, A, P).
or, in other words it is identical to the relation LOTS1A (Lot#, County, Area,
Property_id) that were determined to be in 3NF.
Example 2 of Algorithm 6 (Case Y ). Starting with LOTS1A as the universal rela-
tion and with the same given set of functional dependencies, the second step of the
minimal cover Algorithm 2 produces, as before
F: {P → C, P → A, P → L, LC → A, LC → P, A → C}.
The FD LC → A may be considered redundant because LC → P and P → A implies
LC → A by transitivity. Also, P → C may be considered to be redundant because P →
A and A → C implies P → C by transitivity. This gives a different minimal cover as
Minimal cover GY: { P → LA, LC → P, A → C }.
The alternative design Y produced by the algorithm now is
Design Y: S1 (P, A, L), S2 (L, C, P), and S3 (A, C).
Note that this design has three 3NF relations, none of which can be considered as
redundant by the condition in step 4. All FDs in the original set F are preserved. The
reader will notice that out of the above three relations, relations S1 and S3 were pro-
duced as the BCNF design (implying that S2 is redundant in the presence of S1 and
S3). However, we cannot eliminate relation S2 from the set of three 3NF relations
above since it is not a projection of either S1 or S3. Design Y therefore remains as one
possible final result of applying Algorithm 6 to the given universal relation that pro-
vides relations in 3NF.
It is important to note that the theory of nonadditive join decompositions is based
on the assumption that no NULL values are allowed for the join attributes. The next
569
Relational Database Design Algorithms and Further Dependencies
section discusses some of the problems that NULLs may cause in relational decom-
positions and provides a general discussion of the algorithms for relational design
by synthesis presented in this section.
4 About Nulls, Dangling Tuples,
and Alternative Relational Designs
In this section we will discuss a few general issues related to problems that arise
when relational design is not approached properly.
4.1 Problems with NULL Values and Dangling Tuples
We must carefully consider the problems associated with NULLs when designing a
relational database schema. There is no fully satisfactory relational design theory as
yet that includes NULL values. One problem occurs when some tuples have NULL
values for attributes that will be used to join individual relations in the decomposi-
tion. To illustrate this, consider the database shown in Figure 2(a), where two rela-
tions EMPLOYEE and DEPARTMENT are shown. The last two employee
tuples—‘Berger’ and ‘Benitez’—represent newly hired employees who have not yet
been assigned to a department (assume that this does not violate any integrity con-
straints). Now suppose that we want to retrieve a list of (Ename, Dname) values for
all the employees. If we apply the NATURAL JOIN operation on EMPLOYEE and
DEPARTMENT (Figure 2(b)), the two aforementioned tuples will not appear in the
result. The OUTER JOIN operation can deal with this problem. Recall that if we take
the LEFT OUTER JOIN of EMPLOYEE with DEPARTMENT, tuples in EMPLOYEE that
have NULL for the join attribute will still appear in the result, joined with an
imaginary tuple in DEPARTMENT that has NULLs for all its attribute values. Figure
2(c) shows the result.
In general, whenever a relational database schema is designed in which two or more
relations are interrelated via foreign keys, particular care must be devoted to watch-
ing for potential NULL values in foreign keys. This can cause unexpected loss of
information in queries that involve joins on that foreign key. Moreover, if NULLs
occur in other attributes, such as Salary, their effect on built-in functions such as
SUM and AVERAGE must be carefully evaluated.
A related problem is that of dangling tuples, which may occur if we carry a decom-
position too far. Suppose that we decompose the EMPLOYEE relation in Figure 2(a)
further into EMPLOYEE_1 and EMPLOYEE_2, shown in Figure 3(a) and 3(b).9 If we
apply the NATURAL JOIN operation to EMPLOYEE_1 and EMPLOYEE_2, we get the
original EMPLOYEE relation. However, we may use the alternative representation,
shown in Figure 3(c), where we do not include a tuple in EMPLOYEE_3 if the
9This sometimes happens when we apply vertical fragmentation to a relation in the context of a distrib-
uted database.
570
Relational Database Design Algorithms and Further Dependencies
(b)
Ename
EMPLOYEE
(a)
Ssn Bdate Address Dnum
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
987987987
888665555
1969-03-29
1937-11-10
980 Dallas, Houston, TX
450 Stone, Houston, TX
123456789
333445555
999887777
987654321
666884444
453453453
1965-01-09
1955-12-08
1968-07-19
1941-06-20
1962-09-15
1972-07-31
731 Fondren, Houston, TX
638 Voss, Houston, TX
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
5
5
4
4
5
4
1
Berger, Anders C. 999775555 1965-04-26 6530 Braes, Bellaire, TX NULL
Benitez, Carlos M. 888664444 1963-01-09 7654 Beech, Houston, TX NULL
5
Dname
DEPARTMENT
Dnum Dmgr_ssn
Research
Administration
Headquarters
5
4
1
333445555
987654321
888665555
Ename
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
999887777
123456789
333445555
453453453
987654321
666884444
987987987
888665555 1937-11-10
Ssn
1968-07-19
1965-01-09
1955-12-08
1972-07-31
1969-03-29
1941-06-20
1962-09-15
Bdate
3321 Castle, Spring, TX
731 Fondren, Houston, TX 5
638 Voss, Houston, TX
5631 Rice, Houston, TX
980 Dallas, Houston, TX
450 Stone, Houston, TX
291 Berry, Bellaire, TX
975 Fire Oak, Humble, TX
Address
4
5
5
4
1
4
5
Administration
Research
Research
Research
Administration
Headquarters
Administration
Research
987654321
333445555
333445555
333445555
987654321
888665555
987654321
333445555
Dnum Dname Dmgr_ssn
(c)
Ename
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
999887777
123456789
333445555
453453453
987654321
666884444
987987987
888665555 1937-11-10
1968-07-19
1965-01-09
1955-12-08
1972-07-31
1969-03-29
1941-06-20
1962-09-15
Bdate
3321 Castle, Spring, TX
731 Fondren, Houston, TX 5
638 Voss, Houston, TX
5631 Rice, Houston, TX
980 Dallas, Houston, TX
450 Stone, Houston, TX
291 Berry, Bellaire, TX
975 Fire Oak, Humble, TX
Address
4
5
5
4
1
4
5
Administration
Research
Research
Research
Administration
Headquarters
Administration
Research
987654321
333445555
333445555
333445555
987654321
888665555
Berger, Anders C.
Benitez, Carlos M.
999775555
888665555 1963-01-09
1965-04-26 6530 Braes, Bellaire, TX
7654 Beech, Houston, TX
NULL
NULL
NULL
NULL
NULL
NULL
987654321
333445555
Dnum Dname Dmgr_ssnSsn
Figure 2
Issues with NULL-value
joins. (a) Some
EMPLOYEE tuples have
NULL for the join attrib-
ute Dnum. (b) Result of
applying NATURAL JOIN
to the EMPLOYEE and
DEPARTMENT relations.
(c) Result of applying
LEFT OUTER JOIN to
EMPLOYEE and
DEPARTMENT.
571
Ename
EMPLOYEE_1(a)
(b)
Ssn Bdate Address
Smith, John B.
Wong, Franklin T.
Zelaya, Alicia J.
Wallace, Jennifer S.
Narayan, Ramesh K.
English, Joyce A.
Jabbar, Ahmad V.
Borg, James E.
987987987
888665555
1969-03-29
1937-11-10
980 Dallas, Houston, TX
450 Stone, Houston, TX
123456789
333445555
999887777
987654321
666884444
453453453
1965-01-09
1955-12-08
1968-07-19
1941-06-20
1962-09-15
1972-07-31
731 Fondren, Houston, TX
638 Voss, Houston, TX
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
Berger, Anders C.
Benitez, Carlos M.
999775555
888665555
1965-04-26
1963-01-09
6530 Braes, Bellaire, TX
7654 Beech, Houston, TX
EMPLOYEE_2
Ssn
123456789
333445555
999887777
987654321
666884444
453453453
987987987
888665555
999775555
888664444
4
5
5
5
4
5
NULL
4
1
NULL
Dnum
(c) EMPLOYEE_3
Ssn
123456789
333445555
999887777
987654321
666884444
453453453
987987987
888665555
4
5
5
5
4
5
4
1
Dnum
Relational Database Design Algorithms and Further Dependencies
Figure 3
The dangling tuple problem.
(a) The relation EMPLOYEE_1 (includes
all attributes of EMPLOYEE from
Figure 2(a) except Dnum).
(b) The relation EMPLOYEE_2 (includes
Dnum attribute with NULL values).
(c) The relation EMPLOYEE_3 (includes
Dnum attribute but does not include
tuples for which Dnum has NULL val-
ues).
employee has not been assigned a department (instead of including a tuple with
NULL for Dnum as in EMPLOYEE_2). If we use EMPLOYEE_3 instead of EMPLOYEE_2
and apply a NATURAL JOIN on EMPLOYEE_1 and EMPLOYEE_3, the tuples for
Berger and Benitez will not appear in the result; these are called dangling tuples in
EMPLOYEE_1 because they are represented in only one of the two relations that rep-
resent employees, and hence are lost if we apply an (INNER) JOIN operation.
4.2 Discussion of Normalization Algorithms
and Alternative Relational Designs
One of the problems with the normalization algorithms we described is that the
database designer must first specify all the relevant functional dependencies among
the database attributes. This is not a simple task for a large database with hundreds
572
Relational Database Design Algorithms and Further Dependencies
of attributes. Failure to specify one or two important dependencies may result in an
undesirable design. Another problem is that these algorithms are not deterministic
in general. For example, the synthesis algorithms (Algorithms 4 and 6) require the
specification of a minimal cover G for the set of functional dependencies F. Because
there may be in general many minimal covers corresponding to F, as we illustrated
in Example 2 of Algorithm 6 above, the algorithm can give different designs
depending on the particular minimal cover used. Some of these designs may not be
desirable. The decomposition algorithm to achieve BCNF (Algorithm 5) depends
on the order in which the functional dependencies are supplied to the algorithm to
check for BCNF violation. Again, it is possible that many different designs may arise
corresponding to the same set of functional dependencies, depending on the order
in which such dependencies are considered for violation of BCNF. Some of the
designs may be preferred, whereas others may be undesirable.
It is not always possible to find a decomposition into relation schemas that pre-
serves dependencies and allows each relation schema in the decomposition to be in
BCNF (instead of 3NF as in Algorithm 6). We can check the 3NF relation schemas
in the decomposition individually to see whether each satisfies BCNF. If some rela-
tion schema Ri is not in BCNF, we can choose to decompose it further or to leave it
as it is in 3NF (with some possible update anomalies).
To illustrate the above points, let us revisit the LOTS1A relation in Figure 13(a) from
the chapter “Basics of Functional Dependencies and Normalization for Relational
Databases.” It is a relation in 3NF, which is not in BCNF as was shown in Section 5
of that chapter. We also showed that starting with the functional dependencies
(FD1, FD2, and FD5 in Figure 13(a) same chapter), using the bottom-up approach
to design and applying Algorithm 6, it is possible to either come up with the LOTS1A
relation as the 3NF design (which was called design X previously), or an alternate
design Y which consists of three relations S1, S2, S3 (design Y), each of which is a
3NF relation. Note that if we test design Y further for BCNF, each of S1, S2, and S3
turn out to be individually in BCNF. The design X, however, when tested for BCNF,
fails the test. It yields the two relations S1 and S3 by applying Algorithm 5 (because
of the violating functional dependency A → C). Thus, the bottom-up design proce-
dure of applying Algorithm 6 to design 3NF relations to achieve both properties and
then applying Algorithm 5 to achieve BCNF with the nonadditive join property
(and sacrificing functional dependency preservation) yields S1, S2, S3 as the final
BCNF design by one route (Y design route) and S1, S3 by the other route (X design
route). This happens due to the multiple minimal covers for the original set of func-
tional dependencies. Note that S2 is a redundant relation in the Y design; however, it
does not violate the nonadditive join constraint. It is easy to see that S2 is a valid and
meaningful relation that has the two candidate keys (L, C), and P placed side-by-
side.
Table 1 summarizes the properties of the algorithms discussed in this chapter so far.
573
Relational Database Design Algorithms and Further Dependencies
Table 1 Summary of the Algorithms Discussed in This Chapter
Algorithm Input Output Properties/Purpose Remarks
1 An attribute or a set
of attributes X, and a
set of FDs F
A set of attrbutes in
the closure of X with
respect to F
Determine all the
attributes that can be
functionally deter-
mined from X
The closure of a key
is the entire relation
2 A set of functional
dependencies F
The minimal cover
of functional
dependencies
To determine the
minimal cover of a
set of dependencies F
Multiple minimal
covers may exist—
depends on the order
of selecting function-
al dependencies
2a Relation schema R
with a set of func-
tional dependencies
F
Key K of R To find a key K
(that is a subset of R)
The entire relation R
is always a default
superkey
3 A decomposition D
of R and a set F of
functional depen-
dencies
Boolean result: yes or
no for nonadditive
join property
Testing for nonaddi-
tive join decomposi-
tion
See a simpler test
NJB in Section 2.4
for binary decompo-
sitions
4 A relation R and a
set of functional
dependencies F
A set of relations in
3NF
Dependency preser-
vation
No guarantee of sat-
isfying lossless join
property
5 A relation R and a
set of functional
dependencies F
A set of relations in
BCNF
Nonadditive join
decomposition
No guarantee of
dependency preser-
vation
6 A relation R and a
set of functional
dependencies F
A set of relations in
3NF
Nonadditive join
and dependency-
preserving decompo-
sition
May not achieve
BCNF, but achieves
all desirable proper-
ties and 3NF
7 A relation R and a
set of functional and
multivalued depen-
dencies
A set of relations in
4NF
Nonadditive join
decomposition
No guarantee of
dependency preser-
vation
5 Discussion of Multivalued Dependencies and
4NF
Now we visit MVDs (Multivalued Dependencies) to state the rules of inference on
MVDs.
574
5.1 Inference Rules for Functional
and Multivalued Dependencies
As with functional dependencies (FDs), inference rules for multivalued dependen-
cies (MVDs) have been developed. It is better, though, to develop a unified frame-
work that includes both FDs and MVDs so that both types of constraints can be
considered together. The following inference rules IR1 through IR8 form a sound
and complete set for inferring functional and multivalued dependencies from a
given set of dependencies. Assume that all attributes are included in a universal rela-
tion schema R = {A1, A2, …, An} and that X, Y, Z, and W are subsets of R.
IR1 (reflexive rule for FDs): If X ⊇ Y, then X → Y.
IR2 (augmentation rule for FDs): {X → Y} |= XZ → YZ.
IR3 (transitive rule for FDs): {X → Y, Y → Z} |= X → Z.
IR4 (complementation rule for MVDs): {X →→ Y} |= {X →→ (R – (X ∪ Y))}.
IR5 (augmentation rule for MVDs): If X →→ Y and W ⊇ Z, then WX →→ YZ.
IR6 (transitive rule for MVDs): {X →→ Y, Y →→ Z} |= X →→ (Z – Y).
IR7 (replication rule for FD to MVD): {X → Y} |= X →→ Y.
IR8 (coalescence rule for FDs and MVDs): If X →→ Y and there exists W with the
properties that (a) W ∩ Y is empty, (b) W → Z, and (c) Y ⊇ Z, then X → Z.
IR1 through IR3 are Armstrong’s inference rules for FDs alone. IR4 through IR6 are
inference rules pertaining to MVDs only. IR7 and IR8 relate FDs and MVDs. In par-
ticular, IR7 says that a functional dependency is a special case of a multivalued
dependency; that is, every FD is also an MVD because it satisfies the formal defini-
tion of an MVD. However, this equivalence has a catch: An FD X → Y is an MVD
X →→ Y with the additional implicit restriction that at most one value of Y is associ-
ated with each value of X.10 Given a set F of functional and multivalued dependen-
cies specified on R = {A1, A2, …, An}, we can use IR1 through IR8 to infer the
(complete) set of all dependencies (functional or multivalued) F+ that will hold in
every relation state r of R that satisfies F. We again call F+ the closure of F.
5.2 Fourth Normal Form Revisited
The definition of fourth normal form (4NF) is:
Definition. A relation schema R is in 4NF with respect to a set of dependen-
cies F (that includes functional dependencies and multivalued dependencies)
if, for every nontrivial multivalued dependency X →→ Y in F+, X is a superkey
for R.
Relational Database Design Algorithms and Further Dependencies
10That is, the set of values of Y determined by a value of X is restricted to being a singleton set with only
one value. Hence, in practice, we never view an FD as an MVD.
575
(a) EMP
Ename
Smith
Smith
Smith
Smith
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
Brown
John
Anna
Anna
John
Jim
Jim
Jim
Jim
Joan
Joan
Joan
Joan
Bob
Bob
Bob
Bob
X
Y
X
Y
Y
Z
W
X
Y
Z
W
X
Y
Z
W
X
Pname Dname
(b) EMP_PROJECTS
Ename
Smith
Smith
Brown
Brown
Brown
Brown
W
X
Y
Z
X
Y
Pname
EMP_DEPENDENTS
Ename
Smith
Smith
Brown
Brown
Brown
Jim
Joan
Bob
Anna
John
Dname
Figure 4
Decomposing a relation state of EMP that is not in 4NF. (a) EMP relation with
additional tuples. (b) Two corresponding 4NF relations EMP_PROJECTS and
EMP_DEPENDENTS.
Relational Database Design Algorithms and Further Dependencies
To illustrate the importance of 4NF, Figure 4(a) shows the EMP relation with an
additional employee, ‘Brown’, who has three dependents (‘Jim’, ‘Joan’, and ‘Bob’) and
works on four different projects (‘W’, ‘X’, ‘Y’, and ‘Z’). There are 16 tuples in EMP in
Figure 4(a). If we decompose EMP into EMP_PROJECTS and EMP_DEPENDENTS,
as shown in Figure 4(b), we need to store a total of only 11 tuples in both relations.
Not only would the decomposition save on storage, but the update anomalies asso-
ciated with multivalued dependencies would also be avoided. For example, if
‘Brown’ starts working on a new additional project ‘P,’ we must insert three tuples in
EMP—one for each dependent. If we forget to insert any one of those, the relation
violates the MVD and becomes inconsistent in that it incorrectly implies a relation-
ship between project and dependent.
If the relation has nontrivial MVDs, then insert, delete, and update operations on
single tuples may cause additional tuples to be modified besides the one in question.
If the update is handled incorrectly, the meaning of the relation may change.
However, after normalization into 4NF, these update anomalies disappear. For
576
example, to add the information that ‘Brown’ will be assigned to project ‘P’, only a
single tuple need be inserted in the 4NF relation EMP_PROJECTS.
The EMP relation in Figure 15(a) from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases” is not in 4NF because it
represents two independent 1:N relationships—one between employees and the
projects they work on and the other between employees and their dependents. We
sometimes have a relationship among three entities that depends on all three partic-
ipating entities, such as the SUPPLY relation shown in Figure 15(c) from the same
chapter. (Consider only the tuples in Figure 5(c) of that chapter above the dashed
line for now.) In this case a tuple represents a supplier supplying a specific part to a
particular project, so there are no nontrivial MVDs. Hence, the SUPPLY all-key rela-
tion is already in 4NF and should not be decomposed.
5.3 Nonadditive Join Decomposition into 4NF Relations
Whenever we decompose a relation schema R into R1 = (X ∪ Y) and R2 = (R – Y)
based on an MVD X →→ Y that holds in R, the decomposition has the nonadditive
join property. It can be shown that this is a necessary and sufficient condition for
decomposing a schema into two schemas that have the nonadditive join property, as
given by Property NJB� that is a further generalization of Property NJB given earlier.
Property NJB dealt with FDs only, whereas NJB� deals with both FDs and MVDs
(recall that an FD is also an MVD).
Property NJB�. The relation schemas R1 and R2 form a nonadditive join
decomposition of R with respect to a set F of functional and multivalued
dependencies if and only if
(R1 ∩ R2) →→ (R1 – R2)
or, by symmetry, if and only if
(R1 ∩ R2) →→ (R2 – R1).
We can use a slight modification of Algorithm 5 to develop Algorithm 7, which cre-
ates a nonadditive join decomposition into relation schemas that are in 4NF (rather
than in BCNF). As with Algorithm 5, Algorithm 7 does not necessarily produce a
decomposition that preserves FDs.
Algorithm 7. Relational Decomposition into 4NF Relations with Nonadditive
Join Property
Input: A universal relation R and a set of functional and multivalued depend-
encies F.
1. Set D:= { R };
2. While there is a relation schema Q in D that is not in 4NF, do
{ choose a relation schema Q in D that is not in 4NF;
find a nontrivial MVD X →→ Y in Q that violates 4NF;
replace Q in D by two relation schemas (Q – Y) and (X ∪ Y);
};
Relational Database Design Algorithms and Further Dependencies
577
Relational Database Design Algorithms and Further Dependencies
6 Other Dependencies and Normal Forms
Another type of dependency is called join dependency (JD). It arises when a rela-
tion is decomposable into a set of projected relations that can be joined back to
yield the original relation. After defining JD, we define the fifth normal form based
on it. In the present section we will introduce some other types of dependencies.
6.1 Inclusion Dependencies
Inclusion dependencies were defined in order to formalize two types of interrela-
tional constraints:
■ The foreign key (or referential integrity) constraint cannot be specified as a
functional or multivalued dependency because it relates attributes across
relations.
■ The constraint between two relations that represent a class/subclass relation-
ship also has no formal definition in terms of the functional, multivalued,
and join dependencies.
Definition. An inclusion dependency R.X < S.Y between two sets of attrib-
utes—X of relation schema R, and Y of relation schema S—specifies the con-
straint that, at any specific time when r is a relation state of R and s a relation
state of S, we must have
πX(r(R)) ⊆ πY(s(S))
The ⊆ (subset) relationship does not necessarily have to be a proper subset.
Obviously, the sets of attributes on which the inclusion dependency is specified—X
of R and Y of S—must have the same number of attributes. In addition, the
domains for each pair of corresponding attributes should be compatible. For exam-
ple, if X = {A1, A2, ..., An} and Y = {B1, B2, ..., Bn}, one possible correspondence is to
have dom(Ai) compatible with dom(Bi) for 1 ≤ i ≤ n. In this case, we say that Ai
corresponds to Bi.
For example, we can specify the following inclusion dependencies on the relational
schema in Figure 1 from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases”:
DEPARTMENT.Dmgr_ssn < EMPLOYEE.Ssn
WORKS_ON.Ssn < EMPLOYEE.Ssn
EMPLOYEE.Dnumber < DEPARTMENT.Dnumber
PROJECT.Dnum < DEPARTMENT.Dnumber
WORKS_ON.Pnumber < PROJECT.Pnumber
DEPT_LOCATIONS.Dnumber < DEPARTMENT.Dnumber
All the preceding inclusion dependencies represent referential integrity
constraints. We can also use inclusion dependencies to represent class/subclass
578
relationships. For example, in the relational schema of Figure A.1 (in Appendix:
Figures of the end of this chapter), we can specify the following inclusion depend-
encies:
EMPLOYEE.Ssn < PERSON.Ssn
ALUMNUS.Ssn < PERSON.Ssn
STUDENT.Ssn < PERSON.Ssn
As with other types of dependencies, there are inclusion dependency inference rules
(IDIRs). The following are three examples:
IDIR1 (reflexivity): R.X < R.X.
IDIR2 (attribute correspondence): If R.X < S.Y, where X = {A1, A2, ..., An} and
Y = {B1, B2, ..., Bn} and Ai corresponds to Bi, then R.Ai < S.Bi for 1 ≤ i ≤ n.
IDIR3 (transitivity): If R.X < S.Y and S.Y < T.Z, then R.X < T.Z.
The preceding inference rules were shown to be sound and complete for inclusion
dependencies. So far, no normal forms have been developed based on inclusion
dependencies.
6.2 Template Dependencies
Template dependencies provide a technique for representing constraints in relations
that typically have no easy and formal definitions. No matter how many types of
dependencies we develop, some peculiar constraint may come up based on the
semantics of attributes within relations that cannot be represented by any of them.
The idea behind template dependencies is to specify a template—or example—that
defines each constraint or dependency.
There are two types of templates: tuple-generating templates and constraint-
generating templates. A template consists of a number of hypothesis tuples that are
meant to show an example of the tuples that may appear in one or more relations.
The other part of the template is the template conclusion. For tuple-generating
templates, the conclusion is a set of tuples that must also exist in the relations if the
hypothesis tuples are there. For constraint-generating templates, the template con-
clusion is a condition that must hold on the hypothesis tuples. Using constraint-
generating templates, we are able to define semantic constraints—those that are
beyond the scope of the relational model in terms of its data definition language
and notation.
Figure 5 shows how we may define functional, multivalued, and inclusion depend-
encies by templates. Figure 6 shows how we may specify the constraint that an
employee’s salary cannot be higher than the salary of his or her direct supervisor on the
relation schema EMPLOYEE in Figure A.2.
Relational Database Design Algorithms and Further Dependencies
579
(a)
X = {A, B}
Y = {C, D}
Hypothesis
Conclusion
a1 b1 c1
c1 = c2 and d1 = d2
a1 b1 c2
(b) R = {A, B, C, D}
R = {A, B, C, D}
X = {A, B}
Y = {C}
X = {C, D}
Y = {E, F}
Hypothesis
Conclusion
(c)
Hypothesis
S = {E, F, G}
Conclusion
a1 b1 c1 d1
a1 b1 c1 d1
c1 d1 g
a1 b1 c2 d2
a1 b1
c1
d1
a1 b1
c2
d2
R = {A, B, C, D}
d1
d2
Figure 5
Templates for some common type of dependencies.
(a) Template for functional dependency X → Y.
(b) Template for the multivalued dependency X →→ Y.
(c) Template for the inclusion dependency R.X < S.Y.
EMPLOYEE = {Name, Ssn, . . . , Salary, Supervisor_ssn}
Hypothesis
Conclusion
a b c d
e f g
c < f
d
Figure 6
Templates for the constraint that an employee’s salary must
be less than the supervisor’s salary.
Relational Database Design Algorithms and Further Dependencies
580
Relational Database Design Algorithms and Further Dependencies
6.3 Functional Dependencies Based on Arithmetic Functions
and Procedures
Sometimes some attributes in a relation may be related via some arithmetic func-
tion or a more complicated functional relationship. As long as a unique value of Y is
associated with every X, we can still consider that the FD X → Y exists. For example,
in the relation
ORDER_LINE (Order#, Item#, Quantity, Unit_price, Extended_price,
Discounted_price)
each tuple represents an item from an order with a particular quantity, and the price
per unit for that item. In this relation, (Quantity, Unit_price ) → Extended_price by the
formula
Extended_price = Unit_price * Quantity.
Hence, there is a unique value for Extended_price for every pair (Quantity, Unit_price ),
and thus it conforms to the definition of functional dependency.
Moreover, there may be a procedure that takes into account the quantity discounts,
the type of item, and so on and computes a discounted price for the total quantity
ordered for that item. Therefore, we can say
(Item#, Quantity, Unit_price ) → Discounted_price, or
(Item#, Quantity, Extended_price) → Discounted_price.
To check the above FD, a more complex procedure COMPUTE_TOTAL_PRICE may
have to be called into play. Although the above kinds of FDs are technically present
in most relations, they are not given particular attention during normalization.
6.4 Domain-Key Normal Form
There is no hard-and-fast rule about defining normal forms only up to 5NF.
Historically, the process of normalization and the process of discovering undesir-
able dependencies were carried through 5NF, but it has been possible to define
stricter normal forms that take into account additional types of dependencies and
constraints. The idea behind domain-key normal form (DKNF) is to specify (theo-
retically, at least) the ultimate normal form that takes into account all possible types
of dependencies and constraints. A relation schema is said to be in DKNF if all con-
straints and dependencies that should hold on the valid relation states can be
enforced simply by enforcing the domain constraints and key constraints on the
relation. For a relation in DKNF, it becomes very straightforward to enforce all data-
base constraints by simply checking that each attribute value in a tuple is of the
appropriate domain and that every key constraint is enforced.
However, because of the difficulty of including complex constraints in a DKNF rela-
tion, its practical utility is limited, since it may be quite difficult to specify general
integrity constraints. For example, consider a relation CAR(Make, Vin#) (where Vin#
is the vehicle identification number) and another relation MANUFACTURE(Vin#,
581
Relational Database Design Algorithms and Further Dependencies
Country) (where Country is the country of manufacture). A general constraint may be
of the following form: If the Make is either ‘Toyota’ or ‘Lexus,’ then the first character
of the Vin# is a ‘J’ if the country of manufacture is ‘Japan’; if the Make is ‘Honda’ or
‘Acura,’ the second character of the Vin# is a ‘J’ if the country of manufacture is ‘Japan.’
There is no simplified way to represent such constraints short of writing a proce-
dure (or general assertions) to test them. The procedure COMPUTE_TOTAL_PRICE
above is an example of such procedures needed to enforce an appropriate integrity
constraint.
7 Summary
In this chapter we presented a further set of topics related to dependencies, a discus-
sion of decomposition, and several algorithms related to them as well as to normal-
ization. In Section 1 we presented inference rules for functional dependencies
(FDs), the notion of closure of an attribute, closure of a set of functional dependen-
cies, equivalence among sets of functional dependencies, and algorithms for finding
the closure of an attribute (Algorithm 1) and the minimal cover of a set of FDs
(Algorithm 2). We then discussed two important properties of decompositions: the
nonadditive join property and the dependency-preserving property. An algorithm
to test for nonadditive decomposition (Algorithm 3), and a simpler test for check-
ing the losslessness of binary decompositions (Property NJB) were described. We
then discussed relational design by synthesis, based on a set of given functional
dependencies. The relational synthesis algorithms (such as Algorithms 4 and 6) cre-
ate 3NF relations from a universal relation schema based on a given set of functional
dependencies that has been specified by the database designer. The relational decom-
position algorithms (such as Algorithms 5 and 7) create BCNF (or 4NF) relations by
successive nonadditive decomposition of unnormalized relations into two compo-
nent relations at a time. We saw that it is possible to synthesize 3NF relation schemas
that meet both of the above properties; however, in the case of BCNF, it is possible
to aim only for the nonadditiveness of joins—dependency preservation cannot be
necessarily guaranteed. If the designer has to aim for one of these two, the nonaddi-
tive join condition is an absolute must. In Section 4 we showed how certain difficul-
ties arise in a collection of relations due to null values that may exist in relations in
spite of the relations being individually in 3NF or BCNF. Sometimes when decom-
position is improperly carried too far, certain “dangling tuples” may result that do
not participate in results of joins and hence may become invisible. We also showed
how it is possible to have alternative designs that meet a given desired normal form.
Then we revisited multivalued dependencies (MVDs) in Section 5, which arise from
an improper combination of two or more independent multivalued attributes in
the same relation, and that result in a combinational expansion of the tuples used to
define fourth normal form (4NF). We discussed inference rules applicable to MVDs
and discussed the importance of 4NF. Finally, in Section 6 we discussed inclusion
dependencies, which are used to specify referential integrity and class/subclass con-
straints, and template dependencies, which can be used to specify arbitrary types of
582
Relational Database Design Algorithms and Further Dependencies
constraints. We pointed out the need for arithmetic functions or more complex
procedures to enforce certain functional dependency constraints. We concluded
with a brief discussion of the domain-key normal form (DKNF).
Review Questions
1. What is the role of Armstrong’s inference rules (inference rules IR1 through
IR3) in the development of the theory of relational design?
2. What is meant by the completeness and soundness of Armstrong’s inference
rules?
3. What is meant by the closure of a set of functional dependencies? Illustrate
with an example.
4. When are two sets of functional dependencies equivalent? How can we
determine their equivalence?
5. What is a minimal set of functional dependencies? Does every set of depen-
dencies have a minimal equivalent set? Is it always unique?
6. What is meant by the attribute preservation condition on a decomposition?
7. Why are normal forms alone insufficient as a condition for a good schema
design?
8. What is the dependency preservation property for a decomposition? Why is
it important?
9. Why can we not guarantee that BCNF relation schemas will be produced by
dependency-preserving decompositions of non-BCNF relation schemas?
Give a counterexample to illustrate this point.
10. What is the lossless (or nonadditive) join property of a decomposition? Why
is it important?
11. Between the properties of dependency preservation and losslessness, which
one must definitely be satisfied? Why?
12. Discuss the NULL value and dangling tuple problems.
13. Illustrate how the process of creating first normal form relations may lead to
multivalued dependencies. How should the first normalization be done
properly so that MVDs are avoided?
14. What types of constraints are inclusion dependencies meant to represent?
15. How do template dependencies differ from the other types of dependencies
we discussed?
16. Why is the domain-key normal form (DKNF) known as the ultimate normal
form?
583
Relational Database Design Algorithms and Further Dependencies
Exercises
17. Show that the relation schemas produced by Algorithm 4 are in 3NF.
18. Show that, if the matrix S resulting from Algorithm 3 does not have a row
that is all a symbols, projecting S on the decomposition and joining it back
will always produce at least one spurious tuple.
19. Show that the relation schemas produced by Algorithm 5 are in BCNF.
20. Show that the relation schemas produced by Algorithm 6 are in 3NF.
21. Specify a template dependency for join dependencies.
22. Specify all the inclusion dependencies for the relational schema in Figure A.2.
23. Prove that a functional dependency satisfies the formal definition of multi-
valued dependency.
24. Consider the example of normalizing the LOTS relation in Sections 4 and 5
from the chapter “Basics of Functional Dependencies and Normalization for
Relational Databases.” Determine whether the decomposition of LOTS into
{LOTS1AX, LOTS1AY, LOTS1B, LOTS2} has the lossless join property, by
applying Algorithm 3 and also by using the test under Property NJB.
25. Show how the MVDs Ename →→ Pname and Ename →→ Dname in Figure
15(a) from the chapter “Basics of Functional Dependencies and
Normalization for Relational Databases” may arise during normalization into
1NF of a relation, where the attributes Pname and Dname are multivalued.
26. Apply Algorithm 2(a) to the relation in Exercise 24 from the chapter “Basics
of Functional Dependencies and Normalization for Relational Databases” to
determine a key for R. Create a minimal set of dependencies G that is equiv-
alent to F, and apply the synthesis algorithm (Algorithm 6) to decompose R
into 3NF relations.
27. Repeat Exercise 26 for the functional dependencies in Exercise 25 from the
chapter “Basics of Functional Dependencies and Normalization for
Relational Databases.”
28. Apply the decomposition algorithm (Algorithm 5) to the relation R and the
set of dependencies F in Exercise 24 from the chapter “Basics of Functional
Dependencies and Normalization for Relational Databases.” Repeat for the
dependencies G in Exercise 25 from the same chapter.
29. Apply Algorithm 2(a) to the relations in Exercises 27 and 28 from the chap-
ter “Basics of Functional Dependencies and Normalization for Relational
Databases” to determine a key for R. Apply the synthesis algorithm
(Algorithm 6) to decompose R into 3NF relations and the decomposition
algorithm (Algorithm 5) to decompose R into BCNF relations.
30. Write programs that implement Algorithms 5 and 6.
31. Consider the following decompositions for the relation schema R of Exercise
24 from the chapter “Basics of Functional Dependencies and Normalization
for Relational Databases.” Determine whether each decomposition has (1)
the dependency preservation property, and (2) the lossless join property,
with respect to F. Also determine which normal form each relation in the
decomposition is in.
584
Relational Database Design Algorithms and Further Dependencies
a. D1 = {R1, R2, R3, R4, R5} ; R1 = {A, B, C} , R2 = {A, D, E} , R3 = {B, F} , R4 =
{F, G, H} , R5 = {D, I, J}
b. D2 = {R1, R2, R3} ; R1 = {A, B, C, D, E} , R2 = {B, F, G, H} , R3 = {D, I, J}
c. D3 = {R1, R2, R3, R4, R5} ; R1 = {A, B, C, D} , R2 = {D, E} , R3 = {B, F} , R4 =
{F, G, H} , R5 = {D, I, J}
32. Consider the relation REFRIG(Model#, Year, Price, Manuf_plant, Color), which
is abbreviated as REFRIG(M, Y, P, MP, C), and the following set F of func-
tional dependencies: F = {M → MP, {M, Y} → P, MP → C}
a. Evaluate each of the following as a candidate key for REFRIG, giving rea-
sons why it can or cannot be a key: {M}, {M, Y}, {M, C}.
b. Based on the above key determination, state whether the relation REFRIG
is in 3NF and in BCNF, giving proper reasons.
c. Consider the decomposition of REFRIG into D = {R1(M, Y, P), R2(M, MP,
C)}. Is this decomposition lossless? Show why. (You may consult the test
under Property NJB in Section 2.4.)
Laboratory Exercises
Note: These exercises use the DBD (Data Base Designer) system that is described in
the laboratory manual. The relational schema R and set of functional dependencies
F need to be coded as lists. As an example, R and F for problem 24 from the chapter
“Basics of Functional Dependencies and Normalization for Relational Databases”
are coded as:
R = [a, b, c, d, e, f, g, h, i, j]
F = [[[a, b],[c]],
[[a],[d, e]],
[[b],[f]],
[[f],[g, h]],
[[d],[i, j]]]
Since DBD is implemented in Prolog, use of uppercase terms is reserved for vari-
ables in the language and therefore lowercase constants are used to code the attrib-
utes. For further details on using the DBD system, please refer to the laboratory
manual.
33. Using the DBD system, verify your answers to the following exercises:
a. 24
b. 26
c. 27
d. 28
e. 29
f. 31 (a) and (b)
g. 32 (a) and (c)
585
Relational Database Design Algorithms and Further Dependencies
Selected Bibliography
The books by Maier (1983) and Atzeni and De Antonellis (1993) include a compre-
hensive discussion of relational dependency theory. The decomposition algorithm
(Algorithm 5) is due to Bernstein (1976). Algorithm 6 is based on the normalization
algorithm presented in Biskup et al. (1979). Tsou and Fischer (1982) give a polyno-
mial-time algorithm for BCNF decomposition.
The theory of dependency preservation and lossless joins is given in Ullman (1988),
where proofs of some of the algorithms discussed here appear. The lossless join
property is analyzed in Aho et al. (1979). Algorithms to determine the keys of a rela-
tion from functional dependencies are given in Osborn (1977); testing for BCNF is
discussed in Osborn (1979). Testing for 3NF is discussed in Tsou and Fischer
(1982). Algorithms for designing BCNF relations are given in Wang (1990) and
Hernandez and Chan (1991).
Multivalued dependencies and fourth normal form are defined in Zaniolo (1976)
and Nicolas (1978). Many of the advanced normal forms are due to Fagin: the
fourth normal form in Fagin (1977), PJNF in Fagin (1979), and DKNF in Fagin
(1981). The set of sound and complete rules for functional and multivalued
dependencies was given by Beeri et al. (1977). Join dependencies are discussed by
Rissanen (1977) and Aho et al. (1979). Inference rules for join dependencies are
given by Sciore (1982). Inclusion dependencies are discussed by Casanova et al.
(1981) and analyzed further in Cosmadakis et al. (1990). Their use in optimizing
relational schemas is discussed in Casanova et al. (1989). Template dependencies are
discussed by Sadri and Ullman (1982). Other dependencies are discussed in Nicolas
(1978), Furtado (1978), and Mendelzon and Maier (1979). Abiteboul et al. (1995)
provides a theoretical treatment of many of the ideas presented in this chapter.
586
EMPLOYEE
Salary Employee_type Position Rank Percent_time Ra_flag Ta_flag Project Course
STUDENT
Major_dept Grad_flag Undergrad_flag Degree_program Class Student_assist_flag
Name Birth_date Sex Address
PERSON
Ssn
ALUMNUS ALUMNUS_DEGREES
Year MajorSsn
Ssn
Ssn
Ssn Degree
Figure A.1
Mapping an EER specialization lattice using multiple options.
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.2
Schema diagram for the
COMPANY relational
database schema.
587
Disk Storage, Basic File
Structures, and Hashing
Databases are stored physically as files of records,which are typically stored on magnetic disks. This
chapter deals with the organization of databases in storage and the techniques for
accessing them efficiently using various algorithms, some of which require auxiliary
data structures called indexes. These structures are often referred to as physical
database file structures, and are at the physical level of three-schema architecture.
We start in Section 1 by introducing the concepts of computer storage hierarchies
and how they are used in database systems. Section 2 is devoted to a description of
magnetic disk storage devices and their characteristics, and we also briefly describe
magnetic tape storage devices. After discussing different storage technologies, we
turn our attention to the methods for physically organizing data on disks. Section 3
covers the technique of double buffering, which is used to speed retrieval of multi-
ple disk blocks. In Section 4 we discuss various ways of formatting and storing file
records on disk. Section 5 discusses the various types of operations that are typically
applied to file records. We present three primary methods for organizing file records
on disk: unordered records, in Section 6; ordered records, in Section 7; and hashed
records, in Section 8.
Section 9 briefly introduces files of mixed records and other primary methods for
organizing records, such as B-trees. These are particularly relevant for storage of
object-oriented databases. Section 10 describes RAID (Redundant Arrays of
Inexpensive (or Independent) Disks)—a data storage system architecture that is
commonly used in large organizations for better reliability and performance.
Finally, in Section 11 we describe three developments in the storage systems area:
storage area networks (SAN), network-attached storage (NAS), and iSCSI (Internet
From Chapter 17 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
588
Disk Storage, Basic File Structures, and Hashing
SCSI—Small Computer System Interface), the latest technology, which makes stor-
age area networks more affordable without the use of the Fiber Channel infrastruc-
ture and hence is getting very wide acceptance in industry. Section 12 summarizes
the chapter.
This chapter may be browsed through or even omitted by readers who have already
studied file organizations and indexing in a separate course. The material covered
here, in particular Sections 1 through 8, is necessary for understanding query pro-
cessing and optimization, and database tuning for improving performance of
queries.
1 Introduction
The collection of data that makes up a computerized database must be stored phys-
ically on some computer storage medium. The DBMS software can then retrieve,
update, and process this data as needed. Computer storage media form a storage
hierarchy that includes two main categories:
■ Primary storage. This category includes storage media that can be operated
on directly by the computer’s central processing unit (CPU), such as the com-
puter’s main memory and smaller but faster cache memories. Primary stor-
age usually provides fast access to data but is of limited storage capacity.
Although main memory capacities have been growing rapidly in recent
years, they are still more expensive and have less storage capacity than sec-
ondary and tertiary storage devices.
■ Secondary and tertiary storage. This category includes magnetic disks,
optical disks (CD-ROMs, DVDs, and other similar storage media), and
tapes. Hard-disk drives are classified as secondary storage, whereas remov-
able media such as optical disks and tapes are considered tertiary storage.
These devices usually have a larger capacity, cost less, and provide slower
access to data than do primary storage devices. Data in secondary or tertiary
storage cannot be processed directly by the CPU; first it must be copied into
primary storage and then processed by the CPU.
We first give an overview of the various storage devices used for primary and sec-
ondary storage in Section 1.1 and then discuss how databases are typically handled
in the storage hierarchy in Section 1.2.
1.1 Memory Hierarchies and Storage Devices
In a modern computer system, data resides and is transported throughout a hierar-
chy of storage media. The highest-speed memory is the most expensive and is there-
fore available with the least capacity. The lowest-speed memory is offline tape
storage, which is essentially available in indefinite storage capacity.
589
Disk Storage, Basic File Structures, and Hashing
At the primary storage level, the memory hierarchy includes at the most expensive
end, cache memory, which is a static RAM (Random Access Memory). Cache mem-
ory is typically used by the CPU to speed up execution of program instructions
using techniques such as prefetching and pipelining. The next level of primary stor-
age is DRAM (Dynamic RAM), which provides the main work area for the CPU for
keeping program instructions and data. It is popularly called main memory. The
advantage of DRAM is its low cost, which continues to decrease; the drawback is its
volatility1 and lower speed compared with static RAM. At the secondary and tertiary
storage level, the hierarchy includes magnetic disks, as well as mass storage in the
form of CD-ROM (Compact Disk–Read-Only Memory) and DVD (Digital Video
Disk or Digital Versatile Disk) devices, and finally tapes at the least expensive end of
the hierarchy. The storage capacity is measured in kilobytes (Kbyte or 1000 bytes),
megabytes (MB or 1 million bytes), gigabytes (GB or 1 billion bytes), and even ter-
abytes (1000 GB). The word petabyte (1000 terabytes or 10**15 bytes) is now
becoming relevant in the context of very large repositories of data in physics,
astronomy, earth sciences, and other scientific applications.
Programs reside and execute in DRAM. Generally, large permanent databases reside
on secondary storage, (magnetic disks), and portions of the database are read into
and written from buffers in main memory as needed. Nowadays, personal comput-
ers and workstations have large main memories of hundreds of megabytes of RAM
and DRAM, so it is becoming possible to load a large part of the database into main
memory. Eight to 16 GB of main memory on a single server is becoming common-
place. In some cases, entire databases can be kept in main memory (with a backup
copy on magnetic disk), leading to main memory databases; these are particularly
useful in real-time applications that require extremely fast response times. An
example is telephone switching applications, which store databases that contain
routing and line information in main memory.
Between DRAM and magnetic disk storage, another form of memory, flash mem-
ory, is becoming common, particularly because it is nonvolatile. Flash memories are
high-density, high-performance memories using EEPROM (Electrically Erasable
Programmable Read-Only Memory) technology. The advantage of flash memory is
the fast access speed; the disadvantage is that an entire block must be erased and
written over simultaneously. Flash memory cards are appearing as the data storage
medium in appliances with capacities ranging from a few megabytes to a few giga-
bytes. These are appearing in cameras, MP3 players, cell phones, PDAs, and so on.
USB (Universal Serial Bus) flash drives have become the most portable medium for
carrying data between personal computers; they have a flash memory storage device
integrated with a USB interface.
CD-ROM (Compact Disk – Read Only Memory) disks store data optically and are
read by a laser. CD-ROMs contain prerecorded data that cannot be overwritten.
WORM (Write-Once-Read-Many) disks are a form of optical storage used for
1Volatile memory typically loses its contents in case of a power outage, whereas nonvolatile memory
does not.
590
Disk Storage, Basic File Structures, and Hashing
archiving data; they allow data to be written once and read any number of times
without the possibility of erasing. They hold about half a gigabyte of data per disk
and last much longer than magnetic disks.2 Optical jukebox memories use an array
of CD-ROM platters, which are loaded onto drives on demand. Although optical
jukeboxes have capacities in the hundreds of gigabytes, their retrieval times are in
the hundreds of milliseconds, quite a bit slower than magnetic disks. This type of
storage is continuing to decline because of the rapid decrease in cost and increase in
capacities of magnetic disks. The DVD is another standard for optical disks allowing
4.5 to 15 GB of storage per disk. Most personal computer disk drives now read CD-
ROM and DVD disks. Typically, drives are CD-R (Compact Disk Recordable) that
can create CD-ROMs and audio CDs (Compact Disks), as well as record on DVDs.
Finally, magnetic tapes are used for archiving and backup storage of data. Tape
jukeboxes—which contain a bank of tapes that are catalogued and can be automat-
ically loaded onto tape drives—are becoming popular as tertiary storage to hold
terabytes of data. For example, NASA’s EOS (Earth Observation Satellite) system
stores archived databases in this fashion.
Many large organizations are already finding it normal to have terabyte-sized data-
bases. The term very large database can no longer be precisely defined because disk
storage capacities are on the rise and costs are declining. Very soon the term may be
reserved for databases containing tens of terabytes.
1.2 Storage of Databases
Databases typically store large amounts of data that must persist over long periods
of time, and hence is often referred to as persistent data. Parts of this data are
accessed and processed repeatedly during this period. This contrasts with the notion
of transient data that persist for only a limited time during program execution.
Most databases are stored permanently (or persistently) on magnetic disk secondary
storage, for the following reasons:
■ Generally, databases are too large to fit entirely in main memory.
■ The circumstances that cause permanent loss of stored data arise less fre-
quently for disk secondary storage than for primary storage. Hence, we refer
to disk—and other secondary storage devices—as nonvolatile storage,
whereas main memory is often called volatile storage.
■ The cost of storage per unit of data is an order of magnitude less for disk sec-
ondary storage than for primary storage.
Some of the newer technologies—such as optical disks, DVDs, and tape juke-
boxes—are likely to provide viable alternatives to the use of magnetic disks. In the
future, databases may therefore reside at different levels of the memory hierarchy
from those described in Section 1.1. However, it is anticipated that magnetic disks
2Their rotational speeds are lower (around 400 rpm), giving higher latency delays and low transfer rates
(around 100 to 200 KB/second).
591
Disk Storage, Basic File Structures, and Hashing
will continue to be the primary medium of choice for large databases for years to
come. Hence, it is important to study and understand the properties and character-
istics of magnetic disks and the way data files can be organized on disk in order to
design effective databases with acceptable performance.
Magnetic tapes are frequently used as a storage medium for backing up databases
because storage on tape costs even less than storage on disk. However, access to data
on tape is quite slow. Data stored on tapes is offline; that is, some intervention by an
operator—or an automatic loading device—to load a tape is needed before the data
becomes available. In contrast, disks are online devices that can be accessed directly
at any time.
The techniques used to store large amounts of structured data on disk are impor-
tant for database designers, the DBA, and implementers of a DBMS. Database
designers and the DBA must know the advantages and disadvantages of each stor-
age technique when they design, implement, and operate a database on a specific
DBMS. Usually, the DBMS has several options available for organizing the data. The
process of physical database design involves choosing the particular data organiza-
tion techniques that best suit the given application requirements from among the
options. DBMS system implementers must study data organization techniques so
that they can implement them efficiently and thus provide the DBA and users of the
DBMS with sufficient options.
Typical database applications need only a small portion of the database at a time for
processing. Whenever a certain portion of the data is needed, it must be located on
disk, copied to main memory for processing, and then rewritten to the disk if the
data is changed. The data stored on disk is organized as files of records. Each record
is a collection of data values that can be interpreted as facts about entities, their
attributes, and their relationships. Records should be stored on disk in a manner
that makes it possible to locate them efficiently when they are needed.
There are several primary file organizations, which determine how the file records
are physically placed on the disk, and hence how the records can be accessed. A heap file
(or unordered file) places the records on disk in no particular order by appending
new records at the end of the file, whereas a sorted file (or sequential file) keeps the
records ordered by the value of a particular field (called the sort key). A hashed file
uses a hash function applied to a particular field (called the hash key) to determine
a record’s placement on disk. Other primary file organizations, such as B-trees, use
tree structures. We discuss primary file organizations in Sections 6 through 9. A
secondary organization or auxiliary access structure allows efficient access to file
records based on alternate fields than those that have been used for the primary file
organization. Most of these exist as indexes.
2 Secondary Storage Devices
In this section we describe some characteristics of magnetic disk and magnetic tape
storage devices. Readers who have already studied these devices may simply browse
through this section.
592
Disk Storage, Basic File Structures, and Hashing
Actuator movement
Track
ArmActuator
Read/write
head Spindle Disk rotation
Cylinder
of tracks
(imaginary)
(a)
(b)
Figure 1
(a) A single-sided disk with read/write hardware.
(b) A disk pack with read/write hardware.
2.1 Hardware Description of Disk Devices
Magnetic disks are used for storing large amounts of data. The most basic unit of
data on the disk is a single bit of information. By magnetizing an area on disk in cer-
tain ways, one can make it represent a bit value of either 0 (zero) or 1 (one). To code
information, bits are grouped into bytes (or characters). Byte sizes are typically 4 to
8 bits, depending on the computer and the device. We assume that one character is
stored in a single byte, and we use the terms byte and character interchangeably. The
capacity of a disk is the number of bytes it can store, which is usually very large.
Small floppy disks used with microcomputers typically hold from 400 KB to 1.5
MB; they are rapidly going out of circulation. Hard disks for personal computers
typically hold from several hundred MB up to tens of GB; and large disk packs used
with servers and mainframes have capacities of hundreds of GB. Disk capacities
continue to grow as technology improves.
Whatever their capacity, all disks are made of magnetic material shaped as a thin
circular disk, as shown in Figure 1(a), and protected by a plastic or acrylic cover. A
593
Disk Storage, Basic File Structures, and Hashing
Track(a) Sector (arc of track)
(b)
Three sectors
Two sectors
One sector
Figure 2
Different sector organ-
izations on disk. (a)
Sectors subtending a
fixed angle. (b) Sectors
maintaining a uniform
recording
density.
disk is single-sided if it stores information on one of its surfaces only and double-
sided if both surfaces are used. To increase storage capacity, disks are assembled into
a disk pack, as shown in Figure 1(b), which may include many disks and therefore
many surfaces. Information is stored on a disk surface in concentric circles of small
width,3 each having a distinct diameter. Each circle is called a track. In disk packs,
tracks with the same diameter on the various surfaces are called a cylinder because
of the shape they would form if connected in space. The concept of a cylinder is
important because data stored on one cylinder can be retrieved much faster than if
it were distributed among different cylinders.
The number of tracks on a disk ranges from a few hundred to a few thousand, and
the capacity of each track typically ranges from tens of Kbytes to 150 Kbytes.
Because a track usually contains a large amount of information, it is divided into
smaller blocks or sectors. The division of a track into sectors is hard-coded on the
disk surface and cannot be changed. One type of sector organization, as shown in
Figure 2(a), calls a portion of a track that subtends a fixed angle at the center a sec-
tor. Several other sector organizations are possible, one of which is to have the sec-
tors subtend smaller angles at the center as one moves away, thus maintaining a
uniform density of recording, as shown in Figure 2(b). A technique called ZBR
(Zone Bit Recording) allows a range of cylinders to have the same number of sectors
per arc. For example, cylinders 0–99 may have one sector per track, 100–199 may
have two per track, and so on. Not all disks have their tracks divided into sectors.
The division of a track into equal-sized disk blocks (or pages) is set by the operat-
ing system during disk formatting (or initialization). Block size is fixed during ini-
tialization and cannot be changed dynamically. Typical disk block sizes range from
512 to 8192 bytes. A disk with hard-coded sectors often has the sectors subdivided
into blocks during initialization. Blocks are separated by fixed-size interblock gaps,
which include specially coded control information written during disk initializa-
tion. This information is used to determine which block on the track follows each
3In some disks, the circles are now connected into a kind of continuous spiral.
594
Disk Storage, Basic File Structures, and Hashing
Table 1 Specifications of Typical High-End Cheetah Disks from Seagate
Description Cheetah 15K.6 Cheetah NS 10K
Model Number ST3450856SS/FC ST3400755FC
Height 25.4 mm 26.11 mm
Width 101.6 mm 101.85 mm
Length 146.05 mm 147 mm
Weight 0.709 kg 0.771 kg
Capacity
Formatted Capacity 450 Gbytes 400 Gbytes
Configuration
Number of disks (physical) 4 4
Number of heads (physical) 8 8
Performance
Transfer Rates
Internal Transfer Rate (min) 1051 Mb/sec
Internal Transfer Rate (max) 2225 Mb/sec 1211 Mb/sec
Mean Time Between Failure (MTBF) 1.4 M hours
Seek Times
Avg. Seek Time (Read) 3.4 ms (typical) 3.9 ms (typical)
Avg. Seek Time (Write) 3.9 ms (typical) 4.2 ms (typical)
Track-to-track, Seek, Read 0.2 ms (typical) 0.35 ms (typical)
Track-to-track, Seek, Write 0.4 ms (typical) 0.35 ms (typical)
Average Latency 2 ms 2.98 msec
Courtesy Seagate Technology
interblock gap. Table 1 illustrates the specifications of typical disks used on large
servers in industry. The 10K and 15K prefixes on disk names refer to the rotational
speeds in rpm (revolutions per minute).
There is continuous improvement in the storage capacity and transfer rates associ-
ated with disks; they are also progressively getting cheaper—currently costing only a
fraction of a dollar per megabyte of disk storage. Costs are going down so rapidly
that costs as low 0.025 cent/MB—which translates to $0.25/GB and $250/TB—are
already here.
A disk is a random access addressable device. Transfer of data between main memory
and disk takes place in units of disk blocks. The hardware address of a block—a
combination of a cylinder number, track number (surface number within the cylin-
der on which the track is located), and block number (within the track) is supplied
to the disk I/O (input/output) hardware. In many modern disk drives, a single num-
ber called LBA (Logical Block Address), which is a number between 0 and n (assum-
ing the total capacity of the disk is n + 1 blocks), is mapped automatically to the
right block by the disk drive controller. The address of a buffer—a contiguous
595
reserved area in main storage that holds one disk block—is also provided. For a
read command, the disk block is copied into the buffer; whereas for a write com-
mand, the contents of the buffer are copied into the disk block. Sometimes several
contiguous blocks, called a cluster, may be transferred as a unit. In this case, the
buffer size is adjusted to match the number of bytes in the cluster.
The actual hardware mechanism that reads or writes a block is the disk read/write
head, which is part of a system called a disk drive. A disk or disk pack is mounted in
the disk drive, which includes a motor that rotates the disks. A read/write head
includes an electronic component attached to a mechanical arm. Disk packs with
multiple surfaces are controlled by several read/write heads—one for each surface,
as shown in Figure 1(b). All arms are connected to an actuator attached to another
electrical motor, which moves the read/write heads in unison and positions them
precisely over the cylinder of tracks specified in a block address.
Disk drives for hard disks rotate the disk pack continuously at a constant speed
(typically ranging between 5,400 and 15,000 rpm). Once the read/write head is
positioned on the right track and the block specified in the block address moves
under the read/write head, the electronic component of the read/write head is acti-
vated to transfer the data. Some disk units have fixed read/write heads, with as many
heads as there are tracks. These are called fixed-head disks, whereas disk units with
an actuator are called movable-head disks. For fixed-head disks, a track or cylinder
is selected by electronically switching to the appropriate read/write head rather than
by actual mechanical movement; consequently, it is much faster. However, the cost
of the additional read/write heads is quite high, so fixed-head disks are not com-
monly used.
A disk controller, typically embedded in the disk drive, controls the disk drive and
interfaces it to the computer system. One of the standard interfaces used today for
disk drives on PCs and workstations is called SCSI (Small Computer System
Interface). The controller accepts high-level I/O commands and takes appropriate
action to position the arm and causes the read/write action to take place. To transfer
a disk block, given its address, the disk controller must first mechanically position
the read/write head on the correct track. The time required to do this is called the
seek time. Typical seek times are 5 to 10 msec on desktops and 3 to 8 msecs on
servers. Following that, there is another delay—called the rotational delay or
latency—while the beginning of the desired block rotates into position under the
read/write head. It depends on the rpm of the disk. For example, at 15,000 rpm, the
time per rotation is 4 msec and the average rotational delay is the time per half rev-
olution, or 2 msec. At 10,000 rpm the average rotational delay increases to 3 msec.
Finally, some additional time is needed to transfer the data; this is called the block
transfer time. Hence, the total time needed to locate and transfer an arbitrary
block, given its address, is the sum of the seek time, rotational delay, and block
transfer time. The seek time and rotational delay are usually much larger than the
block transfer time. To make the transfer of multiple blocks more efficient, it is
common to transfer several consecutive blocks on the same track or cylinder. This
eliminates the seek time and rotational delay for all but the first block and can result
Disk Storage, Basic File Structures, and Hashing
596
Disk Storage, Basic File Structures, and Hashing
in a substantial saving of time when numerous contiguous blocks are transferred.
Usually, the disk manufacturer provides a bulk transfer rate for calculating the time
required to transfer consecutive blocks.
The time needed to locate and transfer a disk block is in the order of milliseconds,
usually ranging from 9 to 60 msec. For contiguous blocks, locating the first block
takes from 9 to 60 msec, but transferring subsequent blocks may take only 0.4 to 2
msec each. Many search techniques take advantage of consecutive retrieval of blocks
when searching for data on disk. In any case, a transfer time in the order of millisec-
onds is considered quite high compared with the time required to process data in
main memory by current CPUs. Hence, locating data on disk is a major bottleneck in
database applications. The file structures we discuss here attempt to minimize the
number of block transfers needed to locate and transfer the required data from disk
to main memory. Placing “related information” on contiguous blocks is the basic
goal of any storage organization on disk.
2.2 Magnetic Tape Storage Devices
Disks are random access secondary storage devices because an arbitrary disk block
may be accessed at random once we specify its address. Magnetic tapes are sequen-
tial access devices; to access the nth block on tape, first we must scan the preceding
n – 1 blocks. Data is stored on reels of high-capacity magnetic tape, somewhat sim-
ilar to audiotapes or videotapes. A tape drive is required to read the data from or
write the data to a tape reel. Usually, each group of bits that forms a byte is stored
across the tape, and the bytes themselves are stored consecutively on the tape.
A read/write head is used to read or write data on tape. Data records on tape are also
stored in blocks—although the blocks may be substantially larger than those for
disks, and interblock gaps are also quite large. With typical tape densities of 1600 to
6250 bytes per inch, a typical interblock gap4 of 0.6 inch corresponds to 960 to 3750
bytes of wasted storage space. It is customary to group many records together in one
block for better space utilization.
The main characteristic of a tape is its requirement that we access the data blocks in
sequential order. To get to a block in the middle of a reel of tape, the tape is
mounted and then scanned until the required block gets under the read/write head.
For this reason, tape access can be slow and tapes are not used to store online data,
except for some specialized applications. However, tapes serve a very important
function—backing up the database. One reason for backup is to keep copies of disk
files in case the data is lost due to a disk crash, which can happen if the disk
read/write head touches the disk surface because of mechanical malfunction. For
this reason, disk files are copied periodically to tape. For many online critical appli-
cations, such as airline reservation systems, to avoid any downtime, mirrored sys-
tems are used to keep three sets of identical disks—two in online operation and one
4Called interrecord gaps in tape terminology.
597
Disk Storage, Basic File Structures, and Hashing
as backup. Here, offline disks become a backup device. The three are rotated so that
they can be switched in case there is a failure on one of the live disk drives. Tapes can
also be used to store excessively large database files. Database files that are seldom
used or are outdated but required for historical record keeping can be archived on
tape. Originally, half-inch reel tape drives were used for data storage employing the
so-called 9 track tapes. Later, smaller 8-mm magnetic tapes (similar to those used in
camcorders) that can store up to 50 GB, as well as 4-mm helical scan data cartridges
and writable CDs and DVDs, became popular media for backing up data files from
PCs and workstations. They are also used for storing images and system libraries.
Backing up enterprise databases so that no transaction information is lost is a major
undertaking. Currently, tape libraries with slots for several hundred cartridges are
used with Digital and Superdigital Linear Tapes (DLTs and SDLTs) having capacities
in hundreds of gigabytes that record data on linear tracks. Robotic arms are used to
write on multiple cartridges in parallel using multiple tape drives with automatic
labeling software to identify the backup cartridges. An example of a giant library is
the SL8500 model of Sun Storage Technology that can store up to 70 petabytes
(petabyte = 1000 TB) of data using up to 448 drives with a maximum throughput
rate of 193.2 TB/hour. We defer the discussion of disk storage technology called
RAID, and of storage area networks, network-attached storage, and iSCSI storage
systems to the end of the chapter.
3 Buffering of Blocks
When several blocks need to be transferred from disk to main memory and all the
block addresses are known, several buffers can be reserved in main memory to
speed up the transfer. While one buffer is being read or written, the CPU can
process data in the other buffer because an independent disk I/O processor (con-
troller) exists that, once started, can proceed to transfer a data block between mem-
ory and disk independent of and in parallel to CPU processing.
Figure 3 illustrates how two processes can proceed in parallel. Processes A and B are
running concurrently in an interleaved fashion, whereas processes C and D are
running concurrently in a parallel fashion. When a single CPU controls multiple
processes, parallel execution is not possible. However, the processes can still run
concurrently in an interleaved way. Buffering is most useful when processes can run
concurrently in a parallel fashion, either because a separate disk I/O processor is
available or because multiple CPU processors exist.
Figure 4 illustrates how reading and processing can proceed in parallel when the
time required to process a disk block in memory is less than the time required to
read the next block and fill a buffer. The CPU can start processing a block once its
transfer to main memory is completed; at the same time, the disk I/O processor can
be reading and transferring the next block into a different buffer. This technique is
called double buffering and can also be used to read a continuous stream of blocks
from disk to memory. Double buffering permits continuous reading or writing of
data on consecutive disk blocks, which eliminates the seek time and rotational delay
598
Disk Storage, Basic File Structures, and Hashing
Interleaved concurrency
of operations A and B
Parallel execution of
operations C and D
t1
A A
B B
t2 t3 t4
Time
Figure 3
Interleaved concurrency
versus parallel execution.
i + 1
Process B
i + 2
Fill A
Time
i
Process A
i + 1
Fill B
Disk Block:
I/O:
Disk Block:
PROCESSING:
i
Fill A
i + 2
Process A
i + 3
Fill A
i + 4
Process A
i + 3
Process B
i + 4
Fill A
Figure 4
Use of two buffers, A and B, for reading from disk.
for all but the first block transfer. Moreover, data is kept ready for processing, thus
reducing the waiting time in the programs.
4 Placing File Records on Disk
In this section, we define the concepts of records, record types, and files. Then we
discuss techniques for placing file records on disk.
4.1 Records and Record Types
Data is usually stored in the form of records. Each record consists of a collection of
related data values or items, where each value is formed of one or more bytes and
corresponds to a particular field of the record. Records usually describe entities and
their attributes. For example, an EMPLOYEE record represents an employee entity,
and each field value in the record specifies some attribute of that employee, such as
Name, Birth_date, Salary, or Supervisor. A collection of field names and their corre-
599
Disk Storage, Basic File Structures, and Hashing
sponding data types constitutes a record type or record format definition. A data
type, associated with each field, specifies the types of values a field can take.
The data type of a field is usually one of the standard data types used in program-
ming. These include numeric (integer, long integer, or floating point), string of
characters (fixed-length or varying), Boolean (having 0 and 1 or TRUE and FALSE
values only), and sometimes specially coded date and time data types. The number
of bytes required for each data type is fixed for a given computer system. An integer
may require 4 bytes, a long integer 8 bytes, a real number 4 bytes, a Boolean 1 byte,
a date 10 bytes (assuming a format of YYYY-MM-DD), and a fixed-length string of
k characters k bytes. Variable-length strings may require as many bytes as there are
characters in each field value. For example, an EMPLOYEE record type may be
defined—using the C programming language notation—as the following structure:
struct employee{
char name[30];
char ssn[9];
int salary;
int job_code;
char department[20];
} ;
In some database applications, the need may arise for storing data items that consist
of large unstructured objects, which represent images, digitized video or audio
streams, or free text. These are referred to as BLOBs (binary large objects). A BLOB
data item is typically stored separately from its record in a pool of disk blocks, and a
pointer to the BLOB is included in the record.
4.2 Files, Fixed-Length Records,
and Variable-Length Records
A file is a sequence of records. In many cases, all records in a file are of the same
record type. If every record in the file has exactly the same size (in bytes), the file is
said to be made up of fixed-length records. If different records in the file have dif-
ferent sizes, the file is said to be made up of variable-length records. A file may have
variable-length records for several reasons:
■ The file records are of the same record type, but one or more of the fields are
of varying size (variable-length fields). For example, the Name field of
EMPLOYEE can be a variable-length field.
■ The file records are of the same record type, but one or more of the fields
may have multiple values for individual records; such a field is called a
repeating field and a group of values for the field is often called a repeating
group.
■ The file records are of the same record type, but one or more of the fields are
optional; that is, they may have values for some but not all of the file records
(optional fields).
600
Name = Smith, John Ssn = 123456789 DEPARTMENT = Computer
Smith, John
Name
1
(a)
(b)
(c)
1 12 21 25 29
Name Ssn Salary Job_code Department Hire_date
31 40 44 48 68
Ssn Salary Job_code Department
Separator Characters123456789 XXXX XXXX Computer
Separator Characters
Separates field name
from field value
Separates fields
Terminates record
=
Disk Storage, Basic File Structures, and Hashing
Figure 5
Three record storage formats. (a) A fixed-length record with six
fields and size of 71 bytes. (b) A record with two variable-length
fields and three fixed-length fields. (c) A variable-field record with
three types of separator characters.
■ The file contains records of different record types and hence of varying size
(mixed file). This would occur if related records of different types were
clustered (placed together) on disk blocks; for example, the GRADE_REPORT
records of a particular student may be placed following that STUDENT’s
record.
The fixed-length EMPLOYEE records in Figure 5(a) have a record size of 71 bytes.
Every record has the same fields, and field lengths are fixed, so the system can iden-
tify the starting byte position of each field relative to the starting position of the
record. This facilitates locating field values by programs that access such files. Notice
that it is possible to represent a file that logically should have variable-length records
as a fixed-length records file. For example, in the case of optional fields, we could
have every field included in every file record but store a special NULL value if no value
exists for that field. For a repeating field, we could allocate as many spaces in each
record as the maximum possible number of occurrences of the field. In either case,
space is wasted when certain records do not have values for all the physical spaces
provided in each record. Now we consider other options for formatting records of a
file of variable-length records.
601
Disk Storage, Basic File Structures, and Hashing
For variable-length fields, each record has a value for each field, but we do not know
the exact length of some field values. To determine the bytes within a particular
record that represent each field, we can use special separator characters (such as ? or
% or $)—which do not appear in any field value—to terminate variable-length
fields, as shown in Figure 5(b), or we can store the length in bytes of the field in the
record, preceding the field value.
A file of records with optional fields can be formatted in different ways. If the total
number of fields for the record type is large, but the number of fields that actually
appear in a typical record is small, we can include in each record a sequence of
pairs rather than just the field values. Three types of sep-
arator characters are used in Figure 5(c), although we could use the same separator
character for the first two purposes—separating the field name from the field value
and separating one field from the next field. A more practical option is to assign a
short field type code—say, an integer number—to each field and include in each
record a sequence of pairs rather than pairs.
A repeating field needs one separator character to separate the repeating values of
the field and another separator character to indicate termination of the field.
Finally, for a file that includes records of different types, each record is preceded by a
record type indicator. Understandably, programs that process files of variable-
length records—which are usually part of the file system and hence hidden from the
typical programmers—need to be more complex than those for fixed-length
records, where the starting position and size of each field are known and fixed.5
4.3 Record Blocking and Spanned
versus Unspanned Records
The records of a file must be allocated to disk blocks because a block is the unit of
data transfer between disk and memory. When the block size is larger than the
record size, each block will contain numerous records, although some files may have
unusually large records that cannot fit in one block. Suppose that the block size is B
bytes. For a file of fixed-length records of size R bytes, with B ≥ R, we can fit bfr =
⎣B/R⎦ records per block, where the ⎣(x)⎦ (floor function) rounds down the number x
to an integer. The value bfr is called the blocking factor for the file. In general, R
may not divide B exactly, so we have some unused space in each block equal to
B − (bfr * R) bytes
To utilize this unused space, we can store part of a record on one block and the rest
on another. A pointer at the end of the first block points to the block containing the
remainder of the record in case it is not the next consecutive block on disk. This
organization is called spanned because records can span more than one block.
Whenever a record is larger than a block, we must use a spanned organization. If
records are not allowed to cross block boundaries, the organization is called
unspanned. This is used with fixed-length records having B > R because it makes
5Other schemes are also possible for representing variable-length records.
602
Disk Storage, Basic File Structures, and Hashing
Record 1Block i Record 2 Record 3 Record 4 P
Record 4 (rest)Block i + 1 Record 5 Record 6 Record 7 P
Record 1Block i
(b)
(a) Record 2 Record 3
Record 4Block i + 1 Record 5 Record 6
Figure 6
Types of record
organization.
(a) Unspanned.
(b) Spanned.
each record start at a known location in the block, simplifying record processing. For
variable-length records, either a spanned or an unspanned organization can be used.
If the average record is large, it is advantageous to use spanning to reduce the lost
space in each block. Figure 6 illustrates spanned versus unspanned organization.
For variable-length records using spanned organization, each block may store a dif-
ferent number of records. In this case, the blocking factor bfr represents the average
number of records per block for the file. We can use bfr to calculate the number of
blocks b needed for a file of r records:
b = ⎡(r/bfr)⎤ blocks
where the ⎡(x)⎤ (ceiling function) rounds the value x up to the next integer.
4.4 Allocating File Blocks on Disk
There are several standard techniques for allocating the blocks of a file on disk. In
contiguous allocation, the file blocks are allocated to consecutive disk blocks. This
makes reading the whole file very fast using double buffering, but it makes expand-
ing the file difficult. In linked allocation, each file block contains a pointer to the
next file block. This makes it easy to expand the file but makes it slow to read the
whole file. A combination of the two allocates clusters of consecutive disk blocks,
and the clusters are linked. Clusters are sometimes called file segments or extents.
Another possibility is to use indexed allocation, where one or more index blocks
contain pointers to the actual file blocks. It is also common to use combinations of
these techniques.
4.5 File Headers
A file header or file descriptor contains information about a file that is needed by
the system programs that access the file records. The header includes information to
determine the disk addresses of the file blocks as well as to record format descrip-
tions, which may include field lengths and the order of fields within a record for
fixed-length unspanned records and field type codes, separator characters, and
record type codes for variable-length records.
To search for a record on disk, one or more blocks are copied into main memory
buffers. Programs then search for the desired record or records within the buffers,
using the information in the file header. If the address of the block that contains the
desired record is not known, the search programs must do a linear search through
603
Disk Storage, Basic File Structures, and Hashing
the file blocks. Each file block is copied into a buffer and searched until the record is
located or all the file blocks have been searched unsuccessfully. This can be very
time-consuming for a large file. The goal of a good file organization is to locate the
block that contains a desired record with a minimal number of block transfers.
5 Operations on Files
Operations on files are usually grouped into retrieval operations and update oper-
ations. The former do not change any data in the file, but only locate certain records
so that their field values can be examined and processed. The latter change the file
by insertion or deletion of records or by modification of field values. In either case,
we may have to select one or more records for retrieval, deletion, or modification
based on a selection condition (or filtering condition), which specifies criteria that
the desired record or records must satisfy.
Consider an EMPLOYEE file with fields Name, Ssn, Salary, Job_code, and Department.
A simple selection condition may involve an equality comparison on some field
value—for example, (Ssn = ‘123456789’) or (Department = ‘Research’). More com-
plex conditions can involve other types of comparison operators, such as > or ≥; an
example is (Salary ≥ 30000). The general case is to have an arbitrary Boolean expres-
sion on the fields of the file as the selection condition.
Search operations on files are generally based on simple selection conditions. A
complex condition must be decomposed by the DBMS (or the programmer) to
extract a simple condition that can be used to locate the records on disk. Each
located record is then checked to determine whether it satisfies the full selection
condition. For example, we may extract the simple condition (Department =
‘Research’) from the complex condition ((Salary ≥ 30000) AND (Department =
‘Research’)); each record satisfying (Department = ‘Research’) is located and then
tested to see if it also satisfies (Salary ≥ 30000).
When several file records satisfy a search condition, the first record—with respect to
the physical sequence of file records—is initially located and designated the current
record. Subsequent search operations commence from this record and locate the
next record in the file that satisfies the condition.
Actual operations for locating and accessing file records vary from system to system.
Below, we present a set of representative operations. Typically, high-level programs,
such as DBMS software programs, access records by using these commands, so we
sometimes refer to program variables in the following descriptions:
■ Open. Prepares the file for reading or writing. Allocates appropriate buffers
(typically at least two) to hold file blocks from disk, and retrieves the file
header. Sets the file pointer to the beginning of the file.
■ Reset. Sets the file pointer of an open file to the beginning of the file.
■ Find (or Locate). Searches for the first record that satisfies a search condi-
tion. Transfers the block containing that record into a main memory buffer
(if it is not already there). The file pointer points to the record in the buffer
604
Disk Storage, Basic File Structures, and Hashing
and it becomes the current record. Sometimes, different verbs are used to
indicate whether the located record is to be retrieved or updated.
■ Read (or Get). Copies the current record from the buffer to a program vari-
able in the user program. This command may also advance the current
record pointer to the next record in the file, which may necessitate reading
the next file block from disk.
■ FindNext. Searches for the next record in the file that satisfies the search
condition. Transfers the block containing that record into a main memory
buffer (if it is not already there). The record is located in the buffer and
becomes the current record. Various forms of FindNext (for example, Find
Next record within a current parent record, Find Next record of a given type,
or Find Next record where a complex condition is met) are available in
legacy DBMSs based on the hierarchical and network models.
■ Delete. Deletes the current record and (eventually) updates the file on disk
to reflect the deletion.
■ Modify. Modifies some field values for the current record and (eventually)
updates the file on disk to reflect the modification.
■ Insert. Inserts a new record in the file by locating the block where the record
is to be inserted, transferring that block into a main memory buffer (if it is
not already there), writing the record into the buffer, and (eventually) writ-
ing the buffer to disk to reflect the insertion.
■ Close. Completes the file access by releasing the buffers and performing any
other needed cleanup operations.
The preceding (except for Open and Close) are called record-at-a-time operations
because each operation applies to a single record. It is possible to streamline the
operations Find, FindNext, and Read into a single operation, Scan, whose descrip-
tion is as follows:
■ Scan. If the file has just been opened or reset, Scan returns the first record;
otherwise it returns the next record. If a condition is specified with the oper-
ation, the returned record is the first or next record satisfying the condition.
In database systems, additional set-at-a-time higher-level operations may be
applied to a file. Examples of these are as follows:
■ FindAll. Locates all the records in the file that satisfy a search condition.
■ Find (or Locate) n. Searches for the first record that satisfies a search condi-
tion and then continues to locate the next n – 1 records satisfying the same
condition. Transfers the blocks containing the n records to the main memory
buffer (if not already there).
■ FindOrdered. Retrieves all the records in the file in some specified order.
■ Reorganize. Starts the reorganization process. As we shall see, some file
organizations require periodic reorganization. An example is to reorder the
file records by sorting them on a specified field.
605
Disk Storage, Basic File Structures, and Hashing
At this point, it is worthwhile to note the difference between the terms file organiza-
tion and access method. A file organization refers to the organization of the data of
a file into records, blocks, and access structures; this includes the way records and
blocks are placed on the storage medium and interlinked. An access method, on the
other hand, provides a group of operations—such as those listed earlier—that can
be applied to a file. In general, it is possible to apply several access methods to a file
organization. Some access methods, though, can be applied only to files organized
in certain ways. For example, we cannot apply an indexed access method to a file
without an index.
Usually, we expect to use some search conditions more than others. Some files may
be static, meaning that update operations are rarely performed; other, more
dynamic files may change frequently, so update operations are constantly applied to
them. A successful file organization should perform as efficiently as possible the
operations we expect to apply frequently to the file. For example, consider the
EMPLOYEE file, as shown in Figure 5(a), which stores the records for current
employees in a company. We expect to insert records (when employees are hired),
delete records (when employees leave the company), and modify records (for exam-
ple, when an employee’s salary or job is changed). Deleting or modifying a record
requires a selection condition to identify a particular record or set of records.
Retrieving one or more records also requires a selection condition.
If users expect mainly to apply a search condition based on Ssn, the designer must
choose a file organization that facilitates locating a record given its Ssn value. This
may involve physically ordering the records by Ssn value or defining an index on
Ssn. Suppose that a second application uses the file to generate employees’ pay-
checks and requires that paychecks are grouped by department. For this application,
it is best to order employee records by department and then by name within each
department. The clustering of records into blocks and the organization of blocks on
cylinders would now be different than before. However, this arrangement conflicts
with ordering the records by Ssn values. If both applications are important, the
designer should choose an organization that allows both operations to be done effi-
ciently. Unfortunately, in many cases a single organization does not allow all needed
operations on a file to be implemented efficiently. This requires that a compromise
must be chosen that takes into account the expected importance and mix of
retrieval and update operations.
In the following sections, we discuss methods for organizing records of a file on
disk. Several general techniques, such as ordering, hashing, and indexing, are used
to create access methods. Additionally, various general techniques for handling
insertions and deletions work with many file organizations.
6 Files of Unordered Records (Heap Files)
In this simplest and most basic type of organization, records are placed in the file in
the order in which they are inserted, so new records are inserted at the end of the
606
Disk Storage, Basic File Structures, and Hashing
file. Such an organization is called a heap or pile file.6 This organization is often
used with additional access paths, such as secondary indexes. It is also used to collect
and store data records for future use.
Inserting a new record is very efficient. The last disk block of the file is copied into a
buffer, the new record is added, and the block is then rewritten back to disk. The
address of the last file block is kept in the file header. However, searching for a
record using any search condition involves a linear search through the file block by
block—an expensive procedure. If only one record satisfies the search condition,
then, on the average, a program will read into memory and search half the file
blocks before it finds the record. For a file of b blocks, this requires searching (b/2)
blocks, on average. If no records or several records satisfy the search condition, the
program must read and search all b blocks in the file.
To delete a record, a program must first find its block, copy the block into a buffer,
delete the record from the buffer, and finally rewrite the block back to the disk. This
leaves unused space in the disk block. Deleting a large number of records in this way
results in wasted storage space. Another technique used for record deletion is to
have an extra byte or bit, called a deletion marker, stored with each record. A record
is deleted by setting the deletion marker to a certain value. A different value for the
marker indicates a valid (not deleted) record. Search programs consider only valid
records in a block when conducting their search. Both of these deletion techniques
require periodic reorganization of the file to reclaim the unused space of deleted
records. During reorganization, the file blocks are accessed consecutively, and
records are packed by removing deleted records. After such a reorganization, the
blocks are filled to capacity once more. Another possibility is to use the space of
deleted records when inserting new records, although this requires extra bookkeep-
ing to keep track of empty locations.
We can use either spanned or unspanned organization for an unordered file, and it
may be used with either fixed-length or variable-length records. Modifying a vari-
able-length record may require deleting the old record and inserting a modified
record because the modified record may not fit in its old space on disk.
To read all records in order of the values of some field, we create a sorted copy of the
file. Sorting is an expensive operation for a large disk file, and special techniques for
external sorting are used.
For a file of unordered fixed-length records using unspanned blocks and contiguous
allocation, it is straightforward to access any record by its position in the file. If the
file records are numbered 0, 1, 2, …, r − 1 and the records in each block are num-
bered 0, 1, …, bfr − 1, where bfr is the blocking factor, then the ith record of the file
is located in block ⎣(i/bfr)⎦ and is the (i mod bfr)th record in that block. Such a file
is often called a relative or direct file because records can easily be accessed directly
by their relative positions. Accessing a record by its position does not help locate a
record based on a search condition; however, it facilitates the construction of access
paths on the file, such as indexes.
6Sometimes this organization is called a sequential file.
607
Disk Storage, Basic File Structures, and Hashing
7 Files of Ordered Records (Sorted Files)
We can physically order the records of a file on disk based on the values of one of
their fields—called the ordering field. This leads to an ordered or sequential file.7
If the ordering field is also a key field of the file—a field guaranteed to have a
unique value in each record—then the field is called the ordering key for the file.
Figure 7 shows an ordered file with Name as the ordering key field (assuming that
employees have distinct names).
Ordered records have some advantages over unordered files. First, reading the records
in order of the ordering key values becomes extremely efficient because no sorting is
required. Second, finding the next record from the current one in order of the order-
ing key usually requires no additional block accesses because the next record is in the
same block as the current one (unless the current record is the last one in the block).
Third, using a search condition based on the value of an ordering key field results in
faster access when the binary search technique is used, which constitutes an improve-
ment over linear searches, although it is not often used for disk files. Ordered files are
blocked and stored on contiguous cylinders to minimize the seek time.
A binary search for disk files can be done on the blocks rather than on the records.
Suppose that the file has b blocks numbered 1, 2, …, b; the records are ordered by
ascending value of their ordering key field; and we are searching for a record whose
ordering key field value is K. Assuming that disk addresses of the file blocks are avail-
able in the file header, the binary search can be described by Algorithm 1. A binary
search usually accesses log2(b) blocks, whether the record is found or not—an
improvement over linear searches, where, on the average, (b/2) blocks are accessed
when the record is found and b blocks are accessed when the record is not found.
Algorithm 1. Binary Search on an Ordering Key of a Disk File
l ← 1; u ← b; (* b is the number of file blocks *)
while (u ≥ l ) do
begin i ← (l + u) div 2;
read block i of the file into the buffer;
if K < (ordering key field value of the first record in block i )
then u ← i – 1
else if K > (ordering key field value of the last record in block i )
then l ← i + 1
else if the record with ordering key field value = K is in the buffer
then goto found
else goto notfound;
end;
goto notfound;
A search criterion involving the conditions >, <, ≥, and ≤ on the ordering field
is quite efficient, since the physical ordering of records means that all records
7The term sequential file has also been used to refer to unordered files, although it is more appropriate
for ordered files.
608
Name
Aaron, Ed
Abbott, Diane
Block 1
Acosta, Marc
Ssn Birth_date
...
Job Salary Sex
...
Adams, John
Adams, Robin
Block 2
Akers, Jan
...
Alexander, Ed
Alfred, Bob
Block 3
Allen, Sam
...
Allen, Troy
Anders, Keith
Block 4
Anderson, Rob
...
Anderson, Zach
Angeli, Joe
Block 5
Archer, Sue
...
Arnold, Mack
Arnold, Steven
Block 6
Atkins, Timothy
Wong, James
Wood, Donald
Block n–1
Woods, Manny
...
Wright, Pam
Wyatt, Charles
Block n
Zimmer, Byron
...
Figure 7
Some blocks of an ordered
(sequential) file of EMPLOYEE
records with Name as the
ordering key field.
Disk Storage, Basic File Structures, and Hashing
satisfying the condition are contiguous in the file. For example, referring to Figure 7,
if the search criterion is (Name < ‘G’)—where < means alphabetically before—the
records satisfying the search criterion are those from the beginning of the file up to
the first record that has a Name value starting with the letter ‘G’.
609
Ordering does not provide any advantages for random or ordered access of the
records based on values of the other nonordering fields of the file. In these cases, we
do a linear search for random access. To access the records in order based on a
nonordering field, it is necessary to create another sorted copy—in a different
order—of the file.
Inserting and deleting records are expensive operations for an ordered file because
the records must remain physically ordered. To insert a record, we must find its cor-
rect position in the file, based on its ordering field value, and then make space in the
file to insert the record in that position. For a large file this can be very time-
consuming because, on the average, half the records of the file must be moved to
make space for the new record. This means that half the file blocks must be read and
rewritten after records are moved among them. For record deletion, the problem is
less severe if deletion markers and periodic reorganization are used.
One option for making insertion more efficient is to keep some unused space in each
block for new records. However, once this space is used up, the original problem
resurfaces. Another frequently used method is to create a temporary unordered file
called an overflow or transaction file. With this technique, the actual ordered file is
called the main or master file. New records are inserted at the end of the overflow file
rather than in their correct position in the main file. Periodically, the overflow file is
sorted and merged with the master file during file reorganization. Insertion becomes
very efficient, but at the cost of increased complexity in the search algorithm. The
overflow file must be searched using a linear search if, after the binary search, the
record is not found in the main file. For applications that do not require the most up-
to-date information, overflow records can be ignored during a search.
Modifying a field value of a record depends on two factors: the search condition to
locate the record and the field to be modified. If the search condition involves the
ordering key field, we can locate the record using a binary search; otherwise we must
do a linear search. A nonordering field can be modified by changing the record and
rewriting it in the same physical location on disk—assuming fixed-length records.
Modifying the ordering field means that the record can change its position in the
file. This requires deletion of the old record followed by insertion of the modified
record.
Reading the file records in order of the ordering field is quite efficient if we ignore
the records in overflow, since the blocks can be read consecutively using double
buffering. To include the records in overflow, we must merge them in their correct
positions; in this case, first we can reorganize the file, and then read its blocks
sequentially. To reorganize the file, first we sort the records in the overflow file, and
then merge them with the master file. The records marked for deletion are removed
during the reorganization.
Table 2 summarizes the average access time in block accesses to find a specific
record in a file with b blocks.
Ordered files are rarely used in database applications unless an additional access
path, called a primary index, is used; this results in an indexed-sequential file. This
Disk Storage, Basic File Structures, and Hashing
610
Disk Storage, Basic File Structures, and Hashing
Table 2 Average Access Times for a File of b Blocks under Basic File Organizations
Average Blocks to Access
Type of Organization Access/Search Method a Specific Record
Heap (unordered) Sequential scan (linear search) b/2
Ordered Sequential scan b/2
Ordered Binary search log2 b
further improves the random access time on the ordering key field. If the ordering
attribute is not a key, the file is called a clustered file.
8 Hashing Techniques
Another type of primary file organization is based on hashing, which provides very
fast access to records under certain search conditions. This organization is usually
called a hash file.8 The search condition must be an equality condition on a single
field, called the hash field. In most cases, the hash field is also a key field of the file,
in which case it is called the hash key. The idea behind hashing is to provide a func-
tion h, called a hash function or randomizing function, which is applied to the
hash field value of a record and yields the address of the disk block in which the
record is stored. A search for the record within the block can be carried out in a
main memory buffer. For most records, we need only a single-block access to
retrieve that record.
Hashing is also used as an internal search structure within a program whenever a
group of records is accessed exclusively by using the value of one field. We describe
the use of hashing for internal files in Section 8.1; then we show how it is modified
to store external files on disk in Section 8.2. In Section 8.3 we discuss techniques for
extending hashing to dynamically growing files.
8.1 Internal Hashing
For internal files, hashing is typically implemented as a hash table through the use
of an array of records. Suppose that the array index range is from 0 to M – 1, as
shown in Figure 8(a); then we have M slots whose addresses correspond to the array
indexes. We choose a hash function that transforms the hash field value into an inte-
ger between 0 and M − 1. One common hash function is the h(K) = K mod M func-
tion, which returns the remainder of an integer hash field value K after division by
M; this value is then used for the record address.
8A hash file has also been called a direct file.
611
Disk Storage, Basic File Structures, and Hashing
Noninteger hash field values can be transformed into integers before the mod
function is applied. For character strings, the numeric (ASCII) codes associated
with characters can be used in the transformation—for example, by multiplying
those code values. For a hash field whose data type is a string of 20 characters,
Algorithm 2(a) can be used to calculate the hash address. We assume that the code
function returns the numeric code of a character and that we are given a hash field
value K of type K: array [1..20] of char (in Pascal) or char K[20] (in C).
(a)
–1
–1
–1
M + 2
M
0
1
2
3
M – 2
M – 1
Data fields Overflow pointer
Address space
Overflow space
M + 1
M + 5
–1
M + 4
–1
M + 0 – 2
M + 0 – 1
null pointer = –1
overflow pointer refers to position of next record in linked list
M – 2
M
M + 1
M + 2
M – 1
Name Ssn Job Salary
(b) 0
1
2
3
4
...
Figure 8
Internal hashing data structures. (a) Array
of M positions for use in internal hashing.
(b) Collision resolution by chaining records.
612
Disk Storage, Basic File Structures, and Hashing
Algorithm 2. Two simple hashing algorithms: (a) Applying the mod hash
function to a character string K. (b) Collision resolution by open addressing.
(a) temp ← 1;
for i ← 1 to 20 do temp ← temp * code(K[i ] ) mod M ;
hash_address ← temp mod M;
(b) i ← hash_address(K); a ← i;
if location i is occupied
then begin i ← (i + 1) mod M;
while (i ≠ a) and location i is occupied
do i ← (i + 1) mod M;
if (i = a) then all positions are full
else new_hash_address ← i;
end;
Other hashing functions can be used. One technique, called folding, involves apply-
ing an arithmetic function such as addition or a logical function such as exclusive or
to different portions of the hash field value to calculate the hash address (for exam-
ple, with an address space from 0 to 999 to store 1,000 keys, a 6-digit key 235469
may be folded and stored at the address: (235+964) mod 1000 = 199). Another tech-
nique involves picking some digits of the hash field value—for instance, the third,
fifth, and eighth digits—to form the hash address (for example, storing 1,000
employees with Social Security numbers of 10 digits into a hash file with 1,000 posi-
tions would give the Social Security number 301-67-8923 a hash value of 172 by this
hash function).9 The problem with most hashing functions is that they do not guar-
antee that distinct values will hash to distinct addresses, because the hash field
space—the number of possible values a hash field can take—is usually much larger
than the address space—the number of available addresses for records. The hashing
function maps the hash field space to the address space.
A collision occurs when the hash field value of a record that is being inserted hashes
to an address that already contains a different record. In this situation, we must
insert the new record in some other position, since its hash address is occupied. The
process of finding another position is called collision resolution. There are numer-
ous methods for collision resolution, including the following:
■ Open addressing. Proceeding from the occupied position specified by the
hash address, the program checks the subsequent positions in order until an
unused (empty) position is found. Algorithm 2(b) may be used for this pur-
pose.
■ Chaining. For this method, various overflow locations are kept, usually by
extending the array with a number of overflow positions. Additionally, a
pointer field is added to each record location. A collision is resolved by plac-
ing the new record in an unused overflow location and setting the pointer of
the occupied hash address location to the address of that overflow location.
9A detailed discussion of hashing functions is outside the scope of our presentation.
613
Disk Storage, Basic File Structures, and Hashing
A linked list of overflow records for each hash address is thus maintained, as
shown in Figure 8(b).
■ Multiple hashing. The program applies a second hash function if the first
results in a collision. If another collision results, the program uses open
addressing or applies a third hash function and then uses open addressing if
necessary.
Each collision resolution method requires its own algorithms for insertion,
retrieval, and deletion of records. The algorithms for chaining are the simplest.
Deletion algorithms for open addressing are rather tricky. Data structures textbooks
discuss internal hashing algorithms in more detail.
The goal of a good hashing function is to distribute the records uniformly over the
address space so as to minimize collisions while not leaving many unused locations.
Simulation and analysis studies have shown that it is usually best to keep a hash
table between 70 and 90 percent full so that the number of collisions remains low
and we do not waste too much space. Hence, if we expect to have r records to store
in the table, we should choose M locations for the address space such that (r/M) is
between 0.7 and 0.9. It may also be useful to choose a prime number for M, since it
has been demonstrated that this distributes the hash addresses better over the
address space when the mod hashing function is used. Other hash functions may
require M to be a power of 2.
8.2 External Hashing for Disk Files
Hashing for disk files is called external hashing. To suit the characteristics of disk
storage, the target address space is made of buckets, each of which holds multiple
records. A bucket is either one disk block or a cluster of contiguous disk blocks. The
hashing function maps a key into a relative bucket number, rather than assigning an
absolute block address to the bucket. A table maintained in the file header converts
the bucket number into the corresponding disk block address, as illustrated in
Figure 9.
The collision problem is less severe with buckets, because as many records as will fit
in a bucket can hash to the same bucket without causing problems. However, we
must make provisions for the case where a bucket is filled to capacity and a new
record being inserted hashes to that bucket. We can use a variation of chaining in
which a pointer is maintained in each bucket to a linked list of overflow records for
the bucket, as shown in Figure 10. The pointers in the linked list should be record
pointers, which include both a block address and a relative record position within
the block.
Hashing provides the fastest possible access for retrieving an arbitrary record given
the value of its hash field. Although most good hash functions do not maintain
records in order of hash field values, some functions—called order preserving—
do. A simple example of an order preserving hash function is to take the leftmost
three digits of an invoice number field that yields a bucket address as the hash
address and keep the records sorted by invoice number within each bucket. Another
614
Disk Storage, Basic File Structures, and Hashing
0
1
2
M – 2
M – 1
Bucket
Number Block address on disk
Figure 9
Matching bucket numbers to disk
block addresses.
example is to use an integer hash key directly as an index to a relative file, if the hash
key values fill up a particular interval; for example, if employee numbers in a com-
pany are assigned as 1, 2, 3, ... up to the total number of employees, we can use the
identity hash function that maintains order. Unfortunately, this only works if keys
are generated in order by some application.
The hashing scheme described so far is called static hashing because a fixed number
of buckets M is allocated. This can be a serious drawback for dynamic files. Suppose
that we allocate M buckets for the address space and let m be the maximum number
of records that can fit in one bucket; then at most (m * M) records will fit in the allo-
cated space. If the number of records turns out to be substantially fewer than
(m * M), we are left with a lot of unused space. On the other hand, if the number of
records increases to substantially more than (m * M), numerous collisions will
result and retrieval will be slowed down because of the long lists of overflow
records. In either case, we may have to change the number of blocks M allocated and
then use a new hashing function (based on the new value of M) to redistribute the
records. These reorganizations can be quite time-consuming for large files. Newer
dynamic file organizations based on hashing allow the number of buckets to vary
dynamically with only localized reorganization (see Section 8.3).
When using external hashing, searching for a record given a value of some field
other than the hash field is as expensive as in the case of an unordered file. Record
deletion can be implemented by removing the record from its bucket. If the bucket
has an overflow chain, we can move one of the overflow records into the bucket to
replace the deleted record. If the record to be deleted is already in overflow, we sim-
ply remove it from the linked list. Notice that removing an overflow record implies
that we should keep track of empty positions in overflow. This is done easily by
maintaining a linked list of unused overflow locations.
615
Disk Storage, Basic File Structures, and Hashing
Bucket 0
Main buckets
Overflow buckets
340
460
Record pointer
NULL
NULL
NULL
Bucket 1 321
761
91
Record pointer
981
182
Record pointer
(Pointers are to records within the overflow blocks)
Record pointer
Record pointer
652 Record pointer
Record pointer
Record pointer
Bucket 2 22
72
522
Record pointer
Bucket 9 399
89
Record pointer
NULL
...
Figure 10
Handling overflow for buckets
by chaining.
Modifying a specific record’s field value depends on two factors: the search condi-
tion to locate that specific record and the field to be modified. If the search condi-
tion is an equality comparison on the hash field, we can locate the record efficiently
by using the hashing function; otherwise, we must do a linear search. A nonhash
field can be modified by changing the record and rewriting it in the same bucket.
Modifying the hash field means that the record can move to another bucket, which
requires deletion of the old record followed by insertion of the modified record.
8.3 Hashing Techniques That Allow Dynamic File Expansion
A major drawback of the static hashing scheme just discussed is that the hash
address space is fixed. Hence, it is difficult to expand or shrink the file dynamically.
The schemes described in this section attempt to remedy this situation. The first
scheme—extendible hashing—stores an access structure in addition to the file, and
616
Disk Storage, Basic File Structures, and Hashing
hence is somewhat similar to indexing. The main difference is that the access struc-
ture is based on the values that result after application of the hash function to the
search field. In indexing, the access structure is based on the values of the search
field itself. The second technique, called linear hashing, does not require additional
access structures. Another scheme, called dynamic hashing, uses an access structure
based on binary tree data structures..
These hashing schemes take advantage of the fact that the result of applying a hash-
ing function is a nonnegative integer and hence can be represented as a binary num-
ber. The access structure is built on the binary representation of the hashing
function result, which is a string of bits. We call this the hash value of a record.
Records are distributed among buckets based on the values of the leading bits in
their hash values.
Extendible Hashing. In extendible hashing, a type of directory—an array of 2d
bucket addresses—is maintained, where d is called the global depth of the direc-
tory. The integer value corresponding to the first (high-order) d bits of a hash value
is used as an index to the array to determine a directory entry, and the address in
that entry determines the bucket in which the corresponding records are stored.
However, there does not have to be a distinct bucket for each of the 2d directory
locations. Several directory locations with the same first d� bits for their hash values
may contain the same bucket address if all the records that hash to these locations fit
in a single bucket. A local depth d��—stored with each bucket—specifies the number
of bits on which the bucket contents are based. Figure 11 shows a directory with
global depth d = 3.
The value of d can be increased or decreased by one at a time, thus doubling or halv-
ing the number of entries in the directory array. Doubling is needed if a bucket,
whose local depth d� is equal to the global depth d, overflows. Halving occurs if d >
d� for all the buckets after some deletions occur. Most record retrievals require two
block accesses—one to the directory and the other to the bucket.
To illustrate bucket splitting, suppose that a new inserted record causes overflow in
the bucket whose hash values start with 01—the third bucket in Figure 11. The
records will be distributed between two buckets: the first contains all records whose
hash values start with 010, and the second all those whose hash values start with
011. Now the two directory locations for 010 and 011 point to the two new distinct
buckets. Before the split, they pointed to the same bucket. The local depth d� of the
two new buckets is 3, which is one more than the local depth of the old bucket.
If a bucket that overflows and is split used to have a local depth d�equal to the global
depth d of the directory, then the size of the directory must now be doubled so that
we can use an extra bit to distinguish the two new buckets. For example, if the
bucket for records whose hash values start with 111 in Figure 11 overflows, the two
new buckets need a directory with global depth d = 4, because the two buckets are
now labeled 1110 and 1111, and hence their local depths are both 4. The directory
size is hence doubled, and each of the other original locations in the directory is also
617
Disk Storage, Basic File Structures, and Hashing
Global depth
d = 3
000
001
010
011
100
101
110
111
d´ = 3 Bucket for records
whose hash values
start with 000
Directory Data file buckets
Local depth of
each bucket
d´ = 3 Bucket for records
whose hash values
start with 001
d´ = 2 Bucket for records
whose hash values
start with 01
d´ = 2 Bucket for records
whose hash values
start with 10
d´ = 3 Bucket for records
whose hash values
start with 110
d´ = 3 Bucket for records
whose hash values
start with 111
Figure 11
Structure of the
extendible hashing
scheme.
split into two locations, both of which have the same pointer value as did the origi-
nal location.
The main advantage of extendible hashing that makes it attractive is that the per-
formance of the file does not degrade as the file grows, as opposed to static external
hashing where collisions increase and the corresponding chaining effectively
618
Disk Storage, Basic File Structures, and Hashing
increases the average number of accesses per key. Additionally, no space is allocated
in extendible hashing for future growth, but additional buckets can be allocated
dynamically as needed. The space overhead for the directory table is negligible. The
maximum directory size is 2k, where k is the number of bits in the hash value.
Another advantage is that splitting causes minor reorganization in most cases, since
only the records in one bucket are redistributed to the two new buckets. The only
time reorganization is more expensive is when the directory has to be doubled (or
halved). A disadvantage is that the directory must be searched before accessing the
buckets themselves, resulting in two block accesses instead of one in static hashing.
This performance penalty is considered minor and thus the scheme is considered
quite desirable for dynamic files.
Dynamic Hashing. A precursor to extendible hashing was dynamic hashing, in
which the addresses of the buckets were either the n high-order bits or n − 1 high-
order bits, depending on the total number of keys belonging to the respective
bucket. The eventual storage of records in buckets for dynamic hashing is somewhat
similar to extendible hashing. The major difference is in the organization of the
directory. Whereas extendible hashing uses the notion of global depth (high-order d
bits) for the flat directory and then combines adjacent collapsible buckets into a
bucket of local depth d − 1, dynamic hashing maintains a tree-structured directory
with two types of nodes:
■ Internal nodes that have two pointers—the left pointer corresponding to the
0 bit (in the hashed address) and a right pointer corresponding to the 1 bit.
■ Leaf nodes—these hold a pointer to the actual bucket with records.
An example of the dynamic hashing appears in Figure 12. Four buckets are shown
(“000”, “001”, “110”, and “111”) with high-order 3-bit addresses (corre-sponding to
the global depth of 3), and two buckets (“01” and “10” ) are shown with high-order
2-bit addresses (corresponding to the local depth of 2). The latter two are the result
of collapsing the “010” and “011” into “01” and collapsing “100” and “101” into “10”.
Note that the directory nodes are used implicitly to determine the “global” and
“local” depths of buckets in dynamic hashing. The search for a record given the
hashed address involves traversing the directory tree, which leads to the bucket
holding that record. It is left to the reader to develop algorithms for insertion, dele-
tion, and searching of records for the dynamic hashing scheme.
Linear Hashing. The idea behind linear hashing is to allow a hash file to expand
and shrink its number of buckets dynamically without needing a directory. Suppose
that the file starts with M buckets numbered 0, 1, …, M − 1 and uses the mod hash
function h(K) = K mod M; this hash function is called the initial hash function hi.
Overflow because of collisions is still needed and can be handled by maintaining
individual overflow chains for each bucket. However, when a collision leads to an
overflow record in any file bucket, the first bucket in the file—bucket 0—is split into
two buckets: the original bucket 0 and a new bucket M at the end of the file. The
records originally in bucket 0 are distributed between the two buckets based on a
different hashing function hi+1(K) = K mod 2M. A key property of the two hash
619
Disk Storage, Basic File Structures, and Hashing
Data File Buckets
Bucket for records
whose hash values
start with 000
Bucket for records
whose hash values
start with 001
Bucket for records
whose hash values
start with 01
Bucket for records
whose hash values
start with 10
Bucket for records
whose hash values
start with 110
Bucket for records
whose hash values
start with 111
Directory
0
1
0
1
0
1
0
1
0
1
internal directory node
leaf directory node
Figure 12
Structure of the dynamic hashing scheme.
functions hi and hi+1 is that any records that hashed to bucket 0 based on hi will hash
to either bucket 0 or bucket M based on hi+1; this is necessary for linear hashing to
work.
As further collisions lead to overflow records, additional buckets are split in the
linear order 1, 2, 3, …. If enough overflows occur, all the original file buckets 0, 1, …,
M − 1 will have been split, so the file now has 2M instead of M buckets, and all buck-
ets use the hash function hi+1. Hence, the records in overflow are eventually redis-
tributed into regular buckets, using the function hi+1 via a delayed split of their
buckets. There is no directory; only a value n—which is initially set to 0 and is incre-
mented by 1 whenever a split occurs—is needed to determine which buckets have
been split. To retrieve a record with hash key value K, first apply the function hi to K;
if hi(K) < n, then apply the function hi+1 on K because the bucket is already split.
Initially, n = 0, indicating that the function hi applies to all buckets; n grows linearly
as buckets are split.
620
Disk Storage, Basic File Structures, and Hashing
When n = M after being incremented, this signifies that all the original buckets have
been split and the hash function hi+1 applies to all records in the file. At this point, n
is reset to 0 (zero), and any new collisions that cause overflow lead to the use of a
new hashing function hi+2(K) = K mod 4M. In general, a sequence of hashing func-
tions hi+j(K) = K mod (2
jM) is used, where j = 0, 1, 2, ...; a new hashing function
hi+j+1 is needed whenever all the buckets 0, 1, ..., (2
jM) − 1 have been split and n is
reset to 0. The search for a record with hash key value K is given by Algorithm 3.
Splitting can be controlled by monitoring the file load factor instead of by splitting
whenever an overflow occurs. In general, the file load factor l can be defined as l =
r/(bfr * N), where r is the current number of file records, bfr is the maximum num-
ber of records that can fit in a bucket, and N is the current number of file buckets.
Buckets that have been split can also be recombined if the load factor of the file falls
below a certain threshold. Blocks are combined linearly, and N is decremented
appropriately. The file load can be used to trigger both splits and combinations; in
this manner the file load can be kept within a desired range. Splits can be triggered
when the load exceeds a certain threshold—say, 0.9—and combinations can be trig-
gered when the load falls below another threshold—say, 0.7. The main advantages
of linear hashing are that it maintains the load factor fairly constantly while the file
grows and shrinks, and it does not require a directory.10
Algorithm 3. The Search Procedure for Linear Hashing
if n = 0
then m ← hj (K) (* m is the hash value of record with hash key K *)
else begin
m ← hj (K);
if m < n then m ← hj+1 (K)
end;
search the bucket whose hash value is m (and its overflow, if any);
9 Other Primary File Organizations
9.1 Files of Mixed Records
The file organizations we have studied so far assume that all records of a particular
file are of the same record type. The records could be of EMPLOYEEs, PROJECTs,
STUDENTs, or DEPARTMENTs, but each file contains records of only one type. In
most database applications, we encounter situations in which numerous types of
entities are interrelated in various ways. Relationships among records in various
files can be represented by connecting fields.11 For example, a STUDENT record can
have a connecting field Major_dept whose value gives the name of the DEPARTMENT
10For details of insertion and deletion into Linear hashed files, refer to Litwin (1980) and Salzberg
(1988).
11The concept of foreign keys in the relational data model and references among objects in
object-oriented models are examples of connecting fields.
621
Disk Storage, Basic File Structures, and Hashing
in which the student is majoring. This Major_dept field refers to a DEPARTMENT
entity, which should be represented by a record of its own in the DEPARTMENT file.
If we want to retrieve field values from two related records, we must retrieve one of
the records first. Then we can use its connecting field value to retrieve the related
record in the other file. Hence, relationships are implemented by logical field refer-
ences among the records in distinct files.
File organizations in object DBMSs, as well as legacy systems such as hierarchical
and network DBMSs, often implement relationships among records as physical
relationships realized by physical contiguity (or clustering) of related records or by
physical pointers. These file organizations typically assign an area of the disk to
hold records of more than one type so that records of different types can be
physically clustered on disk. If a particular relationship is expected to be used fre-
quently, implementing the relationship physically can increase the system’s effi-
ciency at retrieving related records. For example, if the query to retrieve a
DEPARTMENT record and all records for STUDENTs majoring in that department is
frequent, it would be desirable to place each DEPARTMENT record and its cluster of
STUDENT records contiguously on disk in a mixed file. The concept of physical
clustering of object types is used in object DBMSs to store related objects together
in a mixed file.
To distinguish the records in a mixed file, each record has—in addition to its field
values—a record type field, which specifies the type of record. This is typically the
first field in each record and is used by the system software to determine the type of
record it is about to process. Using the catalog information, the DBMS can deter-
mine the fields of that record type and their sizes, in order to interpret the data val-
ues in the record.
9.2 B-Trees and Other Data Structures
as Primary Organization
Other data structures can be used for primary file organizations. For example, if both
the record size and the number of records in a file are small, some DBMSs offer the
option of a B-tree data structure as the primary file organization. In general, any data
structure that can be adapted to the characteristics of disk devices can be used as a
primary file organization for record placement on disk. Recently, column-based stor-
age of data has been proposed as a primary method for storage of relations in rela-
tional databases.
10 Parallelizing Disk Access
Using RAID Technology
With the exponential growth in the performance and capacity of semiconductor
devices and memories, faster microprocessors with larger and larger primary mem-
ories are continually becoming available. To match this growth, it is natural to
622
Disk Storage, Basic File Structures, and Hashing
(a) Disk 0
A0 | A4
B0 | B4
Disk 1
A1 | A5
B1 | B5
Disk 2
A2 | A6
B2 | B6
Disk 3
A3 | A7
B3 | B7
Disk 0
A1
Disk 1
A2
Disk 2
A3
Disk 3
A4
A0 | A1 | A2 | A3 | A4 | A5 | A6 | A7
B0 | B1 | B2 | B3 | B4 | B5 | B6 | B7
Data
Block A1
File A:
(b)
Block A2 Block A3 Block A4
Figure 13
Striping of data
across multiple disks.
(a) Bit-level striping
across four disks.
(b) Block-level
striping across four
disks.
expect that secondary storage technology must also take steps to keep up with
processor technology in performance and reliability.
A major advance in secondary storage technology is represented by the develop-
ment of RAID, which originally stood for Redundant Arrays of Inexpensive Disks.
More recently, the I in RAID is said to stand for Independent. The RAID idea
received a very positive industry endorsement and has been developed into an elab-
orate set of alternative RAID architectures (RAID levels 0 through 6). We highlight
the main features of the technology in this section.
The main goal of RAID is to even out the widely different rates of performance
improvement of disks against those in memory and microprocessors.12 While RAM
capacities have quadrupled every two to three years, disk access times are improving
at less than 10 percent per year, and disk transfer rates are improving at roughly 20
percent per year. Disk capacities are indeed improving at more than 50 percent per
year, but the speed and access time improvements are of a much smaller magnitude.
A second qualitative disparity exists between the ability of special microprocessors
that cater to new applications involving video, audio, image, and spatial data pro-
cessing, with correspond-ing lack of fast access to large, shared data sets.
The natural solution is a large array of small independent disks acting as a single
higher-performance logical disk. A concept called data striping is used, which uti-
lizes parallelism to improve disk performance. Data striping distributes data trans-
parently over multiple disks to make them appear as a single large, fast disk. Figure
13 shows a file distributed or striped over four disks. Striping improves overall I/O
performance by allowing multiple I/Os to be serviced in parallel, thus providing
high overall transfer rates. Data striping also accomplishes load balancing among
disks. Moreover, by storing redundant information on disks using parity or some
other error-correction code, reliability can be improved. In Sections 10.1 and 10.2,
12This was predicted by Gordon Bell to be about 40 percent every year between 1974 and 1984 and is
now supposed to exceed 50 percent per year.
623
Disk Storage, Basic File Structures, and Hashing
we discuss how RAID achieves the two important objectives of improved reliability
and higher performance. Section 10.3 discusses RAID organizations and levels.
10.1 Improving Reliability with RAID
For an array of n disks, the likelihood of failure is n times as much as that for one
disk. Hence, if the MTBF (Mean Time Between Failures) of a disk drive is assumed to
be 200,000 hours or about 22.8 years (for the disk drive in Table 1 called Cheetah NS,
it is 1.4 million hours), the MTBF for a bank of 100 disk drives becomes only 2,000
hours or 83.3 days (for 1,000 Cheetah NS disks it would be 1,400 hours or 58.33
days). Keeping a single copy of data in such an array of disks will cause a significant
loss of reliability. An obvious solution is to employ redundancy of data so that disk
failures can be tolerated. The disadvantages are many: additional I/O operations for
write, extra computation to maintain redundancy and to do recovery from errors,
and additional disk capacity to store redundant information.
One technique for introducing redundancy is called mirroring or shadowing. Data
is written redundantly to two identical physical disks that are treated as one logical
disk. When data is read, it can be retrieved from the disk with shorter queuing, seek,
and rotational delays. If a disk fails, the other disk is used until the first is repaired.
Suppose the mean time to repair is 24 hours, then the mean time to data loss of a
mirrored disk system using 100 disks with MTBF of 200,000 hours each is
(200,000)2/(2 * 24) = 8.33 * 10
8 hours, which is 95,028 years.13 Disk mirroring also
doubles the rate at which read requests are handled, since a read can go to either disk.
The transfer rate of each read, however, remains the same as that for a single disk.
Another solution to the problem of reliability is to store extra information that is not
normally needed but that can be used to reconstruct the lost information in case of
disk failure. The incorporation of redundancy must consider two problems: selecting
a technique for computing the redundant information, and selecting a method of
distributing the redundant information across the disk array. The first problem is
addressed by using error-correcting codes involving parity bits, or specialized codes
such as Hamming codes. Under the parity scheme, a redundant disk may be consid-
ered as having the sum of all the data in the other disks. When a disk fails, the miss-
ing information can be constructed by a process similar to subtraction.
For the second problem, the two major approaches are either to store the redundant
information on a small number of disks or to distribute it uniformly across all disks.
The latter results in better load balancing. The different levels of RAID choose a
combination of these options to implement redundancy and improve reliability.
10.2 Improving Performance with RAID
The disk arrays employ the technique of data striping to achieve higher transfer rates.
Note that data can be read or written only one block at a time, so a typical transfer
contains 512 to 8192 bytes. Disk striping may be applied at a finer granularity by
13The formulas for MTBF calculations appear in Chen et al. (1994).
624
Disk Storage, Basic File Structures, and Hashing
breaking up a byte of data into bits and spreading the bits to different disks. Thus,
bit-level data striping consists of splitting a byte of data and writing bit j to the jth
disk. With 8-bit bytes, eight physical disks may be considered as one logical disk with
an eightfold increase in the data transfer rate. Each disk participates in each I/O
request and the total amount of data read per request is eight times as much. Bit-level
striping can be generalized to a number of disks that is either a multiple or a factor of
eight. Thus, in a four-disk array, bit n goes to the disk which is (n mod 4). Figure
13(a) shows bit-level striping of data.
The granularity of data interleaving can be higher than a bit; for example, blocks of
a file can be striped across disks, giving rise to block-level striping. Figure 13(b)
shows block-level data striping assuming the data file contains four blocks. With
block-level striping, multiple independent requests that access single blocks (small
requests) can be serviced in parallel by separate disks, thus decreasing the queuing
time of I/O requests. Requests that access multiple blocks (large requests) can be
parallelized, thus reducing their response time. In general, the more the number of
disks in an array, the larger the potential performance benefit. However, assuming
independent failures, the disk array of 100 disks collectively has 1/100th the reliabil-
ity of a single disk. Thus, redundancy via error-correcting codes and disk mirroring
is necessary to provide reliability along with high performance.
10.3 RAID Organizations and Levels
Different RAID organizations were defined based on different combinations of the
two factors of granularity of data interleaving (striping) and pattern used to com-
pute redundant information. In the initial proposal, levels 1 through 5 of RAID
were proposed, and two additional levels—0 and 6—were added later.
RAID level 0 uses data striping, has no redundant data, and hence has the best write
performance since updates do not have to be duplicated. It splits data evenly across
two or more disks. However, its read performance is not as good as RAID level 1,
which uses mirrored disks. In the latter, performance improvement is possible by
scheduling a read request to the disk with shortest expected seek and rotational
delay. RAID level 2 uses memory-style redundancy by using Hamming codes, which
contain parity bits for distinct overlapping subsets of components. Thus, in one
particular version of this level, three redundant disks suffice for four original disks,
whereas with mirroring—as in level 1—four would be required. Level 2 includes
both error detection and correction, although detection is generally not required
because broken disks identify themselves.
RAID level 3 uses a single parity disk relying on the disk controller to figure out
which disk has failed. Levels 4 and 5 use block-level data striping, with level 5 dis-
tributing data and parity information across all disks. Figure 14(b) shows an illus-
tration of RAID level 5, where parity is shown with subscript p. If one disk fails, the
missing data is calculated based on the parity available from the remaining disks.
Finally, RAID level 6 applies the so-called P + Q redundancy scheme using Reed-
Soloman codes to protect against up to two disk failures by using just two redun-
dant disks.
625
Disk Storage, Basic File Structures, and Hashing
Disk 0 Disk 1
A1
B1
C1
Dp
A2
B2
Cp
D1
A3
Bp
C2
D2
Ap
B3
C3
D3
(a)
(b)
File A
File B
File C
File D
File A
File B
File C
File D
Figure 14
Some popular levels of RAID.
(a) RAID level 1: Mirroring of
data on two disks. (b) RAID
level 5: Striping of data with
distributed parity across four
disks.
Rebuilding in case of disk failure is easiest for RAID level 1. Other levels require the
reconstruction of a failed disk by reading multiple disks. Level 1 is used for critical
applications such as storing logs of transactions. Levels 3 and 5 are preferred for
large volume storage, with level 3 providing higher transfer rates. Most popular use
of RAID technology currently uses level 0 (with striping), level 1 (with mirroring),
and level 5 with an extra drive for parity. A combination of multiple RAID levels are
also used – for example, 0+1 combines striping and mirroring using a minimum of
four disks. Other nonstandard RAID levels include: RAID 1.5, RAID 7, RAID-DP,
RAID S or Parity RAID, Matrix RAID, RAID-K, RAID-Z, RAIDn, Linux MD RAID
10, IBM ServeRAID 1E, and unRAID. A discussion of these nonstandard levels is
beyond the scope of this text. Designers of a RAID setup for a given application mix
have to confront many design decisions such as the level of RAID, the number of
disks, the choice of parity schemes, and grouping of disks for block-level striping.
Detailed performance studies on small reads and writes (referring to I/O requests
for one striping unit) and large reads and writes (referring to I/O requests for one
stripe unit from each disk in an error-correction group) have been performed.
11 New Storage Systems
In this section, we describe three recent developments in storage systems that are
becoming an integral part of most enterprise’s information system architectures.
11.1 Storage Area Networks
With the rapid growth of electronic commerce, Enterprise Resource Planning
(ERP) systems that integrate application data across organizations, and data ware-
houses that keep historical aggregate information, the demand for storage has gone
up substantially. For today’s Internet-driven organizations, it has become necessary
626
Disk Storage, Basic File Structures, and Hashing
to move from a static fixed data center-oriented operation to a more flexible and
dynamic infrastructure for their information processing requirements. The total
cost of managing all data is growing so rapidly that in many instances the cost of
managing server-attached storage exceeds the cost of the server itself. Furthermore,
the procurement cost of storage is only a small fraction—typically, only 10 to 15
percent of the overall cost of storage management. Many users of RAID systems
cannot use the capacity effectively because it has to be attached in a fixed manner to
one or more servers. Therefore, most large organizations have moved to a concept
called storage area networks (SANs). In a SAN, online storage peripherals are con-
figured as nodes on a high-speed network and can be attached and detached from
servers in a very flexible manner. Several companies have emerged as SAN providers
and supply their own proprietary topologies. They allow storage systems to be
placed at longer distances from the servers and provide different performance and
connectivity options. Existing storage management applications can be ported into
SAN configurations using Fiber Channel networks that encapsulate the legacy SCSI
protocol. As a result, the SAN-attached devices appear as SCSI devices.
Current architectural alternatives for SAN include the following: point-to-point
connections between servers and storage systems via fiber channel; use of a fiber
channel switch to connect multiple RAID systems, tape libraries, and so on to
servers; and the use of fiber channel hubs and switches to connect servers and stor-
age systems in different configurations. Organizations can slowly move up from
simpler topologies to more complex ones by adding servers and storage devices as
needed. We do not provide further details here because they vary among SAN ven-
dors. The main advantages claimed include:
■ Flexible many-to-many connectivity among servers and storage devices
using fiber channel hubs and switches
■ Up to 10 km separation between a server and a storage system using appro-
priate fiber optic cables
■ Better isolation capabilities allowing nondisruptive addition of new periph-
erals and servers
SANs are growing very rapidly, but are still faced with many problems, such as com-
bining storage options from multiple vendors and dealing with evolving standards
of storage management software and hardware. Most major companies are evaluat-
ing SANs as a viable option for database storage.
11.2 Network-Attached Storage
With the phenomenal growth in digital data, particularly generated from multime-
dia and other enterprise applications, the need for high-performance storage solu-
tions at low cost has become extremely important. Network-attached storage
(NAS) devices are among the storage devices being used for this purpose. These
devices are, in fact, servers that do not provide any of the common server services,
but simply allow the addition of storage for file sharing. NAS devices allow vast
627
Disk Storage, Basic File Structures, and Hashing
amounts of hard-disk storage space to be added to a network and can make that
space available to multiple servers without shutting them down for maintenance
and upgrades. NAS devices can reside anywhere on a local area network (LAN) and
may be combined in different configurations. A single hardware device, often called
the NAS box or NAS head, acts as the interface between the NAS system and net-
work clients. These NAS devices require no monitor, keyboard, or mouse. One or
more disk or tape drives can be attached to many NAS systems to increase total
capacity. Clients connect to the NAS head rather than to the individual storage
devices. An NAS can store any data that appears in the form of files, such as e-mail
boxes, Web content, remote system backups, and so on. In that sense, NAS devices
are being deployed as a replacement for traditional file servers.
NAS systems strive for reliable operation and easy administration. They include
built-in features such as secure authentication, or the automatic sending of e-mail
alerts in case of error on the device. The NAS devices (or appliances, as some ven-
dors refer to them) are being offered with a high degree of scalability, reliability,
flexibility, and performance. Such devices typically support RAID levels 0, 1, and 5.
Traditional storage area networks (SANs) differ from NAS in several ways.
Specifically, SANs often utilize Fiber Channel rather than Ethernet, and a SAN often
incorporates multiple network devices or endpoints on a self-contained or private
LAN, whereas NAS relies on individual devices connected directly to the existing
public LAN. Whereas Windows, UNIX, and NetWare file servers each demand spe-
cific protocol support on the client side, NAS systems claim greater operating sys-
tem independence of clients.
11.3 iSCSI Storage Systems
A new protocol called iSCSI (Internet SCSI) has been proposed recently. It allows
clients (called initiators) to send SCSI commands to SCSI storage devices on remote
channels. The main advantage of iSCSI is that it does not require the special cabling
needed by Fiber Channel and it can run over longer distances using existing network
infrastructure. By carrying SCSI commands over IP networks, iSCSI facilitates data
transfers over intranets and manages storage over long distances. It can transfer data
over local area networks (LANs), wide area networks (WANs), or the Internet.
iSCSI works as follows. When a DBMS needs to access data, the operating system
generates the appropriate SCSI commands and data request, which then go through
encapsulation and, if necessary, encryption procedures. A packet header is added
before the resulting IP packets are transmitted over an Ethernet connection. When a
packet is received, it is decrypted (if it was encrypted before transmission) and dis-
assembled, separating the SCSI commands and request. The SCSI commands go via
the SCSI controller to the SCSI storage device. Because iSCSI is bidirectional, the
protocol can also be used to return data in response to the original request. Cisco
and IBM have marketed switches and routers based on this technology.
iSCSI storage has mainly impacted small- and medium-sized businesses because of
its combination of simplicity, low cost, and the functionality of iSCSI devices. It
allows them not to learn the ins and outs of Fiber Channel (FC) technology and
628
Disk Storage, Basic File Structures, and Hashing
instead benefit from their familiarity with the IP protocol and Ethernet hardware.
iSCSI implementations in the data centers of very large enterprise businesses are
slow in development due to their prior investment in Fiber Channel-based SANs.
iSCSI is one of two main approaches to storage data transmission over IP networks.
The other method, Fiber Channel over IP (FCIP), translates Fiber Channel control
codes and data into IP packets for transmission between geographically distant
Fiber Channel storage area networks. This protocol, known also as Fiber Channel
tunneling or storage tunneling, can only be used in conjunction with Fiber Channel
technology, whereas iSCSI can run over existing Ethernet networks.
The latest idea to enter the enterprise IP storage race is Fiber Channel over
Ethernet (FCoE), which can be thought of as iSCSI without the IP. It uses many ele-
ments of SCSI and FC (just like iSCSI), but it does not include TCP/IP components.
This promises excellent performance, especially on 10 Gigabit Ethernet (10GbE),
and is relatively easy for vendors to add to their products.
12 Summary
We began this chapter by discussing the characteristics of memory hierarchies and
then concentrated on secondary storage devices. In particular, we focused on mag-
netic disks because they are used most often to store online database files.
Data on disk is stored in blocks; accessing a disk block is expensive because of the
seek time, rotational delay, and block transfer time. To reduce the average block
access time, double buffering can be used when accessing consecutive disk blocks.
We presented different ways of storing file records on disk. File records are grouped
into disk blocks and can be fixed length or variable length, spanned or unspanned,
and of the same record type or mixed types. We discussed the file header, which
describes the record formats and keeps track of the disk addresses of the file blocks.
Information in the file header is used by system software accessing the file records.
Then we presented a set of typical commands for accessing individual file records
and discussed the concept of the current record of a file. We discussed how complex
record search conditions are transformed into simple search conditions that are
used to locate records in the file.
Three primary file organizations were then discussed: unordered, ordered, and
hashed. Unordered files require a linear search to locate records, but record inser-
tion is very simple. We discussed the deletion problem and the use of deletion
markers.
Ordered files shorten the time required to read records in order of the ordering field.
The time required to search for an arbitrary record, given the value of its ordering
key field, is also reduced if a binary search is used. However, maintaining the records
in order makes insertion very expensive; thus the technique of using an unordered
overflow file to reduce the cost of record insertion was discussed. Overflow records
are merged with the master file periodically during file reorganization.
629
Disk Storage, Basic File Structures, and Hashing
Hashing provides very fast access to an arbitrary record of a file, given the value of
its hash key. The most suitable method for external hashing is the bucket technique,
with one or more contiguous blocks corresponding to each bucket. Collisions caus-
ing bucket overflow are handled by chaining. Access on any nonhash field is slow,
and so is ordered access of the records on any field. We discussed three hashing tech-
niques for files that grow and shrink in the number of records dynamically:
extendible, dynamic, and linear hashing. The first two use the higher-order bits of
the hash address to organize a directory. Linear hashing is geared to keep the load
factor of the file within a given range and adds new buckets linearly.
We briefly discussed other possibilities for primary file organizations, such as B-
trees, and files of mixed records, which implement relationships among records of
different types physically as part of the storage structure. We reviewed the recent
advances in disk technology represented by RAID (Redundant Arrays of
Inexpensive (or Independent) Disks), which has become a standard technique in
large enterprises to provide better reliability and fault tolerance features in storage.
Finally, we reviewed three currently popular options in enterprise storage systems:
storage area networks (SANs), network-attached storage (NAS), and iSCSI storage
systems.
Review Questions
1. What is the difference between primary and secondary storage?
2. Why are disks, not tapes, used to store online database files?
3. Define the following terms: disk, disk pack, track, block, cylinder, sector,
interblock gap, read/write head.
4. Discuss the process of disk initialization.
5. Discuss the mechanism used to read data from or write data to the disk.
6. What are the components of a disk block address?
7. Why is accessing a disk block expensive? Discuss the time components
involved in accessing a disk block.
8. How does double buffering improve block access time?
9. What are the reasons for having variable-length records? What types of sep-
arator characters are needed for each?
10. Discuss the techniques for allocating file blocks on disk.
11. What is the difference between a file organization and an access method?
12. What is the difference between static and dynamic files?
13. What are the typical record-at-a-time operations for accessing a file? Which
of these depend on the current file record?
14. Discuss the techniques for record deletion.
630
Disk Storage, Basic File Structures, and Hashing
15. Discuss the advantages and disadvantages of using (a) an unordered file, (b)
an ordered file, and (c) a static hash file with buckets and chaining. Which
operations can be performed efficiently on each of these organizations, and
which operations are expensive?
16. Discuss the techniques for allowing a hash file to expand and shrink dynam-
ically. What are the advantages and disadvantages of each?
17. What is the difference between the directories of extendible and dynamic
hashing?
18. What are mixed files used for? What are other types of primary file organiza-
tions?
19. Describe the mismatch between processor and disk technologies.
20. What are the main goals of the RAID technology? How does it achieve them?
21. How does disk mirroring help improve reliability? Give a quantitative
example.
22. What characterizes the levels in RAID organization?
23. What are the highlights of the popular RAID levels 0, 1, and 5?
24. What are storage area networks? What flexibility and advantages do they
offer?
25. Describe the main features of network-attached storage as an enterprise
storage solution.
26. How have new iSCSI systems improved the applicability of storage area net-
works?
Exercises
27. Consider a disk with the following characteristics (these are not parameters
of any particular disk unit): block size B = 512 bytes; interblock gap size G =
128 bytes; number of blocks per track = 20; number of tracks per surface =
400. A disk pack consists of 15 double-sided disks.
a. What is the total capacity of a track, and what is its useful capacity
(excluding interblock gaps)?
b. How many cylinders are there?
c. What are the total capacity and the useful capacity of a cylinder?
d. What are the total capacity and the useful capacity of a disk pack?
e. Suppose that the disk drive rotates the disk pack at a speed of 2400 rpm
(revolutions per minute); what are the transfer rate (tr) in bytes/msec and
the block transfer time (btt) in msec? What is the average rotational delay
(rd) in msec? What is the bulk transfer rate?
f. Suppose that the average seek time is 30 msec. How much time does it
take (on the average) in msec to locate and transfer a single block, given
its block address?
631
Disk Storage, Basic File Structures, and Hashing
g. Calculate the average time it would take to transfer 20 random blocks,
and compare this with the time it would take to transfer 20 consecutive
blocks using double buffering to save seek time and rotational delay.
28. A file has r = 20,000 STUDENT records of fixed length. Each record has the
following fields: Name (30 bytes), Ssn (9 bytes), Address (40 bytes), PHONE
(10 bytes), Birth_date (8 bytes), Sex (1 byte), Major_dept_code (4 bytes),
Minor_dept_code (4 bytes), Class_code (4 bytes, integer), and Degree_program
(3 bytes). An additional byte is used as a deletion marker. The file is stored on
the disk whose parameters are given in Exercise 27.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assum-
ing an unspanned organization.
c. Calculate the average time it takes to find a record by doing a linear search
on the file if (i) the file blocks are stored contiguously, and double buffer-
ing is used; (ii) the file blocks are not stored contiguously.
d. Assume that the file is ordered by Ssn; by doing a binary search, calculate
the time it takes to search for a record given its Ssn value.
29. Suppose that only 80 percent of the STUDENT records from Exercise 28 have
a value for Phone, 85 percent for Major_dept_code, 15 percent for
Minor_dept_code, and 90 percent for Degree_program; and suppose that we
use a variable-length record file. Each record has a 1-byte field type for each
field in the record, plus the 1-byte deletion marker and a 1-byte end-of-
record marker. Suppose that we use a spanned record organization, where
each block has a 5-byte pointer to the next block (this space is not used for
record storage).
a. Calculate the average record length R in bytes.
b. Calculate the number of blocks needed for the file.
30. Suppose that a disk unit has the following parameters: seek time s = 20 msec;
rotational delay rd = 10 msec; block transfer time btt = 1 msec; block size B =
2400 bytes; interblock gap size G = 600 bytes. An EMPLOYEE file has the fol-
lowing fields: Ssn, 9 bytes; Last_name, 20 bytes; First_name, 20 bytes;
Middle_init, 1 byte; Birth_date, 10 bytes; Address, 35 bytes; Phone, 12 bytes;
Supervisor_ssn, 9 bytes; Department, 4 bytes; Job_code, 4 bytes; deletion
marker, 1 byte. The EMPLOYEE file has r = 30,000 records, fixed-length for-
mat, and unspanned blocking. Write appropriate formulas and calculate the
following values for the above EMPLOYEE file:
a. The record size R (including the deletion marker), the blocking factor bfr,
and the number of disk blocks b.
b. Calculate the wasted space in each disk block because of the unspanned
organization.
c. Calculate the transfer rate tr and the bulk transfer rate btr for this disk
unit (see Appendix B for definitions of tr and btr).
632
Disk Storage, Basic File Structures, and Hashing
d. Calculate the average number of block accesses needed to search for an
arbitrary record in the file, using linear search.
e. Calculate in msec the average time needed to search for an arbitrary
record in the file, using linear search, if the file blocks are stored on con-
secutive disk blocks and double buffering is used.
f. Calculate in msec the average time needed to search for an arbitrary
record in the file, using linear search, if the file blocks are not stored on
consecutive disk blocks.
g. Assume that the records are ordered via some key field. Calculate the
average number of block accesses and the average time needed to search for
an arbitrary record in the file, using binary search.
31. A PARTS file with Part# as the hash key includes records with the following
Part# values: 2369, 3760, 4692, 4871, 5659, 1821, 1074, 7115, 1620, 2428, 3943,
4750, 6975, 4981, and 9208. The file uses eight buckets, numbered 0 to 7. Each
bucket is one disk block and holds two records. Load these records into the
file in the given order, using the hash function h(K) = K mod 8. Calculate the
average number of block accesses for a random retrieval on Part#.
32. Load the records of Exercise 31 into expandable hash files based on
extendible hashing. Show the structure of the directory at each step, and the
global and local depths. Use the hash function h(K) = K mod 128.
33. Load the records of Exercise 31 into an expandable hash file, using linear
hashing. Start with a single disk block, using the hash function h0 = K mod
20, and show how the file grows and how the hash functions change as the
records are inserted. Assume that blocks are split whenever an overflow
occurs, and show the value of n at each stage.
34. Compare the file commands listed in Section 5 to those available on a file
access method you are familiar with.
35. Suppose that we have an unordered file of fixed-length records that uses an
unspanned record organization. Outline algorithms for insertion, deletion,
and modification of a file record. State any assumptions you make.
36. Suppose that we have an ordered file of fixed-length records and an
unordered overflow file to handle insertion. Both files use unspanned
records. Outline algorithms for insertion, deletion, and modification of a file
record and for reorganizing the file. State any assumptions you make.
37. Can you think of techniques other than an unordered overflow file that can
be used to make insertions in an ordered file more efficient?
38. Suppose that we have a hash file of fixed-length records, and suppose that
overflow is handled by chaining. Outline algorithms for insertion, deletion,
and modification of a file record. State any assumptions you make.
633
Disk Storage, Basic File Structures, and Hashing
39. Can you think of techniques other than chaining to handle bucket overflow
in external hashing?
40. Write pseudocode for the insertion algorithms for linear hashing and for
extendible hashing.
41. Write program code to access individual fields of records under each of the
following circumstances. For each case, state the assumptions you make con-
cerning pointers, separator characters, and so on. Determine the type of
information needed in the file header in order for your code to be general in
each case.
a. Fixed-length records with unspanned blocking
b. Fixed-length records with spanned blocking
c. Variable-length records with variable-length fields and spanned blocking
d. Variable-length records with repeating groups and spanned blocking
e. Variable-length records with optional fields and spanned blocking
f. Variable-length records that allow all three cases in parts c, d, and e
42. Suppose that a file initially contains r = 120,000 records of R = 200 bytes
each in an unsorted (heap) file. The block size B = 2400 bytes, the average
seek time s = 16 ms, the average rotational latency rd = 8.3 ms, and the block
transfer time btt = 0.8 ms. Assume that 1 record is deleted for every 2 records
added until the total number of active records is 240,000.
a. How many block transfers are needed to reorganize the file?
b. How long does it take to find a record right before reorganization?
c. How long does it take to find a record right after reorganization?
43. Suppose we have a sequential (ordered) file of 100,000 records where each
record is 240 bytes. Assume that B = 2400 bytes, s = 16 ms, rd = 8.3 ms, and
btt = 0.8 ms. Suppose we want to make X independent random record reads
from the file. We could make X random block reads or we could perform one
exhaustive read of the entire file looking for those X records. The question is
to decide when it would be more efficient to perform one exhaustive read of
the entire file than to perform X individual random reads. That is, what is
the value for X when an exhaustive read of the file is more efficient than ran-
dom X reads? Develop this as a function of X.
44. Suppose that a static hash file initially has 600 buckets in the primary area
and that records are inserted that create an overflow area of 600 buckets. If
we reorganize the hash file, we can assume that most of the overflow is elim-
inated. If the cost of reorganizing the file is the cost of the bucket transfers
(reading and writing all of the buckets) and the only periodic file operation
is the fetch operation, then how many times would we have to perform a
fetch (successfully) to make the reorganization cost effective? That is, the
reorganization cost and subsequent search cost are less than the search cost
before reorganization. Support your answer. Assume s = 16 ms, rd = 8.3 ms,
and btt = 1 ms.
634
Disk Storage, Basic File Structures, and Hashing
45. Suppose we want to create a linear hash file with a file load factor of 0.7 and
a blocking factor of 20 records per bucket, which is to contain 112,000
records initially.
a. How many buckets should we allocate in the primary area?
b. What should be the number of bits used for bucket addresses?
Selected Bibliography
Wiederhold (1987) has a detailed discussion and analysis of secondary storage
devices and file organizations as a part of database design. Optical disks are
described in Berg and Roth (1989) and analyzed in Ford and Christodoulakis
(1991). Flash memory is discussed by Dipert and Levy (1993). Ruemmler and
Wilkes (1994) present a survey of the magnetic-disk technology. Most textbooks on
databases include discussions of the material presented here. Most data structures
textbooks, including Knuth (1998), discuss static hashing in more detail; Knuth has
a complete discussion of hash functions and collision resolution techniques, as well
as of their performance comparison. Knuth also offers a detailed discussion of tech-
niques for sorting external files. Textbooks on file structures include Claybrook
(1992), Smith and Barnes (1987), and Salzberg (1988); they discuss additional file
organizations including tree-structured files, and have detailed algorithms for oper-
ations on files. Salzberg et al. (1990) describe a distributed external sorting algo-
rithm. File organizations with a high degree of fault tolerance are described by
Bitton and Gray (1988) and by Gray et al. (1990). Disk striping was proposed in
Salem and Garcia Molina (1986). The first paper on redundant arrays of inexpen-
sive disks (RAID) is by Patterson et al. (1988). Chen and Patterson (1990) and the
excellent survey of RAID by Chen et al. (1994) are additional references.
Grochowski and Hoyt (1996) discuss future trends in disk drives. Various formulas
for the RAID architecture appear in Chen et al. (1994).
Morris (1968) is an early paper on hashing. Extendible hashing is described in Fagin
et al. (1979). Linear hashing is described by Litwin (1980). Algorithms for insertion
and deletion for linear hashing are discussed with illustrations in Salzberg (1988).
Dynamic hashing, which we briefly introduced, was proposed by Larson (1978).
There are many proposed variations for extendible and linear hashing; for exam-
ples, see Cesarini and Soda (1991), Du and Tong (1991), and Hachem and Berra
(1992).
Details of disk storage devices can be found at manufacturer sites (for example,
http://www.seagate.com, http://www.ibm.com, http://www.emc.com, http://www
.hp.com, http://www.storagetek.com,. IBM has a storage technology research center
at IBM Almaden (http://www.almaden.ibm.com/).
635
Indexing Structures for Files
In this chapter we assume that a file already exists withsome primary organization such as the unordered,
ordered, or hashed organizations. We will describe additional auxiliary access
structures called indexes, which are used to speed up the retrieval of records in
response to certain search conditions. The index structures are additional files on
disk that provide secondary access paths, which provide alternative ways to access
the records without affecting the physical placement of records in the primary data
file on disk. They enable efficient access to records based on the indexing fields that
are used to construct the index. Basically, any field of the file can be used to create an
index, and multiple indexes on different fields—as well as indexes on multiple
fields—can be constructed on the same file. A variety of indexes are possible; each of
them uses a particular data structure to speed up the search. To find a record or
records in the data file based on a search condition on an indexing field, the index is
searched, which leads to pointers to one or more disk blocks in the data file where
the required records are located. The most prevalent types of indexes are based on
ordered files (single-level indexes) and tree data structures (multilevel indexes, B+-
trees). Indexes can also be constructed based on hashing or other search data struc-
tures. We also discuss indexes that are vectors of bits called bitmap indexes.
We describe different types of single-level ordered indexes—primary, secondary,
and clustering—in Section 1. By viewing a single-level index as an ordered file,
one can develop additional indexes for it, giving rise to the concept of multilevel
indexes. A popular indexing scheme called ISAM (Indexed Sequential Access
Method) is based on this idea. We discuss multilevel tree-structured indexes in
Section 2. In Section 3 we describe B-trees and B+-trees, which are data structures
that are commonly used in DBMSs to implement dynamically changing multi-
level indexes. B+-trees have become a commonly accepted default structure for
From Chapter 18 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
636
Indexing Structures for Files
generating indexes on demand in most relational DBMSs. Section 4 is devoted to
alternative ways to access data based on a combination of multiple keys. In
Section 5 we discuss hash indexes and introduce the concept of logical indexes,
which give an additional level of indirection from physical indexes, allowing for
the physical index to be flexible and extensible in its organization. In Section 6 we
discuss multikey indexing and bitmap indexes used for searching on one or more
keys. Section 7 summarizes the chapter.
1 Types of Single-Level Ordered Indexes
The idea behind an ordered index is similar to that behind the index used in a text-
book, which lists important terms at the end of the book in alphabetical order along
with a list of page numbers where the term appears in the book. We can search the
book index for a certain term in the textbook to find a list of addresses—page num-
bers in this case—and use these addresses to locate the specified pages first and then
search for the term on each specified page. The alternative, if no other guidance is
given, would be to sift slowly through the whole textbook word by word to find the
term we are interested in; this corresponds to doing a linear search, which scans the
whole file. Of course, most books do have additional information, such as chapter
and section titles, which help us find a term without having to search through the
whole book. However, the index is the only exact indication of the pages where each
term occurs in the book.
For a file with a given record structure consisting of several fields (or attributes), an
index access structure is usually defined on a single field of a file, called an indexing
field (or indexing attribute).1 The index typically stores each value of the index
field along with a list of pointers to all disk blocks that contain records with that
field value. The values in the index are ordered so that we can do a binary search on
the index. If both the data file and the index file are ordered, and since the index file
is typically much smaller than the data file, searching the index using a binary
search is a better option. Tree-structured multilevel indexes (see Section 2) imple-
ment an extension of the binary search idea that reduces the search space by 2-way
partitioning at each search step, thereby creating a more efficient approach that
divides the search space in the file n-ways at each stage.
There are several types of ordered indexes. A primary index is specified on the
ordering key field of an ordered file of records. Recall that an ordering key field is
used to physically order the file records on disk, and every record has a unique value
for that field. If the ordering field is not a key field—that is, if numerous records in
the file can have the same value for the ordering field—another type of index, called
a clustering index, can be used. The data file is called a clustered file in this latter
case. Notice that a file can have at most one physical ordering field, so it can have at
most one primary index or one clustering index, but not both. A third type of index,
called a secondary index, can be specified on any nonordering field of a file. A data
1We use the terms field and attribute interchangeably in this chapter.
637
Indexing Structures for Files
file can have several secondary indexes in addition to its primary access method. We
discuss these types of single-level indexes in the next three subsections.
1.1 Primary Indexes
A primary index is an ordered file whose records are of fixed length with two fields,
and it acts like an access structure to efficiently search for and access the data
records in a data file. The first field is of the same data type as the ordering key
field—called the primary key—of the data file, and the second field is a pointer to a
disk block (a block address). There is one index entry (or index record) in the
index file for each block in the data file. Each index entry has the value of the pri-
mary key field for the first record in a block and a pointer to that block as its two
field values. We will refer to the two field values of index entry i as .
To create a primary index on the ordered file shown in Figure A.1 (at the end of this
chapter, in Appendix: Figures and Table), we use the Name field as primary key,
because that is the ordering key field of the file (assuming that each value of Name is
unique). Each entry in the index has a Name value and a pointer. The first three
index entries are as follows:
Figure 1 illustrates this primary index. The total number of entries in the index is
the same as the number of disk blocks in the ordered data file. The first record in each
block of the data file is called the anchor record of the block, or simply the block
anchor.2
Indexes can also be characterized as dense or sparse. A dense index has an index
entry for every search key value (and hence every record) in the data file. A sparse
(or nondense) index, on the other hand, has index entries for only some of the
search values. A sparse index has fewer entries than the number of records in the
file. Thus, a primary index is a nondense (sparse) index, since it includes an entry
for each disk block of the data file and the keys of its anchor record rather than for
every search value (or every record).
The index file for a primary index occupies a much smaller space than does the data
file, for two reasons. First, there are fewer index entries than there are records in the
data file. Second, each index entry is typically smaller in size than a data record
because it has only two fields; consequently, more index entries than data records
can fit in one block. Therefore, a binary search on the index file requires fewer block
accesses than a binary search on the data file. Referring to Table A.1, note that the
binary search for an ordered data file required log2b block accesses. But if the pri-
mary index file contains only bi blocks, then to locate a record with a search key
2We can use a scheme similar to the one described here, with the last record in each block (rather than
the first) as the block anchor. This slightly improves the efficiency of the search algorithm.
638
Indexing Structures for Files
Index file
( entries)
Block anchor
primary key
value
Block
pointer
(Primary
key field)
Name
Aaron, Ed
Abbot, Diane
…
…
…
…
…
Acosta, Marc
Adams, John
Adams, Robin
Akers, Jan
Alexander, Ed
Alfred, Bob
Allen, Sam
Allen, Troy
Anders, Keith
Anderson, Rob
Anderson, Zach
Angel, Joe
Archer, Sue
Arnold, Mack
Arnold, Steven
Atkins, Timothy
Wong, James
Wood, Donald
Woods, Manny
Wright, Pam
Wyatt, Charles
Zimmer, Byron
Aaron, Ed
Adams, John
Alexander, Ed
Allen, Troy
Anderson, Zach
Arnold, Mack
Wong, James
Wright, Pam
…
…
…
. .
.
. .
.
Ssn Birth_date Job Salary Sex
Figure 1
Primary index on the ordering key field of
the file shown in Figure A.1.
639
Indexing Structures for Files
value requires a binary search of that index and access to the block containing that
record: a total of log2bi + 1 accesses.
A record whose primary key value is K lies in the block whose address is P(i), where
K(i) ≤ K < K(i + 1). The ith block in the data file contains all such records because
of the physical ordering of the file records on the primary key field. To retrieve a
record, given the value K of its primary key field, we do a binary search on the index
file to find the appropriate index entry i, and then retrieve the data file block whose
address is P(i).3 Example 1 illustrates the saving in block accesses that is attainable
when a primary index is used to search for a record.
Example 1. Suppose that we have an ordered file with r = 30,000 records stored on
a disk with block size B = 1024 bytes. File records are of fixed size and are
unspanned, with record length R = 100 bytes. The blocking factor for the file would
be bfr = ⎣(B/R)⎦ = ⎣(1024/100)⎦ = 10 records per block. The number of blocks
needed for the file is b = ⎡(r/bfr)⎤ = ⎡(30000/10)⎤ = 3000 blocks. A binary search on
the data file would need approximately ⎡log2b⎤= ⎡(log23000)⎤ = 12 block accesses.
Now suppose that the ordering key field of the file is V = 9 bytes long, a block
pointer is P = 6 bytes long, and we have constructed a primary index for the file. The
size of each index entry is Ri = (9 + 6) = 15 bytes, so the blocking factor for the index
is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per block. The total number of index
entries ri is equal to the number of blocks in the data file, which is 3000. The num-
ber of index blocks is hence bi = ⎡(ri/bfri)⎤ = ⎡(3000/68)⎤ = 45 blocks. To perform a
binary search on the index file would need ⎡(log2bi)⎤ = ⎡(log245)⎤ = 6 block
accesses. To search for a record using the index, we need one additional block access
to the data file for a total of 6 + 1 = 7 block accesses—an improvement over binary
search on the data file, which required 12 disk block accesses.
A major problem with a primary index—as with any ordered file—is insertion and
deletion of records. With a primary index, the problem is compounded because if
we attempt to insert a record in its correct position in the data file, we must not only
move records to make space for the new record but also change some index entries,
since moving records will change the anchor records of some blocks. Using an
unordered overflow file can reduce this problem. Another possibility is to use a
linked list of overflow records for each block in the data file. This is similar to the
method of dealing with overflow records related to hashing. Records within each
block and its overflow linked list can be sorted to improve retrieval time. Record
deletion is handled using deletion markers.
1.2 Clustering Indexes
If file records are physically ordered on a nonkey field—which does not have a dis-
tinct value for each record—that field is called the clustering field and the data file
3Notice that the above formula would not be correct if the data file were ordered on a nonkey field; in
that case the same index value in the block anchor could be repeated in the last records of the previous
block.
640
is called a clustered file. We can create a different type of index, called a clustering
index, to speed up retrieval of all the records that have the same value for the clus-
tering field. This differs from a primary index, which requires that the ordering field
of the data file have a distinct value for each record.
A clustering index is also an ordered file with two fields; the first field is of the same
type as the clustering field of the data file, and the second field is a disk block
pointer. There is one entry in the clustering index for each distinct value of the clus-
tering field, and it contains the value and a pointer to the first block in the data file
that has a record with that value for its clustering field. Figure 2 shows an example.
Notice that record insertion and deletion still cause problems because the data
records are physically ordered. To alleviate the problem of insertion, it is common to
reserve a whole block (or a cluster of contiguous blocks) for each value of the clus-
tering field; all records with that value are placed in the block (or block cluster). This
makes insertion and deletion relatively straightforward. Figure 3 shows this scheme.
A clustering index is another example of a nondense index because it has an entry
for every distinct value of the indexing field, which is a nonkey by definition and
hence has duplicate values rather than a unique value for every record in the file.
There is some similarity between Figures 1, 2, and 3 and Figures A.2 and A.3. An
index is somewhat similar to dynamic hashing and to the directory structures used
for extendible hashing. Both are searched to find a pointer to the data block con-
taining the desired record. A main difference is that an index search uses the values
of the search field itself, whereas a hash directory search uses the binary hash value
that is calculated by applying the hash function to the search field.
1.3 Secondary Indexes
A secondary index provides a secondary means of accessing a data file for which
some primary access already exists. The data file records could be ordered,
unordered, or hashed. The secondary index may be created on a field that is a can-
didate key and has a unique value in every record, or on a nonkey field with dupli-
cate values. The index is again an ordered file with two fields. The first field is of the
same data type as some nonordering field of the data file that is an indexing field.
The second field is either a block pointer or a record pointer. Many secondary
indexes (and hence, indexing fields) can be created for the same file—each repre-
sents an additional means of accessing that file based on some specific field.
First we consider a secondary index access structure on a key (unique) field that has
a distinct value for every record. Such a field is sometimes called a secondary key; in
the relational model, this would correspond to any UNIQUE key attribute or to the
primary key attribute of a table. In this case there is one index entry for each record
in the data file, which contains the value of the field for the record and a pointer
either to the block in which the record is stored or to the record itself. Hence, such
an index is dense.
Indexing Structures for Files
641
Data file
(Clustering
field)
Dept_number
1
1
1
2
Name Ssn Birth_date SalaryJob
2
3
3
3
3
3
4
4
5
5
5
5
6
6
6
6
6
8
8
8
1
2
3
4
5
6
8
Index file
( entries)
Clustering
field value
Block
pointer
Figure 2
A clustering index on the Dept_number ordering
nonkey field of an EMPLOYEE file.
Indexing Structures for Files
Again we refer to the two field values of index entry i as . The entries are
ordered by value of K(i), so we can perform a binary search. Because the records of
the data file are not physically ordered by values of the secondary key field, we cannot
use block anchors. That is why an index entry is created for each record in the data
642
Indexing Structures for Files
Data file
Block pointer
NULL pointer
(Clustering
field)
Dept_number
1
1
2
3
4
5
6
8
1
1
Name Ssn Birth_date SalaryJob
Block pointer
2
2
Block pointer
3
3
3
3
Block pointer
3
Block pointer
4
4
Block pointer
5
5
5
5
Block pointer
6
6
6
6
Block pointer
6
Block pointer
8
8
8
NULL pointer
NULL pointer
NULL pointer
NULL pointer
NULL pointer
NULL pointer
Index file
( entries)
Clustering
field value
Block
pointer
Figure 3
Clustering index with a
separate block cluster for
each group of records
that share the same value
for the clustering field.
643
Indexing Structures for Files
Data file
Indexing field
(secondary
key field)
6
15
3
17
9
5
13
8
21
11
16
2
24
10
20
1
4
23
18
14
12
7
19
22
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
Index file
( entries)
Index
field value
Block
pointer
Figure 4
A dense secondary index (with block pointers) on a nonordering key field of a file.
file, rather than for each block, as in the case of a primary index. Figure 4 illustrates a
secondary index in which the pointers P(i) in the index entries are block pointers, not
record pointers. Once the appropriate disk block is transferred to a main memory
buffer, a search for the desired record within the block can be carried out.
644
Indexing Structures for Files
A secondary index usually needs more storage space and longer search time than
does a primary index, because of its larger number of entries. However, the
improvement in search time for an arbitrary record is much greater for a secondary
index than for a primary index, since we would have to do a linear search on the data
file if the secondary index did not exist. For a primary index, we could still use a
binary search on the main file, even if the index did not exist. Example 2 illustrates
the improvement in number of blocks accessed.
Example 2. Consider the file of Example 1 with r = 30,000 fixed-length records of
size R = 100 bytes stored on a disk with block size B = 1024 bytes. The file has b =
3000 blocks, as calculated in Example 1. Suppose we want to search for a record with
a specific value for the secondary key—a nonordering key field of the file that is V =
9 bytes long. Without the secondary index, to do a linear search on the file would
require b/2 = 3000/2 = 1500 block accesses on the average. Suppose that we con-
struct a secondary index on that nonordering key field of the file. As in Example 1, a
block pointer is P = 6 bytes long, so each index entry is Ri = (9 + 6) = 15 bytes, and
the blocking factor for the index is bfri = ⎣(B/Ri)⎦ = ⎣(1024/15)⎦ = 68 entries per
block. In a dense secondary index such as this, the total number of index entries ri is
equal to the number of records in the data file, which is 30,000. The number of blocks
needed for the index is hence bi = ⎡(ri /bfri)⎤ = ⎡(3000/68)⎤ = 442 blocks.
A binary search on this secondary index needs ⎡(log2bi)⎤ = ⎡(log2442)⎤ = 9 block
accesses. To search for a record using the index, we need an additional block access
to the data file for a total of 9 + 1 = 10 block accesses—a vast improvement over the
1500 block accesses needed on the average for a linear search, but slightly worse than
the 7 block accesses required for the primary index. This difference arose because
the primary index was nondense and hence shorter, with only 45 blocks in length.
We can also create a secondary index on a nonkey, nonordering field of a file. In this
case, numerous records in the data file can have the same value for the indexing
field. There are several options for implementing such an index:
■ Option 1 is to include duplicate index entries with the same K(i) value—one
for each record. This would be a dense index.
■ Option 2 is to have variable-length records for the index entries, with a
repeating field for the pointer. We keep a list of pointers
in the index entry for K(i)—one pointer to each block that contains a record
whose indexing field value equals K(i). In either option 1 or option 2, the
binary search algorithm on the index must be modified appropriately to
account for a variable number of index entries per index key value.
■ Option 3, which is more commonly used, is to keep the index entries them-
selves at a fixed length and have a single entry for each index field value, but
to create an extra level of indirection to handle the multiple pointers. In this
nondense scheme, the pointer P(i) in index entry points to a
disk block, which contains a set of record pointers; each record pointer in that
disk block points to one of the data file records with value K(i) for the index-
ing field. If some value K(i) occurs in too many records, so that their record
pointers cannot fit in a single disk block, a cluster or linked list of blocks is
645
Indexing Structures for Files
Data file
(Indexing field)
Dept_number
3
5
1
6
Name Ssn Birth_date SalaryJob
2
3
4
8
6
8
4
1
6
5
2
5
5
1
6
3
6
3
8
3
1
2
3
4
5
6
8
Blocks of
record
pointers
Index file
( entries)
Field
value
Block
pointer
Figure 5
A secondary index (with
record pointers) on a non-
key field implemented
using one level of indirec-
tion so that index entries
are of fixed length and
have unique field values.
used. This technique is illustrated in Figure 5. Retrieval via the index requires
one or more additional block accesses because of the extra level, but the
algorithms for searching the index and (more importantly) for inserting of
new records in the data file are straightforward. In addition, retrievals on
complex selection conditions may be handled by referring to the record
pointers, without having to retrieve many unnecessary records from the data
file (see Exercise 23).
646
Notice that a secondary index provides a logical ordering on the records by the
indexing field. If we access the records in order of the entries in the secondary index,
we get them in order of the indexing field. The primary and clustering indexes
assume that the field used for physical ordering of records in the file is the same as
the indexing field.
1.4 Summary
To conclude this section, we summarize the discussion of index types in two tables.
Table 1 shows the index field characteristics of each type of ordered single-level
index discussed—primary, clustering, and secondary. Table 2 summarizes the prop-
erties of each type of index by comparing the number of index entries and specify-
ing which indexes are dense and which use block anchors of the data file.
Indexing Structures for Files
Table 1 Types of Indexes Based on the Properties of the Indexing Field
Index Field Used
for Physical Ordering
of the File
Index Field Not Used
for Physical Ordering
of the File
Indexing field is key Primary index Secondary index (Key)
Indexing field is nonkey Clustering index Secondary index (NonKey)
Table 2 Properties of Index Types
Type of Index
Number of (First-level)
Index Entries
Dense or Nondense
(Sparse)
Block Anchoring
on the Data File
Primary Number of blocks in
data file
Nondense Yes
Clustering Number of distinct
index field values
Nondense Yes/noa
Secondary (key) Number of records in
data file
Dense No
Secondary (nonkey) Number of recordsb or
number of distinct index
field valuesc
Dense or Nondense No
aYes if every distinct value of the ordering field starts a new block; no otherwise.
bFor option 1.
cFor options 2 and 3.
647
Indexing Structures for Files
2 Multilevel Indexes
The indexing schemes we have described thus far involve an ordered index file. A
binary search is applied to the index to locate pointers to a disk block or to a record
(or records) in the file having a specific index field value. A binary search requires
approximately (log2bi) block accesses for an index with bi blocks because each step
of the algorithm reduces the part of the index file that we continue to search by a
factor of 2. This is why we take the log function to the base 2. The idea behind a
multilevel index is to reduce the part of the index that we continue to search by bfri,
the blocking factor for the index, which is larger than 2. Hence, the search space is
reduced much faster. The value bfri is called the fan-out of the multilevel index, and
we will refer to it by the symbol fo. Whereas we divide the record search space into
two halves at each step during a binary search, we divide it n-ways (where n = the
fan-out) at each search step using the multilevel index. Searching a multilevel index
requires approximately (logfobi) block accesses, which is a substantially smaller
number than for a binary search if the fan-out is larger than 2. In most cases, the
fan-out is much larger than 2.
A multilevel index considers the index file, which we will now refer to as the first (or
base) level of a multilevel index, as an ordered file with a distinct value for each K(i).
Therefore, by considering the first-level index file as a sorted data file, we can create
a primary index for the first level; this index to the first level is called the second
level of the multilevel index. Because the second level is a primary index, we can use
block anchors so that the second level has one entry for each block of the first level.
The blocking factor bfri for the second level—and for all subsequent levels—is the
same as that for the first-level index because all index entries are the same size; each
has one field value and one block address. If the first level has r1 entries, and the
blocking factor—which is also the fan-out—for the index is bfri = fo, then the first
level needs ⎡(r1/fo)⎤ blocks, which is therefore the number of entries r2 needed at the
second level of the index.
We can repeat this process for the second level. The third level, which is a primary
index for the second level, has an entry for each second-level block, so the number
of third-level entries is r3 = ⎡(r2/fo)⎤. Notice that we require a second level only if the
first level needs more than one block of disk storage, and, similarly, we require a
third level only if the second level needs more than one block. We can repeat the
preceding process until all the entries of some index level t fit in a single block. This
block at the tth level is called the top index level.4 Each level reduces the number of
entries at the previous level by a factor of fo—the index fan-out—so we can use the
formula 1 ≤ (r1/((fo)
t)) to calculate t. Hence, a multilevel index with r1 first-level
entries will have approximately t levels, where t = ⎡(logfo(r1))⎤. When searching the
4The numbering scheme for index levels used here is the reverse of the way levels are commonly
defined for tree data structures. In tree data structures, t is referred to as level 0 (zero), t – 1 is level 1,
and so on.
648
Indexing Structures for Files
index, a single disk block is retrieved at each level. Hence, t disk blocks are accessed
for an index search, where t is the number of index levels.
The multilevel scheme described here can be used on any type of index—whether it
is primary, clustering, or secondary—as long as the first-level index has distinct val-
ues for K(i) and fixed-length entries. Figure 6 shows a multilevel index built over a
primary index. Example 3 illustrates the improvement in number of blocks accessed
when a multilevel index is used to search for a record.
Example 3. Suppose that the dense secondary index of Example 2 is converted into
a multilevel index. We calculated the index blocking factor bfri = 68 index entries
per block, which is also the fan-out fo for the multilevel index; the number of first-
level blocks b1 = 442 blocks was also calculated. The number of second-level blocks
will be b2 = ⎡(b1/fo)⎤ = ⎡(442/68)⎤ = 7 blocks, and the number of third-level blocks
will be b3 = ⎡(b2/fo)⎤ = ⎡(7/68)⎤ = 1 block. Hence, the third level is the top level of
the index, and t = 3. To access a record by searching the multilevel index, we must
access one block at each level plus one block from the data file, so we need t + 1 = 3
+ 1 = 4 block accesses. Compare this to Example 2, where 10 block accesses were
needed when a single-level index and binary search were used.
Notice that we could also have a multilevel primary index, which would be non-
dense. Exercise 18(c) illustrates this case, where we must access the data block from
the file before we can determine whether the record being searched for is in the file.
For a dense index, this can be determined by accessing the first index level (without
having to access a data block), since there is an index entry for every record in the
file.
A common file organization used in business data processing is an ordered file with
a multilevel primary index on its ordering key field. Such an organization is called
an indexed sequential file and was used in a large number of early IBM systems.
IBM’s ISAM organization incorporates a two-level index that is closely related to
the organization of the disk in terms of cylinders and tracks. The first level is a cylin-
der index, which has the key value of an anchor record for each cylinder of a disk
pack occupied by the file and a pointer to the track index for the cylinder. The track
index has the key value of an anchor record for each track in the cylinder and a
pointer to the track. The track can then be searched sequentially for the desired
record or block. Insertion is handled by some form of overflow file that is merged
periodically with the data file. The index is recreated during file reorganization.
Algorithm 1 outlines the search procedure for a record in a data file that uses a non-
dense multilevel primary index with t levels. We refer to entry i at level j of the index
as , and we search for a record whose primary key value is K. We
assume that any overflow records are ignored. If the record is in the file, there must
be some entry at level 1 with K1(i) ≤ K < K1(i + 1) and the record will be in the block
of the data file whose address is P1(i). Exercise 23 discusses modifying the search
algorithm for other types of indexes.
649
Indexing Structures for Files
Data file
Primary
key field
Second (top)
level
Two-level index
2
5
8
12
15
21
24
29
35
36
39
41
44
46
51
52
55
58
63
66
71
78
80
82
85
89
2
35
55
85
First (base)
level
2
8
15
24
35
39
44
51
55
63
71
80
85
Figure 6
A two-level primary index resembling ISAM (Indexed Sequential
Access Method) organization.
650
Indexing Structures for Files
Algorithm 1. Searching a Nondense Multilevel Primary Index with t Levels
(* We assume the index entry to be a block anchor that is the first key per block. *)
p ← address of top-level block of index;
for j ← t step – 1 to 1 do
begin
read the index block (at jth index level) whose address is p;
search block p for entry i such that Kj (i) ≤ K < Kj(i + 1)
(* if Kj(i)
is the last entry in the block, it is sufficient to satisfy Kj(i) ≤ K *);
p ← Pj(i ) (* picks appropriate pointer at jth index level *)
end;
read the data file block whose address is p;
search block p for record with key = K;
As we have seen, a multilevel index reduces the number of blocks accessed when
searching for a record, given its indexing field value. We are still faced with the prob-
lems of dealing with index insertions and deletions, because all index levels are
physically ordered files. To retain the benefits of using multilevel indexing while
reducing index insertion and deletion problems, designers adopted a multilevel
index called a dynamic multilevel index that leaves some space in each of its blocks
for inserting new entries and uses appropriate insertion/deletion algorithms for cre-
ating and deleting new index blocks when the data file grows and shrinks. It is often
implemented by using data structures called B-trees and B+-trees, which we
describe in the next section.
3 Dynamic Multilevel Indexes Using
B-Trees and B+-Trees
B-trees and B+-trees are special cases of the well-known search data structure
known as a tree. We briefly introduce the terminology used in discussing tree data
structures. A tree is formed of nodes. Each node in the tree, except for a special
node called the root, has one parent node and zero or more child nodes. The root
node has no parent. A node that does not have any child nodes is called a leaf node;
a nonleaf node is called an internal node. The level of a node is always one more
than the level of its parent, with the level of the root node being zero.5 A subtree of
a node consists of that node and all its descendant nodes—its child nodes, the child
nodes of its child nodes, and so on. A precise recursive definition of a subtree is that
it consists of a node n and the subtrees of all the child nodes of n. Figure 7 illustrates
a tree data structure. In this figure the root node is A, and its child nodes are B, C,
and D. Nodes E, J, C, G, H, and K are leaf nodes. Since the leaf nodes are at different
levels of the tree, this tree is called unbalanced.
5This standard definition of the level of a tree node, which we use throughout Section 3, is different from
the one we gave for multilevel indexes in Section 2.
651
Indexing Structures for Files
A
CB
Subtree for node B
(Nodes E, J, C, G, H, and K are leaf nodes of the tree)
Root node (level 0)
Nodes at level 1D
Nodes at level 2DF I
Nodes at level 3I
HG
J
E
K
Figure 7
A tree data structure that shows an unbalanced tree.
In Section 3.1, we introduce search trees and then discuss B-trees, which can be used
as dynamic multilevel indexes to guide the search for records in a data file. B-tree
nodes are kept between 50 and 100 percent full, and pointers to the data blocks are
stored in both internal nodes and leaf nodes of the B-tree structure. In Section 3.2
we discuss B+-trees, a variation of B-trees in which pointers to the data blocks of a
file are stored only in leaf nodes, which can lead to fewer levels and higher-capacity
indexes. In the DBMSs prevalent in the market today, the common structure used
for indexing is B+-trees.
3.1 Search Trees and B-Trees
A search tree is a special type of tree that is used to guide the search for a record,
given the value of one of the record’s fields. The multilevel indexes discussed in
Section 2 can be thought of as a variation of a search tree; each node in the multi-
level index can have as many as fo pointers and fo key values, where fo is the index
fan-out. The index field values in each node guide us to the next node, until we
reach the data file block that contains the required records. By following a pointer,
we restrict our search at each level to a subtree of the search tree and ignore all
nodes not in this subtree.
Search Trees. A search tree is slightly different from a multilevel index. A search
tree of order p is a tree such that each node contains at most p − 1 search values and
p pointers in the order , where q ≤ p. Each Pi is a
pointer to a child node (or a NULL pointer), and each Ki is a search value from some
652
Indexing Structures for Files
P1
P1
K1 Ki–1
Kq–1 < X
X
Ki–1 < X < Ki
X
X < K1
X
Pi Ki Kq–1 Pq. . . . . .Figure 8
A node in a search
tree with pointers to
subtrees below it.
5
3
Tree node pointer
Null tree pointer
6 9
7 8 121
Figure 9
A search tree of
order p = 3.
ordered set of values. All search values are assumed to be unique.6 Figure 8 illus-
trates a node in a search tree. Two constraints must hold at all times on the search
tree:
1. Within each node, K1 < K2 < ... < Kq−1.
2. For all values X in the subtree pointed at by Pi, we have Ki−1 < X < Ki for 1 <
i < q; X < Ki for i = 1; and Ki−1 < X for i = q (see Figure 8).
Whenever we search for a value X, we follow the appropriate pointer Pi according to
the formulas in condition 2 above. Figure 9 illustrates a search tree of order p = 3
and integer search values. Notice that some of the pointers Pi in a node may be NULL
pointers.
We can use a search tree as a mechanism to search for records stored in a disk file.
The values in the tree can be the values of one of the fields of the file, called
the search field (which is the same as the index field if a multilevel index guides the
search). Each key value in the tree is associated with a pointer to the record in the
data file having that value. Alternatively, the pointer could be to the disk block con-
taining that record. The search tree itself can be stored on disk by assigning each tree
node to a disk block. When a new record is inserted in the file, we must update the
search tree by inserting an entry in the tree containing the search field value of the
new record and a pointer to the new record.
6This restriction can be relaxed. If the index is on a nonkey field, duplicate search values may exist and
the node structure and the navigation rules for the tree may be modified.
653
Indexing Structures for Files
Algorithms are necessary for inserting and deleting search values into and from the
search tree while maintaining the preceding two constraints. In general, these algo-
rithms do not guarantee that a search tree is balanced, meaning that all of its leaf
nodes are at the same level.7 The tree in Figure 7 is not balanced because it has leaf
nodes at levels 1, 2, and 3. The goals for balancing a search tree are as follows:
■ To guarantee that nodes are evenly distributed, so that the depth of the tree
is minimized for the given set of keys and that the tree does not get skewed
with some nodes being at very deep levels
■ To make the search speed uniform, so that the average time to find any ran-
dom key is roughly the same
While minimizing the number of levels in the tree is one goal, another implicit goal
is to make sure that the index tree does not need too much restructuring as records
are inserted into and deleted from the main file. Thus we want the nodes to be as full
as possible and do not want any nodes to be empty if there are too many deletions.
Record deletion may leave some nodes in the tree nearly empty, thus wasting storage
space and increasing the number of levels. The B-tree addresses both of these prob-
lems by specifying additional constraints on the search tree.
B-Trees. The B-tree has additional constraints that ensure that the tree is always
balanced and that the space wasted by deletion, if any, never becomes excessive. The
algorithms for insertion and deletion, though, become more complex in order to
maintain these constraints. Nonetheless, most insertions and deletions are simple
processes; they become complicated only under special circumstances—namely,
whenever we attempt an insertion into a node that is already full or a deletion from
a node that makes it less than half full. More formally, a B-tree of order p, when
used as an access structure on a key field to search for records in a data file, can be
defined as follows:
1. Each internal node in the B-tree (Figure 10(a)) is of the form
, P2, , …, , Pq>
where q ≤ p. Each Pi is a tree pointer—a pointer to another node in the B-
tree. Each Pri is a data pointer
8—a pointer to the record whose search key
field value is equal to Ki (or to the data file block containing that record).
2. Within each node, K1 < K2 < ... < Kq−1.
3. For all search key field values X in the subtree pointed at by Pi (the ith sub-
tree, see Figure 10(a)), we have:
Ki–1 < X < Ki for 1 < i < q; X < Ki for i = 1; and Ki–1 < X for i = q.
4. Each node has at most p tree pointers.
7The definition of balanced is different for binary trees. Balanced binary trees are known as AVL trees.
8A data pointer is either a block address or a record address; the latter is essentially a block address and
a record offset within the block.
654
Indexing Structures for Files
X Tree
pointer
Tree
pointer
Tree
pointer
(a)
(b)
P2
X
Data
pointer
Data
pointer
Data
pointer
5 o 8 Tree node pointero
6 o 7 o 9 o 12 o
Data pointer
Null tree pointer
1 o 3 o
Data
pointer
P1 Pr1K1 Ki–1 Prq–1Kq–1
X
Kq–1 < XKi–1 < X < Ki X < K1
Pi Pq. . . . . .Pri–1 Ki Pri
Tree
pointer
o
Figure 10
B-tree structures. (a) A node in a B-tree with q – 1 search values. (b) A B-tree
of order p = 3.The values were inserted in the order 8, 5, 1, 7, 3, 12, 9, 6.
5. Each node, except the root and leaf nodes, has at least ⎡(p/2)⎤ tree pointers.
The root node has at least two tree pointers unless it is the only node in the
tree.
6. A node with q tree pointers, q ≤ p, has q – 1 search key field values (and hence
has q – 1 data pointers).
7. All leaf nodes are at the same level. Leaf nodes have the same structure as
internal nodes except that all of their tree pointers Pi are NULL.
Figure 10(b) illustrates a B-tree of order p = 3. Notice that all search values K in the
B-tree are unique because we assumed that the tree is used as an access structure on
a key field. If we use a B-tree on a nonkey field, we must change the definition of the
file pointers Pri to point to a block—or a cluster of blocks—that contain the point-
ers to the file records. This extra level of indirection is similar to option 3, discussed
in Section 1.3, for secondary indexes.
A B-tree starts with a single root node (which is also a leaf node) at level 0 (zero).
Once the root node is full with p – 1 search key values and we attempt to insert
another entry in the tree, the root node splits into two nodes at level 1. Only the
middle value is kept in the root node, and the rest of the values are split evenly
655
Indexing Structures for Files
between the other two nodes. When a nonroot node is full and a new entry is
inserted into it, that node is split into two nodes at the same level, and the middle
entry is moved to the parent node along with two pointers to the new split nodes. If
the parent node is full, it is also split. Splitting can propagate all the way to the root
node, creating a new level if the root is split. We do not discuss algorithms for B-
trees in detail in this book,9 but we outline search and insertion procedures for
B+-trees in the next section.
If deletion of a value causes a node to be less than half full, it is combined with its
neighboring nodes, and this can also propagate all the way to the root. Hence, dele-
tion can reduce the number of tree levels. It has been shown by analysis and simula-
tion that, after numerous random insertions and deletions on a B-tree, the nodes
are approximately 69 percent full when the number of values in the tree stabilizes.
This is also true of B+-trees. If this happens, node splitting and combining will
occur only rarely, so insertion and deletion become quite efficient. If the number of
values grows, the tree will expand without a problem—although splitting of nodes
may occur, so some insertions will take more time. Each B-tree node can have at
most p tree pointers, p – 1 data pointers, and p – 1 search key field values (see Figure
10(a)).
In general, a B-tree node may contain additional information needed by the algo-
rithms that manipulate the tree, such as the number of entries q in the node and a
pointer to the parent node. Next, we illustrate how to calculate the number of blocks
and levels for a B-tree.
Example 4. Suppose that the search field is a nonordering key field, and we con-
struct a B-tree on this field with p = 23. Assume that each node of the B-tree is 69
percent full. Each node, on the average, will have p * 0.69 = 23 * 0.69 or approxi-
mately 16 pointers and, hence, 15 search key field values. The average fan-out fo =
16. We can start at the root and see how many values and pointers can exist, on the
average, at each subsequent level:
Root: 1 node 15 key entries 16 pointers
Level 1: 16 nodes 240 key entries 256 pointers
Level 2: 256 nodes 3840 key entries 4096 pointers
Level 3: 4096 nodes 61,440 key entries
At each level, we calculated the number of key entries by multiplying the total num-
ber of pointers at the previous level by 15, the average number of entries in each
node. Hence, for the given block size, pointer size, and search key field size, a two-
level B-tree holds 3840 + 240 + 15 = 4095 entries on the average; a three-level B-tree
holds 65,535 entries on the average.
B-trees are sometimes used as primary file organizations. In this case, whole records
are stored within the B-tree nodes rather than just the
entries. This works well for files with a relatively small number of records and a small
9For details on insertion and deletion algorithms for B-trees, consult Ramakrishnan and Gehrke [2003].
656
Indexing Structures for Files
record size. Otherwise, the fan-out and the number of levels become too great to
permit efficient access.
In summary, B-trees provide a multilevel access structure that is a balanced tree
structure in which each node is at least half full. Each node in a B-tree of order p can
have at most p − 1 search values.
3.2 B+-Trees
Most implementations of a dynamic multilevel index use a variation of the B-tree
data structure called a B+-tree. In a B-tree, every value of the search field appears
once at some level in the tree, along with a data pointer. In a B+-tree, data pointers
are stored only at the leaf nodes of the tree; hence, the structure of leaf nodes differs
from the structure of internal nodes. The leaf nodes have an entry for every value of
the search field, along with a data pointer to the record (or to the block that contains
this record) if the search field is a key field. For a nonkey search field, the pointer
points to a block containing pointers to the data file records, creating an extra level
of indirection.
The leaf nodes of the B+-tree are usually linked to provide ordered access on the
search field to the records. These leaf nodes are similar to the first (base) level of an
index. Internal nodes of the B+-tree correspond to the other levels of a multilevel
index. Some search field values from the leaf nodes are repeated in the internal
nodes of the B+-tree to guide the search. The structure of the internal nodes of a B+-
tree of order p (Figure 11(a)) is as follows:
1. Each internal node is of the form
where q ≤ p and each Pi is a tree pointer.
2. Within each internal node, K1 < K2 < ... < Kq−1.
3. For all search field values X in the subtree pointed at by Pi, we have Ki−1 < X
≤ Ki for 1 < i < q; X ≤ Ki for i = 1; and Ki−1 < X for i = q (see Figure 11(a)).
10
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least ⎡(p/2)⎤ tree pointers. The
root node has at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q ≤ p, has q − 1 search field values.
The structure of the leaf nodes of a B+-tree of order p (Figure 11(b)) is as follows:
1. Each leaf node is of the form
<, , …, , Pnext>
where q ≤ p, each Pri is a data pointer, and Pnext points to the next leaf node of
the B+-tree.
10Our definition follows Knuth (1998). One can define a B+-tree differently by exchanging the < and �
symbols (K
i−1 � X < Ki; Kq−1 ≤ X), but the principles remain the same.
657
Indexing Structures for Files
(b) Pointer to
next leaf
node in
tree
Data
pointer
Data
pointer
Data
pointer
Data
pointer
Pr1K1 Pr2K2 Pri Prq–1 PnextKi Kq–1. . . . . .
(a) P1 K1 Ki–1
Kq–1 < X
XX
X < K1
X
Pi Ki Kq–1 Pq. . . . . .
Tree
pointer
Tree
pointer
Tree
pointer
Ki–1 < X < Ki
Figure 11
The nodes of a B+-tree. (a) Internal node of a B+-tree with q – 1 search values.
(b) Leaf node of a B+-tree with q – 1 search values and q – 1 data pointers.
2. Within each leaf node, K1 ≤ K2 ... , Kq−1, q ≤ p.
3. Each Pri is a data pointer that points to the record whose search field value is
Ki or to a file block containing the record (or to a block of record pointers
that point to records whose search field value is Ki if the search field is not a
key).
4. Each leaf node has at least ⎡(p/2)⎤ values.
5. All leaf nodes are at the same level.
The pointers in internal nodes are tree pointers to blocks that are tree nodes, whereas
the pointers in leaf nodes are data pointers to the data file records or blocks—except
for the Pnext pointer, which is a tree pointer to the next leaf node. By starting at the
leftmost leaf node, it is possible to traverse leaf nodes as a linked list, using the Pnext
pointers. This provides ordered access to the data records on the indexing field. A
Pprevious pointer can also be included. For a B
+-tree on a nonkey field, an extra level
of indirection is needed similar to the one shown in Figure 5, so the Pr pointers are
block pointers to blocks that contain a set of record pointers to the actual records in
the data file, as discussed in option 3 of Section 1.3.
Because entries in the internal nodes of a B+-tree include search values and tree
pointers without any data pointers, more entries can be packed into an internal node
of a B+-tree than for a similar B-tree. Thus, for the same block (node) size, the order
p will be larger for the B+-tree than for the B-tree, as we illustrate in Example 5. This
can lead to fewer B+-tree levels, improving search time. Because the structures for
internal and for leaf nodes of a B+-tree are different, the order p can be different. We
658
Indexing Structures for Files
will use p to denote the order for internal nodes and pleaf to denote the order for leaf
nodes, which we define as being the maximum number of data pointers in a leaf
node.
Example 5. To calculate the order p of a B+-tree, suppose that the search key field
is V = 9 bytes long, the block size is B = 512 bytes, a record pointer is Pr = 7 bytes,
and a block pointer is P = 6 bytes. An internal node of the B+-tree can have up to p
tree pointers and p – 1 search field values; these must fit into a single block. Hence,
we have:
(p * P) + ((p – 1) * V) ≤ B
(P * 6) + ((P − 1) * 9) ≤ 512
(15 * p) ≤ 521
We can choose p to be the largest value satisfying the above inequality, which gives
p = 34. This is larger than the value of 23 for the B-tree (it is left to the reader to
compute the order of the B-tree assuming same size pointers), resulting in a larger
fan-out and more entries in each internal node of a B+-tree than in the correspon-
ding B-tree. The leaf nodes of the B+-tree will have the same number of values and
pointers, except that the pointers are data pointers and a next pointer. Hence, the
order pleaf for the leaf nodes can be calculated as follows:
(pleaf * (Pr + V)) + P ≤ B
(pleaf * (7 + 9)) + 6 ≤ 512
(16 * pleaf) ≤ 506
It follows that each leaf node can hold up to pleaf = 31 key value/data pointer combi-
nations, assuming that the data pointers are record pointers.
As with the B-tree, we may need additional information—to implement the inser-
tion and deletion algorithms—in each node. This information can include the type
of node (internal or leaf), the number of current entries q in the node, and pointers
to the parent and sibling nodes. Hence, before we do the above calculations for p
and pleaf, we should reduce the block size by the amount of space needed for all such
information. The next example illustrates how we can calculate the number of
entries in a B+-tree.
Example 6. Suppose that we construct a B+-tree on the field in Example 5. To cal-
culate the approximate number of entries in the B+-tree, we assume that each node
is 69 percent full. On the average, each internal node will have 34 * 0.69 or approxi-
mately 23 pointers, and hence 22 values. Each leaf node, on the average, will hold
0.69 * pleaf = 0.69 * 31 or approximately 21 data record pointers. A B
+-tree will have
the following average number of entries at each level:
Root: 1 node 22 key entries 23 pointers
Level 1: 23 nodes 506 key entries 529 pointers
Level 2: 529 nodes 11,638 key entries 12,167 pointers
Leaf level: 12,167 nodes 255,507 data record pointers
659
Indexing Structures for Files
For the block size, pointer size, and search field size given above, a three-level B+-
tree holds up to 255,507 record pointers, with the average 69 percent occupancy of
nodes. Compare this to the 65,535 entries for the corresponding B-tree in Example
4. This is the main reason that B+-trees are preferred to B-trees as indexes to data-
base files.
Search, Insertion, and Deletion with B+-Trees. Algorithm 2 outlines the pro-
cedure using the B+-tree as the access structure to search for a record. Algorithm 3
illustrates the procedure for inserting a record in a file with a B+-tree access struc-
ture. These algorithms assume the existence of a key search field, and they must be
modified appropriately for the case of a B+-tree on a nonkey field. We illustrate
insertion and deletion with an example.
Algorithm 2. Searching for a Record with Search Key Field Value K, Using a
B+-tree
n ← block containing root node of B+-tree;
read block n;
while (n is not a leaf node of the B+-tree) do
begin
q ← number of tree pointers in node n;
if K ≤ n.K1 (*n.Ki refers to the ith search field value in node n*)
then n ← n.P1 (*n.Pi refers to the ith tree pointer in node n*)
else if K > n.Kq−1
then n ← n.Pq
else begin
search node n for an entry i such that n.Ki−1 < K ≤n.Ki;
n ← n.Pi
end;
read block n
end;
search block n for entry (Ki, Pri) with K = Ki; (* search leaf node *)
if found
then read data file block with address Pri and retrieve record
else the record with search field value K is not in the data file;
Algorithm 3. Inserting a Record with Search Key Field Value K in a B+-tree of
Order p
n ← block containing root node of B+-tree;
read block n; set stack S to empty;
while (n is not a leaf node of the B+-tree) do
begin
push address of n on stack S;
(*stack S holds parent nodes that are needed in case of split*)
q ← number of tree pointers in node n;
if K ≤n.K1 (*n.Ki refers to the ith search field value in node n*)
660
then n ← n.P1 (*n.Pi refers to the ith tree pointer in node n*)
else if K > n.Kq−1
then n ← n.Pq
else begin
search node n for an entry i such that n.Ki−1 < K ≤n.Ki;
n ← n.Pi
end;
read block n
end;
search block n for entry (Ki,Pri) with K = Ki; (*search leaf node n*)
if found
then record already in file; cannot insert
else (*insert entry in B+-tree to point to record*)
begin
create entry (K, Pr) where Pr points to the new record;
if leaf node n is not full
then insert entry (K, Pr) in correct position in leaf node n
else begin (*leaf node n is full with pleaf record pointers; is split*)
copy n to temp (*temp is an oversize leaf node to hold extra
entries*);
insert entry (K, Pr) in temp in correct position;
(*temp now holds pleaf + 1 entries of the form (Ki, Pri)*)
new ← a new empty leaf node for the tree; new.Pnext ← n.Pnext ;
j ← ⎡(pleaf + 1)/2 ⎤ ;
n ← first j entries in temp (up to entry (Kj, Prj)); n.Pnext ← new;
new ← remaining entries in temp; K ← Kj ;
(*now we must move (K, new) and insert in parent internal node;
however, if parent is full, split may propagate*)
finished ← false;
repeat
if stack S is empty
then (*no parent node; new root node is created for the tree*)
begin
root ← a new empty internal node for the tree;
root ← ; finished ← true;
end
else begin
n ← pop stack S;
if internal node n is not full
then
begin (*parent node not full; no split*)
insert (K, new) in correct position in internal node n;
finished ← true
end
else begin (*internal node n is full with p tree pointers;
overflow condition; node is split*)
Indexing Structures for Files
661
Indexing Structures for Files
copy n to temp (*temp is an oversize internal node*);
insert (K, new) in temp in correct position;
(*temp now has p + 1 tree pointers*)
new ← a new empty internal node for the tree;
j ← ⎣((p + 1)/2⎦ ;
n ← entries up to tree pointer Pj in temp;
(*n contains *)
new ← entries from tree pointer Pj+1 in temp;
(*new contains < Pj+1, Kj+1, ..., Kp−1, Pp, Kp, Pp+1 >*)
K ← Kj
(*now we must move (K, new) and insert in parent
internal node*)
end
end
until finished
end;
end;
Figure 12 illustrates insertion of records in a B+-tree of order p = 3 and pleaf = 2.
First, we observe that the root is the only node in the tree, so it is also a leaf node. As
soon as more than one level is created, the tree is divided into internal nodes and
leaf nodes. Notice that every key value must exist at the leaf level, because all data
pointers are at the leaf level. However, only some values exist in internal nodes to
guide the search. Notice also that every value appearing in an internal node also
appears as the rightmost value in the leaf level of the subtree pointed at by the tree
pointer to the left of the value.
When a leaf node is full and a new entry is inserted there, the node overflows and
must be split. The first j = ⎡((pleaf + 1)/2)⎤ entries in the original node are kept
there, and the remaining entries are moved to a new leaf node. The jth search value
is replicated in the parent internal node, and an extra pointer to the new node is cre-
ated in the parent. These must be inserted in the parent node in their correct
sequence. If the parent internal node is full, the new value will cause it to overflow
also, so it must be split. The entries in the internal node up to Pj—the jth tree
pointer after inserting the new value and pointer, where j = ⎣((p + 1)/2)⎦—are kept,
while the jth search value is moved to the parent, not replicated. A new internal
node will hold the entries from Pj+1 to the end of the entries in the node (see
Algorithm 3). This splitting can propagate all the way up to create a new root node
and hence a new level for the B+-tree.
Figure 13 illustrates deletion from a B+-tree. When an entry is deleted, it is always
removed from the leaf level. If it happens to occur in an internal node, it must also
be removed from there. In the latter case, the value to its left in the leaf node must
replace it in the internal node because that value is now the rightmost entry in the
subtree. Deletion may cause underflow by reducing the number of entries in the
leaf node to below the minimum required. In this case, we try to find a sibling leaf
node—a leaf node directly to the left or to the right of the node with underflow—
662
Indexing Structures for Files
5 0 8
Insert 1: overflow (new level)
0
5
1 0 5 0 8 0
5
3 5
5
3
3
8
3 7 8
8
Tree node pointer
Data pointer
Null tree pointerInsert 7
Insert 9
Insert 6: overflow (split, propagates)
Insert 3: overflow
(split)
Insert 12: overflow (split, propagates,
new level)
1 0 5 0 7 0 8 0
1 0 5 0
5 0 12 0
3 0
1 0 3 0
5 01 0 3 0
7 0 8 0
7 0 8 0
12 09 07 0 8 0
5
5
5 01 0 3 0 8 0 12 09 06 0 7 0
Insertion sequence: 8, 5, 1, 7, 3, 12, 9, 6
0
Figure 12
An example of insertion in a B+-tree with p = 3 and pleaf = 2.
663
Indexing Structures for Files
7
1 6
7
1 6 9
Deletion sequence: 5, 12, 9
1 o 5 o 6
Delete 5
o 8 o 9 o 12 o7 o
1 o 6 o 8 o 9 o 12 o7 o
9
7
1 6 8
1 o 6 o 8 o 9 o7 o
Delete 12: underflow
(redistribute)
Delete 9: underflow
(merge with left, redistribute)6
1 7
1 o 6 o 8 o7 o
Figure 13
An example of deletion from a B+-tree.
and redistribute the entries among the node and its sibling so that both are at least
half full; otherwise, the node is merged with its siblings and the number of leaf
nodes is reduced. A common method is to try to redistribute entries with the left
sibling; if this is not possible, an attempt to redistribute with the right sibling is
664
Indexing Structures for Files
made. If this is also not possible, the three nodes are merged into two leaf nodes. In
such a case, underflow may propagate to internal nodes because one fewer tree
pointer and search value are needed. This can propagate and reduce the tree levels.
Notice that implementing the insertion and deletion algorithms may require parent
and sibling pointers for each node, or the use of a stack as in Algorithm 3. Each node
should also include the number of entries in it and its type (leaf or internal).
Another alternative is to implement insertion and deletion as recursive
procedures.11
Variations of B-Trees and B+-Trees. To conclude this section, we briefly men-
tion some variations of B-trees and B+-trees. In some cases, constraint 5 on the B-
tree (or for the internal nodes of the B+–tree, except the root node), which requires
each node to be at least half full, can be changed to require each node to be at least
two-thirds full. In this case the B-tree has been called a B*-tree. In general, some
systems allow the user to choose a fill factor between 0.5 and 1.0, where the latter
means that the B-tree (index) nodes are to be completely full. It is also possible to
specify two fill factors for a B+-tree: one for the leaf level and one for the internal
nodes of the tree. When the index is first constructed, each node is filled up to
approximately the fill factors specified. Some investigators have suggested relaxing
the requirement that a node be half full, and instead allow a node to become com-
pletely empty before merging, to simplify the deletion algorithm. Simulation studies
show that this does not waste too much additional space under randomly distrib-
uted insertions and deletions.
4 Indexes on Multiple Keys
In our discussion so far, we have assumed that the primary or secondary keys on
which files were accessed were single attributes (fields). In many retrieval and
update requests, multiple attributes are involved. If a certain combination of attrib-
utes is used frequently, it is advantageous to set up an access structure to provide
efficient access by a key value that is a combination of those attributes.
For example, consider an EMPLOYEE file containing attributes Dno (department
number), Age, Street, City, Zip_code, Salary and Skill_code, with the key of Ssn (Social
Security number). Consider the query: List the employees in department number 4
whose age is 59. Note that both Dno and Age are nonkey attributes, which means that
a search value for either of these will point to multiple records. The following alter-
native search strategies may be considered:
1. Assuming Dno has an index, but Age does not, access the records having
Dno = 4 using the index, and then select from among them those records that
satisfy Age = 59.
11For more details on insertion and deletion algorithms for B+ trees, consult Ramakrishnan and Gehrke
[2003].
665
Indexing Structures for Files
2. Alternately, if Age is indexed but Dno is not, access the records having Age =
59 using the index, and then select from among them those records that sat-
isfy Dno = 4.
3. If indexes have been created on both Dno and Age, both indexes may be used;
each gives a set of records or a set of pointers (to blocks or records). An inter-
section of these sets of records or pointers yields those records or pointers
that satisfy both conditions.
All of these alternatives eventually give the correct result. However, if the set of
records that meet each condition (Dno = 4 or Age = 59) individually are large, yet
only a few records satisfy the combined condition, then none of the above is an effi-
cient technique for the given search request. A number of possibilities exist that
would treat the combination < Dno, Age> or < Age, Dno> as a search key made up of
multiple attributes. We briefly outline these techniques in the following sections. We
will refer to keys containing multiple attributes as composite keys.
4.1 Ordered Index on Multiple Attributes
All the discussion in this chapter so far still applies if we create an index on a search
key field that is a combination of . The search key is a pair of values <4,
59> in the above example. In general, if an index is created on attributes , the search key values are tuples with n values: .
A lexicographic ordering of these tuple values establishes an order on this compos-
ite search key. For our example, all of the department keys for department number
3 precede those for department number 4. Thus <3, n> precedes <4, m> for any val-
ues of m and n. The ascending key order for keys with Dno = 4 would be <4, 18>, <4,
19>, <4, 20>, and so on. Lexicographic ordering works similarly to ordering of
character strings. An index on a composite key of n attributes works similarly to any
index discussed in this chapter so far.
4.2 Partitioned Hashing
Partitioned hashing is an extension of static external hashing that allows access on
multiple keys. It is suitable only for equality comparisons; range queries are not sup-
ported. In partitioned hashing, for a key consisting of n components, the hash func-
tion is designed to produce a result with n separate hash addresses. The bucket
address is a concatenation of these n addresses. It is then possible to search for the
required composite search key by looking up the appropriate buckets that match the
parts of the address in which we are interested.
For example, consider the composite search key . If Dno and Age are
hashed into a 3-bit and 5-bit address respectively, we get an 8-bit bucket address.
Suppose that Dno = 4 has a hash address ‘100’ and Age = 59 has hash address ‘10101’.
Then to search for the combined search value, Dno = 4 and Age = 59, one goes to
bucket address 100 10101; just to search for all employees with Age = 59, all buckets
(eight of them) will be searched whose addresses are ‘000 10101’, ‘001 10101’, … and
666
Indexing Structures for Files
Linear Scale for Age
EMPLOYEE file Bucket pool
Bucket pool
4
5
3
2
1
0
0 1 2 3 4 5
< 20 21–25 26–30 31–40 41–50 > 50
0 1 2 3 4 5
Dno
Linear scale
for Dno
0 1, 2
3, 4
5
6, 7
8
9, 10
1
2
3
4
5
Figure 14
Example of a grid array on Dno and Age attributes.
so on. An advantage of partitioned hashing is that it can be easily extended to any
number of attributes. The bucket addresses can be designed so that high-order bits
in the addresses correspond to more frequently accessed attributes. Additionally, no
separate access structure needs to be maintained for the individual attributes. The
main drawback of partitioned hashing is that it cannot handle range queries on any
of the component attributes.
4.3 Grid Files
Another alternative is to organize the EMPLOYEE file as a grid file. If we want to
access a file on two keys, say Dno and Age as in our example, we can construct a grid
array with one linear scale (or dimension) for each of the search attributes. Figure
14 shows a grid array for the EMPLOYEE file with one linear scale for Dno and
another for the Age attribute. The scales are made in a way as to achieve a uniform
distribution of that attribute. Thus, in our example, we show that the linear scale for
Dno has Dno = 1, 2 combined as one value 0 on the scale, while Dno = 5 corresponds
to the value 2 on that scale. Similarly, Age is divided into its scale of 0 to 5 by group-
ing ages so as to distribute the employees uniformly by age. The grid array shown
for this file has a total of 36 cells. Each cell points to some bucket address where the
records corresponding to that cell are stored. Figure 14 also shows the assignment of
cells to buckets (only partially).
Thus our request for Dno = 4 and Age = 59 maps into the cell (1, 5) corresponding
to the grid array. The records for this combination will be found in the correspond-
ing bucket. This method is particularly useful for range queries that would map into
a set of cells corresponding to a group of values along the linear scales. If a range
query corresponds to a match on the some of the grid cells, it can be processed by
accessing exactly the buckets for those grid cells. For example, a query for Dno ≤ 5
667
Indexing Structures for Files
and Age > 40 refers to the data in the top bucket shown in Figure 14. The grid file
concept can be applied to any number of search keys. For example, for n search keys,
the grid array would have n dimensions. The grid array thus allows a partitioning of
the file along the dimensions of the search key attributes and provides an access by
combinations of values along those dimensions. Grid files perform well in terms of
reduction in time for multiple key access. However, they represent a space overhead
in terms of the grid array structure. Moreover, with dynamic files, a frequent reor-
ganization of the file adds to the maintenance cost.12
5 Other Types of Indexes
5.1 Hash Indexes
It is also possible to create access structures similar to indexes that are based on
hashing. The hash index is a secondary structure to access the file by using hashing
on a search key other than the one used for the primary data file organization. The
index entries are of the type or , where Pr is a pointer to the record
containing the key, or P is a pointer to the block containing the record for that key.
The index file with these index entries can be organized as a dynamically expand-
able hash file; searching for an entry uses the hash search algorithm on K. Once an
entry is found, the pointer Pr (or P) is used to locate the corresponding record in the
data file. Figure 15 illustrates a hash index on the Emp_id field for a file that has been
stored as a sequential file ordered by Name. The Emp_id is hashed to a bucket num-
ber by using a hashing function: the sum of the digits of Emp_id modulo 10. For
example, to find Emp_id 51024, the hash function results in bucket number 2; that
bucket is accessed first. It contains the index entry < 51024, Pr >; the pointer Pr
leads us to the actual record in the file. In a practical application, there may be thou-
sands of buckets; the bucket number, which may be several bits long, would be sub-
jected to the directory schemes related to dynamic hashing. Other search structures
can also be used as indexes.
5.2 Bitmap Indexes
The bitmap index is another popular data structure that facilitates querying on
multiple keys. Bitmap indexing is used for relations that contain a large number of
rows. It creates an index for one or more columns, and each value or value range in
those columns is indexed. Typically, a bitmap index is created for those columns
that contain a fairly small number of unique values. To build a bitmap index on a set
of records in a relation, the records must be numbered from 0 to n with an id (a
record id or a row id) that can be mapped to a physical address made of a block
number and a record offset within the block.
12Insertion/deletion algorithms for grid files may be found in Nievergelt et al. (1984).
668
Indexing Structures for Files
Bucket 0 Emp_id
. . . . . . . . . .
12676 Marcus M . .
. . . . . . . . . .
13646 Hanson M . .
. . . . . . . . . .
21124 Dunhill M . .
. . . . . . . . . .
23402 Clarke F . .
. . . . . . . . . .
34723 Ferragamo F . .
. . . . . . . . . .
41301 Zara F . .
. . . . . . . . . .
51024 Bass M . .
. . . . . . . . . .
62104 England M . .
. . . . . . . . . .
71221 Abercombe F . .
. . . . . . . . . .
81165 Gucci F . .
. . . . . . . . . .
13646
21124
. . . . .
Lastname Sex . . . . .
Bucket 1
23402
81165
. . . . .
Bucket 2
51024
12676
. . . . .
Bucket 3
62104
71221
. . . . .
Bucket 9
34723
41301
. . . . .
Figure 15
Hash-based indexing.
A bitmap index is built on one particular value of a particular field (the column in a
relation) and is just an array of bits. Consider a bitmap index for the column C and
a value V for that column. For a relation with n rows, it contains n bits. The ith bit is
set to 1 if the row i has the value V for column C; otherwise it is set to a 0. If C con-
tains the valueset with m distinct values, then m bitmap indexes
would be created for that column. Figure 16 shows the relation EMPLOYEE with
columns Emp_id, Lname, Sex, Zipcode, and Salary_grade (with just 8 rows for illustra-
tion) and a bitmap index for the Sex and Zipcode columns. As an example, if the
bitmap for Sex = F, the bits for Row_ids 1, 3, 4, and 7 are set to 1, and the rest of the
bits are set to 0, the bitmap indexes could have the following query applications:
■ For the query C1 = V1 , the corresponding bitmap for value V1 returns the
Row_ids containing the rows that qualify.
669
Indexing Structures for Files
EMPLOYEE
Row_id Emp_id Lname Sex Zipcode Salary_grade
0 51024 Bass M 94040 ..
1 23402 Clarke F 30022 ..
2 62104 England M 19046 ..
3 34723 Ferragamo F 30022 ..
4 81165 Gucci F 19046 ..
5 13646 Hanson M 19046 ..
6 12676 Marcus M 30022 ..
7 41301 Zara F 94040 ..
Bitmap index for Sex
M F
10100110 01011001
Bitmap index for Zipcode
Zipcode 19046 Zipcode 30022 Zipcode 94040
00101100 01010010 10000001
Figure 16
Bitmap indexes for
Sex and Zipcode
■ For the query C1= V1 and C2 = V2 (a multikey search request), the two cor-
responding bitmaps are retrieved and intersected (logically AND-ed) to
yield the set of Row_ids that qualify. In general, k bitvectors can be intersected
to deal with k equality conditions. Complex AND-OR conditions can also be
supported using bitmap indexing.
■ To retrieve a count of rows that qualify for the condition C1 = V1, the “1”
entries in the corresponding bitvector are counted.
■ Queries with negation, such as C1 ¬ = V1, can be handled by applying the
Boolean complement operation on the corresponding bitmap.
Consider the example in Figure 16. To find employees with Sex = F and
Zipcode = 30022, we intersect the bitmaps “01011001” and “01010010” yielding
Row_ids 1 and 3. Employees who do not live in Zipcode = 94040 are obtained by
complementing the bitvector “10000001” and yields Row_ids 1 through 6. In gen-
eral, if we assume uniform distribution of values for a given column, and if one col-
umn has 5 distinct values and another has 10 distinct values, the join condition on
these two can be considered to have a selectivity of 1/50 (=1/5 * 1/10). Hence, only
about 2 percent of the records would actually have to be retrieved. If a column has
only a few values, like the Sex column in Figure 16, retrieval of the Sex = M condi-
tion on average would retrieve 50 percent of the rows; in such cases, it is better to do
a complete scan rather than use bitmap indexing.
In general, bitmap indexes are efficient in terms of the storage space that they need.
If we consider a file of 1 million rows (records) with record size of 100 bytes per row,
each bitmap index would take up only one bit per row and hence would use 1 mil-
lion bits or 125 Kbytes. Suppose this relation is for 1 million residents of a state, and
they are spread over 200 ZIP Codes; the 200 bitmaps over Zipcodes contribute 200
bits (or 25 bytes) worth of space per row; hence, the 200 bitmaps occupy only 25
percent as much space as the data file. They allow an exact retrieval of all residents
who live in a given ZIP Code by yielding their Row_ids.
670
Indexing Structures for Files
When records are deleted, renumbering rows and shifting bits in bitmaps becomes
expensive. Another bitmap, called the existence bitmap, can be used to avoid this
expense. This bitmap has a 0 bit for the rows that have been deleted but are still
present and a 1 bit for rows that actually exist. Whenever a row is inserted in the
relation, an entry must be made in all the bitmaps of all the columns that have a
bitmap index; rows typically are appended to the relation or may replace deleted
rows. This process represents an indexing overhead.
Large bitvectors are handled by treating them as a series of 32-bit or 64-bit vectors,
and corresponding AND, OR, and NOT operators are used from the instruction set
to deal with 32- or 64-bit input vectors in a single instruction. This makes bitvector
operations computationally very efficient.
Bitmaps for B+-Tree Leaf Nodes. Bitmaps can be used on the leaf nodes of
B+-tree indexes as well as to point to the set of records that contain each specific
value of the indexed field in the leaf node. When the B+-tree is built on a nonkey
search field, the leaf record must contain a list of record pointers alongside each
value of the indexed attribute. For values that occur very frequently, that is, in a
large percentage of the relation, a bitmap index may be stored instead of the point-
ers. As an example, for a relation with n rows, suppose a value occurs in 10 percent
of the file records. A bitvector would have n bits, having the “1” bit for those Row_ids
that contain that search value, which is n/8 or 0.125n bytes in size. If the record
pointer takes up 4 bytes (32 bits), then the n/10 record pointers would take up
4 * n/10 or 0.4n bytes. Since 0.4n is more than 3 times larger than 0.125n, it is better
to store the bitmap index rather than the record pointers. Hence for search values
that occur more frequently than a certain ratio (in this case that would be 1/32), it is
beneficial to use bitmaps as a compressed storage mechanism for representing the
record pointers in B+-trees that index a nonkey field.
5.3 Function-Based Indexing
In this section we discuss a new type of indexing, called function-based indexing,
that has been introduced in the Oracle relational DBMS as well as in some other
commercial products.13
The idea behind function-based indexing is to create an index such that the value
that results from applying some function on a field or a collection of fields becomes
the key to the index. The following examples show how to create and use function-
based indexes.
Example 1. The following statement creates a function-based index on the
EMPLOYEE table based on an uppercase representation of the Lname column, which
can be entered in many ways but is always queried by its uppercase representation.
CREATE INDEX upper_ix ON Employee (UPPER(Lname));
13Rafi Ahmed contributed most of this section.
671
Indexing Structures for Files
This statement will create an index based on the function UPPER(Lname), which
returns the last name in uppercase letters; for example, UPPER(‘Smith’) will
return ‘SMITH’.
Function-based indexes ensure that Oracle Database system will use the index
rather than perform a full table scan, even when a function is used in the search
predicate of a query. For example, the following query will use the index:
SELECT First_name, Lname
FROM Employee
WHERE UPPER(Lname)= “SMITH”.
Without the function-based index, an Oracle Database might perform a full table
scan, since a B+-tree index is searched only by using the column value directly; the
use of any function on a column prevents such an index from being used.
Example 2. In this example, the EMPLOYEE table is supposed to contain two
fields—salary and commission_pct (commission percentage)—and an index is being
created on the sum of salary and commission based on the commission_pct.
CREATE INDEX income_ix
ON Employee(Salary + (Salary*Commission_pct));
The following query uses the income_ix index even though the fields salary and
commission_pct are occurring in the reverse order in the query when compared to
the index definition.
SELECT First_name, Lname
FROM Employee
WHERE ((Salary*Commission_pct) + Salary ) > 15000;
Example 3. This is a more advanced example of using function-based indexing to
define conditional uniqueness. The following statement creates a unique function-
based index on the ORDERS table that prevents a customer from taking advantage of
a promotion id (“blowout sale”) more than once. It creates a composite index on the
Customer_id and Promotion_id fields together, and it allows only one entry in the index
for a given Customer_id with the Promotion_id of “2” by declaring it as a unique index.
CREATE UNIQUE INDEX promo_ix ON Orders
(CASE WHEN Promotion_id = 2 THEN Customer_id ELSE NULL END,
CASE WHEN Promotion_id = 2 THEN Promotion_id ELSE NULL END);
Note that by using the CASE statement, the objective is to remove from the index any
rows where Promotion_id is not equal to 2. Oracle Database does not store in the B+-
tree index any rows where all the keys are NULL. Therefore, in this example, we map
both Customer_id and Promotion_id to NULL unless Promotion_id is equal to 2. The
result is that the index constraint is violated only if Promotion_id is equal to 2, for two
(attempted insertions of) rows with the same Customer_id value.
672
Indexing Structures for Files
6 Some General Issues
Concerning Indexing
6.1 Logical versus Physical Indexes
In the earlier discussion, we have assumed that the index entries (or ) always include a physical pointer Pr (or P) that specifies the physical record
address on disk as a block number and offset. This is sometimes called a physical
index, and it has the disadvantage that the pointer must be changed if the record is
moved to another disk location. For example, suppose that a primary file organiza-
tion is based on linear hashing or extendible hashing; then, each time a bucket is
split, some records are allocated to new buckets and hence have new physical
addresses. If there was a secondary index on the file, the pointers to those records
would have to be found and updated, which is a difficult task.
To remedy this situation, we can use a structure called a logical index, whose index
entries are of the form . Each entry has one value K for the secondary index-
ing field matched with the value Kp of the field used for the primary file organiza-
tion. By searching the secondary index on the value of K, a program can locate the
corresponding value of Kp and use this to access the record through the primary file
organization. Logical indexes thus introduce an additional level of indirection
between the access structure and the data. They are used when physical record
addresses are expected to change frequently. The cost of this indirection is the extra
search based on the primary file organization.
6.2 Discussion
In many systems, an index is not an integral part of the data file but can be created
and discarded dynamically. That is why it is often called an access structure.
Whenever we expect to access a file frequently based on some search condition
involving a particular field, we can request the DBMS to create an index on that
field. Usually, a secondary index is created to avoid physical ordering of the records
in the data file on disk.
The main advantage of secondary indexes is that—theoretically, at least—they can
be created in conjunction with virtually any primary record organization. Hence, a
secondary index could be used to complement other primary access methods such
as ordering or hashing, or it could even be used with mixed files. To create a B+-tree
secondary index on some field of a file, we must go through all records in the file to
create the entries at the leaf level of the tree. These entries are then sorted and filled
according to the specified fill factor; simultaneously, the other index levels are cre-
ated. It is more expensive and much harder to create primary indexes and clustering
indexes dynamically, because the records of the data file must be physically sorted
on disk in order of the indexing field. However, some systems allow users to create
these indexes dynamically on their files by sorting the file during index creation.
It is common to use an index to enforce a key constraint on an attribute. While
searching the index to insert a new record, it is straightforward to check at the same
673
Indexing Structures for Files
time whether another record in the file—and hence in the index tree—has the same
key attribute value as the new record. If so, the insertion can be rejected.
If an index is created on a nonkey field, duplicates occur; handling of these dupli-
cates is an issue the DBMS product vendors have to deal with and affects data stor-
age as well as index creation and management. Data records for the duplicate key
may be contained in the same block or may span multiple blocks where many dupli-
cates are possible. Some systems add a row id to the record so that records with
duplicate keys have their own unique identifiers. In such cases, the B+-tree index
may regard a combination as the de facto key for the index, turning
the index into a unique index with no duplicates. The deletion of a key K from such
an index would involve deleting all occurrences of that key K—hence the deletion
algorithm has to account for this.
In actual DBMS products, deletion from B+-tree indexes is also handled in various
ways to improve performance and response times. Deleted records may be marked
as deleted and the corresponding index entries may also not be removed until a
garbage collection process reclaims the space in the data file; the index is rebuilt
online after garbage collection.
A file that has a secondary index on every one of its fields is often called a fully
inverted file. Because all indexes are secondary, new records are inserted at the end
of the file; therefore, the data file itself is an unordered (heap) file. The indexes are
usually implemented as B+-trees, so they are updated dynamically to reflect inser-
tion or deletion of records. Some commercial DBMSs, such as Software AG’s
Adabas, use this method extensively.
We referred to the popular IBM file organization called ISAM in Section 2. Another
IBM method, the virtual storage access method (VSAM), is somewhat similar to the
B+–tree access structure and is still being used in many commercial systems.
6.3 Column-Based Storage of Relations
There has been a recent trend to consider a column-based storage of relations as an
alternative to the traditional way of storing relations row by row. Commercial rela-
tional DBMSs have offered B+-tree indexing on primary as well as secondary keys as
an efficient mechanism to support access to data by various search criteria and the
ability to write a row or a set of rows to disk at a time to produce write-optimized
systems. For data warehouses, which are read-only databases, the column-based
storage offers particular advantages for read-only queries. Typically, the column-
store RDBMSs consider storing each column of data individually and afford per-
formance advantages in the following areas:
■ Vertically partitioning the table column by column, so that a two-column
table can be constructed for every attribute and thus only the needed
columns can be accessed
■ Use of column-wise indexes (similar to the bitmap indexes discussed in
Section 5.2) and join indexes on multiple tables to answer queries without
having to access the data tables
674
Indexing Structures for Files
■ Use of materialized views to support queries on multiple columns
Column-wise storage of data affords additional freedom in the creation of indexes,
such as the bitmap indexes discussed earlier. The same column may be present in
multiple projections of a table and indexes may be created on each projection. To
store the values in the same column, strategies for data compression, null-value sup-
pression, dictionary encoding techniques (where distinct values in the column are
assigned shorter codes), and run-length encoding techniques have been devised.
MonetDB/X100, C-Store, and Vertica are examples of such systems. Further discus-
sion on column-store DBMSs can be found in the references mentioned in this
chapter’s Selected Bibliography.
7 Summary
In this chapter we presented file organizations that involve additional access struc-
tures, called indexes, to improve the efficiency of retrieval of records from a data file.
These access structures may be used in conjunction with primary file organizations,
which are used to organize the file records themselves on disk.
Three types of ordered single-level indexes were introduced: primary, clustering, and
secondary. Each index is specified on a field of the file. Primary and clustering
indexes are constructed on the physical ordering field of a file, whereas secondary
indexes are specified on nonordering fields as additional access structures to improve
performance of queries and transactions. The field for a primary index must also be
a key of the file, whereas it is a nonkey field for a clustering index. A single-level index
is an ordered file and is searched using a binary search. We showed how multilevel
indexes can be constructed to improve the efficiency of searching an index.
Next we showed how multilevel indexes can be implemented as B-trees and B+-
trees, which are dynamic structures that allow an index to expand and shrink
dynamically. The nodes (blocks) of these index structures are kept between half full
and completely full by the insertion and deletion algorithms. Nodes eventually sta-
bilize at an average occupancy of 69 percent full, allowing space for insertions with-
out requiring reorganization of the index for the majority of insertions. B+-trees
can generally hold more entries in their internal nodes than can B-trees, so they may
have fewer levels or hold more entries than does a corresponding B-tree.
We gave an overview of multiple key access methods, and showed how an index can
be constructed based on hash data structures. We discussed the hash index in some
detail—it is a secondary structure to access the file by using hashing on a search key
other than that used for the primary organization. Bitmap indexing is another
important type of indexing used for querying by multiple keys and is particularly
applicable on fields with a small number of unique values. Bitmaps can also be used
at the leaf nodes of B+ tree indexes as well. We also discussed function-based index-
ing, which is being provided by relational vendors to allow special indexes on a
function of one or more attributes.
675
Indexing Structures for Files
We introduced the concept of a logical index and compared it with the physical
indexes we described before. They allow an additional level of indirection in index-
ing in order to permit greater freedom for movement of actual record locations on
disk. We also reviewed some general issues related to indexing, and commented on
column-based storage of relations, which has particular advantages for read-only
databases. Finally, we discussed how combinations of the above organizations can
be used. For example, secondary indexes are often used with mixed files, as well as
with unordered and ordered files.
Review Questions
1. Define the following terms: indexing field, primary key field, clustering field,
secondary key field, block anchor, dense index, and nondense (sparse) index.
2. What are the differences among primary, secondary, and clustering indexes?
How do these differences affect the ways in which these indexes are imple-
mented? Which of the indexes are dense, and which are not?
3. Why can we have at most one primary or clustering index on a file, but sev-
eral secondary indexes?
4. How does multilevel indexing improve the efficiency of searching an index
file?
5. What is the order p of a B-tree? Describe the structure of B-tree nodes.
6. What is the order p of a B+-tree? Describe the structure of both internal and
leaf nodes of a B+-tree.
7. How does a B-tree differ from a B+-tree? Why is a B+-tree usually preferred
as an access structure to a data file?
8. Explain what alternative choices exist for accessing a file based on multiple
search keys.
9. What is partitioned hashing? How does it work? What are its limitations?
10. What is a grid file? What are its advantages and disadvantages?
11. Show an example of constructing a grid array on two attributes on some file.
12. What is a fully inverted file? What is an indexed sequential file?
13. How can hashing be used to construct an index?
14. What is bitmap indexing? Create a relation with two columns and sixteen
tuples and show an example of a bitmap index on one or both.
15. What is the concept of function-based indexing? What additional purpose
does it serve?
16. What is the difference between a logical index and a physical index?
17. What is column-based storage of a relational database?
676
Indexing Structures for Files
Exercises
18. Consider a disk with block size B = 512 bytes. A block pointer is P = 6 bytes
long, and a record pointer is PR = 7 bytes long. A file has r = 30,000
EMPLOYEE records of fixed length. Each record has the following fields: Name
(30 bytes), Ssn (9 bytes), Department_code (9 bytes), Address (40 bytes),
Phone (10 bytes), Birth_date (8 bytes), Sex (1 byte), Job_code (4 bytes), and
Salary (4 bytes, real number). An additional byte is used as a deletion marker.
a. Calculate the record size R in bytes.
b. Calculate the blocking factor bfr and the number of file blocks b, assum-
ing an unspanned organization.
c. Suppose that the file is ordered by the key field Ssn and we want to con-
struct a primary index on Ssn. Calculate (i) the index blocking factor bfri
(which is also the index fan-out fo); (ii) the number of first-level index
entries and the number of first-level index blocks; (iii) the number of lev-
els needed if we make it into a multilevel index; (iv) the total number of
blocks required by the multilevel index; and (v) the number of block
accesses needed to search for and retrieve a record from the file—given its
Ssn value—using the primary index.
d. Suppose that the file is not ordered by the key field Ssn and we want to
construct a secondary index on Ssn. Repeat the previous exercise (part c)
for the secondary index and compare with the primary index.
e. Suppose that the file is not ordered by the nonkey field Department_code
and we want to construct a secondary index on Department_code, using
option 3 of Section 1.3, with an extra level of indirection that stores
record pointers. Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed
among these values. Calculate (i) the index blocking factor bfri (which is
also the index fan-out fo); (ii) the number of blocks needed by the level of
indirection that stores record pointers; (iii) the number of first-level
index entries and the number of first-level index blocks; (iv) the number
of levels needed if we make it into a multilevel index; (v) the total number
of blocks required by the multilevel index and the blocks used in the extra
level of indirection; and (vi) the approximate number of block accesses
needed to search for and retrieve all records in the file that have a specific
Department_code value, using the index.
f. Suppose that the file is ordered by the nonkey field Department_code and
we want to construct a clustering index on Department_code that uses
block anchors (every new value of Department_code starts at the beginning
of a new block). Assume there are 1,000 distinct values of
Department_code and that the EMPLOYEE records are evenly distributed
among these values. Calculate (i) the index blocking factor bfri (which is
also the index fan-out fo); (ii) the number of first-level index entries and
the number of first-level index blocks; (iii) the number of levels needed if
we make it into a multilevel index; (iv) the total number of blocks
677
Indexing Structures for Files
required by the multilevel index; and (v) the number of block accesses
needed to search for and retrieve all records in the file that have a specific
Department_code value, using the clustering index (assume that multiple
blocks in a cluster are contiguous).
g. Suppose that the file is not ordered by the key field Ssn and we want to
construct a B+-tree access structure (index) on Ssn. Calculate (i) the
orders p and pleaf of the B
+-tree; (ii) the number of leaf-level blocks
needed if blocks are approximately 69 percent full (rounded up for con-
venience); (iii) the number of levels needed if internal nodes are also 69
percent full (rounded up for convenience); (iv) the total number of blocks
required by the B+-tree; and (v) the number of block accesses needed to
search for and retrieve a record from the file—given its Ssn value—using
the B+-tree.
h. Repeat part g, but for a B-tree rather than for a B+-tree. Compare your
results for the B-tree and for the B+-tree.
19. A PARTS file with Part# as the key field includes records with the following
Part# values: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20,
24, 28, 39, 43, 47, 50, 69, 75, 8, 49, 33, 38. Suppose that the search field values
are inserted in the given order in a B+-tree of order p = 4 and pleaf = 3; show
how the tree will expand and what the final tree will look like.
20. Repeat Exercise 19, but use a B-tree of order p = 4 instead of a B+-tree.
21. Suppose that the following search field values are deleted, in the given order,
from the B+-tree of Exercise 19; show how the tree will shrink and show the
final tree. The deleted values are 65, 75, 43, 18, 20, 92, 59, 37.
22. Repeat Exercise 21, but for the B-tree of Exercise 20.
23. Algorithm 1 outlines the procedure for searching a nondense multilevel pri-
mary index to retrieve a file record. Adapt the algorithm for each of the fol-
lowing cases:
a. A multilevel secondary index on a nonkey nonordering field of a file.
Assume that option 3 of Section 1.3 is used, where an extra level of indi-
rection stores pointers to the individual records with the corres-ponding
index field value.
b. A multilevel secondary index on a nonordering key field of a file.
c. A multilevel clustering index on a nonkey ordering field of a file.
24. Suppose that several secondary indexes exist on nonkey fields of a file,
implemented using option 3 of Section 1.3; for example, we could have sec-
ondary indexes on the fields Department_code, Job_code, and Salary of the
EMPLOYEE file of Exercise Describe an efficient way to search for and
retrieve records satisfying a complex selection condition on these fields, such
as (Department_code = 5 AND Job_code = 12 AND Salary = 50,000), using the
record pointers in the indirection level.
678
Indexing Structures for Files
25. Adapt Algorithms 2 and 3, which outline search and insertion procedures for
a B+-tree, to a B-tree.
26. It is possible to modify the B+-tree insertion algorithm to delay the case
where a new level is produced by checking for a possible redistribution of val-
ues among the leaf nodes. Figure 17 (next page) illustrates how this could be
done for our example in Figure 12; rather than splitting the leftmost leaf
node when 12 is inserted, we do a left redistribution by moving 7 to the leaf
node to its left (if there is space in this node). Figure 17 shows how the tree
would look when redistribution is considered. It is also possible to consider
right redistribution. Try to modify the B+-tree insertion algorithm to take
redistribution into account.
27. Outline an algorithm for deletion from a B+-tree.
28. Repeat Exercise 27 for a B-tree.
Selected Bibliography
Bayer and McCreight (1972) introduced B-trees and associated algorithms. Comer
(1979) provides an excellent survey of B-trees and their history, and variations of B-
trees. Knuth (1998) provides detailed analysis of many search techniques, including
B-trees and some of their variations. Nievergelt (1974) discusses the use of binary
search trees for file organization. Textbooks on file structures including Claybrook
(1992), Smith and Barnes (1987), and Salzberg (1988), the algorithms and data
structures textbook by Wirth (1985), as well as the database textbook by
Ramakrihnan and Gehrke (2003) discuss indexing in detail and may be consulted
for search, insertion, and deletion algorithms for B-trees and B+-trees. Larson
(1981) analyzes index-sequential files, and Held and Stonebraker (1978) compare
static multilevel indexes with B-tree dynamic indexes. Lehman and Yao (1981) and
Srinivasan and Carey (1991) did further analysis of concurrent access to B-trees.
The books by Wiederhold (1987), Smith and Barnes (1987), and Salzberg (1988),
among others, discuss many of the search techniques described in this chapter. Grid
files are introduced in Nievergelt et al. (1984). Partial-match retrieval, which uses
partitioned hashing, is discussed in Burkhard (1976, 1979).
New techniques and applications of indexes and B+-trees are discussed in Lanka
and Mays (1991), Zobel et al. (1992), and Faloutsos and Jagadish (1992). Mohan
and Narang (1992) discuss index creation. The performance of various B–tree and
B+-tree algorithms is assessed in Baeza-Yates and Larson (1989) and Johnson and
Shasha (1993). Buffer management for indexes is discussed in Chan et al. (1992).
Column-based storage of databases was proposed by Stonebraker et al. (2005) in the
C-Store database system; MonetDB/X100 by Boncz et al. (2008) is another imple-
mentation of the idea. Abadi et al. (2008) discuss the advantages of column stores
over row-stored databases for read-only database applications.
679
Indexing Structures for Files
1 0 3 0 5 0 7 0 8 0
3 5
Insert 12: overflow (left
redistribution)
Insert 9: overflow (new level)
Insert 6: overflow (split)
1 0 3 0 5 0 7 0 8 0 12 0
1 0 3 0 5 0 7 0 8 0 9 0 12 0
12 0
3 7
3 9
7
1 0 3 0 5 0 6 0 7 0 8 0 9 0
3 6 9
7
Figure 17
B+-tree insertion with left redistribution.
680
Name
Aaron, Ed
Abbott, Diane
Block 1
Acosta, Marc
Ssn Birth_date
…
Job Salary Sex
…
Adams, John
Adams, Robin
Block 2
Akers, Jan
…
Alexander, Ed
Alfred, Bob
Block 3
Allen, Sam
…
Allen, Troy
Anders, Keith
Block 4
Anderson, Rob
…
Anderson, Zach
Angeli, Joe
Block 5
Archer, Sue
…
Arnold, Mack
Arnold, Steven
Block 6
Atkins, Timothy
Wong, James
Wood, Donald
Block n–1
Woods, Manny
…
Wright, Pam
Wyatt, Charles
Block n
Zimmer, Byron
…
Figure A.1
Some blocks of an ordered
(sequential) file of EMPLOYEE
records with Name as the
ordering key field.
681
Indexing Structures for Files
Table A.1 Average Access Times for a File of b Blocks under Basic File Organizations
Average Blocks to Access
Type of Organization Access/Search Method a Specific Record
Heap (unordered) Sequential scan (linear search) b/2
Ordered Sequential scan b/2
Ordered Binary search log2 b
Global depth
d = 3
000
001
010
011
100
101
110
111
d´ = 3 Bucket for records
whose hash values
start with 000
Directory Data file buckets
Local depth of
each bucket
d´ = 3 Bucket for records
whose hash values
start with 001
d´ = 2 Bucket for records
whose hash values
start with 01
d´ = 2 Bucket for records
whose hash values
start with 10
d´ = 3 Bucket for records
whose hash values
start with 110
d´ = 3 Bucket for records
whose hash values
start with 111
Figure A.2
Structure of the
extendible hashing
scheme.
682
Indexing Structures for Files
Data File Buckets
Bucket for records
whose hash values
start with 000
Bucket for records
whose hash values
start with 001
Bucket for records
whose hash values
start with 01
Bucket for records
whose hash values
start with 10
Bucket for records
whose hash values
start with 110
Bucket for records
whose hash values
start with 111
Directory
0
1
0
1
0
1
0
1
0
1
internal directory node
leaf directory node
Figure A.3
Structure of the dynamic hashing scheme.
683
Algorithms for Query
Processing and Optimization
In this chapter we discuss the techniques used inter-nally by a DBMS to process, optimize, and execute
high-level queries. A query expressed in a high-level query language such as SQL
must first be scanned, parsed, and validated.1 The scanner identifies the query
tokens—such as SQL keywords, attribute names, and relation names—that appear
in the text of the query, whereas the parser checks the query syntax to determine
whether it is formulated according to the syntax rules (rules of grammar) of the
query language. The query must also be validated by checking that all attribute and
relation names are valid and semantically meaningful names in the schema of the
particular database being queried. An internal representation of the query is then
created, usually as a tree data structure called a query tree. It is also possible to rep-
resent the query using a graph data structure called a query graph. The DBMS must
then devise an execution strategy or query plan for retrieving the results of the
query from the database files. A query typically has many possible execution strate-
gies, and the process of choosing a suitable one for processing a query is known as
query optimization.
Figure 1 shows the different steps of processing a high-level query. The query opti-
mizer module has the task of producing a good execution plan, and the code gener-
ator generates the code to execute that plan. The runtime database processor has
the task of running (executing) the query code, whether in compiled or interpreted
mode, to produce the query result. If a runtime error results, an error message is
generated by the runtime database processor.
1We will not discuss the parsing and syntax-checking phase of query processing here; this material is
discussed in compiler textbooks.
From Chapter 19 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
684
Algorithms for Query Processing and Optimization
The term optimization is actually a misnomer because in some cases the chosen exe-
cution plan is not the optimal (or absolute best) strategy—it is just a reasonably effi-
cient strategy for executing the query. Finding the optimal strategy is usually too
time-consuming—except for the simplest of queries. In addition, trying to find the
optimal query execution strategy may require detailed information on how the files
are implemented and even on the contents of the files—information that may not
be fully available in the DBMS catalog. Hence, planning of a good execution strategy
may be a more accurate description than query optimization.
For lower-level navigational database languages in legacy systems—such as the
network DML or the hierarchical DL/1—the programmer must choose the query
execution strategy while writing a database program. If a DBMS provides only a
navigational language, there is limited need or opportunity for extensive query opti-
mization by the DBMS; instead, the programmer is given the capability to choose
the query execution strategy. On the other hand, a high-level query language—
such as SQL for relational DBMSs (RDBMSs) or OQL for object DBMSs
(ODBMSs)—is more declarative in nature because it specifies what the intended
results of the query are, rather than identifying the details of how the result should
be obtained. Query optimization is thus necessary for queries that are specified in
a high-level query language.
We will concentrate on describing query optimization in the context of an RDBMS
because many of the techniques we describe have also been adapted for other types
Query in a high-level language
Scanning, parsing, and validating
Immediate form of query
Query optimizer
Execution plan
Query code generator
Code to execute the query
Runtime database processor
Code can be:
Executed directly (interpreted mode)
Stored and executed later whenever
needed (compiled mode)
Result of query
Figure 1
Typical steps when
processing a high-level
query.
685
Algorithms for Query Processing and Optimization
2There are some query optimization problems and techniques that are pertinent only to ODBMSs.
However, we do not discuss them here because we give only an introduction to query optimization.
of database management systems, such as ODBMSs.2 A relational DBMS must sys-
tematically evaluate alternative query execution strategies and choose a reasonably
efficient or near-optimal strategy. Each DBMS typically has a number of general
database access algorithms that implement relational algebra operations such as
SELECT or JOIN or combinations of these operations. Only execution strategies that
can be implemented by the DBMS access algorithms and that apply to the particu-
lar query, as well as to the particular physical database design, can be considered by
the query optimization module.
This chapter starts with a general discussion of how SQL queries are typically trans-
lated into relational algebra queries and then optimized in Section 1. Then we dis-
cuss algorithms for implementing relational algebra operations in Sections 2
through 6. Following this, we give an overview of query optimization strategies.
There are two main techniques that are employed during query optimization. The
first technique is based on heuristic rules for ordering the operations in a query
execution strategy. A heuristic is a rule that works well in most cases but is not guar-
anteed to work well in every case. The rules typically reorder the operations in a
query tree. The second technique involves systematically estimating the cost of dif-
ferent execution strategies and choosing the execution plan with the lowest cost esti-
mate. These techniques are usually combined in a query optimizer. We discuss
heuristic optimization in Section 7 and cost estimation in Section 8. Then we pro-
vide a brief overview of the factors considered during query optimization in the
Oracle commercial RDBMS in Section 9. Section 10 introduces the topic of seman-
tic query optimization, in which known constraints are used as an aid to devising
efficient query execution strategies.
The topics covered in this chapter require that the reader be familiar with SQL, rela-
tional algebra, and file structures and indexing. Also, it is important to note that the
topic of query processing and optimization is vast, and we can only give an intro-
duction to the basic principles and techniques in this chapter.
1 Translating SQL Queries into Relational
Algebra
In practice, SQL is the query language that is used in most commercial RDBMSs. An
SQL query is first translated into an equivalent extended relational algebra expres-
sion—represented as a query tree data structure—that is then optimized. Typically,
SQL queries are decomposed into query blocks, which form the basic units that can
be translated into the algebraic operators and optimized. A query block contains a
single SELECT-FROM-WHERE expression, as well as GROUP BY and HAVING clauses
if these are part of the block. Hence, nested queries within a query are identified as
686
Algorithms for Query Processing and Optimization
separate query blocks. Because SQL includes aggregate operators—such as MAX,
MIN, SUM, and COUNT—these operators must also be included in the extended
algebra.
Consider the following SQL query on the EMPLOYEE relation in Figure A.1 (in
Appendix: Figures at the end of this chapter):
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );
This query retrieves the names of employees (from any department in the com-
pany) who earn a salary that is greater than the highest salary in department 5. The
query includes a nested subquery and hence would be decomposed into two blocks.
The inner block is:
( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 )
This retrieves the highest salary in department 5. The outer query block is:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > c
where c represents the result returned from the inner block. The inner block could
be translated into the following extended relational algebra expression:
ℑMAX Salary(σDno=5(EMPLOYEE))
and the outer block into the expression:
πLname,Fname(σSalary>c(EMPLOYEE))
The query optimizer would then choose an execution plan for each query block.
Notice that in the above example, the inner block needs to be evaluated only once to
produce the maximum salary of employees in department 5, which is then used—as
the constant c—by the outer block. We call this a nested query (without correlation
with the outer query). It is much harder to optimize the more complex correlated
nested queries, where a tuple variable from the outer query block appears in the
WHERE-clause of the inner query block.
2 Algorithms for External Sorting
Sorting is one of the primary algorithms used in query processing. For example,
whenever an SQL query specifies an ORDER BY-clause, the query result must be
sorted. Sorting is also a key component in sort-merge algorithms used for JOIN and
other operations (such as UNION and INTERSECTION), and in duplicate elimination
algorithms for the PROJECT operation (when an SQL query specifies the DISTINCT
687
Algorithms for Query Processing and Optimization
option in the SELECT clause). We will discuss one of these algorithms in this sec-
tion. Note that sorting of a particular file may be avoided if an appropriate index—
such as a primary or clustering index—exists on the desired file attribute to allow
ordered access to the records of the file.
External sorting refers to sorting algorithms that are suitable for large files of
records stored on disk that do not fit entirely in main memory, such as most data-
base files.3 The typical external sorting algorithm uses a sort-merge strategy, which
starts by sorting small subfiles—called runs—of the main file and then merges the
sorted runs, creating larger sorted subfiles that are merged in turn. The sort-merge
algorithm, like other database algorithms, requires buffer space in main memory,
where the actual sorting and merging of the runs is performed. The basic algorithm,
outlined in Figure 2, consists of two phases: the sorting phase and the merging
phase. The buffer space in main memory is part of the DBMS cache—an area in the
computer’s main memory that is controlled by the DBMS. The buffer space is
divided into individual buffers, where each buffer is the same size in bytes as the size
of one disk block. Thus, one buffer can hold the contents of exactly one disk block.
In the sorting phase, runs (portions or pieces) of the file that can fit in the available
buffer space are read into main memory, sorted using an internal sorting algorithm,
and written back to disk as temporary sorted subfiles (or runs). The size of each run
and the number of initial runs (nR) are dictated by the number of file blocks (b)
and the available buffer space (nB). For example, if the number of available main
memory buffers nB = 5 disk blocks and the size of the file b = 1024 disk blocks, then
nR= ⎡(b/nB)⎤ or 205 initial runs each of size 5 blocks (except the last run which will
have only 4 blocks). Hence, after the sorting phase, 205 sorted runs (or 205 sorted
subfiles of the original file) are stored as temporary subfiles on disk.
In the merging phase, the sorted runs are merged during one or more merge
passes. Each merge pass can have one or more merge steps. The degree of merging
(dM) is the number of sorted subfiles that can be merged in each merge step. During
each merge step, one buffer block is needed to hold one disk block from each of the
sorted subfiles being merged, and one additional buffer is needed for containing
one disk block of the merge result, which will produce a larger sorted file that is the
result of merging several smaller sorted subfiles. Hence, dM is the smaller of (nB − 1)
and nR, and the number of merge passes is ⎡(logdM(nR))⎤. In our example where nB =
5, dM = 4 (four-way merging), so the 205 initial sorted runs would be merged 4 at a
time in each step into 52 larger sorted subfiles at the end of the first merge pass.
These 52 sorted files are then merged 4 at a time into 13 sorted files, which are then
merged into 4 sorted files, and then finally into 1 fully sorted file, which means that
four passes are needed.
3Internal sorting algorithms are suitable for sorting data structures, such as tables and lists, that can fit
entirely in main memory. These algorithms are described in detail in data structures and algorithms
books, and include techniques such as quick sort, heap sort, bubble sort, and many others. We do not dis-
cuss these here.
688
Algorithms for Query Processing and Optimization
set i ← 1;
j ← b; {size of the file in blocks}
k ← nB; {size of buffer in blocks}
m ← ⎡( j/k)⎤;
{Sorting Phase}
while (i ≤ m)
do {
read next k blocks of the file into the buffer or if there are less than k blocks
remaining, then read in the remaining blocks;
sort the records in the buffer and write as a temporary subfile;
i ← i + 1;
}
{Merging Phase: merge subfiles until only 1 remains}
set i ← 1;
p ← ⎡logk–1m⎤ {p is the number of passes for the merging phase}
j ← m;
while (i ≤ p)
do {
n ← 1;
q ← ( j/(k–1)⎤ ; {number of subfiles to write in this pass}
while (n ≤ q)
do {
read next k–1 subfiles or remaining subfiles (from previous pass)
one block at a time;
merge and write as new subfile one block at a time;
n ← n + 1;
}
j ← q;
i ← i + 1;
}
Figure 2
Outline of the sort-merge algorithm for external sorting.
The performance of the sort-merge algorithm can be measured in the number of
disk block reads and writes (between the disk and main memory) before the sorting
of the whole file is completed. The following formula approximates this cost:
(2 * b) + (2 * b * (logdM nR))
The first term (2 * b) represents the number of block accesses for the sorting phase,
since each file block is accessed twice: once for reading into a main memory buffer
and once for writing the sorted records back to disk into one of the sorted subfiles.
The second term represents the number of block accesses for the merging phase.
During each merge pass, a number of disk blocks approximately equal to the origi-
nal file blocks b is read and written. Since the number of merge passes is (logdM nR),
we get the total merge cost of (2 * b * (logdM nR)).
689
Algorithms for Query Processing and Optimization
The minimum number of main memory buffers needed is nB = 3, which gives a dM
of 2 and an nR of ⎡(b/3)⎤. The minimum dM of 2 gives the worst-case performance
of the algorithm, which is:
(2 * b) + (2 * (b * (log2 nR))).
The following sections discuss the various algorithms for the operations of the rela-
tional algebra.
3 Algorithms for SELECT and JOIN Operations
3.1 Implementing the SELECT Operation
There are many algorithms for executing a SELECT operation, which is basically a
search operation to locate the records in a disk file that satisfy a certain condition.
Some of the search algorithms depend on the file having specific access paths, and
they may apply only to certain types of selection conditions. We discuss some of the
algorithms for implementing SELECT in this section. We will use the following
operations, specified on the relational database in Figure A.1, to illustrate our dis-
cussion:
OP1: σSsn = ‘123456789’ (EMPLOYEE)
OP2: σDnumber > 5 (DEPARTMENT)
OP3: σDno = 5 (EMPLOYEE)
OP4: σDno = 5 AND Salary > 30000 AND Sex = ‘F’ (EMPLOYEE)
OP5: σEssn=‘123456789’ AND Pno =10(WORKS_ON)
Search Methods for Simple Selection. A number of search algorithms are
possible for selecting records from a file. These are also known as file scans, because
they scan the records of a file to search for and retrieve records that satisfy a selec-
tion condition.4 If the search algorithm involves the use of an index, the index
search is called an index scan. The following search methods (S1 through S6) are
examples of some of the search algorithms that can be used to implement a select
operation:
■ S1—Linear search (brute force algorithm). Retrieve every record in the file,
and test whether its attribute values satisfy the selection condition. Since the
records are grouped into disk blocks, each disk block is read into a main
memory buffer, and then a search through the records within the disk block
is conducted in main memory.
4A selection operation is sometimes called a filter, since it filters out the records in the file that do not
satisfy the selection condition.
690
Algorithms for Query Processing and Optimization
■ S2—Binary search. If the selection condition involves an equality compari-
son on a key attribute on which the file is ordered, binary search—which is
more efficient than linear search—can be used. An example is OP1 if Ssn is
the ordering attribute for the EMPLOYEE file.5
■ S3a—Using a primary index. If the selection condition involves an equality
comparison on a key attribute with a primary index—for example, Ssn =
‘123456789’ in OP1—use the primary index to retrieve the record. Note that
this condition retrieves a single record (at most).
■ S3b—Using a hash key. If the selection condition involves an equality com-
parison on a key attribute with a hash key—for example, Ssn = ‘123456789’
in OP1—use the hash key to retrieve the record. Note that this condition
retrieves a single record (at most).
■ S4—Using a primary index to retrieve multiple records. If the comparison
condition is >, >=, <, or <= on a key field with a primary index—for exam-
ple, Dnumber > 5 in OP2—use the index to find the record satisfying the cor-
responding equality condition (Dnumber = 5), then retrieve all subsequent
records in the (ordered) file. For the condition Dnumber < 5, retrieve all the
preceding records.
■ S5—Using a clustering index to retrieve multiple records. If the selection
condition involves an equality comparison on a nonkey attribute with a
clustering index—for example, Dno = 5 in OP3—use the index to retrieve all
the records satisfying the condition.
■ S6—Using a secondary (B+-tree) index on an equality comparison. This
search method can be used to retrieve a single record if the indexing field is a
key (has unique values) or to retrieve multiple records if the indexing field is
not a key. This can also be used for comparisons involving >, >=, <, or <=.
In Section 8, we discuss how to develop formulas that estimate the access cost of
these search methods in terms of the number of block accesses and access time.
Method S1 (linear search) applies to any file, but all the other methods depend on
having the appropriate access path on the attribute used in the selection condition.
Method S2 (binary search) requires the file to be sorted on the search attribute. The
methods that use an index (S3a, S4, S5, and S6) are generally referred to as index
searches, and they require the appropriate index to exist on the search attribute.
Methods S4 and S6 can be used to retrieve records in a certain range—for example,
30000 <= Salary <= 35000. Queries involving such conditions are called range
queries.
Search Methods for Complex Selection. If a condition of a SELECT operation
is a conjunctive condition—that is, if it is made up of several simple conditions
5Generally, binary search is not used in database searches because ordered files are not used unless
they also have a corresponding primary index.
691
Algorithms for Query Processing and Optimization
connected with the AND logical connective such as OP4 above—the DBMS can use
the following additional methods to implement the operation:
■ S7—Conjunctive selection using an individual index. If an attribute
involved in any single simple condition in the conjunctive select condition
has an access path that permits the use of one of the methods S2 to S6, use
that condition to retrieve the records and then check whether each retrieved
record satisfies the remaining simple conditions in the conjunctive select
condition.
■ S8—Conjunctive selection using a composite index. If two or more attrib-
utes are involved in equality conditions in the conjunctive select condition
and a composite index (or hash structure) exists on the combined fields—
for example, if an index has been created on the composite key (Essn, Pno) of
the WORKS_ON file for OP5—we can use the index directly.
■ S9—Conjunctive selection by intersection of record pointers.6 If second-
ary indexes (or other access paths) are available on more than one of the
fields involved in simple conditions in the conjunctive select condition, and
if the indexes include record pointers (rather than block pointers), then each
index can be used to retrieve the set of record pointers that satisfy the indi-
vidual condition. The intersection of these sets of record pointers gives the
record pointers that satisfy the conjunctive select condition, which are then
used to retrieve those records directly. If only some of the conditions have
secondary indexes, each retrieved record is further tested to determine
whether it satisfies the remaining conditions.7 In general, method S9
assumes that each of the indexes is on a nonkey field of the file, because if one
of the conditions is an equality condition on a key field, only one record will
satisfy the whole condition.
Whenever a single condition specifies the selection—such as OP1, OP2, or OP3—
the DBMS can only check whether or not an access path exists on the attribute
involved in that condition. If an access path (such as index or hash key or sorted file)
exists, the method corresponding to that access path is used; otherwise, the brute
force, linear search approach of method S1 can be used. Query optimization for a
SELECT operation is needed mostly for conjunctive select conditions whenever
more than one of the attributes involved in the conditions have an access path. The
optimizer should choose the access path that retrieves the fewest records in the most
efficient way by estimating the different costs (see Section 8) and choosing the
method with the least estimated cost.
Selectivity of a Condition. When the optimizer is choosing between multiple
simple conditions in a conjunctive select condition, it typically considers the
6A record pointer uniquely identifies a record and provides the address of the record on disk; hence, it is
also called the record identifier or record id.
7The technique can have many variations—for example, if the indexes are logical indexes that store pri-
mary key values instead of record pointers.
692
Algorithms for Query Processing and Optimization
selectivity of each condition. The selectivity (sl) is defined as the ratio of the num-
ber of records (tuples) that satisfy the condition to the total number of records
(tuples) in the file (relation), and thus is a number between zero and one. Zero selec-
tivity means none of the records in the file satisfies the selection condition, and a
selectivity of one means that all the records in the file satisfy the condition. In gen-
eral, the selectivity will not be either of these two extremes, but will be a fraction
that estimates the percentage of file records that will be retrieved.
Although exact selectivities of all conditions may not be available, estimates of
selectivities are often kept in the DBMS catalog and are used by the optimizer. For
example, for an equality condition on a key attribute of relation r(R), s = 1/|r(R)|,
where |r(R)| is the number of tuples in relation r(R). For an equality condition on a
nonkey attribute with i distinct values, s can be estimated by (|r(R)|/i)/|r(R)| or 1/i,
assuming that the records are evenly or uniformly distributed among the distinct
values.8 Under this assumption, |r(R)|/i records will satisfy an equality condition on
this attribute. In general, the number of records satisfying a selection condition with
selectivity sl is estimated to be |r(R)| * sl. The smaller this estimate is, the higher the
desirability of using that condition first to retrieve records. In certain cases, the
actual distribution of records among the various distinct values of the attribute is
kept by the DBMS in the form of a histogram, in order to get more accurate esti-
mates of the number of records that satisfy a particular condition.
Disjunctive Selection Conditions. Compared to a conjunctive selection condi-
tion, a disjunctive condition (where simple conditions are connected by the OR
logical connective rather than by AND) is much harder to process and optimize. For
example, consider OP4�:
OP4�: σDno=5 OR Salary > 30000 OR Sex=‘F’ (EMPLOYEE)
With such a condition, little optimization can be done, because the records satisfy-
ing the disjunctive condition are the union of the records satisfying the individual
conditions. Hence, if any one of the conditions does not have an access path, we are
compelled to use the brute force, linear search approach. Only if an access path
exists on every simple condition in the disjunction can we optimize the selection by
retrieving the records satisfying each condition—or their record ids—and then
applying the union operation to eliminate duplicates.
A DBMS will have available many of the methods discussed above, and typically
many additional methods. The query optimizer must choose the appropriate one
for executing each SELECT operation in a query. This optimization uses formulas
that estimate the costs for each available access method, as we will discuss in Section
8. The optimizer chooses the access method with the lowest estimated cost.
8In more sophisticated optimizers, histograms representing the distribution of the records among the dif-
ferent attribute values can be kept in the catalog.
693
Algorithms for Query Processing and Optimization
3.2 Implementing the JOIN Operation
The JOIN operation is one of the most time-consuming operations in query pro-
cessing. Many of the join operations encountered in queries are of the EQUIJOIN
and NATURAL JOIN varieties, so we consider just these two here since we are only
giving an overview of query processing and optimization. For the remainder of this
chapter, the term join refers to an EQUIJOIN (or NATURAL JOIN).
There are many possible ways to implement a two-way join, which is a join on two
files. Joins involving more than two files are called multiway joins. The number of
possible ways to execute multiway joins grows very rapidly. In this section we dis-
cuss techniques for implementing only two-way joins. To illustrate our discussion,
we refer to the relational schema in Figure A.1 once more—specifically, to the
EMPLOYEE, DEPARTMENT, and PROJECT relations. The algorithms we discuss next
are for a join operation of the form:
R A=B S
where A and B are the join attributes, which should be domain-compatible attrib-
utes of R and S, respectively. The methods we discuss can be extended to more gen-
eral forms of join. We illustrate four of the most common techniques for
performing such a join, using the following sample operations:
OP6: EMPLOYEE Dno=Dnumber DEPARTMENT
OP7: DEPARTMENT Mgr_ssn=Ssn EMPLOYEE
Methods for Implementing Joins.
■ J1—Nested-loop join (or nested-block join). This is the default (brute
force) algorithm, as it does not require any special access paths on either file
in the join. For each record t in R (outer loop), retrieve every record s from S
(inner loop) and test whether the two records satisfy the join condition
t[A] = s[B].9
■ J2—Single-loop join (using an access structure to retrieve the matching
records). If an index (or hash key) exists for one of the two join attributes—
say, attribute B of file S—retrieve each record t in R (loop over file R), and
then use the access structure (such as an index or a hash key) to retrieve
directly all matching records s from S that satisfy s[B] = t[A].
■ J3—Sort-merge join. If the records of R and S are physically sorted (ordered)
by value of the join attributes A and B, respectively, we can implement the join
in the most efficient way possible. Both files are scanned concurrently in order
of the join attributes, matching the records that have the same values for A and
B. If the files are not sorted, they may be sorted first by using external sorting
(see Section 2). In this method, pairs of file blocks are copied into memory
buffers in order and the records of each file are scanned only once each for
9For disk files, it is obvious that the loops will be over disk blocks, so this technique has also been called
nested-block join.
694
matching with the other file—unless both A and B are nonkey attributes, in
which case the method needs to be modified slightly. A sketch of the sort-
merge join algorithm is given in Figure 3(a). We use R(i) to refer to the ith
record in file R. A variation of the sort-merge join can be used when secondary
indexes exist on both join attributes. The indexes provide the ability to access
(scan) the records in order of the join attributes, but the records themselves are
physically scattered all over the file blocks, so this method may be quite ineffi-
cient, as every record access may involve accessing a different disk block.
■ J4—Partition-hash join. The records of files R and S are partitioned into
smaller files. The partitioning of each file is done using the same hashing
function h on the join attribute A of R (for partitioning file R) and B of S (for
partitioning file S). First, a single pass through the file with fewer records (say,
R) hashes its records to the various partitions of R; this is called the
partitioning phase, since the records of R are partitioned into the hash buck-
ets. In the simplest case, we assume that the smaller file can fit entirely in
main memory after it is partitioned, so that the partitioned subfiles of R are
all kept in main memory. The collection of records with the same value of
h(A) are placed in the same partition, which is a hash bucket in a hash table
in main memory. In the second phase, called the probing phase, a single pass
through the other file (S) then hashes each of its records using the same hash
function h(B) to probe the appropriate bucket, and that record is combined
with all matching records from R in that bucket. This simplified description
of partition-hash join assumes that the smaller of the two files fits entirely into
memory buckets after the first phase. We will discuss the general case of
partition-hash join that does not require this assumption below. In practice,
techniques J1 to J4 are implemented by accessing whole disk blocks of a file,
rather than individual records. Depending on the available number of buffers
in memory, the number of blocks read in from the file can be adjusted.
How Buffer Space and Choice of Outer-Loop File Affect Performance of
Nested-Loop Join. The buffer space available has an important effect on some of
the join algorithms. First, let us consider the nested-loop approach (J1). Looking
again at the operation OP6 above, assume that the number of buffers available in
main memory for implementing the join is nB = 7 blocks (buffers). Recall that we
assume that each memory buffer is the same size as one disk block. For illustration,
assume that the DEPARTMENT file consists of rD = 50 records stored in bD = 10 disk
blocks and that the EMPLOYEE file consists of rE = 6000 records stored in bE = 2000
disk blocks. It is advantageous to read as many blocks as possible at a time into
memory from the file whose records are used for the outer loop (that is, nB − 2
blocks). The algorithm can then read one block at a time for the inner-loop file and
use its records to probe (that is, search) the outer-loop blocks that are currently in
main memory for matching records. This reduces the total number of block
accesses. An extra buffer in main memory is needed to contain the resulting records
after they are joined, and the contents of this result buffer can be appended to the
result file—the disk file that will contain the join result—whenever it is filled. This
result buffer block then is reused to hold additional join result records.
Algorithms for Query Processing and Optimization
695
(a) sort the tuples in R on attribute A; (* assume R has n tuples (records) *)
sort the tuples in S on attribute B; (* assume S has m tuples (records) *)
set i ← 1, j ← 1;
while (i ≤ n) and ( j ≤ m)
do { if R( i ) [A] > S( j ) [B]
then set j ← j + 1
elseif R( i ) [A] < S( j ) [B]
then set i ← i + 1
else { (* R( i ) [A] = S( j ) [B], so we output a matched tuple *)
output the combined tuple to T;
(* output other tuples that match R(i), if any *)
set I ← j + 1;
while (l ≤ m) and (R( i ) [A] = S( l ) [B])
do { output the combined tuple to T;
set l ← l + 1
}
(* output other tuples that match S(j), if any *)
set k ← i + 1;
while (k ≤ n) and (R(k ) [A] = S( j ) [B])
do { output the combined tuple to T;
set k ← k + 1
}
set i ← k, j ← l
}
}
(b) create a tuple t[] in T� for each tuple t in R;
(* T� contains the projection results before duplicate elimination *)
if includes a key of R
then T ← T �
else { sort the tuples in T �;
set i ← 1, j ← 2;
while i � n
do { output the tuple T �[ i ] to T;
while T�[ i ] = T�[ j ] and j ≤ n do j ← j + 1; (* eliminate duplicates *)
i ← j; j ← i + 1
}
}
(* T contains the projection result after duplicate elimination *) (continues)
Figure 3
Implementing JOIN, PROJECT, UNION, INTERSECTION, and SET DIFFERENCE by
using sort-merge, where R has n tuples and S has m tuples. (a) Implementing the opera-
tion T ← R
A=B
S. (b) Implementing the operation T ← π(R).
Algorithms for Query Processing and Optimization
696
(c) sort the tuples in R and S using the same unique sort attributes;
set i ← 1, j ← 1;
while (i ≤ n) and (j ≤ m)
do { if R( i ) > S( j )
then { output S( j ) to T;
set j ← j + 1
}
elseif R( i ) < S( j )
then { output R( i ) to T;
set i ← i + 1
}
else set j ← j + 1 (* R(i )=S ( j ) , so we skip one of the duplicate tuples *)
}
if (i ≤ n) then add tuples R( i ) to R(n) to T;
if (j ≤ m) then add tuples S( j ) to S(m) to T;
(d) sort the tuples in R and S using the same unique sort attributes;
set i ← 1, j ← 1;
while ( i ≤ n) and ( j ≤ m)
do { if R( i ) > S( j )
then set j ← j + 1
elseif R( i ) < S( j )
then set i ← i + 1
else { output R( j ) to T; (* R( i )=S( j ) , so we output the tuple *)
set i ← i + 1, j ← j + 1
}
}
(e) sort the tuples in R and S using the same unique sort attributes;
set i ← 1, j ← 1;
while (i � n) and ( j ≤ m)
do { if R( i ) > S(j)
then set j ← j + 1
elseif R(i) < S( j )
then { output R( i ) to T; (* R( i ) has no matching S( j ) , so output R( i ) *)
set i ← i + 1
}
else set i ← i + 1, j ← j + 1
}
if (i ≤ n) then add tuples R( i ) to R(n ) to T;
Algorithms for Query Processing and Optimization
Figure 3 (continued)
Implementing JOIN, PROJECT, UNION, INTERSECTION, and SET DIFFERENCE by using
sort-merge, where R has n tuples and S has m tuples. (c) Implementing the operation T ← R ∪
S. (d) Implementing the operation T ← R ∩ S. (e) Implementing the operation T ← R – S.
697
Algorithms for Query Processing and Optimization
In the nested-loop join, it makes a difference which file is chosen for the outer loop
and which for the inner loop. If EMPLOYEE is used for the outer loop, each block of
EMPLOYEE is read once, and the entire DEPARTMENT file (each of its blocks) is read
once for each time we read in (nB – 2) blocks of the EMPLOYEE file. We get the follow-
ing formulas for the number of disk blocks that are read from disk to main memory:
Total number of blocks accessed (read) for outer-loop file = bE
Number of times (nB − 2) blocks of outer file are loaded into main memory
= ⎡bE/(nB – 2)⎤
Total number of blocks accessed (read) for inner-loop file = bD * ⎡bE/(nB – 2)⎤
Hence, we get the following total number of block read accesses:
bE + ( ⎡bE/(nB – 2)⎤ * bD) = 2000 + ( ⎡(2000/5)⎤ * 10) = 6000 block accesses
On the other hand, if we use the DEPARTMENT records in the outer loop, by symme-
try we get the following total number of block accesses:
bD + ( ⎡bD/(nB – 2)⎤ * bE) = 10 + ( ⎡(10/5)⎤ * 2000) = 4010 block accesses
The join algorithm uses a buffer to hold the joined records of the result file. Once
the buffer is filled, it is written to disk and its contents are appended to the result
file, and then refilled with join result records.10
If the result file of the join operation has bRES disk blocks, each block is written once
to disk, so an additional bRES block accesses (writes) should be added to the preced-
ing formulas in order to estimate the total cost of the join operation. The same
holds for the formulas developed later for other join algorithms. As this example
shows, it is advantageous to use the file with fewer blocks as the outer-loop file in the
nested-loop join.
How the Join Selection Factor Affects Join Performance. Another factor
that affects the performance of a join, particularly the single-loop method J2, is the
fraction of records in one file that will be joined with records in the other file. We
call this the join selection factor11 of a file with respect to an equijoin condition
with another file. This factor depends on the particular equijoin condition between
the two files. To illustrate this, consider the operation OP7, which joins each
DEPARTMENT record with the EMPLOYEE record for the manager of that depart-
ment. Here, each DEPARTMENT record (there are 50 such records in our example)
will be joined with a single EMPLOYEE record, but many EMPLOYEE records (the
5,950 of them that do not manage a department) will not be joined with any record
from DEPARTMENT.
Suppose that secondary indexes exist on both the attributes Ssn of EMPLOYEE and
Mgr_ssn of DEPARTMENT, with the number of index levels xSsn = 4 and xMgr_ssn= 2,
10If we reserve two buffers for the result file, double buffering can be used to speed the algorithm.
11This is different from the join selectivity, which we will discuss in Section 8.
698
Algorithms for Query Processing and Optimization
respectively. We have two options for implementing method J2. The first retrieves
each EMPLOYEE record and then uses the index on Mgr_ssn of DEPARTMENT to find
a matching DEPARTMENT record. In this case, no matching record will be found for
employees who do not manage a department. The number of block accesses for this
case is approximately:
bE + (rE * (xMgr_ssn + 1)) = 2000 + (6000 * 3) = 20,000 block accesses
The second option retrieves each DEPARTMENT record and then uses the index on
Ssn of EMPLOYEE to find a matching manager EMPLOYEE record. In this case, every
DEPARTMENT record will have one matching EMPLOYEE record. The number of
block accesses for this case is approximately:
bD + (rD * (xSsn + 1)) = 10 + (50 * 5) = 260 block accesses
The second option is more efficient because the join selection factor of
DEPARTMENT with respect to the join condition Ssn = Mgr_ssn is 1 (every record in
DEPARTMENT will be joined), whereas the join selection factor of EMPLOYEE with
respect to the same join condition is (50/6000), or 0.008 (only 0.8 percent of the
records in EMPLOYEE will be joined). For method J2, either the smaller file or the
file that has a match for every record (that is, the file with the high join selection fac-
tor) should be used in the (single) join loop. It is also possible to create an index
specifically for performing the join operation if one does not already exist.
The sort-merge join J3 is quite efficient if both files are already sorted by their join
attribute. Only a single pass is made through each file. Hence, the number of blocks
accessed is equal to the sum of the numbers of blocks in both files. For this method,
both OP6 and OP7 would need bE + bD = 2000 + 10 = 2010 block accesses. However,
both files are required to be ordered by the join attributes; if one or both are not, a
sorted copy of each file must be created specifically for performing the join opera-
tion. If we roughly estimate the cost of sorting an external file by (b log2b) block
accesses, and if both files need to be sorted, the total cost of a sort-merge join can be
estimated by (bE + bD + bE log2bE + bD log2bD).
12
General Case for Partition-Hash Join. The hash-join method J4 is also quite
efficient. In this case only a single pass is made through each file, whether or not the
files are ordered. If the hash table for the smaller of the two files can be kept entirely
in main memory after hashing (partitioning) on its join attribute, the implementa-
tion is straightforward. If, however, the partitions of both files must be stored on
disk, the method becomes more complex, and a number of variations to improve
the efficiency have been proposed. We discuss two techniques: the general case of
partition-hash join and a variation called hybrid hash-join algorithm, which has been
shown to be quite efficient.
In the general case of partition-hash join, each file is first partitioned into M parti-
tions using the same partitioning hash function on the join attributes. Then, each
12We can use the more accurate formulas from Section 2 if we know the number of available buffers for
sorting.
699
Algorithms for Query Processing and Optimization
pair of corresponding partitions is joined. For example, suppose we are joining rela-
tions R and S on the join attributes R.A and S.B:
R A=B S
In the partitioning phase, R is partitioned into the M partitions R1, R2, ..., RM, and
S into the M partitions S1, S2, ..., SM. The property of each pair of corresponding
partitions Ri, Si with respect to the join operation is that records in Ri only need to be
joined with records in Si, and vice versa. This property is ensured by using the same
hash function to partition both files on their join attributes—attribute A for R and
attribute B for S. The minimum number of in-memory buffers needed for the
partitioning phase is M + 1. Each of the files R and S are partitioned separately.
During partitioning of a file, M in-memory buffers are allocated to store the records
that hash to each partition, and one additional buffer is needed to hold one block at
a time of the input file being partitioned. Whenever the in-memory buffer for a par-
tition gets filled, its contents are appended to a disk subfile that stores the partition.
The partitioning phase has two iterations. After the first iteration, the first file R is
partitioned into the subfiles R1, R2, ..., RM, where all the records that hashed to the
same buffer are in the same partition. After the second iteration, the second file S is
similarly partitioned.
In the second phase, called the joining or probing phase, M iterations are needed.
During iteration i, two corresponding partitions Ri and Si are joined. The minimum
number of buffers needed for iteration i is the number of blocks in the smaller of
the two partitions, say Ri, plus two additional buffers. If we use a nested-loop join
during iteration i, the records from the smaller of the two partitions Ri are copied
into memory buffers; then all blocks from the other partition Si are read—one at a
time—and each record is used to probe (that is, search) partition Ri for matching
record(s). Any matching records are joined and written into the result file. To
improve the efficiency of in-memory probing, it is common to use an in-memory
hash table for storing the records in partition Ri by using a different hash function
from the partitioning hash function.13
We can approximate the cost of this partition hash-join as 3 * (bR + bS) + bRES for
our example, since each record is read once and written back to disk once during the
partitioning phase. During the joining (probing) phase, each record is read a second
time to perform the join. The main difficulty of this algorithm is to ensure that the
partitioning hash function is uniform—that is, the partition sizes are nearly equal
in size. If the partitioning function is skewed (nonuniform), then some partitions
may be too large to fit in the available memory space for the second joining phase.
Notice that if the available in-memory buffer space nB > (bR + 2), where bR is the
number of blocks for the smaller of the two files being joined, say R, then there is no
reason to do partitioning since in this case the join can be performed entirely in
memory using some variation of the nested-loop join based on hashing and probing.
13If the hash function used for partitioning is used again, all records in a partition will hash to the same
bucket again.
700
Algorithms for Query Processing and Optimization
For illustration, assume we are performing the join operation OP6, repeated below:
OP6: EMPLOYEE Dno=Dnumber DEPARTMENT
In this example, the smaller file is the DEPARTMENT file; hence, if the number of
available memory buffers nB > (bD + 2), the whole DEPARTMENT file can be read
into main memory and organized into a hash table on the join attribute. Each
EMPLOYEE block is then read into a buffer, and each EMPLOYEE record in the buffer
is hashed on its join attribute and is used to probe the corresponding in-memory
bucket in the DEPARTMENT hash table. If a matching record is found, the records
are joined, and the result record(s) are written to the result buffer and eventually to
the result file on disk. The cost in terms of block accesses is hence (bD + bE), plus
bRES—the cost of writing the result file.
Hybrid Hash-Join. The hybrid hash-join algorithm is a variation of partition
hash-join, where the joining phase for one of the partitions is included in the
partitioning phase. To illustrate this, let us assume that the size of a memory buffer
is one disk block; that nB such buffers are available; and that the partitioning hash
function used is h(K) = K mod M, so that M partitions are being created, where M
< nB. For illustration, assume we are performing the join operation OP6. In the first
pass of the partitioning phase, when the hybrid hash-join algorithm is partitioning
the smaller of the two files (DEPARTMENT in OP6), the algorithm divides the buffer
space among the M partitions such that all the blocks of the first partition of
DEPARTMENT completely reside in main memory. For each of the other partitions,
only a single in-memory buffer—whose size is one disk block—is allocated; the
remainder of the partition is written to disk as in the regular partition-hash join.
Hence, at the end of the first pass of the partitioning phase, the first partition of
DEPARTMENT resides wholly in main memory, whereas each of the other partitions
of DEPARTMENT resides in a disk subfile.
For the second pass of the partitioning phase, the records of the second file being
joined—the larger file, EMPLOYEE in OP6—are being partitioned. If a record
hashes to the first partition, it is joined with the matching record in DEPARTMENT
and the joined records are written to the result buffer (and eventually to disk). If an
EMPLOYEE record hashes to a partition other than the first, it is partitioned nor-
mally and stored to disk. Hence, at the end of the second pass of the partitioning
phase, all records that hash to the first partition have been joined. At this point,
there are M − 1 pairs of partitions on disk. Therefore, during the second joining or
probing phase, M − 1 iterations are needed instead of M. The goal is to join as many
records during the partitioning phase so as to save the cost of storing those records
on disk and then rereading them a second time during the joining phase.
4 Algorithms for PROJECT and Set Operations
A PROJECT operation π(R) is straightforward to implement if includes a key of relation R, because in this case the result of the operation will
701
Algorithms for Query Processing and Optimization
have the same number of tuples as R, but with only the values for the attributes in
in each tuple. If does not include a key of R, duplicate
tuples must be eliminated. This can be done by sorting the result of the operation and
then eliminating duplicate tuples, which appear consecutively after sorting. A sketch
of the algorithm is given in Figure 3(b). Hashing can also be used to eliminate dupli-
cates: as each record is hashed and inserted into a bucket of the hash file in memory,
it is checked against those records already in the bucket; if it is a duplicate, it is not
inserted in the bucket. It is useful to recall here that in SQL queries, the default is not
to eliminate duplicates from the query result; duplicates are eliminated from the
query result only if the keyword DISTINCT is included.
Set operations—UNION, INTERSECTION, SET DIFFERENCE, and CARTESIAN
PRODUCT—are sometimes expensive to implement. In particular, the CARTESIAN
PRODUCT operation R × S is quite expensive because its result includes a record for
each combination of records from R and S. Also, each record in the result includes
all attributes of R and S. If R has n records and j attributes, and S has m records and
k attributes, the result relation for R × S will have n * m records and each record will
have j + k attributes. Hence, it is important to avoid the CARTESIAN PRODUCT
operation and to substitute other operations such as join during query optimization
(see Section 7).
The other three set operations—UNION, INTERSECTION, and SET
DIFFERENCE14—apply only to type-compatible (or union-compatible) relations,
which have the same number of attributes and the same attribute domains. The cus-
tomary way to implement these operations is to use variations of the sort-merge
technique: the two relations are sorted on the same attributes, and, after sorting, a
single scan through each relation is sufficient to produce the result. For example, we
can implement the UNION operation, R ∪ S, by scanning and merging both sorted
files concurrently, and whenever the same tuple exists in both relations, only one is
kept in the merged result. For the INTERSECTION operation, R ∩ S, we keep in the
merged result only those tuples that appear in both sorted relations. Figure 3(c) to
(e) sketches the implementation of these operations by sorting and merging. Some
of the details are not included in these algorithms.
Hashing can also be used to implement UNION, INTERSECTION, and SET DIFFER-
ENCE. One table is first scanned and then partitioned into an in-memory hash table
with buckets, and the records in the other table are then scanned one at a time and
used to probe the appropriate partition. For example, to implement R ∪ S, first hash
(partition) the records of R; then, hash (probe) the records of S, but do not insert
duplicate records in the buckets. To implement R ∩ S, first partition the records of
R to the hash file. Then, while hashing each record of S, probe to check if an identi-
cal record from R is found in the bucket, and if so add the record to the result file. To
implement R – S, first hash the records of R to the hash file buckets. While hashing
(probing) each record of S, if an identical record is found in the bucket, remove that
record from the bucket.
14SET DIFFERENCE is called EXCEPT in SQL.
702
Algorithms for Query Processing and Optimization
In SQL, there are two variations of these set operations. The operations UNION,
INTERSECTION, and EXCEPT (the SQL keyword for the SET DIFFERENCE opera-
tion) apply to traditional sets, where no duplicate records exist in the result. The
operations UNION ALL, INTERSECTION ALL, and EXCEPT ALL apply to multisets (or
bags), and duplicates are fully considered. Variations of the above algorithms can be
used for the multiset operations in SQL. We leave these as an exercise for the reader.
5 Implementing Aggregate Operations
and OUTER JOINs
5.1 Implementing Aggregate Operations
The aggregate operators (MIN, MAX, COUNT, AVERAGE, SUM), when applied to an
entire table, can be computed by a table scan or by using an appropriate index, if
available. For example, consider the following SQL query:
SELECT MAX(Salary)
FROM EMPLOYEE;
If an (ascending) B+-tree index on Salary exists for the EMPLOYEE relation, then the
optimizer can decide on using the Salary index to search for the largest Salary value
in the index by following the rightmost pointer in each index node from the root to
the rightmost leaf. That node would include the largest Salary value as its last entry.
In most cases, this would be more efficient than a full table scan of EMPLOYEE, since
no actual records need to be retrieved. The MIN function can be handled in a similar
manner, except that the leftmost pointer in the index is followed from the root to
leftmost leaf. That node would include the smallest Salary value as its first entry.
The index could also be used for the AVERAGE and SUM aggregate functions, but
only if it is a dense index—that is, if there is an index entry for every record in the
main file. In this case, the associated computation would be applied to the values in
the index. For a nondense index, the actual number of records associated with each
index value must be used for a correct computation. This can be done if the number
of records associated with each value in the index is stored in each index entry. For the
COUNT aggregate function, the number of values can be also computed from the
index in a similar manner. If a COUNT(*) function is applied to a whole relation, the
number of records currently in each relation are typically stored in the catalog, and
so the result can be retrieved directly from the catalog.
When a GROUP BY clause is used in a query, the aggregate operator must be applied
separately to each group of tuples as partitioned by the grouping attribute. Hence,
the table must first be partitioned into subsets of tuples, where each partition
(group) has the same value for the grouping attributes. In this case, the computa-
tion is more complex. Consider the following query:
SELECT Dno, AVG(Salary)
FROM EMPLOYEE
GROUP BY Dno;
703
Algorithms for Query Processing and Optimization
The usual technique for such queries is to first use either sorting or hashing on the
grouping attributes to partition the file into the appropriate groups. Then the algo-
rithm computes the aggregate function for the tuples in each group, which have the
same grouping attribute(s) value. In the sample query, the set of EMPLOYEE tuples
for each department number would be grouped together in a partition and the aver-
age salary computed for each group.
Notice that if a clustering index exists on the grouping attribute(s), then the
records are already partitioned (grouped) into the appropriate subsets. In this case,
it is only necessary to apply the computation to each group.
5.2 Implementing OUTER JOINs
The outer join operation has three variations: left outer join, right outer join, and full
outer join. These operations can be specified in SQL. The following is an example of
a left outer join operation in SQL:
SELECT Lname, Fname, Dname
FROM (EMPLOYEE LEFT OUTER JOIN DEPARTMENT ON Dno=Dnumber);
The result of this query is a table of employee names and their associated depart-
ments. It is similar to a regular (inner) join result, with the exception that if an
EMPLOYEE tuple (a tuple in the left relation) does not have an associated department,
the employee’s name will still appear in the resulting table, but the department
name would be NULL for such tuples in the query result.
Outer join can be computed by modifying one of the join algorithms, such as
nested-loop join or single-loop join. For example, to compute a left outer join, we
use the left relation as the outer loop or single-loop because every tuple in the left
relation must appear in the result. If there are matching tuples in the other relation,
the joined tuples are produced and saved in the result. However, if no matching
tuple is found, the tuple is still included in the result but is padded with NULL
value(s). The sort-merge and hash-join algorithms can also be extended to compute
outer joins.
Theoretically, outer join can also be computed by executing a combination of rela-
tional algebra operators. For example, the left outer join operation shown above is
equivalent to the following sequence of relational operations:
1. Compute the (inner) JOIN of the EMPLOYEE and DEPARTMENT tables.
TEMP1 ← πLname, Fname, Dname (EMPLOYEE Dno=Dnumber DEPARTMENT)
2. Find the EMPLOYEE tuples that do not appear in the (inner) JOIN result.
TEMP2 ← πLname, Fname (EMPLOYEE) – πLname, Fname (TEMP1)
3. Pad each tuple in TEMP2 with a NULL Dname field.
TEMP2 ← TEMP2 × NULL
704
Algorithms for Query Processing and Optimization
4. Apply the UNION operation to TEMP1, TEMP2 to produce the LEFT OUTER
JOIN result.
RESULT ← TEMP1 ∪ TEMP2
The cost of the outer join as computed above would be the sum of the costs of the
associated steps (inner join, projections, set difference, and union). However, note
that step 3 can be done as the temporary relation is being constructed in step 2; that
is, we can simply pad each resulting tuple with a NULL. In addition, in step 4, we
know that the two operands of the union are disjoint (no common tuples), so there
is no need for duplicate elimination.
6 Combining Operations Using Pipelining
A query specified in SQL will typically be translated into a relational algebra expres-
sion that is a sequence of relational operations. If we execute a single operation at a
time, we must generate temporary files on disk to hold the results of these tempo-
rary operations, creating excessive overhead. Generating and storing large tempo-
rary files on disk is time-consuming and can be unnecessary in many cases, since
these files will immediately be used as input to the next operation. To reduce the
number of temporary files, it is common to generate query execution code that cor-
responds to algorithms for combinations of operations in a query.
For example, rather than being implemented separately, a JOIN can be combined
with two SELECT operations on the input files and a final PROJECT operation on
the resulting file; all this is implemented by one algorithm with two input files and a
single output file. Rather than creating four temporary files, we apply the algorithm
directly and get just one result file. In Section 7.2, we discuss how heuristic rela-
tional algebra optimization can group operations together for execution. This is
called pipelining or stream-based processing.
It is common to create the query execution code dynamically to implement multiple
operations. The generated code for producing the query combines several algo-
rithms that correspond to individual operations. As the result tuples from one oper-
ation are produced, they are provided as input for subsequent operations. For
example, if a join operation follows two select operations on base relations, the
tuples resulting from each select are provided as input for the join algorithm in a
stream or pipeline as they are produced.
7 Using Heuristics in Query Optimization
In this section we discuss optimization techniques that apply heuristic rules to
modify the internal representation of a query—which is usually in the form of a
query tree or a query graph data structure—to improve its expected performance.
The scanner and parser of an SQL query first generate a data structure that corre-
sponds to an initial query representation, which is then optimized according to
heuristic rules. This leads to an optimized query representation, which corresponds
to the query execution strategy. Following that, a query execution plan is generated
705
Algorithms for Query Processing and Optimization
to execute groups of operations based on the access paths available on the files
involved in the query.
One of the main heuristic rules is to apply SELECT and PROJECT operations before
applying the JOIN or other binary operations, because the size of the file resulting
from a binary operation—such as JOIN—is usually a multiplicative function of the
sizes of the input files. The SELECT and PROJECT operations reduce the size of a file
and hence should be applied before a join or other binary operation.
In Section 7.1 we discuss query tree and query graph notations in the context of
relational algebra and calculus. These can be used as the basis for the data structures
that are used for internal representation of queries. A query tree is used to represent
a relational algebra or extended relational algebra expression, whereas a query graph
is used to represent a relational calculus expression. Then in Section 7.2 we show how
heuristic optimization rules are applied to convert an initial query tree into an
equivalent query tree, which represents a different relational algebra expression
that is more efficient to execute but gives the same result as the original tree. We also
discuss the equivalence of various relational algebra expressions. Finally, Section 7.3
discusses the generation of query execution plans.
7.1 Notation for Query Trees and Query Graphs
A query tree is a tree data structure that corresponds to a relational algebra expres-
sion. It represents the input relations of the query as leaf nodes of the tree, and rep-
resents the relational algebra operations as internal nodes. An execution of the
query tree consists of executing an internal node operation whenever its operands
are available and then replacing that internal node by the relation that results from
executing the operation. The order of execution of operations starts at the leaf nodes,
which represents the input database relations for the query, and ends at the root
node, which represents the final operation of the query. The execution terminates
when the root node operation is executed and produces the result relation for the
query.
Figure 4a shows a query tree: For every project located in ‘Stafford’, retrieve the proj-
ect number, the controlling department number, and the department manager’s last
name, address, and birthdate. This query is specified on the COMPANY relational
schema in Figure A.1 and corresponds to the following relational algebra expres-
sion:
πPnumber, Dnum, Lname, Address, Bdate (((σPlocation=‘Stafford’(PROJECT))
Dnum=Dnumber(DEPARTMENT)) Mgr_ssn=Ssn(EMPLOYEE))
This corresponds to the following SQL query:
Q2: SELECT P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
FROM PROJECT AS P, DEPARTMENT AS D, EMPLOYEE AS E
WHERE P.Dnum=D.Dnumber AND D.Mgr_ssn=E.Ssn AND
P.Plocation= ‘Stafford’;
706
Algorithms for Query Processing and Optimization
(b)
(a)
E
DP
P.Pnumber, P.Dnum, E.Lname, E.Address, E.Bdate
π
P.Dnum=D.Dnumber AND D.Mgr_ssn=E.Ssn AND P.Plocation=‘Stafford’
σ
(c)
EDP
[P.Pnumber, P.Dnum] [E.Lname, E.Address, E.Bdate]
P.Dnum=D.Dnumber
P.Plocation=‘Stafford’
D.Mgr_ssn=E.Ssn
‘Stafford’
XX
XX
(1)
(2)
(3)
P.Pnumber,P.Dnum,E.Lname,E.Address,E.Bdateπ
D.Mgr_ssn=E.Ssn
P.Dnum=D.Dnumber
σP.Plocation= ‘Stafford’
E
D
P
EMPLOYEE
DEPARTMENT
PROJECT
Figure 4
Two query trees for the query Q2. (a) Query tree corresponding to the relational algebra
expression for Q2. (b) Initial (canonical) query tree for SQL query Q2. (c) Query graph for Q2.
In Figure 4a, the leaf nodes P, D, and E represent the three relations PROJECT,
DEPARTMENT, and EMPLOYEE, respectively, and the internal tree nodes represent
the relational algebra operations of the expression. When this query tree is executed,
the node marked (1) in Figure 4a must begin execution before node (2) because
some resulting tuples of operation (1) must be available before we can begin execut-
ing operation (2). Similarly, node (2) must begin executing and producing results
before node (3) can start execution, and so on.
As we can see, the query tree represents a specific order of operations for executing
a query. A more neutral data structure for representation of a query is the query
graph notation. Figure 4c shows the query graph for query Q2. Relations in the
707
Algorithms for Query Processing and Optimization
query are represented by relation nodes, which are displayed as single circles.
Constant values, typically from the query selection conditions, are represented by
constant nodes, which are displayed as double circles or ovals. Selection and join
conditions are represented by the graph edges, as shown in Figure 4c. Finally, the
attributes to be retrieved from each relation are displayed in square brackets above
each relation.
The query graph representation does not indicate an order on which operations to
perform first. There is only a single graph corresponding to each query.15 Although
some optimization techniques were based on query graphs, it is now generally
accepted that query trees are preferable because, in practice, the query optimizer
needs to show the order of operations for query execution, which is not possible in
query graphs.
7.2 Heuristic Optimization of Query Trees
In general, many different relational algebra expressions—and hence many different
query trees—can be equivalent; that is, they can represent the same query.16
The query parser will typically generate a standard initial query tree to correspond
to an SQL query, without doing any optimization. For example, for a SELECT-
PROJECT-JOIN query, such as Q2, the initial tree is shown in Figure 4(b). The
CARTESIAN PRODUCT of the relations specified in the FROM clause is first applied;
then the selection and join conditions of the WHERE clause are applied, followed by
the projection on the SELECT clause attributes. Such a canonical query tree repre-
sents a relational algebra expression that is very inefficient if executed directly,
because of the CARTESIAN PRODUCT (×) operations. For example, if the PROJECT,
DEPARTMENT, and EMPLOYEE relations had record sizes of 100, 50, and 150 bytes
and contained 100, 20, and 5,000 tuples, respectively, the result of the CARTESIAN
PRODUCT would contain 10 million tuples of record size 300 bytes each. However,
the initial query tree in Figure 4(b) is in a simple standard form that can be easily
created from the SQL query. It will never be executed. The heuristic query opti-
mizer will transform this initial query tree into an equivalent final query tree that is
efficient to execute.
The optimizer must include rules for equivalence among relational algebra expres-
sions that can be applied to transform the initial tree into the final, optimized query
tree. First we discuss informally how a query tree is transformed by using heuristics,
and then we discuss general transformation rules and show how they can be used in
an algebraic heuristic optimizer.
Example of Transforming a Query. Consider the following query Q on the
database in Figure A.1: Find the last names of employees born after 1957 who work on
a project named ‘Aquarius’. This query can be specified in SQL as follows:
15Hence, a query graph corresponds to a relational calculus expression.
16The same query may also be stated in various ways in a high-level query language such as SQL.
708
Algorithms for Query Processing and Optimization
Q: SELECT Lname
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn
AND Bdate > ‘1957-12-31’;
The initial query tree for Q is shown in Figure 5(a). Executing this tree directly first
creates a very large file containing the CARTESIAN PRODUCT of the entire
EMPLOYEE, WORKS_ON, and PROJECT files. That is why the initial query tree is
never executed, but is transformed into another equivalent tree that is efficient to
(a) Lname
Pname=‘Aquarius’ AND Pnumber=Pno AND Essn=Ssn AND Bdate>‘1957-12-31’
PROJECT
WORKS_ONEMPLOYEE
(b) Lname
Pnumber=Pno
Bdate>‘1957-12-31’
Pname=‘Aquarius’Essn=Ssn
π
π
σ
σ
σσ
σ
EMPLOYEE
PROJECT
WORKS_ON
X
X
X
X
Figure 5
Steps in converting a query tree during heuristic optimization.
(a) Initial (canonical) query tree for SQL query Q.
(b) Moving SELECT operations down the query tree.
(c) Applying the more restrictive SELECT operation first.
(d) Replacing CARTESIAN PRODUCT and SELECT with JOIN operations.
(e) Moving PROJECT operations down the query tree.
709
Algorithms for Query Processing and Optimization
(e) π Lname
σBdate>‘1957-12-31’
σPname=‘Aquarius’
πPnumber πEssn,Pno
π Essn
πSsn, Lname
EMPLOYEE
WORKS_ON
PROJECT
(d)
π Lname
σBdate>‘1957-12-31’
σPname=‘Aquarius’ EMPLOYEEWORKS_ON
PROJECT
Essn=Ssn
Pnumber=Pno
Pnumber=Pno
Essn=Ssn
(c)
σ Essn=Ssn
π Lname
σ Pnumber=Pno σBdate>‘1957-12-31’
σ
Pname=‘Aquarius’
EMPLOYEE
WORKS_ON
PROJECT
X
X
710
execute. This particular query needs only one record from the PROJECT relation—
for the ‘Aquarius’ project—and only the EMPLOYEE records for those whose date of
birth is after ‘1957-12-31’. Figure 5(b) shows an improved query tree that first
applies the SELECT operations to reduce the number of tuples that appear in the
CARTESIAN PRODUCT.
A further improvement is achieved by switching the positions of the EMPLOYEE and
PROJECT relations in the tree, as shown in Figure 5(c). This uses the information
that Pnumber is a key attribute of the PROJECT relation, and hence the SELECT
operation on the PROJECT relation will retrieve a single record only. We can further
improve the query tree by replacing any CARTESIAN PRODUCT operation that is
followed by a join condition with a JOIN operation, as shown in Figure 5(d).
Another improvement is to keep only the attributes needed by subsequent opera-
tions in the intermediate relations, by including PROJECT (π) operations as early as
possible in the query tree, as shown in Figure 5(e). This reduces the attributes
(columns) of the intermediate relations, whereas the SELECT operations reduce the
number of tuples (records).
As the preceding example demonstrates, a query tree can be transformed step by
step into an equivalent query tree that is more efficient to execute. However, we
must make sure that the transformation steps always lead to an equivalent query
tree. To do this, the query optimizer must know which transformation rules preserve
this equivalence. We discuss some of these transformation rules next.
General Transformation Rules for Relational Algebra Operations. There
are many rules for transforming relational algebra operations into equivalent ones.
For query optimization purposes, we are interested in the meaning of the opera-
tions and the resulting relations. Hence, if two relations have the same set of attrib-
utes in a different order but the two relations represent the same information, we
consider the relations to be equivalent. There is an alternative definition of relation
that makes the order of attributes unimportant; we will use this definition here. We
will state some transformation rules that are useful in query optimization, without
proving them:
1. Cascade of σ A conjunctive selection condition can be broken up into a cas-
cade (that is, a sequence) of individual σ operations:
σc1 AND c2 AND . . . AND cn
(R)� σc1
(σc2
(…(σcn
(R))…))
2. Commutativity of σσ. The σ operation is commutative:
σc1
(σc2
(R)) � σc2
(σc1
(R))
3. Cascade of ππ. In a cascade (sequence) of π operations, all but the last one can
be ignored:
πList1
(πList2
(…(πListn
(R))…)) � πList1
(R)
4. Commuting σ with π. If the selection condition c involves only those attrib-
utes A1, . . . , An in the projection list, the two operations can be commuted:
πA1, A2, …, An
(σc (R)) � σc (πA1, A2, …, An
(R))
Algorithms for Query Processing and Optimization
711
Algorithms for Query Processing and Optimization
5. Commutativity of (and ××). The join operation is commutative, as is the
× operation:
R c S ≡ S c R
R × S ≡ S × R
Notice that although the order of attributes may not be the same in the rela-
tions resulting from the two joins (or two Cartesian products), the meaning
is the same because the order of attributes is not important in the alternative
definition of relation.
6. Commuting σσ with (or ××). If all the attributes in the selection condition
c involve only the attributes of one of the relations being joined—say, R—the
two operations can be commuted as follows:
σc (R S) ≡ (σc (R)) S
Alternatively, if the selection condition c can be written as (c1 AND c2), where
condition c1 involves only the attributes of R and condition c2 involves only
the attributes of S, the operations commute as follows:
σc (R S) � (σc1
(R)) (σc2
(S))
The same rules apply if the is replaced by a × operation.
7. Commuting ππ with (or ××). Suppose that the projection list is L = {A1, …,
An, B1, …, Bm} , where A1, …, An are attributes of R and B1, …, Bm are attrib-
utes of S. If the join condition c involves only attributes in L, the two opera-
tions can be commuted as follows:
πL (R c S) � (πA1, …, An
(R)) c (πB1, …, Bm
(S))
If the join condition c contains additional attributes not in L, these must be
added to the projection list, and a final π operation is needed. For example, if
attributes An+1, …, An+k of R and Bm+1, …, Bm+p of S are involved in the join
condition c but are not in the projection list L, the operations commute as
follows:
πL (R c S) � πL ((πA1, …, An, An+1, …, An+k
(R)) c (πB1, …, Bm, Bm+1, …, Bm+p
(S)))
For ×, there is no condition c, so the first transformation rule always applies
by replacing c with ×.
8. Commutativity of set operations. The set operations ∪ and ∩ are commu-
tative but − is not.
9. Associativity of , ××, ∪∪, and ∩∩. These four operations are individually
associative; that is, if θ stands for any one of these four operations (through-
out the expression), we have:
(R θ S) θ T ≡ R θ (S θ T)
10. Commuting σ with set operations. The σ operation commutes with ∪, ∩,
and −. If θ stands for any one of these three operations (throughout the
expression), we have:
σc (R θ S) ≡ (σc (R)) θ (σc (S))
712
Algorithms for Query Processing and Optimization
11. The π operation commutes with ∪∪.
πL (R ∪ S) ≡ (πL (R)) ∪ (πL (S))
12. Converting a (σσ, ××) sequence into . If the condition c of a σ that follows a
× corresponds to a join condition, convert the (σ, ×) sequence into a as
follows:
(σc (R × S)) ≡ (R c S)
There are other possible transformations. For example, a selection or join condition
c can be converted into an equivalent condition by using the following standard
rules from Boolean algebra (DeMorgan’s laws):
NOT (c1 AND c2) ≡ (NOT c1) OR (NOT c2)
NOT (c1 OR c2) ≡ (NOT c1) AND (NOT c2)
We discuss next how transformations can be used in heuristic optimization.
Outline of a Heuristic Algebraic Optimization Algorithm. We can now out-
line the steps of an algorithm that utilizes some of the above rules to transform an
initial query tree into a final tree that is more efficient to execute (in most cases).
The algorithm will lead to transformations similar to those discussed in our exam-
ple in Figure 5. The steps of the algorithm are as follows:
1. Using Rule 1, break up any SELECT operations with conjunctive conditions
into a cascade of SELECT operations. This permits a greater degree of free-
dom in moving SELECT operations down different branches of the tree.
2. Using Rules 2, 4, 6, and 10 concerning the commutativity of SELECT with
other operations, move each SELECT operation as far down the query tree as
is permitted by the attributes involved in the select condition. If the condi-
tion involves attributes from only one table, which means that it represents a
selection condition, the operation is moved all the way to the leaf node that
represents this table. If the condition involves attributes from two tables,
which means that it represents a join condition, the condition is moved to a
location down the tree after the two tables are combined.
3. Using Rules 5 and 9 concerning commutativity and associativity of binary
operations, rearrange the leaf nodes of the tree using the following criteria.
First, position the leaf node relations with the most restrictive SELECT oper-
ations so they are executed first in the query tree representation. The defini-
tion of most restrictive SELECT can mean either the ones that produce a
relation with the fewest tuples or with the smallest absolute size.17 Another
possibility is to define the most restrictive SELECT as the one with the small-
est selectivity; this is more practical because estimates of selectivities are
often available in the DBMS catalog. Second, make sure that the ordering of
leaf nodes does not cause CARTESIAN PRODUCT operations; for example, if
17Either definition can be used, since these rules are heuristic.
713
Algorithms for Query Processing and Optimization
the two relations with the most restrictive SELECT do not have a direct join
condition between them, it may be desirable to change the order of leaf
nodes to avoid Cartesian products.18
4. Using Rule 12, combine a CARTESIAN PRODUCT operation with a subse-
quent SELECT operation in the tree into a JOIN operation, if the condition
represents a join condition.
5. Using Rules 3, 4, 7, and 11 concerning the cascading of PROJECT and the
commuting of PROJECT with other operations, break down and move lists
of projection attributes down the tree as far as possible by creating new
PROJECT operations as needed. Only those attributes needed in the query
result and in subsequent operations in the query tree should be kept after
each PROJECT operation.
6. Identify subtrees that represent groups of operations that can be executed by
a single algorithm.
In our example, Figure 5(b) shows the tree in Figure 5(a) after applying steps 1 and
2 of the algorithm; Figure 5(c) shows the tree after step 3; Figure 5(d) after step 4;
and Figure 5(e) after step 5. In step 6 we may group together the operations in the
subtree whose root is the operation πEssn into a single algorithm. We may also group
the remaining operations into another subtree, where the tuples resulting from the
first algorithm replace the subtree whose root is the operation πEssn, because the
first grouping means that this subtree is executed first.
Summary of Heuristics for Algebraic Optimization. The main heuristic is to
apply first the operations that reduce the size of intermediate results. This includes
performing as early as possible SELECT operations to reduce the number of tuples
and PROJECT operations to reduce the number of attributes—by moving SELECT
and PROJECT operations as far down the tree as possible. Additionally, the SELECT
and JOIN operations that are most restrictive—that is, result in relations with the
fewest tuples or with the smallest absolute size—should be executed before other
similar operations. The latter rule is accomplished through reordering the leaf
nodes of the tree among themselves while avoiding Cartesian products, and adjust-
ing the rest of the tree appropriately.
7.3 Converting Query Trees into Query Execution Plans
An execution plan for a relational algebra expression represented as a query tree
includes information about the access methods available for each relation as well as
the algorithms to be used in computing the relational operators represented in the
tree. As a simple example, consider query Q1, whose corresponding relational alge-
bra expression is
πFname, Lname, Address(σDname=‘Research’(DEPARTMENT) Dnumber=Dno EMPLOYEE)
18Note that a CARTESIAN PRODUCT is acceptable in some cases—for example, if each relation has
only a single tuple because each had a previous select condition on a key field.
714
Algorithms for Query Processing and Optimization
π Fname, Lname, Address
σ Dname=‘Research’
DEPARTMENT
EMPLOYEE
Dnumber=Dno
Figure 6
A query tree for query Q1.
The query tree is shown in Figure 6. To convert this into an execution plan, the opti-
mizer might choose an index search for the SELECT operation on DEPARTMENT
(assuming one exists), a single-loop join algorithm that loops over the records in the
result of the SELECT operation on DEPARTMENT for the join operation (assuming
an index exists on the Dno attribute of EMPLOYEE), and a scan of the JOIN result for
input to the PROJECT operator. Additionally, the approach taken for executing the
query may specify a materialized or a pipelined evaluation, although in general a
pipelined evaluation is preferred whenever feasible.
With materialized evaluation, the result of an operation is stored as a temporary
relation (that is, the result is physically materialized). For instance, the JOIN opera-
tion can be computed and the entire result stored as a temporary relation, which is
then read as input by the algorithm that computes the PROJECT operation, which
would produce the query result table. On the other hand, with pipelined
evaluation, as the resulting tuples of an operation are produced, they are forwarded
directly to the next operation in the query sequence. For example, as the selected
tuples from DEPARTMENT are produced by the SELECT operation, they are placed
in a buffer; the JOIN operation algorithm would then consume the tuples from the
buffer, and those tuples that result from the JOIN operation are pipelined to the pro-
jection operation algorithm. The advantage of pipelining is the cost savings in not
having to write the intermediate results to disk and not having to read them back for
the next operation.
8 Using Selectivity and Cost Estimates
in Query Optimization
A query optimizer does not depend solely on heuristic rules; it also estimates and
compares the costs of executing a query using different execution strategies and
algorithms, and it then chooses the strategy with the lowest cost estimate. For this
approach to work, accurate cost estimates are required so that different strategies can
be compared fairly and realistically. In addition, the optimizer must limit the num-
ber of execution strategies to be considered; otherwise, too much time will be spent
making cost estimates for the many possible execution strategies. Hence, this
approach is more suitable for compiled queries where the optimization is done at
compile time and the resulting execution strategy code is stored and executed
directly at runtime. For interpreted queries, where the entire process shown in
715
Algorithms for Query Processing and Optimization
Figure 1 occurs at runtime, a full-scale optimization may slow down the response
time. A more elaborate optimization is indicated for compiled queries, whereas a
partial, less time-consuming optimization works best for interpreted queries.
This approach is generally referred to as cost-based query optimization.19 It uses
traditional optimization techniques that search the solution space to a problem for a
solution that minimizes an objective (cost) function. The cost functions used in
query optimization are estimates and not exact cost functions, so the optimization
may select a query execution strategy that is not the optimal (absolute best) one. In
Section 8.1 we discuss the components of query execution cost. In Section 8.2 we
discuss the type of information needed in cost functions. This information is kept
in the DBMS catalog. In Section 8.3 we give examples of cost functions for the
SELECT operation, and in Section 8.4 we discuss cost functions for two-way JOIN
operations. Section 8.5 discusses multiway joins, and Section 8.6 gives an example.
8.1 Cost Components for Query Execution
The cost of executing a query includes the following components:
1. Access cost to secondary storage. This is the cost of transferring (reading
and writing) data blocks between secondary disk storage and main memory
buffers. This is also known as disk I/O (input/output) cost. The cost of search-
ing for records in a disk file depends on the type of access structures on that
file, such as ordering, hashing, and primary or secondary indexes. In addi-
tion, factors such as whether the file blocks are allocated contiguously on the
same disk cylinder or scattered on the disk affect the access cost.
2. Disk storage cost. This is the cost of storing on disk any intermediate files
that are generated by an execution strategy for the query.
3. Computation cost. This is the cost of performing in-memory operations on
the records within the data buffers during query execution. Such operations
include searching for and sorting records, merging records for a join or a sort
operation, and performing computations on field values. This is also known
as CPU (central processing unit) cost.
4. Memory usage cost. This is the cost pertaining to the number of main mem-
ory buffers needed during query execution.
5. Communication cost. This is the cost of shipping the query and its results
from the database site to the site or terminal where the query originated. In
distributed databases, it would also include the cost of transferring tables and
results among various computers during query evaluation.
For large databases, the main emphasis is often on minimizing the access cost to sec-
ondary storage. Simple cost functions ignore other factors and compare different
query execution strategies in terms of the number of block transfers between disk
19This approach was first used in the optimizer for the SYSTEM R in an experimental DBMS developed
at IBM (Selinger et al. 1979).
716
Algorithms for Query Processing and Optimization
and main memory buffers. For smaller databases, where most of the data in the files
involved in the query can be completely stored in memory, the emphasis is on min-
imizing computation cost. In distributed databases, where many sites are involved,
communication cost must be minimized also. It is difficult to include all the cost
components in a (weighted) cost function because of the difficulty of assigning suit-
able weights to the cost components. That is why some cost functions consider a
single factor only—disk access. In the next section we discuss some of the informa-
tion that is needed for formulating cost functions.
8.2 Catalog Information Used in Cost Functions
To estimate the costs of various execution strategies, we must keep track of any
information that is needed for the cost functions. This information may be stored in
the DBMS catalog, where it is accessed by the query optimizer. First, we must know
the size of each file. For a file whose records are all of the same type, the number of
records (tuples) (r), the (average) record size (R), and the number of file blocks (b)
(or close estimates of them) are needed. The blocking factor (bfr) for the file may
also be needed. We must also keep track of the primary file organization for each file.
The primary file organization records may be unordered, ordered by an attribute
with or without a primary or clustering index, or hashed (static hashing or one of
the dynamic hashing methods) on a key attribute. Information is also kept on all
primary, secondary, or clustering indexes and their indexing attributes. The number
of levels (x) of each multilevel index (primary, secondary, or clustering) is needed
for cost functions that estimate the number of block accesses that occur during
query execution. In some cost functions the number of first-level index blocks
(bI1) is needed.
Another important parameter is the number of distinct values (d) of an attribute
and the attribute selectivity (sl), which is the fraction of records satisfying an equal-
ity condition on the attribute. This allows estimation of the selection cardinality (s
= sl*r) of an attribute, which is the average number of records that will satisfy an
equality selection condition on that attribute. For a key attribute, d = r, sl = 1/r and s
= 1. For a nonkey attribute, by making an assumption that the d distinct values are
uniformly distributed among the records, we estimate sl = (1/d) and so s = (r/d).20
Information such as the number of index levels is easy to maintain because it does
not change very often. However, other information may change frequently; for
example, the number of records r in a file changes every time a record is inserted or
deleted. The query optimizer will need reasonably close but not necessarily com-
pletely up-to-the-minute values of these parameters for use in estimating the cost of
various execution strategies.
For a nonkey attribute with d distinct values, it is often the case that the records are
not uniformly distributed among these values. For example, suppose that a com-
pany has 5 departments numbered 1 through 5, and 200 employees who are distrib-
20More accurate optimizers store histograms of the distribution of records over the data values for an
attribute.
717
Algorithms for Query Processing and Optimization
uted among the departments as follows: (1, 5), (2, 25), (3, 70), (4, 40), (5, 60). In
such cases, the optimizer can store a histogram that reflects the distribution of
employee records over different departments in a table with the two attributes (Dno,
Selectivity), which would contain the following values for our example: (1, 0.025), (2,
0.125), (3, 0.35), (4, 0.2), (5, 0.3). The selectivity values stored in the histogram can
also be estimates if the employee table changes frequently.
In the next two sections we examine how some of these parameters are used in cost
functions for a cost-based query optimizer.
8.3 Examples of Cost Functions for SELECT
We now give cost functions for the selection algorithms S1 to S8 discussed in
Section 3.1 in terms of number of block transfers between memory and disk.
Algorithm S9 involves an intersection of record pointers after they have been
retrieved by some other means, such as algorithm S6, and so the cost function will
be based on the cost for S6. These cost functions are estimates that ignore compu-
tation time, storage cost, and other factors. The cost for method Si is referred to as
CSi block accesses.
■ S1—Linear search (brute force) approach. We search all the file blocks to
retrieve all records satisfying the selection condition; hence, CS1a = b. For an
equality condition on a key attribute, only half the file blocks are searched on
the average before finding the record, so a rough estimate for CS1b = (b/2) if
the record is found; if no record is found that satisfies the condition, CS1b = b.
■ S2—Binary search. This search accesses approximately CS2 = log2b +
⎡(s/bfr)⎤ − 1 file blocks. This reduces to log2b if the equality condition is on a
unique (key) attribute, because s = 1 in this case.
■ S3a—Using a primary index to retrieve a single record. For a primary
index, retrieve one disk block at each index level, plus one disk block from
the data file. Hence, the cost is one more disk block than the number of
index levels: CS3a = x + 1.
■ S3b—Using a hash key to retrieve a single record. For hashing, only one
disk block needs to be accessed in most cases. The cost function is approxi-
mately CS3b = 1 for static hashing or linear hashing, and it is 2 disk block
accesses for extendible hashing.
■ S4—Using an ordering index to retrieve multiple records. If the compari-
son condition is >, >=, <, or <= on a key field with an ordering index,
roughly half the file records will satisfy the condition. This gives a cost func-
tion of CS4 = x + (b/2). This is a very rough estimate, and although it may be
correct on the average, it may be quite inaccurate in individual cases. A more
accurate estimate is possible if the distribution of records is stored in a his-
togram.
■ S5—Using a clustering index to retrieve multiple records. One disk block
is accessed at each index level, which gives the address of the first file disk
block in the cluster. Given an equality condition on the indexing attribute, s
718
Algorithms for Query Processing and Optimization
records will satisfy the condition, where s is the selection cardinality of the
indexing attribute. This means that ⎡(s/bfr)⎤ file blocks will be in the cluster
of file blocks that hold all the selected records, giving CS5 = x + ⎡(s/bfr)⎤.
■ S6—Using a secondary (B+-tree) index. For a secondary index on a key
(unique) attribute, the cost is x + 1 disk block accesses. For a secondary index
on a nonkey (nonunique) attribute, s records will satisfy an equality condition,
where s is the selection cardinality of the indexing attribute. However, because
the index is nonclustering, each of the records may reside on a different disk
block, so the (worst case) cost estimate is CS6a = x + 1 + s. The additional 1 is
to account for the disk block that contains the record pointers after the index is
searched. If the comparison condition is >, >=, <, or <= and half the file
records are assumed to satisfy the condition, then (very roughly) half the first-
level index blocks are accessed, plus half the file records via the index. The cost
estimate for this case, approximately, is CS6b = x + (bI1/2) + (r/2). The r/2 fac-
tor can be refined if better selectivity estimates are available through a his-
togram. The latter method CS6b can be very costly.
■ S7—Conjunctive selection. We can use either S1 or one of the methods S2
to S6 discussed above. In the latter case, we use one condition to retrieve the
records and then check in the main memory buffers whether each retrieved
record satisfies the remaining conditions in the conjunction. If multiple
indexes exist, the search of each index can produce a set of record pointers
(record ids) in the main memory buffers. The intersection of the sets of
record pointers (referred to in S9) can be computed in main memory, and
then the resulting records are retrieved based on their record ids.
■ S8—Conjunctive selection using a composite index. Same as S3a, S5, or
S6a, depending on the type of index.
Example of Using the Cost Functions. In a query optimizer, it is common to
enumerate the various possible strategies for executing a query and to estimate the
costs for different strategies. An optimization technique, such as dynamic program-
ming, may be used to find the optimal (least) cost estimate efficiently, without hav-
ing to consider all possible execution strategies. We do not discuss optimization
algorithms here; rather, we use a simple example to illustrate how cost estimates
may be used. Suppose that the EMPLOYEE file in Figure A.1 has rE = 10,000 records
stored in bE = 2000 disk blocks with blocking factor bfrE = 5 records/block and the
following access paths:
1. A clustering index on Salary, with levels xSalary = 3 and average selection car-
dinality sSalary = 20. (This corresponds to a selectivity of slSalary = 0.002).
2. A secondary index on the key attribute Ssn, with xSsn = 4 (sSsn = 1, slSsn =
0.0001).
3. A secondary index on the nonkey attribute Dno, with xDno = 2 and first-level
index blocks bI1Dno = 4. There are dDno = 125 distinct values for Dno, so the
selectivity of Dno is slDno = (1/dDno) = 0.008, and the selection cardinality is
sDno = (rE * slDno) = (rE/dDno) = 80.
719
Algorithms for Query Processing and Optimization
4. A secondary index on Sex, with xSex = 1. There are dSex = 2 values for the Sex
attribute, so the average selection cardinality is sSex = (rE/dSex) = 5000. (Note
that in this case, a histogram giving the percentage of male and female
employees may be useful, unless they are approximately equal.)
We illustrate the use of cost functions with the following examples:
OP1: σSsn=‘123456789’(EMPLOYEE)
OP2: σDno>5(EMPLOYEE)
OP3: σDno=5(EMPLOYEE)
OP4: σDno=5 AND SALARY>30000 AND Sex=‘F’(EMPLOYEE)
The cost of the brute force (linear search or file scan) option S1 will be estimated as
CS1a = bE = 2000 (for a selection on a nonkey attribute) or CS1b = (bE/2) = 1000
(average cost for a selection on a key attribute). For OP1 we can use either method
S1 or method S6a; the cost estimate for S6a is CS6a = xSsn + 1 = 4 + 1 = 5, and it is
chosen over method S1, whose average cost is CS1b = 1000. For OP2 we can use
either method S1 (with estimated cost CS1a = 2000) or method S6b (with estimated
cost CS6b = xDno + (bI1Dno/2) + (rE /2) = 2 + (4/2) + (10,000/2) = 5004), so we choose
the linear search approach for OP2. For OP3 we can use either method S1 (with esti-
mated cost CS1a = 2000) or method S6a (with estimated cost CS6a = xDno + sDno = 2
+ 80 = 82), so we choose method S6a.
Finally, consider OP4, which has a conjunctive selection condition. We need to esti-
mate the cost of using any one of the three components of the selection condition to
retrieve the records, plus the linear search approach. The latter gives cost estimate
CS1a = 2000. Using the condition (Dno = 5) first gives the cost estimate CS6a = 82.
Using the condition (Salary > 30,000) first gives a cost estimate CS4 = xSalary + (bE/2)
= 3 + (2000/2) = 1003. Using the condition (Sex = ‘F’) first gives a cost estimate CS6a
= xSex + sSex = 1 + 5000 = 5001. The optimizer would then choose method S6a on
the secondary index on Dno because it has the lowest cost estimate. The condition
(Dno = 5) is used to retrieve the records, and the remaining part of the conjunctive
condition (Salary > 30,000 AND Sex = ‘F’) is checked for each selected record after it
is retrieved into memory. Only the records that satisfy these additional conditions
are included in the result of the operation.
8.4 Examples of Cost Functions for JOIN
To develop reasonably accurate cost functions for JOIN operations, we need to have
an estimate for the size (number of tuples) of the file that results after the JOIN oper-
ation. This is usually kept as a ratio of the size (number of tuples) of the resulting
join file to the size of the CARTESIAN PRODUCT file, if both are applied to the same
input files, and it is called the join selectivity ( js). If we denote the number of tuples
of a relation R by |R|, we have:
js = |(R c S)| / |(R × S)| = |(R c S)| / (|R| * |S|)
If there is no join condition c, then js = 1 and the join is the same as the CARTESIAN
PRODUCT. If no tuples from the relations satisfy the join condition, then js = 0. In
720
Algorithms for Query Processing and Optimization
general, 0 ≤ js ≤ 1. For a join where the condition c is an equality comparison R.A =
S.B, we get the following two special cases:
1. If A is a key of R, then |(R c S)| ≤ |S|, so js ≤ (1/|R|). This is because each
record in file S will be joined with at most one record in file R, since A is a key
of R. A special case of this condition is when attribute B is a foreign key of S
that references the primary key A of R. In addition, if the foreign key B has
the NOT NULL constraint, then js = (1/|R|), and the result file of the join
will contain |S| records.
2. If B is a key of S, then |(R c S)| ≤ |R|, so js ≤ (1/|S|).
Having an estimate of the join selectivity for commonly occurring join conditions
enables the query optimizer to estimate the size of the resulting file after the join
operation, given the sizes of the two input files, by using the formula |(R c S)| = js
* |R| * |S|. We can now give some sample approximate cost functions for estimating
the cost of some of the join algorithms given in Section 3.2. The join operations are
of the form:
R A=B S
where A and B are domain-compatible attributes of R and S, respectively. Assume
that R has bR blocks and that S has bS blocks:
■ J1—Nested-loop join. Suppose that we use R for the outer loop; then we get
the following cost function to estimate the number of block accesses for this
method, assuming three memory buffers. We assume that the blocking factor
for the resulting file is bfrRS and that the join selectivity is known:
CJ1 = bR + (bR * bS) + (( js * |R| * |S|)/bfrRS)
The last part of the formula is the cost of writing the resulting file to disk.
This cost formula can be modified to take into account different numbers of
memory buffers, as presented in Section 3.2. If nB main memory buffers are
available to perform the join, the cost formula becomes:
CJ1 = bR + ( ⎡bR/(nB – 2)⎤ * bS) + ((js * |R| * |S|)/bfrRS)
■ J2—Single-loop join (using an access structure to retrieve the matching
record(s)). If an index exists for the join attribute B of S with index levels xB,
we can retrieve each record s in R and then use the index to retrieve all the
matching records t from S that satisfy t[B] = s[A]. The cost depends on the
type of index. For a secondary index where sB is the selection cardinality for
the join attribute B of S,21 we get:
CJ2a = bR + (|R| * (xB + 1 + sB)) + (( js * |R| * |S|)/bfrRS)
For a clustering index where sB is the selection cardinality of B, we get
CJ2b = bR + (|R| * (xB + (sB/bfrB))) + (( js * |R| * |S|)/bfrRS)
For a primary index, we get
21Selection cardinality was defined as the average number of records that satisfy an equality condition on
an attribute, which is the average number of records that have the same value for the attribute and hence
will be joined to a single record in the other file.
721
Algorithms for Query Processing and Optimization
CJ2c = bR + (|R| * (xB + 1)) + (( j s * |R| * |S|)/bfrRS)
If a hash key exists for one of the two join attributes—say, B of S—we get
CJ2d = bR + (|R| * h) + (( j s * |R| * |S|)/bfrRS)
where h ≥ 1 is the average number of block accesses to retrieve a record,
given its hash key value. Usually, h is estimated to be 1 for static and linear
hashing and 2 for extendible hashing.
■ J3—Sort-merge join. If the files are already sorted on the join attributes, the
cost function for this method is
CJ3a = bR + bS + (( j s * |R| * |S|)/bfrRS)
If we must sort the files, the cost of sorting must be added. We can use the
formulas from Section 2 to estimate the sorting cost.
Example of Using the Cost Functions. Suppose that we have the EMPLOYEE
file described in the example in the previous section, and assume that the
DEPARTMENT file in Figure A.1 consists of rD = 125 records stored in bD = 13 disk
blocks. Consider the following two join operations:
OP6: EMPLOYEE Dno=Dnumber DEPARTMENT
OP7: DEPARTMENT Mgr_ssn=Ssn EMPLOYEE
Suppose that we have a primary index on Dnumber of DEPARTMENT with xDnumber= 1
level and a secondary index on Mgr_ssn of DEPARTMENT with selection cardinality
sMgr_ssn= 1 and levels xMgr_ssn= 2. Assume that the join selectivity for OP6 is jsOP6 =
(1/|DEPARTMENT|) = 1/125 because Dnumber is a key of DEPARTMENT. Also assume
that the blocking factor for the resulting join file is bfrED= 4 records per block. We
can estimate the worst-case costs for the JOIN operation OP6 using the applicable
methods J1 and J2 as follows:
1. Using method J1 with EMPLOYEE as outer loop:
CJ1 = bE + (bE * bD) + (( jsOP6 * rE * rD)/bfrED)
= 2000 + (2000 * 13) + (((1/125) * 10,000 * 125)/4) = 30,500
2. Using method J1 with DEPARTMENT as outer loop:
CJ1 = bD + (bE * bD) + (( jsOP6 * rE * rD)/bfrED)
= 13 + (13 * 2000) + (((1/125) * 10,000 * 125/4) = 28,513
3. Using method J2 with EMPLOYEE as outer loop:
CJ2c = bE + (rE * (xDnumber+ 1)) + (( jsOP6 * rE * rD)/bfrED
= 2000 + (10,000 * 2) + (((1/125) * 10,000 * 125/4) = 24,500
4. Using method J2 with DEPARTMENT as outer loop:
CJ2a = bD + (rD * (xDno + sDno)) + (( jsOP6 * rE * rD)/bfrED)
= 13 + (125 * (2 + 80)) + (((1/125) * 10,000 * 125/4) = 12,763
Case 4 has the lowest cost estimate and will be chosen. Notice that in case 2 above, if
15 memory buffers (or more) were available for executing the join instead of just 3,
13 of them could be used to hold the entire DEPARTMENT relation (outer loop
722
Algorithms for Query Processing and Optimization
R1 R2
R3
R4
R4 R3
R2
R1
Figure 7
Two left-deep (JOIN) query trees.
relation) in memory, one could be used as buffer for the result, and one would be
used to hold one block at a time of the EMPLOYEE file (inner loop file), and the cost
for case 2 could be drastically reduced to just bE + bD + (( jsOP6 * rE * rD)/bfrED) or
4,513, as discussed in Section 3.2. If some other number of main memory buffers
was available, say nB = 10, then the cost for case 2 would be calculated as follows,
which would also give better performance than case 4:
CJ1 = bD + ( ⎡bD/(nB – 2)⎤ * bE) + ((js * |R| * |S|)/bfrRS)
= 13 + ( ⎡13/8⎤ * 2000) + (((1/125) * 10,000 * 125/4) = 28,513
= 13 + (2 * 2000) + 2500 = 6,513
As an exercise, the reader should perform a similar analysis for OP7.
8.5 Multiple Relation Queries and JOIN Ordering
The algebraic transformation rules in Section 7.2 include a commutative rule and
an associative rule for the join operation. With these rules, many equivalent join
expressions can be produced. As a result, the number of alternative query trees
grows very rapidly as the number of joins in a query increases. A query that joins n
relations will often have n − 1 join operations, and hence can have a large number of
different join orders. Estimating the cost of every possible join tree for a query with
a large number of joins will require a substantial amount of time by the query opti-
mizer. Hence, some pruning of the possible query trees is needed. Query optimizers
typically limit the structure of a (join) query tree to that of left-deep (or right-deep)
trees. A left-deep tree is a binary tree in which the right child of each nonleaf node
is always a base relation. The optimizer would choose the particular left-deep tree
with the lowest estimated cost. Two examples of left-deep trees are shown in Figure
7. (Note that the trees in Figure 5 are also left-deep trees.)
With left-deep trees, the right child is considered to be the inner relation when exe-
cuting a nested-loop join, or the probing relation when executing a single-loop join.
One advantage of left-deep (or right-deep) trees is that they are amenable to
pipelining, as discussed in Section 6. For instance, consider the first left-deep tree in
Figure 7 and assume that the join algorithm is the single-loop method; in this case,
a disk page of tuples of the outer relation is used to probe the inner relation for
723
Algorithms for Query Processing and Optimization
matching tuples. As resulting tuples (records) are produced from the join of R1 and
R2, they can be used to probe R3 to locate their matching records for joining.
Likewise, as resulting tuples are produced from this join, they could be used to
probe R4. Another advantage of left-deep (or right-deep) trees is that having a base
relation as one of the inputs of each join allows the optimizer to utilize any access
paths on that relation that may be useful in executing the join.
If materialization is used instead of pipelining (see Sections 6 and 7.3), the join
results could be materialized and stored as temporary relations. The key idea from
the optimizer’s standpoint with respect to join ordering is to find an ordering that
will reduce the size of the temporary results, since the temporary results (pipelined
or materialized) are used by subsequent operators and hence affect the execution
cost of those operators.
8.6 Example to Illustrate Cost-Based Query Optimization
We will consider query Q2 and its query tree shown in Figure 4(a) to illustrate cost-
based query optimization:
Q2: SELECT Pnumber, Dnum, Lname, Address, Bdate
FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum=Dnumber AND Mgr_ssn=Ssn AND
Plocation=‘Stafford’;
Suppose we have the information about the relations shown in Figure 8. The
LOW_VALUE and HIGH_VALUE statistics have been normalized for clarity. The tree
in Figure 4(a) is assumed to represent the result of the algebraic heuristic optimiza-
tion process and the start of cost-based optimization (in this example, we assume
that the heuristic optimizer does not push the projection operations down the tree).
The first cost-based optimization to consider is join ordering. As previously men-
tioned, we assume the optimizer considers only left-deep trees, so the potential join
orders—without CARTESIAN PRODUCT—are:
1. PROJECT DEPARTMENT EMPLOYEE
2. DEPARTMENT PROJECT EMPLOYEE
3. DEPARTMENT EMPLOYEE PROJECT
4. EMPLOYEE DEPARTMENT PROJECT
Assume that the selection operation has already been applied to the PROJECT rela-
tion. If we assume a materialized approach, then a new temporary relation is
created after each join operation. To examine the cost of join order (1), the first
join is between PROJECT and DEPARTMENT. Both the join method and the access
methods for the input relations must be determined. Since DEPARTMENT has no
index according to Figure 8, the only available access method is a table scan (that is,
a linear search). The PROJECT relation will have the selection operation performed
before the join, so two options exist: table scan (linear search) or utilizing
its PROJ_PLOC index, so the optimizer must compare their estimated costs. The
724
Algorithms for Query Processing and Optimization
(a) Table_name
PROJECT
PROJECT
PROJECT
DEPARTMENT
DEPARTMENT
EMPLOYEE
EMPLOYEE
EMPLOYEE
200
2000
50
50
50
10000
50
500
1
1
1
1
1
1
1
1
200
2000
50
50
50
10000
50
500
Dnum
Dnumber
Plocation
Pnumber
Dno
Salary
Mgr_ssn
Ssn
Column_name Num_distinct Low_value High_value
(c) Index_name
*Blevel is the number of levels without the leaf level.
PROJ_PLOC
EMP_SSN
EMP_SAL
1
1
1
4
50
50
200
10000
500NONUNIQUE
NONUNIQUE
UNIQUE
Uniqueness Blevel* Leaf_blocks Distinct_keys
(b) Table_name
PROJECT
DEPARTMENT
EMPLOYEE
100
5
200010000
2000
50
Num_rows Blocks
Figure 8
Sample statistical information for relations in Q2. (a)
Column information. (b) Table information. (c) Index
information.
statistical information on the PROJ_PLOC index (see Figure 8) shows the number
of index levels x = 2 (root plus leaf levels). The index is nonunique (because
Plocation is not a key of PROJECT), so the optimizer assumes a uniform data distri-
bution and estimates the number of record pointers for each Plocation value to be
10. This is computed from the tables in Figure 8 by multiplying Selectivity *
Num_rows, where Selectivity is estimated by 1/Num_distinct. So the cost of using the
index and accessing the records is estimated to be 12 block accesses (2 for the index
and 10 for the data blocks). The cost of a table scan is estimated to be 100 block
accesses, so the index access is more efficient as expected.
In the materialized approach, a temporary file TEMP1 of size 1 block is created to
hold the result of the selection operation. The file size is calculated by determining
the blocking factor using the formula Num_rows/Blocks, which gives 2000/100 or 20
rows per block. Hence, the 10 records selected from the PROJECT relation will fit
725
Algorithms for Query Processing and Optimization
into a single block. Now we can compute the estimated cost of the first join. We will
consider only the nested-loop join method, where the outer relation is the tempo-
rary file, TEMP1, and the inner relation is DEPARTMENT. Since the entire TEMP1 file
fits in the available buffer space, we need to read each of the DEPARTMENT table’s
five blocks only once, so the join cost is six block accesses plus the cost of writing the
temporary result file, TEMP2. The optimizer would have to determine the size of
TEMP2. Since the join attribute Dnumber is the key for DEPARTMENT, any Dnum
value from TEMP1 will join with at most one record from DEPARTMENT, so the
number of rows in TEMP2 will be equal to the number of rows in TEMP1, which is
10. The optimizer would determine the record size for TEMP2 and the number of
blocks needed to store these 10 rows. For brevity, assume that the blocking factor for
TEMP2 is five rows per block, so a total of two blocks are needed to store TEMP2.
Finally, the cost of the last join needs to be estimated. We can use a single-loop join
on TEMP2 since in this case the index EMP_SSN (see Figure 8) can be used to probe
and locate matching records from EMPLOYEE. Hence, the join method would
involve reading in each block of TEMP2 and looking up each of the five Mgr_ssn val-
ues using the EMP_SSN index. Each index lookup would require a root access, a leaf
access, and a data block access (x+1, where the number of levels x is 2). So, 10
lookups require 30 block accesses. Adding the two block accesses for TEMP2 gives a
total of 32 block accesses for this join.
For the final projection, assume pipelining is used to produce the final result, which
does not require additional block accesses, so the total cost for join order (1) is esti-
mated as the sum of the previous costs. The optimizer would then estimate costs in
a similar manner for the other three join orders and choose the one with the lowest
estimate. We leave this as an exercise for the reader.
9 Overview of Query Optimization
in Oracle
The Oracle DBMS22 provides two different approaches to query optimization: rule-
based and cost-based. With the rule-based approach, the optimizer chooses execu-
tion plans based on heuristically ranked operations. Oracle maintains a table of 15
ranked access paths, where a lower ranking implies a more efficient approach. The
access paths range from table access by ROWID (the most efficient)—where ROWID
specifies the record’s physical address that includes the data file, data block, and row
offset within the block—to a full table scan (the least efficient)—where all rows in
the table are searched by doing multiblock reads. However, the rule-based approach
is being phased out in favor of the cost-based approach, where the optimizer exam-
ines alternative access paths and operator algorithms and chooses the execution
plan with the lowest estimated cost. The estimated query cost is proportional to the
expected elapsed time needed to execute the query with the given execution plan.
22The discussion in this section is primarily based on version 7 of Oracle. More optimization techniques
have been added to subsequent versions.
726
Algorithms for Query Processing and Optimization
The Oracle optimizer calculates this cost based on the estimated usage of resources,
such as I/O, CPU time, and memory needed. The goal of cost-based optimization in
Oracle is to minimize the elapsed time to process the entire query.
An interesting addition to the Oracle query optimizer is the capability for an appli-
cation developer to specify hints to the optimizer.23 The idea is that an application
developer might know more information about the data than the optimizer. For
example, consider the EMPLOYEE table shown in Figure A.2. The Sex column of that
table has only two distinct values. If there are 10,000 employees, then the optimizer
would estimate that half are male and half are female, assuming a uniform data dis-
tribution. If a secondary index exists, it would more than likely not be used.
However, if the application developer knows that there are only 100 male employ-
ees, a hint could be specified in an SQL query whose WHERE-clause condition is Sex
= ‘M’ so that the associated index would be used in processing the query. Various
hints can be specified, such as:
■ The optimization approach for an SQL statement
■ The access path for a table accessed by the statement
■ The join order for a join statement
■ A particular join operation in a join statement
The cost-based optimization of Oracle 8 and later versions is a good example of the
sophisticated approach taken to optimize SQL queries in commercial RDBMSs.
10 Semantic Query Optimization
A different approach to query optimization, called semantic query optimization,
has been suggested. This technique, which may be used in combination with the
techniques discussed previously, uses constraints specified on the database
schema—such as unique attributes and other more complex constraints—in order
to modify one query into another query that is more efficient to execute. We will not
discuss this approach in detail but we will illustrate it with a simple example.
Consider the SQL query:
SELECT E.Lname, M.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.Super_ssn=M.Ssn AND E.Salary > M.Salary
This query retrieves the names of employees who earn more than their supervisors.
Suppose that we had a constraint on the database schema that stated that no
employee can earn more than his or her direct supervisor. If the semantic query
optimizer checks for the existence of this constraint, it does not need to execute the
query at all because it knows that the result of the query will be empty. This may
save considerable time if the constraint checking can be done efficiently. However,
searching through many constraints to find those that are applicable to a given
23Such hints have also been called query annotations.
727
Algorithms for Query Processing and Optimization
query and that may semantically optimize it can also be quite time-consuming.
With the inclusion of active rules and additional metadata in database systems,
semantic query optimization techniques are being gradually incorporated into the
DBMSs.
11 Summary
In this chapter we gave an overview of the techniques used by DBMSs in processing
and optimizing high-level queries. We first discussed how SQL queries are trans-
lated into relational algebra and then how various relational algebra operations may
be executed by a DBMS. We saw that some operations, particularly SELECT and
JOIN, may have many execution options. We also discussed how operations can be
combined during query processing to create pipelined or stream-based execution
instead of materialized execution.
Following that, we described heuristic approaches to query optimization, which use
heuristic rules and algebraic techniques to improve the efficiency of query execu-
tion. We showed how a query tree that represents a relational algebra expression can
be heuristically optimized by reorganizing the tree nodes and transforming it
into another equivalent query tree that is more efficient to execute. We also gave
equivalence-preserving transformation rules that may be applied to a query tree.
Then we introduced query execution plans for SQL queries, which add method exe-
cution plans to the query tree operations.
We discussed the cost-based approach to query optimization. We showed how cost
functions are developed for some database access algorithms and how these cost
functions are used to estimate the costs of different execution strategies. We pre-
sented an overview of the Oracle query optimizer, and we mentioned the technique
of semantic query optimization.
Review Questions
1. Discuss the reasons for converting SQL queries into relational algebra
queries before optimization is done.
2. Discuss the different algorithms for implementing each of the following
relational operators and the circumstances under which each algorithm can
be used: SELECT, JOIN, PROJECT, UNION, INTERSECT, SET DIFFERENCE,
CARTESIAN PRODUCT.
3. What is a query execution plan?
4. What is meant by the term heuristic optimization? Discuss the main heuris-
tics that are applied during query optimization.
5. How does a query tree represent a relational algebra expression? What is
meant by an execution of a query tree? Discuss the rules for transformation
of query trees and identify when each rule should be applied during opti-
mization.
728
Algorithms for Query Processing and Optimization
6. How many different join orders are there for a query that joins 10 relations?
7. What is meant by cost-based query optimization?
8. What is the difference between pipelining and materialization?
9. Discuss the cost components for a cost function that is used to estimate
query execution cost. Which cost components are used most often as the
basis for cost functions?
10. Discuss the different types of parameters that are used in cost functions.
Where is this information kept?
11. List the cost functions for the SELECT and JOIN methods discussed in
Section 8.
12. What is meant by semantic query optimization? How does it differ from
other query optimization techniques?
Exercises
13. Consider SQL queries Q1, Q8, Q1B, and Q4 in the chapter Basic SQL and
Q27 in the chapter More SQL: Complex Queries, Triggers, Views, and
Schema Modification.
a. Draw at least two query trees that can represent each of these queries.
Under what circumstances would you use each of your query trees?
b. Draw the initial query tree for each of these queries, and then show how
the query tree is optimized by the algorithm outlined in Section 7.
c. For each query, compare your own query trees of part (a) and the initial
and final query trees of part (b).
14. A file of 4096 blocks is to be sorted with an available buffer space of 64
blocks. How many passes will be needed in the merge phase of the external
sort-merge algorithm?
15. Develop cost functions for the PROJECT, UNION, INTERSECTION, SET DIF-
FERENCE, and CARTESIAN PRODUCT algorithms discussed in Section 4.
16. Develop cost functions for an algorithm that consists of two SELECTs, a
JOIN, and a final PROJECT, in terms of the cost functions for the individual
operations.
17. Can a nondense index be used in the implementation of an aggregate opera-
tor? Why or why not?
18. Calculate the cost functions for different options of executing the JOIN oper-
ation OP7 discussed in Section 3.2.
19. Develop formulas for the hybrid hash-join algorithm for calculating the size
of the buffer for the first bucket. Develop more accurate cost estimation for-
mulas for the algorithm.
729
Algorithms for Query Processing and Optimization
20. Estimate the cost of operations OP6 and OP7, using the formulas developed
in Exercise 9.
21. Extend the sort-merge join algorithm to implement the LEFT OUTER JOIN
operation.
22. Compare the cost of two different query plans for the following query:
σSalary > 40000(EMPLOYEE Dno=DnumberDEPARTMENT)
Use the database statistics in Figure 8.
Selected Bibliography
A detailed algorithm for relational algebra optimization is given by Smith and
Chang (1975). The Ph.D. thesis of Kooi (1980) provides a foundation for query pro-
cessing techniques. A survey paper by Jarke and Koch (1984) gives a taxonomy of
query optimization and includes a bibliography of work in this area. A survey by
Graefe (1993) discusses query execution in database systems and includes an exten-
sive bibliography.
Whang (1985) discusses query optimization in OBE (Office-By-Example), which is
a system based on the language QBE. Cost-based optimization was introduced in
the SYSTEM R experimental DBMS and is discussed in Astrahan et al. (1976).
Selinger et al. (1979) is a classic paper that discussed cost-based optimization of
multiway joins in SYSTEM R. Join algorithms are discussed in Gotlieb (1975),
Blasgen and Eswaran (1976), and Whang et al. (1982). Hashing algorithms for
implementing joins are described and analyzed in DeWitt et al. (1984),
Bratbergsengen (1984), Shapiro (1986), Kitsuregawa et al. (1989), and Blakeley and
Martin (1990), among others. Approaches to finding a good join order are pre-
sented in Ioannidis and Kang (1990) and in Swami and Gupta (1989). A discussion
of the implications of left-deep and bushy join trees is presented in Ioannidis and
Kang (1991). Kim (1982) discusses transformations of nested SQL queries into
canonical representations. Optimization of aggregate functions is discussed in Klug
(1982) and Muralikrishna (1992). Salzberg et al. (1990) describe a fast external sort-
ing algorithm. Estimating the size of temporary relations is crucial for query opti-
mization. Sampling-based estimation schemes are presented in Haas et al. (1995)
and in Haas and Swami (1995). Lipton et al. (1990) also discuss selectivity estima-
tion. Having the database system store and use more detailed statistics in the form
of histograms is the topic of Muralikrishna and DeWitt (1988) and Poosala et al.
(1996).
Kim et al. (1985) discuss advanced topics in query optimization. Semantic query
optimization is discussed in King (1981) and Malley and Zdonick (1986). Work on
semantic query optimization is reported in Chakravarthy et al. (1990), Shenoy and
Ozsoyoglu (1989), and Siegel et al. (1992).
730
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.1
Schema diagram for the
COMPANY relational
database schema.
731
Algorithms for Query Processing and Optimization
DEPT_LOCATIONS
Dnumber
Houston
Stafford
Bellaire
Sugarland
Dlocation
DEPARTMENT
Dname
Research
Administration
Headquarters 1
5
4
888665555
333445555
987654321
1981-06-19
1988-05-22
1995-01-01
Dnumber Mgr_ssn Mgr_start_date
WORKS_ON
Essn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
Pno Hours
PROJECT
Pname
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
3
1
2
30
10
20
5
5
5
4
4
1
Houston
Bellaire
Sugarland
Stafford
Stafford
Houston
Pnumber Plocation Dnum
DEPENDENT
333445555
333445555
333445555
987654321
123456789
123456789
123456789
Joy
Alice F
M
F
M
M
F
F
1986-04-05
1983-10-25
1958-05-03
1942-02-28
1988-01-04
1988-12-30
1967-05-05
Theodore
Alice
Elizabeth
Abner
Michael
Spouse
Daughter
Son
Daughter
Spouse
Spouse
Son
Dependent_name Sex Bdate Relationship
EMPLOYEE
Fname
John
Franklin
Jennifer
Alicia
Ramesh
Joyce
James
Ahmad
Narayan
English
Borg
Jabbar
666884444
453453453
888665555
987987987
F
F
M
M
M
M
M
F
4
4
5
5
4
1
5
5
25000
43000
30000
40000
25000
55000
38000
25000
987654321
888665555
333445555
888665555
987654321
NULL
333445555
333445555
Zelaya
Wallace
Smith
Wong
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
731 Fondren, Houston, TX
638 Voss, Houston, TX
1968-01-19
1941-06-20
1965-01-09
1955-12-08
1969-03-29
1937-11-10
1962-09-15
1972-07-31
980 Dallas, Houston, TX
450 Stone, Houston, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
999887777
987654321
123456789
333445555
Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn
B
T
J
S
K
A
V
E
Houston
1
4
5
5
Essn
5
Figure A.2
One possible database state for the COMPANY relational database schema.
732
Physical Database
Design and Tuning
Various techniques by which queries can beprocessed efficiently by the DBMS are mostly
internal to the DBMS and invisible to the programmer. In this chapter we discuss
additional issues that affect the performance of an application running on a DBMS.
In particular, we discuss some of the options available to database administrators
and programmers for storing databases, and some of the heuristics, rules, and tech-
niques that they can use to tune the database for performance improvement. First,
in Section 1, we discuss the issues that arise in physical database design dealing with
storage and access of data. Then, in Section 2, we discuss how to improve database
performance through tuning, indexing of data, database design, and the queries
themselves.
1 Physical Database Design
in Relational Databases
In this section, we begin by discussing the physical design factors that affect the per-
formance of applications and transactions, and then we comment on the specific
guidelines for RDBMSs.
1.1 Factors That Influence Physical Database Design
Physical design is an activity where the goal is not only to create the appropriate
structuring of data in storage, but also to do so in a way that guarantees good per-
formance. For a given conceptual schema, there are many physical design alterna-
tives in a given DBMS. It is not possible to make meaningful physical design
From Chapter 20 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
733
Physical Database Design and Tuning
decisions and performance analyses until the database designer knows the mix of
queries, transactions, and applications that are expected to run on the database.
This is called the job mix for the particular set of database system applications. The
database administrators/designers must analyze these applications, their expected
frequencies of invocation, any timing constraints on their execution speed, the
expected frequency of update operations, and any unique constraints on attributes.
We discuss each of these factors next.
A. Analyzing the Database Queries and Transactions. Before undertaking
the physical database design, we must have a good idea of the intended use of the
database by defining in a high-level form the queries and transactions that are
expected to run on the database. For each retrieval query, the following informa-
tion about the query would be needed:
1. The files that will be accessed by the query.1
2. The attributes on which any selection conditions for the query are specified.
3. Whether the selection condition is an equality, inequality, or a range condi-
tion.
4. The attributes on which any join conditions or conditions to link multiple
tables or objects for the query are specified.
5. The attributes whose values will be retrieved by the query.
The attributes listed in items 2 and 4 above are candidates for the definition of
access structures, such as indexes, hash keys, or sorting of the file.
For each update operation or update transaction, the following information
would be needed:
1. The files that will be updated.
2. The type of operation on each file (insert, update, or delete).
3. The attributes on which selection conditions for a delete or update are spec-
ified.
4. The attributes whose values will be changed by an update operation.
Again, the attributes listed in item 3 are candidates for access structures on the files,
because they would be used to locate the records that will be updated or deleted. On
the other hand, the attributes listed in item 4 are candidates for avoiding an access
structure, since modifying them will require updating the access structures.
B. Analyzing the Expected Frequency of Invocation of Queries and
Transactions. Besides identifying the characteristics of expected retrieval queries
and update transactions, we must consider their expected rates of invocation. This
frequency information, along with the attribute information collected on each
query and transaction, is used to compile a cumulative list of the expected fre-
quency of use for all queries and transactions. This is expressed as the expected fre-
quency of using each attribute in each file as a selection attribute or a join attribute,
1For simplicity we use the term files here, but this can also mean tables or relations.
734
Physical Database Design and Tuning
over all the queries and transactions. Generally, for large volumes of processing, the
informal 80–20 rule can be used: approximately 80 percent of the processing is
accounted for by only 20 percent of the queries and transactions. Therefore, in prac-
tical situations, it is rarely necessary to collect exhaustive statistics and invocation
rates on all the queries and transactions; it is sufficient to determine the 20 percent
or so most important ones.
C. Analyzing the Time Constraints of Queries and Transactions. Some
queries and transactions may have stringent performance constraints. For example,
a transaction may have the constraint that it should terminate within 5 seconds on
95 percent of the occasions when it is invoked, and that it should never take more
than 20 seconds. Such timing constraints place further priorities on the attributes
that are candidates for access paths. The selection attributes used by queries and
transactions with time constraints become higher-priority candidates for primary
access structures for the files, because the primary access structures are generally the
most efficient for locating records in a file.
D. Analyzing the Expected Frequencies of Update Operations. A minimum
number of access paths should be specified for a file that is frequently updated,
because updating the access paths themselves slows down the update operations. For
example, if a file that has frequent record insertions has 10 indexes on 10 different
attributes, each of these indexes must be updated whenever a new record is inserted.
The overhead for updating 10 indexes can slow down the insert operations.
E. Analyzing the Uniqueness Constraints on Attributes. Access paths should
be specified on all candidate key attributes—or sets of attributes—that are either the
primary key of a file or unique attributes. The existence of an index (or other access
path) makes it sufficient to only search the index when checking this uniqueness
constraint, since all values of the attribute will exist in the leaf nodes of the index.
For example, when inserting a new record, if a key attribute value of the new record
already exists in the index, the insertion of the new record should be rejected, since it
would violate the uniqueness constraint on the attribute.
Once the preceding information is compiled, it is possible to address the physical
database design decisions, which consist mainly of deciding on the storage struc-
tures and access paths for the database files.
1.2 Physical Database Design Decisions
Most relational systems represent each base relation as a physical database file. The
access path options include specifying the type of primary file organization for each
relation and the attributes of which indexes that should be defined. At most, one of
the indexes on each file may be a primary or a clustering index. Any number of
additional secondary indexes can be created.2
2The reader should review the various types of indexes and be familiar with the algorithms for query
processing.
735
Physical Database Design and Tuning
Design Decisions about Indexing. The attributes whose values are required in
equality or range conditions (selection operation) are those that are keys or that
participate in join conditions (join operation) requiring access paths, such as
indexes.
The performance of queries largely depends upon what indexes or hashing schemes
exist to expedite the processing of selections and joins. On the other hand, during
insert, delete, or update operations, the existence of indexes adds to the overhead.
This overhead must be justified in terms of the gain in efficiency by expediting
queries and transactions.
The physical design decisions for indexing fall into the following categories:
1. Whether to index an attribute. The general rules for creating an index on an
attribute are that the attribute must either be a key (unique), or there must
be some query that uses that attribute either in a selection condition (equal-
ity or range of values) or in a join condition. One reason for creating multi-
ple indexes is that some operations can be processed by just scanning the
indexes, without having to access the actual data file.
2. What attribute or attributes to index on. An index can be constructed on a
single attribute, or on more than one attribute if it is a composite index. If
multiple attributes from one relation are involved together in several queries,
(for example, (Garment_style_#, Color) in a garment inventory database), a
multiattribute (composite) index is warranted. The ordering of attributes
within a multiattribute index must correspond to the queries. For instance,
the above index assumes that queries would be based on an ordering of col-
ors within a Garment_style_# rather than vice versa.
3. Whether to set up a clustered index. At most, one index per table can be a
primary or clustering index, because this implies that the file be physically
ordered on that attribute. In most RDBMSs, this is specified by the keyword
CLUSTER. (If the attribute is a key, a primary index is created, whereas a
clustering index is created if the attribute is not a key.) If a table requires sev-
eral indexes, the decision about which one should be the primary or cluster-
ing index depends upon whether keeping the table ordered on that attribute
is needed. Range queries benefit a great deal from clustering. If several
attributes require range queries, relative benefits must be evaluated before
deciding which attribute to cluster on. If a query is to be answered by doing
an index search only (without retrieving data records), the corresponding
index should not be clustered, since the main benefit of clustering is achieved
when retrieving the records themselves. A clustering index may be set up as a
multiattribute index if range retrieval by that composite key is useful in
report creation (for example, an index on Zip_code, Store_id, and Product_id
may be a clustering index for sales data).
4. Whether to use a hash index over a tree index. In general, RDBMSs use B+-
trees for indexing. However, ISAM and hash indexes are also provided in
some systems. B+-trees support both equality and range queries on the
attribute used as the search key. Hash indexes work well with equality
736
Physical Database Design and Tuning
conditions, particularly during joins to find a matching record(s), but they
do not support range queries.
5. Whether to use dynamic hashing for the file. For files that are very
volatile—that is, those that grow and shrink continuously—one of the
dynamic hashing schemes would be suitable. Currently, they are not offered
by many commercial RDBMSs.
How to Create an Index. Many RDBMSs have a similar type of command for
creating an index, although it is not part of the SQL standard. The general form of
this command is:
CREATE [ UNIQUE ] INDEX
ON
( [ ] { , [ ] } )
[ CLUSTER ] ;
The keywords UNIQUE and CLUSTER are optional. The keyword CLUSTER is used
when the index to be created should also sort the data file records on the indexing
attribute. Thus, specifying CLUSTER on a key (unique) attribute would create some
variation of a primary index, whereas specifying CLUSTER on a nonkey
(nonunique) attribute would create some variation of a clustering index. The value
for can be either ASC (ascending) or DESC (descending), and specifies
whether the data file should be ordered in ascending or descending values of the
indexing attribute. The default is ASC. For example, the following would create a
clustering (ascending) index on the nonkey attribute Dno of the EMPLOYEE file:
CREATE INDEX DnoIndex
ON EMPLOYEE (Dno)
CLUSTER ;
Denormalization as a Design Decision for Speeding Up Queries. The ulti-
mate goal during normalization is to separate attributes into tables to minimize
redundancy, and thereby avoid the update anomalies that lead to an extra process-
ing overhead to maintain consistency in the database. The ideals that are typically
followed are the third or Boyce-Codd normal forms.
The above ideals are sometimes sacrificed in favor of faster execution of frequently
occurring queries and transactions. This process of storing the logical database
design (which may be in BCNF or 4NF) in a weaker normal form, say 2NF or 1NF,
is called denormalization. Typically, the designer includes certain attributes from a
table S into another table R. The reason is that the attributes from S that are
included in R are frequently needed—along with other attributes in R—for answer-
ing queries or producing reports. By including these attributes, a join of R with S is
avoided for these frequently occurring queries and reports. This reintroduces
redundancy in the base tables by including the same attributes in both tables R and
S. A partial functional dependency or a transitive dependency now exists in the table
R, thereby creating the associated redundancy problems. A tradeoff exists between
the additional updating needed for maintaining consistency of redundant attributes
737
Physical Database Design and Tuning
versus the effort needed to perform a join to incorporate the additional attributes
needed in the result. For example, consider the following relation:
ASSIGN (Emp_id, Proj_id, Emp_name, Emp_job_title, Percent_assigned, Proj_name,
Proj_mgr_id, Proj_mgr_name),
which corresponds exactly to the headers in a report called The Employee
Assignment Roster.
This relation is only in 1NF because of the following functional dependencies:
Proj_id → Proj_name, Proj_mgr_id
Proj_mgr_id → Proj_mgr_name
Emp_id → Emp_name, Emp_job_title
This relation may be preferred over the design in 2NF (and 3NF) consisting of the
following three relations:
EMP (Emp_id, Emp_name, Emp_job_title)
PROJ (Proj_id, Proj_name, Proj_mgr_id)
EMP_PROJ (Emp_id, Proj_id, Percent_assigned)
This is because to produce the The Employee Assignment Roster report (with all
fields shown in ASSIGN above), the latter multirelation design requires two
NATURAL JOIN (indicated with *) operations (between EMP and EMP_PROJ, and
between PROJ and EMP_PROJ), plus a final JOIN between PROJ and EMP to retrieve
the Proj_mgr_name from the Proj_mgr_id. Thus the following JOINs would be needed
(the final join would also require renaming (aliasing) of the last EMP table, which is
not shown):
((EMP_PROJ * EMP) * PROJ) PROJ.Proj_mgr_id = EMP.Emp_id EMP
It is also possible to create a view for the ASSIGN table. This does not mean that
the join operations will be avoided, but that the user need not specify the joins. If
the view table is materialized, the joins would be avoided, but if the virtual view
table is not stored as a materialized file, the join computations would still be nec-
essary. Other forms of denormalization consist of storing extra tables to maintain
original functional dependencies that are lost during BCNF decomposition. For
example, Figure A.1 (in Appendix: Figure at the end of this chapter) shows the
TEACH(Student, Course, Instructor) relation with the functional dependencies
{{Student, Course} → Instructor, Instructor → Course}. A lossless decomposition of
TEACH into T1(Student, Instructor) and T2(Instructor, Course) does not allow queries
of the form what course did student Smith take from instructor Navathe to be
answered without joining T1 and T2. Therefore, storing T1, T2, and TEACH may be
a possible solution, which reduces the design from BCNF to 3NF. Here, TEACH is a
materialized join of the other two tables, representing an extreme redundancy. Any
updates to T1 and T2 would have to be applied to TEACH. An alternate strategy is
to create T1 and T2 as updatable base tables, and to create TEACH as a view (virtual
table) on T1 and T2 that can only be queried.
738
Physical Database Design and Tuning
2 An Overview of Database Tuning
in Relational Systems
After a database is deployed and is in operation, actual use of the applications, trans-
actions, queries, and views reveals factors and problem areas that may not have been
accounted for during the initial physical design. The inputs to physical design listed
in Section 1.1 can be revised by gathering actual statistics about usage patterns.
Resource utilization as well as internal DBMS processing—such as query optimiza-
tion—can be monitored to reveal bottlenecks, such as contention for the same data
or devices. Volumes of activity and sizes of data can be better estimated. Therefore, it
is necessary to monitor and revise the physical database design constantly—an activ-
ity referred to as database tuning. The goals of tuning are as follows:
■ To make applications run faster.
■ To improve (lower) the response time of queries and transactions.
■ To improve the overall throughput of transactions.
The dividing line between physical design and tuning is very thin. The same design
decisions that we discussed in Section 1.2 are revisited during database tuning,
which is a continual adjustment of the physical design. We give a brief overview of
the tuning process below.3 The inputs to the tuning process include statistics related
to the same factors mentioned in Section 1.1. In particular, DBMSs can internally
collect the following statistics:
■ Sizes of individual tables.
■ Number of distinct values in a column.
■ The number of times a particular query or transaction is submitted and exe-
cuted in an interval of time.
■ The times required for different phases of query and transaction processing
(for a given set of queries or transactions).
These and other statistics create a profile of the contents and use of the database.
Other information obtained from monitoring the database system activities and
processes includes the following:
■ Storage statistics. Data about allocation of storage into tablespaces, index-
spaces, and buffer pools.
■ I/O and device performance statistics. Total read/write activity (paging) on
disk extents and disk hot spots.
■ Query/transaction processing statistics. Execution times of queries and
transactions, and optimization times during query optimization.
3Interested readers should consult Shasha and Bonnet (2002) for a detailed discussion of tuning.
739
Physical Database Design and Tuning
■ Locking/logging related statistics. Rates of issuing different types of locks,
transaction throughput rates, and log records activity.
■ Index statistics. Number of levels in an index, number of noncontiguous
leaf pages, and so on.
Some of the above statistics relate to transactions, concurrency control, and recov-
ery. Tuning a database involves dealing with the following types of problems:
■ How to avoid excessive lock contention, thereby increasing concurrency
among transactions.
■ How to minimize the overhead of logging and unnecessary dumping of data.
■ How to optimize the buffer size and scheduling of processes.
■ How to allocate resources such as disks, RAM, and processes for most effi-
cient utilization.
Most of the previously mentioned problems can be solved by the DBA by setting
appropriate physical DBMS parameters, changing configurations of devices, chang-
ing operating system parameters, and other similar activities. The solutions tend to
be closely tied to specific systems. The DBAs are typically trained to handle these
tuning problems for the specific DBMS. We briefly discuss the tuning of various
physical database design decisions below.
2.1 Tuning Indexes
The initial choice of indexes may have to be revised for the following reasons:
■ Certain queries may take too long to run for lack of an index.
■ Certain indexes may not get utilized at all.
■ Certain indexes may undergo too much updating because the index is on an
attribute that undergoes frequent changes.
Most DBMSs have a command or trace facility, which can be used by the DBA to ask
the system to show how a query was executed—what operations were performed in
what order and what secondary access structures (indexes) were used. By analyzing
these execution plans, it is possible to diagnose the causes of the above problems.
Some indexes may be dropped and some new indexes may be created based on the
tuning analysis.
The goal of tuning is to dynamically evaluate the requirements, which sometimes
fluctuate seasonally or during different times of the month or week, and to reorgan-
ize the indexes and file organizations to yield the best overall performance.
Dropping and building new indexes is an overhead that can be justified in terms of
performance improvements. Updating of a table is generally suspended while an
740
Physical Database Design and Tuning
index is dropped or created; this loss of service must be accounted for. Besides drop-
ping or creating indexes and changing from a nonclustered to a clustered index and
vice versa, rebuilding the index may improve performance. Most RDBMSs use
B+-trees for an index. If there are many deletions on the index key, index pages may
contain wasted space, which can be claimed during a rebuild operation. Similarly,
too many insertions may cause overflows in a clustered index that affect perfor-
mance. Rebuilding a clustered index amounts to reorganizing the entire table
ordered on that key.
The available options for indexing and the way they are defined, created, and reor-
ganized varies from system to system. As an illustration, consider the sparse and
dense indexes. A sparse index such as a primary index will have one index pointer for
each page (disk block) in the data file; a dense index such as a unique secondary
index will have an index pointer for each record. Sybase provides clustering indexes
as sparse indexes in the form of B+-trees, whereas INGRES provides sparse clustering
indexes as ISAM files and dense clustering indexes as B+-trees. In some versions of
Oracle and DB2, the option of setting up a clustering index is limited to a dense
index (with many more index entries), and the DBA has to work with this limitation.
2.2 Tuning the Database Design
In Section 1.2, we discussed the need for a possible denormalization, which is a
departure from keeping all tables as BCNF relations. If a given physical database
design does not meet the expected objectives, the DBA may revert to the logical
database design, make adjustments such as denormalizations to the logical schema,
and remap it to a new set of physical tables and indexes.
As discussed, the entire database design has to be driven by the processing require-
ments as much as by data requirements. If the processing requirements are dynam-
ically changing, the design needs to respond by making changes to the conceptual
schema if necessary and to reflect those changes into the logical schema and physi-
cal design. These changes may be of the following nature:
■ Existing tables may be joined (denormalized) because certain attributes
from two or more tables are frequently needed together: This reduces the
normalization level from BCNF to 3NF, 2NF, or 1NF.4
■ For the given set of tables, there may be alternative design choices, all of
which achieve 3NF or BCNF. One normalized design may be replaced by
another.
■ A relation of the form R(K,A, B, C, D, …)—with K as a set of key attributes—
that is in BCNF can be stored in multiple tables that are also in BCNF—for
example, R1(K, A, B), R2(K, C, D, ), R3(K, …)—by replicating the key K in each
table. Such a process is known as vertical partitioning. Each table groups
4Note that 3NF and 2NF address different types of problem dependencies that are independent of each
other; hence, the normalization (or denormalization) order between them is arbitrary.
741
Physical Database Design and Tuning
sets of attributes that are accessed together. For example, the table
EMPLOYEE(Ssn, Name, Phone, Grade, Salary) may be split into two tables:
EMP1(Ssn, Name, Phone) and EMP2(Ssn, Grade, Salary). If the original table
has a large number of rows (say 100,000) and queries about phone numbers
and salary information are totally distinct and occur with very different fre-
quencies, then this separation of tables may work better.
■ Attribute(s) from one table may be repeated in another even though this cre-
ates redundancy and a potential anomaly. For example, Part_name may be
replicated in tables wherever the Part# appears (as foreign key), but there
may be one master table called PART_MASTER(Part#, Part_name, …) where
the Partname is guaranteed to be up-to-date.
■ Just as vertical partitioning splits a table vertically into multiple tables,
horizontal partitioning takes horizontal slices of a table and stores them as
distinct tables. For example, product sales data may be separated into ten
tables based on ten product lines. Each table has the same set of columns
(attributes) but contains a distinct set of products (tuples). If a query or
transaction applies to all product data, it may have to run against all the
tables and the results may have to be combined.
These types of adjustments designed to meet the high volume of queries or transac-
tions, with or without sacrificing the normal forms, are commonplace in practice.
2.3 Tuning Queries
We already discussed how query performance is dependent upon the appropriate
selection of indexes, and how indexes may have to be tuned after analyzing queries
that give poor performance by using the commands in the RDBMS that show the
execution plan of the query. There are mainly two indications that suggest that
query tuning may be needed:
1. A query issues too many disk accesses (for example, an exact match query
scans an entire table).
2. The query plan shows that relevant indexes are not being used.
Some typical instances of situations prompting query tuning include the following:
1. Many query optimizers do not use indexes in the presence of arithmetic
expressions (such as Salary/365 > 10.50), numerical comparisons of attrib-
utes of different sizes and precision (such as Aqty = Bqty where Aqty is of type
INTEGER and Bqty is of type SMALLINTEGER), NULL comparisons (such as
Bdate IS NULL), and substring comparisons (such as Lname LIKE ‘%mann’).
2. Indexes are often not used for nested queries using IN; for example, the fol-
lowing query:
SELECT Ssn FROM EMPLOYEE
WHERE Dno IN ( SELECT Dnumber FROM DEPARTMENT
WHERE Mgr_ssn = ‘333445555’ );
742
Physical Database Design and Tuning
may not use the index on Dno in EMPLOYEE, whereas using Dno = Dnumber
in the WHERE-clause with a single block query may cause the index to be
used.
3. Some DISTINCTs may be redundant and can be avoided without changing
the result. A DISTINCT often causes a sort operation and must be avoided as
much as possible.
4. Unnecessary use of temporary result tables can be avoided by collapsing
multiple queries into a single query unless the temporary relation is needed
for some intermediate processing.
5. In some situations involving the use of correlated queries, temporaries are
useful. Consider the following query, which retrieves the highest paid
employee in each department:
SELECT Ssn
FROM EMPLOYEE E
WHERE Salary = SELECT MAX (Salary)
FROM EMPLOYEE AS M
WHERE M.Dno = E.Dno;
This has the potential danger of searching all of the inner EMPLOYEE table M
for each tuple from the outer EMPLOYEE table E. To make the execution
more efficient, the process can be broken into two queries, where the first
query just computes the maximum salary in each department as follows:
SELECT MAX (Salary) AS High_salary, Dno INTO TEMP
FROM EMPLOYEE
GROUP BY Dno;
SELECT EMPLOYEE.Ssn
FROM EMPLOYEE, TEMP
WHERE EMPLOYEE.Salary = TEMP.High_salary
AND EMPLOYEE.Dno = TEMP.Dno;
6. If multiple options for a join condition are possible, choose one that uses a
clustering index and avoid those that contain string comparisons. For exam-
ple, assuming that the Name attribute is a candidate key in EMPLOYEE and
STUDENT, it is better to use EMPLOYEE.Ssn = STUDENT.Ssn as a join condi-
tion rather than EMPLOYEE.Name = STUDENT.Name if Ssn has a clustering
index in one or both tables.
7. One idiosyncrasy with some query optimizers is that the order of tables in
the FROM-clause may affect the join processing. If that is the case, one may
have to switch this order so that the smaller of the two relations is scanned
and the larger relation is used with an appropriate index.
8. Some query optimizers perform worse on nested queries compared to their
equivalent unnested counterparts. There are four types of nested queries:
■ Uncorrelated subqueries with aggregates in an inner query.
■ Uncorrelated subqueries without aggregates.
■ Correlated subqueries with aggregates in an inner query.
743
Physical Database Design and Tuning
■ Correlated subqueries without aggregates.
Of the four types above, the first one typically presents no problem, since
most query optimizers evaluate the inner query once. However, for a query
of the second type, such as the example in item 2, most query optimizers
may not use an index on Dno in EMPLOYEE. However, the same optimizers
may do so if the query is written as an unnested query. Transformation of
correlated subqueries may involve setting temporary tables. Detailed exam-
ples are outside our scope here.5
9. Finally, many applications are based on views that define the data of interest
to those applications. Sometimes, these views become overkill, because a
query may be posed directly against a base table, rather than going through a
view that is defined by a JOIN.
2.4 Additional Query Tuning Guidelines
Additional techniques for improving queries apply in certain situations as follows:
1. A query with multiple selection conditions that are connected via OR may
not be prompting the query optimizer to use any index. Such a query may be
split up and expressed as a union of queries, each with a condition on an
attribute that causes an index to be used. For example,
SELECT Fname, Lname, Salary, Age6
FROM EMPLOYEE
WHERE Age > 45 OR Salary < 50000;
may be executed using sequential scan giving poor performance. Splitting it
up as
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Age > 45
UNION
SELECT Fname, Lname, Salary, Age
FROM EMPLOYEE
WHERE Salary < 50000;
may utilize indexes on Age as well as on Salary.
2. To help expedite a query, the following transformations may be tried:
■ NOT condition may be transformed into a positive expression.
■ Embedded SELECT blocks using IN, = ALL, = ANY, and = SOME may be
replaced by joins.
■ If an equality join is set up between two tables, the range predicate (selec-
tion condition) on the joining attribute set up in one table may be
repeated for the other table.
5For further details, see Shasha and Bonnet (2002).
6We modified the schema and used Age in EMPLOYEE instead of Bdate.
744
Physical Database Design and Tuning
3. WHERE conditions may be rewritten to utilize the indexes on multiple
columns. For example,
SELECT Region#, Prod_type, Month, Sales
FROM SALES_STATISTICS
WHERE Region# = 3 AND ((Prod_type BETWEEN 1 AND 3) OR (Prod_type
BETWEEN 8 AND 10));
may use an index only on Region# and search through all leaf pages of the
index for a match on Prod_type. Instead, using
SELECT Region#, Prod_type, Month, Sales
FROM SALES_STATISTICS
WHERE (Region# = 3 AND (Prod_type BETWEEN 1 AND 3))
OR (Region# = 3 AND (Prod_type BETWEEN 8 AND 10));
may use a composite index on (Region#, Prod_type) and work much more
efficiently.
In this section, we have covered many of the common instances where the ineffi-
ciency of a query may be fixed by some simple corrective action such as using a tem-
porary table, avoiding certain types of query constructs, or avoiding the use of
views. The goal is to have the RDBMS use existing single attribute or composite
attribute indexes as much as possible. This avoids full scans of data blocks or entire
scanning of index leaf nodes. Redundant processes like sorting must be avoided at
any cost. The problems and the remedies will depend upon the workings of a query
optimizer within an RDBMS. Detailed literature exists in database tuning guidelines
for database administration by the RDBMS vendors. Major relational DBMS ven-
dors like Oracle, IBM and Microsoft encourage their large customers to share ideas
of tuning at the annual expos and other forums so that the entire industry benefits
by using performance enhancement techniques. These techniques are typically
available in trade literature and on various Web sites.
3 Summary
In this chapter, we discussed the factors that affect physical database design deci-
sions and provided guidelines for choosing among physical design alternatives. We
discussed changes to logical design such as denormalization, as well as modifica-
tions of indexes, and changes to queries to illustrate different techniques for data-
base performance tuning. These are only a representative sample of a large number
of measures and techniques adopted in the design of large commercial applications
of relational DBMSs.
Review Questions
1. What are the important factors that influence physical database design?
2. Discuss the decisions made during physical database design.
3. Discuss the guidelines for physical database design in RDBMSs.
745
Physical Database Design and Tuning
4. Discuss the types of modifications that may be applied to the logical data-
base design of a relational database.
5. Under what situations would denormalization of a database schema be
used? Give examples of denormalization.
6. Discuss the tuning of indexes for relational databases.
7. Discuss the considerations for reevaluating and modifying SQL queries.
8. Illustrate the types of changes to SQL queries that may be worth considering
for improving the performance during database tuning.
Selected Bibliography
Wiederhold (1987) covers issues related to physical design. O’Neil and O’Neil
(2001) has a detailed discussion of physical design and transaction issues in refer-
ence to commercial RDBMSs. Navathe and Kerschberg (1986) discuss all phases of
database design and point out the role of data dictionaries. Rozen and Shasha
(1991) and Carlis and March (1984) present different models for the problem of
physical database design. Shasha and Bonnet (2002) has an elaborate discussion of
guidelines for database tuning. Niemiec (2008) is one among several books available
for Oracle database administration and tuning; Schneider (2006) is focused on
designing and tuning MySQL databases.
TEACH
Student
Narayan
Smith
Smith
Smith
Mark
Navathe
Ammar
Schulman
Operating Systems
Database
Database
Theory
Wallace
Wallace
Wong
Zelaya
Mark
Ahamad
Omiecinski
Navathe
Database
Database
Operating Systems
Database
Course Instructor
Narayan Operating Systems Ammar
Figure A.1
A relation TEACH that is in
3NF but not BCNF.
746
Introduction to Transaction
Processing Concepts
and Theory
The concept of transaction provides a mechanismfor describing logical units of database processing.
Transaction processing systems are systems with large databases and hundreds of
concurrent users executing database transactions. Examples of such systems include
airline reservations, banking, credit card processing, online retail purchasing, stock
markets, supermarket checkouts, and many other applications. These systems
require high availability and fast response time for hundreds of concurrent users. In
this chapter we present the concepts that are needed in transaction processing sys-
tems. We define the concept of a transaction, which is used to represent a logical
unit of database processing that must be completed in its entirety to ensure correct-
ness. A transaction is typically implemented by a computer program, which
includes database commands such as retrievals, insertions, deletions, and updates.
In this chapter, we focus on the basic concepts and theory that are needed to ensure
the correct executions of transactions. We discuss the concurrency control problem,
which occurs when multiple transactions submitted by various users interfere with
one another in a way that produces incorrect results. We also discuss the problems
that can occur when transactions fail, and how the database system can recover
from various types of failures.
This chapter is organized as follows. Section 1 informally discusses why concur-
rency control and recovery are necessary in a database system. Section 2 defines the
term transaction and discusses additional concepts related to transaction processing
in database systems. Section 3 presents the important properties of atomicity, con-
sistency preservation, isolation, and durability or permanency—called the ACID
From Chapter 21 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
747
Introduction to Transaction Processing Concepts and Theory
properties—that are considered desirable in transaction processing systems. Section
4 introduces the concept of schedules (or histories) of executing transactions and
characterizes the recoverability of schedules. Section 5 discusses the notion of
serializability of concurrent transaction execution, which can be used to define cor-
rect execution sequences (or schedules) of concurrent transactions. In Section 6, we
present some of the commands that support the transaction concept in SQL.
Section 7 summarizes the chapter.
1 Introduction to Transaction Processing
In this section we discuss the concepts of concurrent execution of transactions and
recovery from transaction failures. Section 1.1 compares single-user and multiuser
database systems and demonstrates how concurrent execution of transactions can
take place in multiuser systems. Section 1.2 defines the concept of transaction and
presents a simple model of transaction execution based on read and write database
operations. This model is used as the basis for defining and formalizing concur-
rency control and recovery concepts. Section 1.3 uses informal examples to show
why concurrency control techniques are needed in multiuser systems. Finally,
Section 1.4 discusses why techniques are needed to handle recovery from system
and transaction failures by discussing the different ways in which transactions can
fail while executing.
1.1 Single-User versus Multiuser Systems
One criterion for classifying a database system is according to the number of users
who can use the system concurrently. A DBMS is single-user if at most one user at
a time can use the system, and it is multiuser if many users can use the system—and
hence access the database—concurrently. Single-user DBMSs are mostly restricted
to personal computer systems; most other DBMSs are multiuser. For example, an
airline reservations system is used by hundreds of travel agents and reservation
clerks concurrently. Database systems used in banks, insurance agencies, stock
exchanges, supermarkets, and many other applications are multiuser systems. In
these systems, hundreds or thousands of users are typically operating on the data-
base by submitting transactions concurrently to the system.
Multiple users can access databases—and use computer systems—simultaneously
because of the concept of multiprogramming, which allows the operating system
of the computer to execute multiple programs—or processes—at the same time. A
single central processing unit (CPU) can only execute at most one process at a time.
However, multiprogramming operating systems execute some commands from
one process, then suspend that process and execute some commands from the next
748
Introduction to Transaction Processing Concepts and Theory
A A
B B
C
D
CPU1
CPU2
t1 t2 t3 t4
Time
Figure 1
Interleaved process-
ing versus parallel
processing of con-
current transactions.
process, and so on. A process is resumed at the point where it was suspended when-
ever it gets its turn to use the CPU again. Hence, concurrent execution of processes
is actually interleaved, as illustrated in Figure 1, which shows two processes, A and
B, executing concurrently in an interleaved fashion. Interleaving keeps the CPU
busy when a process requires an input or output (I/O) operation, such as reading a
block from disk. The CPU is switched to execute another process rather than
remaining idle during I/O time. Interleaving also prevents a long process from
delaying other processes.
If the computer system has multiple hardware processors (CPUs), parallel process-
ing of multiple processes is possible, as illustrated by processes C and D in Figure 1.
Most of the theory concerning concurrency control in databases is developed in
terms of interleaved concurrency, so for the remainder of this chapter we assume
this model. In a multiuser DBMS, the stored data items are the primary resources
that may be accessed concurrently by interactive users or application programs,
which are constantly retrieving information from and modifying the database.
1.2 Transactions, Database Items, Read
and Write Operations, and DBMS Buffers
A transaction is an executing program that forms a logical unit of database process-
ing. A transaction includes one or more database access operations—these can
include insertion, deletion, modification, or retrieval operations. The database
operations that form a transaction can either be embedded within an application
program or they can be specified interactively via a high-level query language such
as SQL. One way of specifying the transaction boundaries is by specifying explicit
begin transaction and end transaction statements in an application program; in
this case, all database access operations between the two are considered as forming
one transaction. A single application program may contain more than one transac-
tion if it contains several transaction boundaries. If the database operations in a
transaction do not update the database but only retrieve data, the transaction is
called a read-only transaction; otherwise it is known as a read-write transaction.
749
Introduction to Transaction Processing Concepts and Theory
The database model that is used to present transaction processing concepts is quite
simple when compared to data models, such as the relational model or the object
model. A database is basically represented as a collection of named data items. The
size of a data item is called its granularity. A data item can be a database record, but
it can also be a larger unit such as a whole disk block, or even a smaller unit such as
an individual field (attribute) value of some record in the database. The transaction
processing concepts we discuss are independent of the data item granularity (size)
and apply to data items in general. Each data item has a unique name, but this name
is not typically used by the programmer; rather, it is just a means to uniquely iden-
tify each data item. For example, if the data item granularity is one disk block, then
the disk block address can be used as the data item name. Using this simplified data-
base model, the basic database access operations that a transaction can include are
as follows:
■ read_item(X). Reads a database item named X into a program variable. To
simplify our notation, we assume that the program variable is also named X.
■ write_item(X). Writes the value of program variable X into the database
item named X.
The basic unit of data transfer from disk to main memory is one block. Executing a
read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
3. Copy item X from the buffer to the program variable named X.
Executing a write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not
already in some main memory buffer).
3. Copy item X from the program variable named X into its correct location in
the buffer.
4. Store the updated block from the buffer back to disk (either immediately or
at some later point in time).
It is step 4 that actually updates the database on disk. In some cases the buffer is not
immediately stored to disk, in case additional changes are to be made to the buffer.
Usually, the decision about when to store a modified disk block whose contents are
in a main memory buffer is handled by the recovery manager of the DBMS in coop-
eration with the underlying operating system. The DBMS will maintain in the
database cache a number of data buffers in main memory. Each buffer typically
holds the contents of one database disk block, which contains some of the database
items being processed. When these buffers are all occupied, and additional database
disk blocks must be copied into memory, some buffer replacement policy is used to
750
Introduction to Transaction Processing Concepts and Theory
(a)
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
Y := Y + N;
write_item(Y );
(b)
read_item(X );
X := X + M;
write_item(X );
T1 T2
Figure 2
Two sample transac-
tions. (a) Transaction
T1. (b) Transaction T2.
choose which of the current buffers is to be replaced. If the chosen buffer has been
modified, it must be written back to disk before it is reused.1
A transaction includes read_item and write_item operations to access and update the
database. Figure 2 shows examples of two very simple transactions. The read-set of
a transaction is the set of all items that the transaction reads, and the write-set is the
set of all items that the transaction writes. For example, the read-set of T1 in Figure
2 is {X, Y} and its write-set is also {X, Y}.
Concurrency control and recovery mechanisms are mainly concerned with the
database commands in a transaction. Transactions submitted by the various users
may execute concurrently and may access and update the same database items. If
this concurrent execution is uncontrolled, it may lead to problems, such as an incon-
sistent database. In the next section we informally introduce some of the problems
that may occur.
1.3 Why Concurrency Control Is Needed
Several problems can occur when concurrent transactions execute in an uncon-
trolled manner. We illustrate some of these problems by referring to a much simpli-
fied airline reservations database in which a record is stored for each airline flight.
Each record includes the number of reserved seats on that flight as a named (uniquely
identifiable) data item, among other information. Figure 2(a) shows a transaction
T1 that transfers N reservations from one flight whose number of reserved seats is
stored in the database item named X to another flight whose number of reserved
seats is stored in the database item named Y. Figure 2(b) shows a simpler transac-
tion T2 that just reserves M seats on the first flight (X) referenced in transaction T1.
2
To simplify our example, we do not show additional portions of the transactions,
such as checking whether a flight has enough seats available before reserving addi-
tional seats.
1We will not discuss buffer replacement policies here because they are typically discussed in operating
systems textbooks.
2A similar, more commonly used example assumes a bank database, with one transaction doing a trans-
fer of funds from account X to account Y and the other transaction doing a deposit to account X.
751
Introduction to Transaction Processing Concepts and Theory
When a database access program is written, it has the flight number, flight date, and
the number of seats to be booked as parameters; hence, the same program can be
used to execute many different transactions, each with a different flight number,
date, and number of seats to be booked. For concurrency control purposes, a trans-
action is a particular execution of a program on a specific date, flight, and number of
seats. In Figure 2(a) and (b), the transactions T1 and T2 are specific executions of the
programs that refer to the specific flights whose numbers of seats are stored in data
items X and Y in the database. Next we discuss the types of problems we may
encounter with these two simple transactions if they run concurrently.
The Lost Update Problem. This problem occurs when two transactions that
access the same database items have their operations interleaved in a way that makes
the value of some database items incorrect. Suppose that transactions T1 and T2 are
submitted at approximately the same time, and suppose that their operations are
interleaved as shown in Figure 3(a); then the final value of item X is incorrect
because T2 reads the value of X before T1 changes it in the database, and hence the
updated value resulting from T1 is lost. For example, if X = 80 at the start (originally
there were 80 reservations on the flight), N = 5 (T1 transfers 5 seat reservations from
the flight corresponding to X to the flight corresponding to Y), and M = 4 (T2
reserves 4 seats on X), the final result should be X = 79. However, in the interleaving
of operations shown in Figure 3(a), it is X = 84 because the update in T1 that
removed the five seats from X was lost.
The Temporary Update (or Dirty Read) Problem. This problem occurs when
one transaction updates a database item and then the transaction fails for some rea-
son (see Section 1.4). Meanwhile, the updated item is accessed (read) by another
transaction before it is changed back to its original value. Figure 3(b) shows an
example where T1 updates item X and then fails before completion, so the system
must change X back to its original value. Before it can do so, however, transaction T2
reads the temporary value of X, which will not be recorded permanently in the data-
base because of the failure of T1. The value of item X that is read by T2 is called dirty
data because it has been created by a transaction that has not completed and com-
mitted yet; hence, this problem is also known as the dirty read problem.
The Incorrect Summary Problem. If one transaction is calculating an aggregate
summary function on a number of database items while other transactions are
updating some of these items, the aggregate function may calculate some values
before they are updated and others after they are updated. For example, suppose
that a transaction T3 is calculating the total number of reservations on all the flights;
meanwhile, transaction T1 is executing. If the interleaving of operations shown in
Figure 3(c) occurs, the result of T3 will be off by an amount N because T3 reads the
value of X after N seats have been subtracted from it but reads the value of Y before
those N seats have been added to it.
752
Introduction to Transaction Processing Concepts and Theory
(a)
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
read_item(X );
X := X + M;
write_item(X );
Time
Item X has an incorrect value because
its update by T1 is lost (overwritten).
Y := Y + N;
write_item(Y );
(b)
read_item(X );
X := X – N;
write_item(X );
read_item(X );
X := X + M;
write_item(X );
Time
Transaction T1 fails and must change
the value of X back to its old value;
meanwhile T2 has read the temporary
incorrect value of X.
read_item(Y );
T1
T1
(c)
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
Y := Y + N;
write_item(Y );
read_item(X );
sum := sum + X;
read_item(Y );
sum := sum + Y;
T3 reads X after N is subtracted and reads
Y before N is added; a wrong summary
is the result (off by N ).
T3
T2
sum := 0;
read_item(A);
sum := sum + A;
T1 T2
Figure 3
Some problems that occur when concurrent
execution is uncontrolled. (a) The lost update
problem. (b) The temporary update problem.
(c) The incorrect summary problem.
753
Introduction to Transaction Processing Concepts and Theory
The Unrepeatable Read Problem. Another problem that may occur is called
unrepeatable read, where a transaction T reads the same item twice and the item is
changed by another transaction T� between the two reads. Hence, T receives
different values for its two reads of the same item. This may occur, for example, if
during an airline reservation transaction, a customer inquires about seat availability
on several flights. When the customer decides on a particular flight, the transaction
then reads the number of seats on that flight a second time before completing the
reservation, and it may end up reading a different value for the item.
1.4 Why Recovery Is Needed
Whenever a transaction is submitted to a DBMS for execution, the system is respon-
sible for making sure that either all the operations in the transaction are completed
successfully and their effect is recorded permanently in the database, or that the
transaction does not have any effect on the database or any other transactions. In
the first case, the transaction is said to be committed, whereas in the second case,
the transaction is aborted. The DBMS must not permit some operations of a trans-
action T to be applied to the database while other operations of T are not, because
the whole transaction is a logical unit of database processing. If a transaction fails
after executing some of its operations but before executing all of them, the opera-
tions already executed must be undone and have no lasting effect.
Types of Failures. Failures are generally classified as transaction, system, and
media failures. There are several possible reasons for a transaction to fail in the mid-
dle of execution:
1. A computer failure (system crash). A hardware, software, or network error
occurs in the computer system during transaction execution. Hardware
crashes are usually media failures—for example, main memory failure.
2. A transaction or system error. Some operation in the transaction may cause
it to fail, such as integer overflow or division by zero. Transaction failure may
also occur because of erroneous parameter values or because of a logical
programming error.3 Additionally, the user may interrupt the transaction
during its execution.
3. Local errors or exception conditions detected by the transaction. During
transaction execution, certain conditions may occur that necessitate cancel-
lation of the transaction. For example, data for the transaction may not be
found. An exception condition,4 such as insufficient account balance in a
banking database, may cause a transaction, such as a fund withdrawal, to be
canceled. This exception could be programmed in the transaction itself, and
in such a case would not be considered as a transaction failure.
3In general, a transaction should be thoroughly tested to ensure that it does not have any bugs (logical
programming errors).
4Exception conditions, if programmed correctly, do not constitute transaction failures.
754
Introduction to Transaction Processing Concepts and Theory
4. Concurrency control enforcement. The concurrency control method may
decide to abort a transaction because it violates serializability (see Section 5),
or it may abort one or more transactions to resolve a state of deadlock
among several transactions. Transactions aborted because of serializability
violations or deadlocks are typically restarted automatically at a later time.
5. Disk failure. Some disk blocks may lose their data because of a read or write
malfunction or because of a disk read/write head crash. This may happen
during a read or a write operation of the transaction.
6. Physical problems and catastrophes. This refers to an endless list of prob-
lems that includes power or air-conditioning failure, fire, theft, sabotage,
overwriting disks or tapes by mistake, and mounting of a wrong tape by the
operator.
Failures of types 1, 2, 3, and 4 are more common than those of types 5 or 6.
Whenever a failure of type 1 through 4 occurs, the system must keep sufficient
information to quickly recover from the failure. Disk failure or other catastrophic
failures of type 5 or 6 do not happen frequently; if they do occur, recovery is a major
task.
The concept of transaction is fundamental to many techniques for concurrency
control and recovery from failures.
2 Transaction and System Concepts
In this section we discuss additional concepts relevant to transaction processing.
Section 2.1 describes the various states a transaction can be in, and discusses other
operations needed in transaction processing. Section 2.2 discusses the system log,
which keeps information about transactions and data items that will be needed for
recovery. Section 2.3 describes the concept of commit points of transactions, and
why they are important in transaction processing.
2.1 Transaction States and Additional Operations
A transaction is an atomic unit of work that should either be completed in its
entirety or not done at all. For recovery purposes, the system needs to keep track of
when each transaction starts, terminates, and commits or aborts (see Section 2.3).
Therefore, the recovery manager of the DBMS needs to keep track of the following
operations:
■ BEGIN_TRANSACTION. This marks the beginning of transaction execution.
■ READ or WRITE. These specify read or write operations on the database
items that are executed as part of a transaction.
■ END_TRANSACTION. This specifies that READ and WRITE transaction oper-
ations have ended and marks the end of transaction execution. However, at
this point it may be necessary to check whether the changes introduced by
755
Introduction to Transaction Processing Concepts and Theory
Active
Begin
transaction
End
transaction Commit
AbortAbort
Read, Write
Partially committed
Failed Terminated
Committed
Figure 4
State transition diagram illustrating the states for
transaction execution.
the transaction can be permanently applied to the database (committed) or
whether the transaction has to be aborted because it violates serializability
(see Section 5) or for some other reason.
■ COMMIT_TRANSACTION. This signals a successful end of the transaction so
that any changes (updates) executed by the transaction can be safely
committed to the database and will not be undone.
■ ROLLBACK (or ABORT). This signals that the transaction has ended unsuc-
cessfully, so that any changes or effects that the transaction may have applied
to the database must be undone.
Figure 4 shows a state transition diagram that illustrates how a transaction moves
through its execution states. A transaction goes into an active state immediately after
it starts execution, where it can execute its READ and WRITE operations. When the
transaction ends, it moves to the partially committed state. At this point, some
recovery protocols need to ensure that a system failure will not result in an inability
to record the changes of the transaction permanently (usually by recording changes
in the system log, discussed in the next section).5 Once this check is successful, the
transaction is said to have reached its commit point and enters the committed state.
Commit points are discussed in more detail in Section 2.3. When a transaction is
committed, it has concluded its execution successfully and all its changes must be
recorded permanently in the database, even if a system failure occurs.
However, a transaction can go to the failed state if one of the checks fails or if the
transaction is aborted during its active state. The transaction may then have to be
rolled back to undo the effect of its WRITE operations on the database. The
terminated state corresponds to the transaction leaving the system. The transaction
information that is maintained in system tables while the transaction has been run-
ning is removed when the transaction terminates. Failed or aborted transactions
may be restarted later—either automatically or after being resubmitted by the
user—as brand new transactions.
5Optimistic concurrency control also requires that certain checks are made at this point to ensure that
the transaction did not interfere with other executing transactions.
756
Introduction to Transaction Processing Concepts and Theory
2.2 The System Log
To be able to recover from failures that affect transactions, the system maintains a
log6 to keep track of all transaction operations that affect the values of database
items, as well as other transaction information that may be needed to permit recov-
ery from failures. The log is a sequential, append-only file that is kept on disk, so it
is not affected by any type of failure except for disk or catastrophic failure. Typically,
one (or more) main memory buffers hold the last part of the log file, so that log
entries are first added to the main memory buffer. When the log buffer is filled, or
when certain other conditions occur, the log buffer is appended to the end of the log
file on disk. In addition, the log file from disk is periodically backed up to archival
storage (tape) to guard against catastrophic failures. The following are the types of
entries—called log records—that are written to the log file and the corresponding
action for each log record. In these entries, T refers to a unique transaction-id that
is generated automatically by the system for each transaction and that is used to
identify each transaction:
1. [start_transaction, T]. Indicates that transaction T has started execution.
2. [write_item, T, X, old_value, new_value]. Indicates that transaction T has
changed the value of database item X from old_value to new_value.
3. [read_item, T, X]. Indicates that transaction T has read the value of database
item X.
4. [commit, T]. Indicates that transaction T has completed successfully, and
affirms that its effect can be committed (recorded permanently) to the data-
base.
5. [abort, T]. Indicates that transaction T has been aborted.
Protocols for recovery that avoid cascading rollbacks (see Section 4.2)—which
include nearly all practical protocols—do not require that READ operations are writ-
ten to the system log. However, if the log is also used for other purposes—such as
auditing (keeping track of all database operations)—then such entries can be
included. Additionally, some recovery protocols require simpler WRITE entries only
include one of new_value and old_value instead of including both (see Section 4.2).
Notice that we are assuming that all permanent changes to the database occur
within transactions, so the notion of recovery from a transaction failure amounts to
either undoing or redoing transaction operations individually from the log. If the
system crashes, we can recover to a consistent database state by examining the log
(and using techniques not detailed here). Because the log contains a record of every
WRITE operation that changes the value of some database item, it is possible to
undo the effect of these WRITE operations of a transaction T by tracing backward
through the log and resetting all items changed by a WRITE operation of T to their
old_values. Redo of an operation may also be necessary if a transaction has its
updates recorded in the log but a failure occurs before the system can be sure that all
6The log has sometimes been called the DBMS journal.
757
Introduction to Transaction Processing Concepts and Theory
these new_values have been written to the actual database on disk from the main
memory buffers.
2.3 Commit Point of a Transaction
A transaction T reaches its commit point when all its operations that access the
database have been executed successfully and the effect of all the transaction opera-
tions on the database have been recorded in the log. Beyond the commit point, the
transaction is said to be committed, and its effect must be permanently recorded in
the database. The transaction then writes a commit record [commit, T] into the log.
If a system failure occurs, we can search back in the log for all transactions T that
have written a [start_transaction, T] record into the log but have not written their
[commit, T] record yet; these transactions may have to be rolled back to undo their
effect on the database during the recovery process. Transactions that have written
their commit record in the log must also have recorded all their WRITE operations
in the log, so their effect on the database can be redone from the log records.
Notice that the log file must be kept on disk. Updating a disk file involves copying
the appropriate block of the file from disk to a buffer in main memory, updating the
buffer in main memory, and copying the buffer to disk. It is common to keep one or
more blocks of the log file in main memory buffers, called the log buffer, until they
are filled with log entries and then to write them back to disk only once, rather than
writing to disk every time a log entry is added. This saves the overhead of multiple
disk writes of the same log file buffer. At the time of a system crash, only the log
entries that have been written back to disk are considered in the recovery process
because the contents of main memory may be lost. Hence, before a transaction
reaches its commit point, any portion of the log that has not been written to the disk
yet must now be written to the disk. This process is called force-writing the log
buffer before committing a transaction.
3 Desirable Properties of Transactions
Transactions should possess several properties, often called the ACID properties;
they should be enforced by the concurrency control and recovery methods of the
DBMS. The following are the ACID properties:
■ Atomicity. A transaction is an atomic unit of processing; it should either be
performed in its entirety or not performed at all.
■ Consistency preservation. A transaction should be consistency preserving,
meaning that if it is completely executed from beginning to end without
interference from other transactions, it should take the database from one
consistent state to another.
■ Isolation. A transaction should appear as though it is being executed in iso-
lation from other transactions, even though many transactions are executing
758
Introduction to Transaction Processing Concepts and Theory
concurrently. That is, the execution of a transaction should not be interfered
with by any other transactions executing concurrently.
■ Durability or permanency. The changes applied to the database by a com-
mitted transaction must persist in the database. These changes must not be
lost because of any failure.
The atomicity property requires that we execute a transaction to completion. It is the
responsibility of the transaction recovery subsystem of a DBMS to ensure atomicity.
If a transaction fails to complete for some reason, such as a system crash in the
midst of transaction execution, the recovery technique must undo any effects of the
transaction on the database. On the other hand, write operations of a committed
transaction must be eventually written to disk.
The preservation of consistency is generally considered to be the responsibility of the
programmers who write the database programs or of the DBMS module that
enforces integrity constraints. Recall that a database state is a collection of all the
stored data items (values) in the database at a given point in time. A consistent state
of the database satisfies the constraints specified in the schema as well as any other
constraints on the database that should hold. A database program should be written
in a way that guarantees that, if the database is in a consistent state before executing
the transaction, it will be in a consistent state after the complete execution of the
transaction, assuming that no interference with other transactions occurs.
The isolation property is enforced by the concurrency control subsystem of the DBMS.
If every transaction does not make its updates (write operations) visible to other
transactions until it is committed, one form of isolation is enforced that solves the
temporary update problem and eliminates cascading rollbacks but does not elimi-
nate all other problems. There have been attempts to define the level of isolation of
a transaction. A transaction is said to have level 0 (zero) isolation if it does not over-
write the dirty reads of higher-level transactions. Level 1 (one) isolation has no lost
updates, and level 2 isolation has no lost updates and no dirty reads. Finally, level 3
isolation (also called true isolation) has, in addition to level 2 properties, repeatable
reads.7
And last, the durability property is the responsibility of the recovery subsystem of the
DBMS. We will introduce how recovery protocols enforce durability and atomicity
in the next section.
4 Characterizing Schedules Based
on Recoverability
When transactions are executing concurrently in an interleaved fashion, then the
order of execution of operations from all the various transactions is known as a
schedule (or history). In this section, first we define the concept of schedules, and
7The SQL syntax for isolation level discussed later in Section 6 is closely related to these levels.
759
Introduction to Transaction Processing Concepts and Theory
then we characterize the types of schedules that facilitate recovery when failures
occur. In Section 5, we characterize schedules in terms of the interference of partic-
ipating transactions, leading to the concepts of serializability and serializable sched-
ules.
4.1 Schedules (Histories) of Transactions
A schedule (or history) S of n transactions T1, T2, ..., Tn is an ordering of the oper-
ations of the transactions. Operations from different transactions can be interleaved
in the schedule S. However, for each transaction Ti that participates in the schedule
S, the operations of Ti in S must appear in the same order in which they occur in Ti.
The order of operations in S is considered to be a total ordering, meaning that for
any two operations in the schedule, one must occur before the other. It is possible
theoretically to deal with schedules whose operations form partial orders (as we
discuss later), but we will assume for now total ordering of the operations in a
schedule.
For the purpose of recovery and concurrency control, we are mainly interested in
the read_item and write_item operations of the transactions, as well as the commit and
abort operations. A shorthand notation for describing a schedule uses the symbols b,
r, w, e, c, and a for the operations begin_transaction, read_item, write_item, end_transac-
tion, commit, and abort, respectively, and appends as a subscript the transaction id
(transaction number) to each operation in the schedule. In this notation, the data-
base item X that is read or written follows the r and w operations in parentheses. In
some schedules, we will only show the read and write operations, whereas in other
schedules, we will show all the operations. For example, the schedule in Figure 3(a),
which we shall call Sa, can be written as follows in this notation:
Sa: r1(X); r2(X); w1(X); r1(Y); w2(X); w1(Y);
Similarly, the schedule for Figure 3(b), which we call Sb, can be written as follows, if
we assume that transaction T1 aborted after its read_item(Y) operation:
Sb: r1(X); w1(X); r2(X); w2(X); r1(Y); a1;
Two operations in a schedule are said to conflict if they satisfy all three of the fol-
lowing conditions: (1) they belong to different transactions; (2) they access the same
item X; and (3) at least one of the operations is a write_item(X). For example, in
schedule Sa, the operations r1(X) and w2(X) conflict, as do the operations r2(X) and
w1(X), and the operations w1(X) and w2(X). However, the operations r1(X) and
r2(X) do not conflict, since they are both read operations; the operations w2(X)
and w1(Y) do not conflict because they operate on distinct data items X and Y; and
the operations r1(X) and w1(X) do not conflict because they belong to the same
transaction.
Intuitively, two operations are conflicting if changing their order can result in a dif-
ferent outcome. For example, if we change the order of the two operations r1(X);
w2(X) to w2(X); r1(X), then the value of X that is read by transaction T1 changes,
because in the second order the value of X is changed by w2(X) before it is read by
760
Introduction to Transaction Processing Concepts and Theory
r1(X), whereas in the first order the value is read before it is changed. This is called a
read-write conflict. The other type is called a write-write conflict, and is illustrated
by the case where we change the order of two operations such as w1(X); w2(X) to
w2(X); w1(X). For a write-write conflict, the last value of X will differ because in one
case it is written by T2 and in the other case by T1. Notice that two read operations
are not conflicting because changing their order makes no difference in outcome.
The rest of this section covers some theoretical definitions concerning schedules. A
schedule S of n transactions T1, T2, ..., Tn is said to be a complete schedule if the
following conditions hold:
1. The operations in S are exactly those operations in T1, T2, ..., Tn, including a
commit or abort operation as the last operation for each transaction in the
schedule.
2. For any pair of operations from the same transaction Ti, their relative order
of appearance in S is the same as their order of appearance in Ti.
3. For any two conflicting operations, one of the two must occur before the
other in the schedule.8
The preceding condition (3) allows for two nonconflicting operations to occur in the
schedule without defining which occurs first, thus leading to the definition of a
schedule as a partial order of the operations in the n transactions.9 However, a total
order must be specified in the schedule for any pair of conflicting operations (con-
dition 3) and for any pair of operations from the same transaction (condition 2).
Condition 1 simply states that all operations in the transactions must appear in the
complete schedule. Since every transaction has either committed or aborted, a com-
plete schedule will not contain any active transactions at the end of the schedule.
In general, it is difficult to encounter complete schedules in a transaction processing
system because new transactions are continually being submitted to the system.
Hence, it is useful to define the concept of the committed projection C(S) of a
schedule S, which includes only the operations in S that belong to committed trans-
actions—that is, transactions Ti whose commit operation ci is in S.
4.2 Characterizing Schedules Based on Recoverability
For some schedules it is easy to recover from transaction and system failures,
whereas for other schedules the recovery process can be quite involved. In some
cases, it is even not possible to recover correctly after a failure. Hence, it is important
to characterize the types of schedules for which recovery is possible, as well as those
for which recovery is relatively simple. These characterizations do not actually pro-
vide the recovery algorithm; they only attempt to theoretically characterize the dif-
ferent types of schedules.
8Theoretically, it is not necessary to determine an order between pairs of nonconflicting operations.
9In practice, most schedules have a total order of operations. If parallel processing is employed, it is theo-
retically possible to have schedules with partially ordered nonconflicting operations.
761
Introduction to Transaction Processing Concepts and Theory
First, we would like to ensure that, once a transaction T is committed, it should
never be necessary to roll back T. This ensures that the durability property of trans-
actions is not violated (see Section 3). The schedules that theoretically meet this cri-
terion are called recoverable schedules; those that do not are called nonrecoverable
and hence should not be permitted by the DBMS. The definition of recoverable
schedule is as follows: A schedule S is recoverable if no transaction T in S commits
until all transactions T� that have written some item X that T reads have committed.
A transaction T reads from transaction T� in a schedule S if some item X is first
written by T� and later read by T. In addition, T� should not have been aborted
before T reads item X, and there should be no transactions that write X after T�
writes it and before T reads it (unless those transactions, if any, have aborted before
T reads X).
Some recoverable schedules may require a complex recovery process as we shall see,
but if sufficient information is kept (in the log), a recovery algorithm can be devised
for any recoverable schedule. The (partial) schedules Sa and Sb from the preceding
section are both recoverable, since they satisfy the above definition. Consider the
schedule Sa� given below, which is the same as schedule Sa except that two commit
operations have been added to Sa:
Sa�: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;
Sa� is recoverable, even though it suffers from the lost update problem; this problem
is handled by serializability theory (see Section 5). However, consider the two (par-
tial) schedules Sc and Sd that follow:
Sc: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
Sd: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); c1; c2;
Se: r1(X); w1(X); r2(X); r1(Y); w2(X); w1(Y); a1; a2;
Sc is not recoverable because T2 reads item X from T1, but T2 commits before T1
commits. The problem occurs if T1 aborts after the c2 operation in Sc, then the value
of X that T2 read is no longer valid and T2 must be aborted after it is committed,
leading to a schedule that is not recoverable. For the schedule to be recoverable, the c2
operation in Sc must be postponed until after T1 commits, as shown in Sd. If T1
aborts instead of committing, then T2 should also abort as shown in Se, because the
value of X it read is no longer valid. In Se, aborting T2 is acceptable since it has not
committed yet, which is not the case for the nonrecoverable schedule Sc.
In a recoverable schedule, no committed transaction ever needs to be rolled back,
and so the definition of committed transaction as durable is not violated. However,
it is possible for a phenomenon known as cascading rollback (or cascading abort)
to occur in some recoverable schedules, where an uncommitted transaction has to be
rolled back because it read an item from a transaction that failed. This is illustrated
in schedule Se, where transaction T2 has to be rolled back because it read item X
from T1, and T1 then aborted.
Because cascading rollback can be quite time-consuming—since numerous transac-
tions can be rolled back—it is important to characterize the schedules where this
762
Introduction to Transaction Processing Concepts and Theory
phenomenon is guaranteed not to occur. A schedule is said to be cascadeless, or to
avoid cascading rollback, if every transaction in the schedule reads only items that
were written by committed transactions. In this case, all items read will not be dis-
carded, so no cascading rollback will occur. To satisfy this criterion, the r2(X) com-
mand in schedules Sd and Se must be postponed until after T1 has committed (or
aborted), thus delaying T2 but ensuring no cascading rollback if T1 aborts.
Finally, there is a third, more restrictive type of schedule, called a strict schedule, in
which transactions can neither read nor write an item X until the last transaction that
wrote X has committed (or aborted). Strict schedules simplify the recovery process.
In a strict schedule, the process of undoing a write_item(X) operation of an aborted
transaction is simply to restore the before image (old_value or BFIM) of data item X.
This simple procedure always works correctly for strict schedules, but it may not
work for recoverable or cascadeless schedules. For example, consider schedule Sf :
Sf : w1(X, 5); w2(X, 8); a1;
Suppose that the value of X was originally 9, which is the before image stored in the
system log along with the w1(X, 5) operation. If T1 aborts, as in Sf , the recovery pro-
cedure that restores the before image of an aborted write operation will restore the
value of X to 9, even though it has already been changed to 8 by transaction T2, thus
leading to potentially incorrect results. Although schedule Sf is cascadeless, it is not
a strict schedule, since it permits T2 to write item X even though the transaction T1
that last wrote X had not yet committed (or aborted). A strict schedule does not
have this problem.
It is important to note that any strict schedule is also cascadeless, and any cascade-
less schedule is also recoverable. Suppose we have i transactions T1, T2, ..., Ti, and
their number of operations are n1, n2, ..., ni, respectively. If we make a set of all pos-
sible schedules of these transactions, we can divide the schedules into two disjoint
subsets: recoverable and nonrecoverable. The cascadeless schedules will be a subset
of the recoverable schedules, and the strict schedules will be a subset of the cascade-
less schedules. Thus, all strict schedules are cascadeless, and all cascadeless schedules
are recoverable.
5 Characterizing Schedules Based
on Serializability
In the previous section, we characterized schedules based on their recoverability
properties. Now we characterize the types of schedules that are always considered to
be correct when concurrent transactions are executing. Such schedules are known as
serializable schedules. Suppose that two users—for example, two airline reservations
agents—submit to the DBMS transactions T1 and T2 in Figure 2 at approximately
the same time. If no interleaving of operations is permitted, there are only two pos-
sible outcomes:
1. Execute all the operations of transaction T1 (in sequence) followed by all the
operations of transaction T2 (in sequence).
763
(a)
Schedule A Schedule B
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
read_item(X );
X := X + M;
write_item(X );
Time
Y := Y + N;
write_item(Y );
(b)
read_item(X );
X := X + M;
write_item(X );
Time
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
Y := Y + N;
write_item(Y );
(c) T1 T2
Schedule C Schedule D
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
read_item(X );
X := X + M;
write_item(X );
Time
Y := Y + N;
write_item(Y );
read_item(X );
X := X + M;
write_item(X );
read_item(X );
X := X – N;
write_item(X );
read_item(Y );
Y := Y + N;
write_item(Y );
T1 T2
T1 T2 T1 T2
Time
Introduction to Transaction Processing Concepts and Theory
Figure 5
Examples of serial and nonserial schedules involving transactions T1 and T2. (a)
Serial schedule A: T1 followed by T2. (b) Serial schedule B: T2 followed by T1. (c)
Two nonserial schedules C and D with interleaving of operations.
2. Execute all the operations of transaction T2 (in sequence) followed by all the
operations of transaction T1 (in sequence).
These two schedules—called serial schedules—are shown in Figure 5(a) and (b),
respectively. If interleaving of operations is allowed, there will be many possible
orders in which the system can execute the individual operations of the transac-
tions. Two possible schedules are shown in Figure 5(c). The concept of
serializability of schedules is used to identify which schedules are correct when
transaction executions have interleaving of their operations in the schedules. This
section defines serializability and discusses how it may be used in practice.
764
Introduction to Transaction Processing Concepts and Theory
5.1 Serial, Nonserial, and Conflict-Serializable Schedules
Schedules A and B in Figure 5(a) and (b) are called serial because the operations of
each transaction are executed consecutively, without any interleaved operations
from the other transaction. In a serial schedule, entire transactions are performed in
serial order: T1 and then T2 in Figure 5(a), and T2 and then T1 in Figure 5(b).
Schedules C and D in Figure 5(c) are called nonserial because each sequence inter-
leaves operations from the two transactions.
Formally, a schedule S is serial if, for every transaction T participating in the sched-
ule, all the operations of T are executed consecutively in the schedule; otherwise, the
schedule is called nonserial. Therefore, in a serial schedule, only one transaction at
a time is active—the commit (or abort) of the active transaction initiates execution
of the next transaction. No interleaving occurs in a serial schedule. One reasonable
assumption we can make, if we consider the transactions to be independent, is that
every serial schedule is considered correct. We can assume this because every transac-
tion is assumed to be correct if executed on its own (according to the consistency
preservation property of Section 3). Hence, it does not matter which transaction is
executed first. As long as every transaction is executed from beginning to end in iso-
lation from the operations of other transactions, we get a correct end result on the
database.
The problem with serial schedules is that they limit concurrency by prohibiting
interleaving of operations. In a serial schedule, if a transaction waits for an I/O
operation to complete, we cannot switch the CPU processor to another transaction,
thus wasting valuable CPU processing time. Additionally, if some transaction T is
quite long, the other transactions must wait for T to complete all its operations
before starting. Hence, serial schedules are considered unacceptable in practice.
However, if we can determine which other schedules are equivalent to a serial sched-
ule, we can allow these schedules to occur.
To illustrate our discussion, consider the schedules in Figure 5, and assume that the
initial values of database items are X = 90 and Y = 90 and that N = 3 and M = 2.
After executing transactions T1 and T2, we would expect the database values to be X
= 89 and Y = 93, according to the meaning of the transactions. Sure enough, execut-
ing either of the serial schedules A or B gives the correct results. Now consider the
nonserial schedules C and D. Schedule C (which is the same as Figure 3(a)) gives the
results X = 92 and Y = 93, in which the X value is erroneous, whereas schedule D
gives the correct results.
Schedule C gives an erroneous result because of the lost update problem discussed in
Section 1.3; transaction T2 reads the value of X before it is changed by transaction
T1, so only the effect of T2 on X is reflected in the database. The effect of T1 on X is
lost, overwritten by T2, leading to the incorrect result for item X. However, some
nonserial schedules give the correct expected result, such as schedule D. We would
like to determine which of the nonserial schedules always give a correct result and
which may give erroneous results. The concept used to characterize schedules in this
manner is that of serializability of a schedule.
765
Introduction to Transaction Processing Concepts and Theory
S1
read_item(X );
X := X + 10;
write_item(X );
S2
read_item(X );
X := X * 1.1;
write_item (X );
Figure 6
Two schedules that are result
equivalent for the initial value
of X = 100 but are not result
equivalent in general.
The definition of serializable schedule is as follows: A schedule S of n transactions is
serializable if it is equivalent to some serial schedule of the same n transactions. We
will define the concept of equivalence of schedules shortly. Notice that there are n!
possible serial schedules of n transactions and many more possible nonserial sched-
ules. We can form two disjoint groups of the nonserial schedules—those that are
equivalent to one (or more) of the serial schedules and hence are serializable, and
those that are not equivalent to any serial schedule and hence are not serializable.
Saying that a nonserial schedule S is serializable is equivalent to saying that it is cor-
rect, because it is equivalent to a serial schedule, which is considered correct. The
remaining question is: When are two schedules considered equivalent?
There are several ways to define schedule equivalence. The simplest but least satis-
factory definition involves comparing the effects of the schedules on the database.
Two schedules are called result equivalent if they produce the same final state of the
database. However, two different schedules may accidentally produce the same final
state. For example, in Figure 6, schedules S1 and S2 will produce the same final data-
base state if they execute on a database with an initial value of X = 100; however, for
other initial values of X, the schedules are not result equivalent. Additionally, these
schedules execute different transactions, so they definitely should not be considered
equivalent. Hence, result equivalence alone cannot be used to define equivalence of
schedules. The safest and most general approach to defining schedule equivalence is
not to make any assumptions about the types of operations included in the transac-
tions. For two schedules to be equivalent, the operations applied to each data item
affected by the schedules should be applied to that item in both schedules in the
same order. Two definitions of equivalence of schedules are generally used: conflict
equivalence and view equivalence. We discuss conflict equivalence next, which is the
more commonly used definition.
The definition of conflict equivalence of schedules is as follows: Two schedules are
said to be conflict equivalent if the order of any two conflicting operations is the
same in both schedules. Recall from Section 4.1 that two operations in a schedule
are said to conflict if they belong to different transactions, access the same database
item, and either both are write_item operations or one is a write_item and the other a
read_item. If two conflicting operations are applied in different orders in two sched-
ules, the effect can be different on the database or on the transactions in the sched-
ule, and hence the schedules are not conflict equivalent. For example, as we
discussed in Section 4.1, if a read and write operation occur in the order r1(X),
w2(X) in schedule S1, and in the reverse order w2(X), r1(X) in schedule S2, the value
read by r1(X) can be different in the two schedules. Similarly, if two write operations
766
Introduction to Transaction Processing Concepts and Theory
occur in the order w1(X), w2(X) in S1, and in the reverse order w2(X), w1(X) in S2,
the next r(X) operation in the two schedules will read potentially different values; or
if these are the last operations writing item X in the schedules, the final value of item
X in the database will be different.
Using the notion of conflict equivalence, we define a schedule S to be conflict seri-
alizable10 if it is (conflict) equivalent to some serial schedule S�. In such a case, we
can reorder the nonconflicting operations in S until we form the equivalent serial
schedule S�. According to this definition, schedule D in Figure 5(c) is equivalent to
the serial schedule A in Figure 5(a). In both schedules, the read_item(X) of T2 reads
the value of X written by T1, while the other read_item operations read the database
values from the initial database state. Additionally, T1 is the last transaction to write
Y, and T2 is the last transaction to write X in both schedules. Because A is a serial
schedule and schedule D is equivalent to A, D is a serializable schedule. Notice that
the operations r1(Y) and w1(Y) of schedule D do not conflict with the operations
r2(X) and w2(X), since they access different data items. Therefore, we can move
r1(Y), w1(Y) before r2(X), w2(X), leading to the equivalent serial schedule T1, T2.
Schedule C in Figure 5(c) is not equivalent to either of the two possible serial sched-
ules A and B, and hence is not serializable. Trying to reorder the operations of sched-
ule C to find an equivalent serial schedule fails because r2(X) and w1(X) conflict,
which means that we cannot move r2(X) down to get the equivalent serial schedule
T1, T2. Similarly, because w1(X) and w2(X) conflict, we cannot move w1(X) down to
get the equivalent serial schedule T2, T1.
Another, more complex definition of equivalence—called view equivalence, which
leads to the concept of view serializability—is discussed in Section 5.4.
5.2 Testing for Conflict Serializability of a Schedule
There is a simple algorithm for determining whether a particular schedule is con-
flict serializable or not. Most concurrency control methods do not actually test for
serializability. Rather protocols, or rules, are developed that guarantee that any
schedule that follows these rules will be serializable. We discuss the algorithm for
testing conflict serializability of schedules here to gain a better understanding of
these concurrency control protocols.
Algorithm 1 can be used to test a schedule for conflict serializability. The algorithm
looks at only the read_item and write_item operations in a schedule to construct a
precedence graph (or serialization graph), which is a directed graph G = (N, E)
that consists of a set of nodes N = {T1, T2, ..., Tn } and a set of directed edges E = {e1,
e2, ..., em }. There is one node in the graph for each transaction Ti in the schedule.
Each edge ei in the graph is of the form (Tj → Tk ), 1 ≤ j ≤ n, 1 ≤ k ≤ n, where Tj is the
starting node of ei and Tk is the ending node of ei. Such an edge from node Tj to
10We will use serializable to mean conflict serializable. Another definition of serializable used in practice
(see Section 6) is to have repeatable reads, no dirty reads, and no phantom records.
767
Introduction to Transaction Processing Concepts and Theory
node Tk is created by the algorithm if one of the operations in Tj appears in the
schedule before some conflicting operation in Tk.
Algorithm 1. Testing Conflict Serializability of a Schedule S
1. For each transaction Ti participating in schedule S, create a node labeled Ti
in the precedence graph.
2. For each case in S where Tj executes a read_item(X) after Ti executes a
write_item(X), create an edge (Ti → Tj) in the precedence graph.
3. For each case in S where Tj executes a write_item(X) after Ti executes a
read_item(X), create an edge (Ti → Tj) in the precedence graph.
4. For each case in S where Tj executes a write_item(X) after Ti executes a
write_item(X), create an edge (Ti → Tj) in the precedence graph.
5. The schedule S is serializable if and only if the precedence graph has no
cycles.
The precedence graph is constructed as described in Algorithm 1. If there is a cycle
in the precedence graph, schedule S is not (conflict) serializable; if there is no cycle,
S is serializable. A cycle in a directed graph is a sequence of edges C = ((Tj → Tk),
(Tk → Tp), ..., (Ti → Tj)) with the property that the starting node of each edge—
except the first edge—is the same as the ending node of the previous edge, and the
starting node of the first edge is the same as the ending node of the last edge (the
sequence starts and ends at the same node).
In the precedence graph, an edge from Ti to Tj means that transaction Ti must come
before transaction Tj in any serial schedule that is equivalent to S, because two con-
flicting operations appear in the schedule in that order. If there is no cycle in the
precedence graph, we can create an equivalent serial schedule S� that is equivalent
to S, by ordering the transactions that participate in S as follows: Whenever an edge
exists in the precedence graph from Ti to Tj, Ti must appear before Tj in the equiva-
lent serial schedule S�.11 Notice that the edges (Ti → Tj) in a precedence graph can
optionally be labeled by the name(s) of the data item(s) that led to creating the
edge. Figure 7 shows such labels on the edges.
In general, several serial schedules can be equivalent to S if the precedence graph for
S has no cycle. However, if the precedence graph has a cycle, it is easy to show that
we cannot create any equivalent serial schedule, so S is not serializable. The prece-
dence graphs created for schedules A to D, respectively, in Figure 5 appear in Figure
7(a) to (d). The graph for schedule C has a cycle, so it is not serializable. The graph
for schedule D has no cycle, so it is serializable, and the equivalent serial schedule is
T1 followed by T2. The graphs for schedules A and B have no cycles, as expected,
because the schedules are serial and hence serializable.
Another example, in which three transactions participate, is shown in Figure 8.
Figure 8(a) shows the read_item and write_item operations in each transaction. Two
schedules E and F for these transactions are shown in Figure 8(b) and (c), respec-
11This process of ordering the nodes of an acrylic graph is known as topological sorting.
768
Introduction to Transaction Processing Concepts and Theory
T1(a)
(c)
(b)
(d)
T2
T1
X
X
X
X
T2
T1 T2
T1 T2
X
Figure 7
Constructing the precedence graphs for schedules A to D from Figure 5 to test for
conflict serializability. (a) Precedence graph for serial schedule A. (b) Precedence
graph for serial schedule B. (c) Precedence graph for schedule C (not serializable).
(d) Precedence graph for schedule D (serializable, equivalent to schedule A).
tively, and the precedence graphs for schedules E and F are shown in parts (d) and
(e). Schedule E is not serializable because the corresponding precedence graph has
cycles. Schedule F is serializable, and the serial schedule equivalent to F is shown in
Figure 8(e). Although only one equivalent serial schedule exists for F, in general
there may be more than one equivalent serial schedule for a serializable schedule.
Figure 8(f) shows a precedence graph representing a schedule that has two equiva-
lent serial schedules. To find an equivalent serial schedule, start with a node that
does not have any incoming edges, and then make sure that the node order for every
edge is not violated.
5.3 How Serializability Is Used for Concurrency Control
As we discussed earlier, saying that a schedule S is (conflict) serializable—that is, S is
(conflict) equivalent to a serial schedule—is tantamount to saying that S is correct.
Being serializable is distinct from being serial, however. A serial schedule represents
inefficient processing because no interleaving of operations from different transac-
tions is permitted. This can lead to low CPU utilization while a transaction waits for
disk I/O, or for another transaction to terminate, thus slowing down processing
considerably. A serializable schedule gives the benefits of concurrent execution
without giving up any correctness. In practice, it is quite difficult to test for the seri-
alizability of a schedule. The interleaving of operations from concurrent transac-
tions—which are usually executed as processes by the operating system—is
typically determined by the operating system scheduler, which allocates resources to
769
Introduction to Transaction Processing Concepts and Theory
Transaction T1
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
Transaction T3
read_item(Y );
read_item(Z );
write_item(Y );
write_item(Z );
read_item(Y );
read_item(Z );
write_item(Y);
write_item(Z );
Transaction T2
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
(b)
(a)
Schedule E
Time
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
read_item(Y );
read_item(Z );
write_item(Y );
write_item(Z );
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
(c)
Schedule F
Time
Transaction T1 Transaction T2 Transaction T3
Transaction T1 Transaction T2 Transaction T3
Figure 8
Another example of
serializability testing.
(a) The read and write
operations of three
transactions T1, T2,
and T3. (b) Schedule E.
(c) Schedule F.
all processes. Factors such as system load, time of transaction submission, and pri-
orities of processes contribute to the ordering of operations in a schedule. Hence, it
is difficult to determine how the operations of a schedule will be interleaved before-
hand to ensure serializability.
770
Introduction to Transaction Processing Concepts and Theory
(d)
X
Y
Y Y, Z
T1
Equivalent serial schedules
None
Reason
Cycle X(T1 T2),Y(T2 T1)
Cycle X(T1 T2),YZ (T2 T3),Y(T3 T1)
(e) X,Y
Y Y, Z
Equivalent serial schedules
(f) Equivalent serial schedules
T2
T3
T1 T2
T3
T1 T2
T3
T2T3 T1
T2T3 T1
T1T3 T2
If transactions are executed at will and then the resulting schedule is tested for seri-
alizability, we must cancel the effect of the schedule if it turns out not to be serializ-
able. This is a serious problem that makes this approach impractical. Hence, the
approach taken in most practical systems is to determine methods or protocols that
ensure serializability, without having to test the schedules themselves. The approach
taken in most commercial DBMSs is to design protocols (sets of rules) that—if fol-
lowed by every individual transaction or if enforced by a DBMS concurrency con-
trol subsystem—will ensure serializability of all schedules in which the transactions
participate.
Another problem appears here: When transactions are submitted continuously to
the system, it is difficult to determine when a schedule begins and when it ends.
Serializability theory can be adapted to deal with this problem by considering only
the committed projection of a schedule S. Recall from Section 4.1 that the
committed projection C(S) of a schedule S includes only the operations in S that
belong to committed transactions. We can theoretically define a schedule S to be
serializable if its committed projection C(S) is equivalent to some serial schedule,
since only committed transactions are guaranteed by the DBMS.
Figure 8 (continued)
Another example of serializability testing.
(d) Precedence graph for schedule E.
(e) Precedence graph for schedule F.
(f) Precedence graph with two equivalent
serial schedules.
771
Introduction to Transaction Processing Concepts and Theory
A number of different concurrency control protocols guarantee serializability. The
most common technique, called two-phase locking, is based on locking data items to
prevent concurrent transactions from interfering with one another, and enforcing
an additional condition that guarantees serializability. This is used in the majority
of commercial DBMSs. Other protocols have been proposed;12 these include
timestamp ordering, where each transaction is assigned a unique timestamp and the
protocol ensures that any conflicting operations are executed in the order of the
transaction timestamps; multiversion protocols, which are based on maintaining
multiple versions of data items; and optimistic (also called certification or validation)
protocols, which check for possible serializability violations after the transactions
terminate but before they are permitted to commit.
5.4 View Equivalence and View Serializability
In Section 5.1 we defined the concepts of conflict equivalence of schedules and con-
flict serializability. Another less restrictive definition of equivalence of schedules is
called view equivalence. This leads to another definition of serializability called view
serializability. Two schedules S and S� are said to be view equivalent if the following
three conditions hold:
1. The same set of transactions participates in S and S�, and S and S� include the
same operations of those transactions.
2. For any operation ri(X) of Ti in S, if the value of X read by the operation has
been written by an operation wj(X) of Tj (or if it is the original value of X
before the schedule started), the same condition must hold for the value of X
read by operation ri(X) of Ti in S�.
3. If the operation wk(Y) of Tk is the last operation to write item Y in S, then
wk(Y) of Tk must also be the last operation to write item Y in S�.
The idea behind view equivalence is that, as long as each read operation of a trans-
action reads the result of the same write operation in both schedules, the write
operations of each transaction must produce the same results. The read operations
are hence said to see the same view in both schedules. Condition 3 ensures that the
final write operation on each data item is the same in both schedules, so the data-
base state should be the same at the end of both schedules. A schedule S is said to be
view serializable if it is view equivalent to a serial schedule.
The definitions of conflict serializability and view serializability are similar if a con-
dition known as the constrained write assumption (or no blind writes) holds on
all transactions in the schedule. This condition states that any write operation wi(X)
in Ti is preceded by a ri(X) in Ti and that the value written by wi(X) in Ti depends
only on the value of X read by ri(X). This assumes that computation of the new
value of X is a function f(X) based on the old value of X read from the database. A
blind write is a write operation in a transaction T on an item X that is not depen-
dent on the value of X, so it is not preceded by a read of X in the transaction T.
12These other protocols have not been incorporated much into commercial systems; most relational
DBMSs use some variation of the two-phase locking protocol.
772
Introduction to Transaction Processing Concepts and Theory
The definition of view serializability is less restrictive than that of conflict serializ-
ability under the unconstrained write assumption, where the value written by an
operation wi(X) in Ti can be independent of its old value from the database. This is
possible when blind writes are allowed, and it is illustrated by the following schedule
Sg of three transactions T1: r1(X); w1(X); T2: w2(X); and T3: w3(X):
Sg: r1(X); w2(X); w1(X); w3(X); c1; c2; c3;
In Sg the operations w2(X) and w3(X) are blind writes, since T2 and T3 do not read
the value of X. The schedule Sg is view serializable, since it is view equivalent to the
serial schedule T1, T2, T3. However, Sg is not conflict serializable, since it is not
conflict equivalent to any serial schedule. It has been shown that any conflict-
serializable schedule is also view serializable but not vice versa, as illustrated by the
preceding example. There is an algorithm to test whether a schedule S is view serial-
izable or not. However, the problem of testing for view serializability has been
shown to be NP-hard, meaning that finding an efficient polynomial time algorithm
for this problem is highly unlikely.
5.5 Other Types of Equivalence of Schedules
Serializability of schedules is sometimes considered to be too restrictive as a condi-
tion for ensuring the correctness of concurrent executions. Some applications can
produce schedules that are correct by satisfying conditions less stringent than either
conflict serializability or view serializability. An example is the type of transactions
known as debit-credit transactions—for example, those that apply deposits and
withdrawals to a data item whose value is the current balance of a bank account.
The semantics of debit-credit operations is that they update the value of a data item
X by either subtracting from or adding to the value of the data item. Because addi-
tion and subtraction operations are commutative—that is, they can be applied in
any order—it is possible to produce correct schedules that are not serializable. For
example, consider the following transactions, each of which may be used to transfer
an amount of money between two bank accounts:
T1: r1(X); X := X − 10; w1(X); r1(Y); Y := Y + 10; w1(Y);
T2: r2(Y); Y := Y − 20; w2(Y); r2(X); X := X + 20; w2(X);
Consider the following nonserializable schedule Sh for the two transactions:
Sh: r1(X); w1(X); r2(Y); w2(Y); r1(Y); w1(Y); r2(X); w2(X);
With the additional knowledge, or semantics, that the operations between each ri(I)
and wi(I) are commutative, we know that the order of executing the sequences con-
sisting of (read, update, write) is not important as long as each (read, update, write)
sequence by a particular transaction Ti on a particular item I is not interrupted by
conflicting operations. Hence, the schedule Sh is considered to be correct even
though it is not serializable. Researchers have been working on extending concur-
rency control theory to deal with cases where serializability is considered to be too
restrictive as a condition for correctness of schedules. Also, in certain domains of
applications such as computer aided design (CAD) of complex systems like aircraft,
773
Introduction to Transaction Processing Concepts and Theory
design transactions last over a long time period. In such applications, more relaxed
schemes of concurrency control have been proposed to maintain consistency of the
database.
6 Transaction Support in SQL
In this section, we give a brief introduction to transaction support in SQL. There are
many more details, and the newer standards have more commands for transaction
processing. The basic definition of an SQL transaction is similar to our already
defined concept of a transaction. That is, it is a logical unit of work and is guaran-
teed to be atomic. A single SQL statement is always considered to be atomic—either
it completes execution without an error or it fails and leaves the database
unchanged.
With SQL, there is no explicit Begin_Transaction statement. Transaction initiation is
done implicitly when particular SQL statements are encountered. However, every
transaction must have an explicit end statement, which is either a COMMIT or a
ROLLBACK. Every transaction has certain characteristics attributed to it. These
characteristics are specified by a SET TRANSACTION statement in SQL. The charac-
teristics are the access mode, the diagnostic area size, and the isolation level.
The access mode can be specified as READ ONLY or READ WRITE. The default is
READ WRITE, unless the isolation level of READ UNCOMMITTED is specified (see
below), in which case READ ONLY is assumed. A mode of READ WRITE allows select,
update, insert, delete, and create commands to be executed. A mode of READ ONLY,
as the name implies, is simply for data retrieval.
The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n,
which indicates the number of conditions that can be held simultaneously in the
diagnostic area. These conditions supply feedback information (errors or excep-
tions) to the user or program on the n most recently executed SQL statement.
The isolation level option is specified using the statement ISOLATION LEVEL
, where the value for can be READ UNCOMMITTED, READ
COMMITTED, REPEATABLE READ, or SERIALIZABLE.13 The default isolation level is
SERIALIZABLE, although some systems use READ COMMITTED as their default. The
use of the term SERIALIZABLE here is based on not allowing violations that cause
dirty read, unrepeatable read, and phantoms,14 and it is thus not identical to the way
serializability was defined earlier in Section 5. If a transaction executes at a lower
isolation level than SERIALIZABLE, then one or more of the following three viola-
tions may occur:
1. Dirty read. A transaction T1 may read the update of a transaction T2, which
has not yet committed. If T2 fails and is aborted, then T1 would have read a
value that does not exist and is incorrect.
13These are similar to the isolation levels discussed briefly at the end of Section 3.
14The dirty read and unrepeatable read problems were discussed in Section 1.3.
774
Introduction to Transaction Processing Concepts and Theory
Table 1 Possible Violations Based on Isolation Levels as Defined in SQL
Type of Violation
Isolation Level Dirty Read Nonrepeatable Read Phantom
READ UNCOMMITTED Yes Yes Yes
READ COMMITTED No Yes Yes
REPEATABLE READ No No Yes
SERIALIZABLE No No No
2. Nonrepeatable read. A transaction T1 may read a given value from a table. If
another transaction T2 later updates that value and T1 reads that value again,
T1 will see a different value.
3. Phantoms. A transaction T1 may read a set of rows from a table, perhaps
based on some condition specified in the SQL WHERE-clause. Now suppose
that a transaction T2 inserts a new row that also satisfies the WHERE-clause
condition used in T1, into the table used by T1. If T1 is repeated, then T1 will
see a phantom, a row that previously did not exist.
Table 1 summarizes the possible violations for the different isolation levels. An entry
of Yes indicates that a violation is possible and an entry of No indicates that it is not
possible. READ UNCOMMITTED is the most forgiving, and SERIALIZABLE is the
most restrictive in that it avoids all three of the problems mentioned above.
A sample SQL transaction might look like the following:
EXEC SQL WHENEVER SQLERROR GOTO UNDO;
EXEC SQL SET TRANSACTION
READ WRITE
DIAGNOSTIC SIZE 5
ISOLATION LEVEL SERIALIZABLE;
EXEC SQL INSERT INTO EMPLOYEE (Fname, Lname, Ssn, Dno, Salary)
VALUES (‘Robert’, ‘Smith’, ‘991004321’, 2, 35000);
EXEC SQL UPDATE EMPLOYEE
SET Salary = Salary * 1.1 WHERE Dno = 2;
EXEC SQL COMMIT;
GOTO THE_END;
UNDO: EXEC SQL ROLLBACK;
THE_END: … ;
The above transaction consists of first inserting a new row in the EMPLOYEE table
and then updating the salary of all employees who work in department 2. If an error
occurs on any of the SQL statements, the entire transaction is rolled back. This
implies that any updated salary (by this transaction) would be restored to its previ-
ous value and that the newly inserted row would be removed.
As we have seen, SQL provides a number of transaction-oriented features. The DBA
or database programmers can take advantage of these options to try improving
775
Introduction to Transaction Processing Concepts and Theory
transaction performance by relaxing serializability if that is acceptable for their
applications.
7 Summary
In this chapter we discussed DBMS concepts for transaction processing. We intro-
duced the concept of a database transaction and the operations relevant to transac-
tion processing. We compared single-user systems to multiuser systems and then
presented examples of how uncontrolled execution of concurrent transactions in a
multiuser system can lead to incorrect results and database values. We also discussed
the various types of failures that may occur during transaction execution.
Next we introduced the typical states that a transaction passes through during execu-
tion, and discussed several concepts that are used in recovery and concurrency con-
trol methods. The system log keeps track of database accesses, and the system uses
this information to recover from failures. A transaction either succeeds and reaches
its commit point or it fails and has to be rolled back. A committed transaction has its
changes permanently recorded in the database. We presented an overview of the
desirable properties of transactions—atomicity, consistency preservation, isolation,
and durability—which are often referred to as the ACID properties.
Then we defined a schedule (or history) as an execution sequence of the operations
of several transactions with possible interleaving. We characterized schedules in
terms of their recoverability. Recoverable schedules ensure that, once a transaction
commits, it never needs to be undone. Cascadeless schedules add an additional con-
dition to ensure that no aborted transaction requires the cascading abort of other
transactions. Strict schedules provide an even stronger condition that allows a sim-
ple recovery scheme consisting of restoring the old values of items that have been
changed by an aborted transaction.
We defined equivalence of schedules and saw that a serializable schedule is equiva-
lent to some serial schedule. We defined the concepts of conflict equivalence and
view equivalence, which led to definitions for conflict serializability and view serial-
izability. A serializable schedule is considered correct. We presented an algorithm
for testing the (conflict) serializability of a schedule. We discussed why testing for
serializability is impractical in a real system, although it can be used to define and
verify concurrency control protocols, and we briefly mentioned less restrictive defi-
nitions of schedule equivalence. Finally, we gave a brief overview of how transaction
concepts are used in practice within SQL.
Review Questions
1. What is meant by the concurrent execution of database transactions in a
multiuser system? Discuss why concurrency control is needed, and give
informal examples.
776
Introduction to Transaction Processing Concepts and Theory
2. Discuss the different types of failures. What is meant by catastrophic failure?
3. Discuss the actions taken by the read_item and write_item operations on a
database.
4. Draw a state diagram and discuss the typical states that a transaction goes
through during execution.
5. What is the system log used for? What are the typical kinds of records in a
system log? What are transaction commit points, and why are they impor-
tant?
6. Discuss the atomicity, durability, isolation, and consistency preservation
properties of a database transaction.
7. What is a schedule (history)? Define the concepts of recoverable, cascadeless,
and strict schedules, and compare them in terms of their recoverability.
8. Discuss the different measures of transaction equivalence. What is the differ-
ence between conflict equivalence and view equivalence?
9. What is a serial schedule? What is a serializable schedule? Why is a serial
schedule considered correct? Why is a serializable schedule considered cor-
rect?
10. What is the difference between the constrained write and the unconstrained
write assumptions? Which is more realistic?
11. Discuss how serializability is used to enforce concurrency control in a data-
base system. Why is serializability sometimes considered too restrictive as a
measure of correctness for schedules?
12. Describe the four levels of isolation in SQL.
13. Define the violations caused by each of the following: dirty read, nonrepeat-
able read, and phantoms.
Exercises
14. Change transaction T2 in Figure 2(b) to read
read_item(X);
X := X + M;
if X > 90 then exit
else write_item(X);
Discuss the final result of the different schedules in Figure 3(a) and (b),
where M = 2 and N = 2, with respect to the following questions: Does adding
the above condition change the final outcome? Does the outcome obey the
implied consistency rule (that the capacity of X is 90)?
15. Repeat Exercise 14, adding a check in T1 so that Y does not exceed 90.
777
Introduction to Transaction Processing Concepts and Theory
16. Add the operation commit at the end of each of the transactions T1 and T2 in
Figure 2, and then list all possible schedules for the modified transactions.
Determine which of the schedules are recoverable, which are cascadeless,
and which are strict.
17. List all possible schedules for transactions T1 and T2 in Figure 2, and deter-
mine which are conflict serializable (correct) and which are not.
18. How many serial schedules exist for the three transactions in Figure 8(a)?
What are they? What is the total number of possible schedules?
19. Write a program to create all possible schedules for the three transactions in
Figure 8(a), and to determine which of those schedules are conflict serializ-
able and which are not. For each conflict-serializable schedule, your program
should print the schedule and list all equivalent serial schedules.
20. Why is an explicit transaction end statement needed in SQL but not an
explicit begin statement?
21. Describe situations where each of the different isolation levels would be use-
ful for transaction processing.
22. Which of the following schedules is (conflict) serializable? For each serializ-
able schedule, determine the equivalent serial schedules.
a. r1(X); r3(X); w1(X); r2(X); w3(X);
b. r1(X); r3(X); w3(X); w1(X); r2(X);
c. r3(X); r2(X); w3(X); r1(X); w1(X);
d. r3(X); r2(X); r1(X); w3(X); w1(X);
23. Consider the three transactions T1, T2, and T3, and the schedules S1 and S2
given below. Draw the serializability (precedence) graphs for S1 and S2, and
state whether each schedule is serializable or not. If a schedule is serializable,
write down the equivalent serial schedule(s).
T1: r1 (X); r1 (Z); w1 (X);
T2: r2 (Z); r2 (Y); w2 (Z); w2 (Y);
T3: r3 (X); r3 (Y); w3 (Y);
S1: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y);
S2: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); w2 (Z); w3 (Y); w2 (Y);
24. Consider schedules S3, S4, and S5 below. Determine whether each schedule is
strict, cascadeless, recoverable, or nonrecoverable. (Determine the strictest
recoverability condition that each schedule satisfies.)
S3: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); c1; w3 (Y); c3; r2 (Y); w2 (Z);
w2 (Y); c2;
S4: r1 (X); r2 (Z); r1 (Z); r3 (X); r3 (Y); w1 (X); w3 (Y); r2 (Y); w2 (Z); w2 (Y); c1;
c2; c3;
S5: r1 (X); r2 (Z); r3 (X); r1 (Z); r2 (Y); r3 (Y); w1 (X); c1; w2 (Z); w3 (Y); w2 (Y);
c3; c2;
778
Selected Bibliography
The concept of serializability and related ideas to maintain consistency in a database
were introduced in Gray et al. (1975). The concept of the database transaction was
first discussed in Gray (1981). Gray won the coveted ACM Turing Award in 1998 for
his work on database transactions and implementation of transactions in relational
DBMSs. Bernstein, Hadzilacos, and Goodman (1988) focus on concurrency control
and recovery techniques in both centralized and distributed database systems; it is
an excellent reference. Papadimitriou (1986) offers a more theoretical perspective. A
large reference book of more than a thousand pages by Gray and Reuter (1993)
offers a more practical perspective of transaction processing concepts and tech-
niques. Elmagarmid (1992) offers collections of research papers on transaction pro-
cessing for advanced applications. Transaction support in SQL is described in Date
and Darwen (1997). View serializability is defined in Yannakakis (1984).
Recoverability of schedules and reliability in databases is discussed in Hadzilacos
(1983, 1988).
Introduction to Transaction Processing Concepts and Theory
779
Concurrency Control
Techniques
In this chapter we discuss a number of concurrencycontrol techniques that are used to ensure the nonin-
terference or isolation property of concurrently executing transactions. Most of
these techniques ensure serializability of schedules—using concurrency control
protocols (sets of rules) that guarantee serializability. One important set of proto-
cols—known as two-phase locking protocols—employ the technique of locking data
items to prevent multiple transactions from accessing the items concurrently; a
number of locking protocols are described in Sections 1 and 3.2. Locking protocols
are used in most commercial DBMSs. Another set of concurrency control protocols
use timestamps. A timestamp is a unique identifier for each transaction, generated
by the system. Timestamps values are generated in the same order as the transaction
start times. Concurrency control protocols that use timestamp ordering to ensure
serializability are introduced in Section 2. In Section 3 we discuss multiversion con-
currency control protocols that use multiple versions of a data item. One multiver-
sion protocol extends timestamp order to multiversion timestamp ordering
(Section 3.1), and another extends two-phase locking (Section 3.2). In Section 4 we
present a protocol based on the concept of validation or certification of a transac-
tion after it executes its operations; these are sometimes called optimistic
protocols, and also assume that multiple versions of a data item can exist.
Another factor that affects concurrency control is the granularity of the data
items—that is, what portion of the database a data item represents. An item can be
as small as a single attribute (field) value or as large as a disk block, or even a whole
file or the entire database. We discuss granularity of items and a multiple granular-
ity concurrency control protocol, which is an extension of two-phase locking, in
Section 5. In Section 6 we describe concurrency control issues that arise when
From Chapter 22 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
780
Concurrency Control Techniques
indexes are used to process transactions, and in Section 7 we discuss some addi-
tional concurrency control concepts. Section 8 summarizes the chapter.
It is sufficient to read Sections 1, 5, 6, and 7, and possibly 3.2, if your main interest is
an introduction to the concurrency control techniques that are based on locking,
which are used most often in practice. The other techniques are mainly of theoreti-
cal interest.
1 Two-Phase Locking Techniques
for Concurrency Control
Some of the main techniques used to control concurrent execution of transactions
are based on the concept of locking data items. A lock is a variable associated with a
data item that describes the status of the item with respect to possible operations
that can be applied to it. Generally, there is one lock for each data item in the data-
base. Locks are used as a means of synchronizing the access by concurrent transac-
tions to the database items. In Section 1.1 we discuss the nature and types of locks.
Then, in Section 1.2 we present protocols that use locking to guarantee serializabil-
ity of transaction schedules. Finally, in Section 1.3 we describe two problems associ-
ated with the use of locks—deadlock and starvation—and show how these
problems are handled in concurrency control protocols.
1.1 Types of Locks and System Lock Tables
Several types of locks are used in concurrency control. To introduce locking con-
cepts gradually, first we discuss binary locks, which are simple, but are also too
restrictive for database concurrency control purposes, and so are not used in practice.
Then we discuss shared/exclusive locks—also known as read/write locks—which
provide more general locking capabilities and are used in practical database locking
schemes. In Section 3.2 we describe an additional type of lock called a certify lock,
and show how it can be used to improve performance of locking protocols.
Binary Locks. A binary lock can have two states or values: locked and unlocked (or
1 and 0, for simplicity). A distinct lock is associated with each database item X. If the
value of the lock on X is 1, item X cannot be accessed by a database operation that
requests the item. If the value of the lock on X is 0, the item can be accessed when
requested, and the lock value is changed to 1. We refer to the current value (or state)
of the lock associated with item X as lock(X).
Two operations, lock_item and unlock_item, are used with binary locking. A transaction
requests access to an item X by first issuing a lock_item(X) operation. If LOCK(X) =
1, the transaction is forced to wait. If LOCK(X) = 0, it is set to 1 (the transaction locks
the item) and the transaction is allowed to access item X. When the transaction is
through using the item, it issues an unlock_item(X) operation, which sets LOCK(X)
back to 0 (unlocks the item) so that X may be accessed by other transactions. Hence,
a binary lock enforces mutual exclusion on the data item. A description of the
lock_item(X) and unlock_item(X) operations is shown in Figure 1.
781
Concurrency Control Techniques
lock_item(X):
B: if LOCK(X) = 0 (* item is unlocked *)
then LOCK(X) ←1 (* lock the item *)
else
begin
wait (until LOCK(X) = 0
and the lock manager wakes up the transaction);
go to B
end;
unlock_item(X):
LOCK(X) ← 0; (* unlock the item *)
if any transactions are waiting
then wakeup one of the waiting transactions;
Figure 1
Lock and unlock oper-
ations for binary locks.
Notice that the lock_item and unlock_item operations must be implemented as indi-
visible units (known as critical sections in operating systems); that is, no interleav-
ing should be allowed once a lock or unlock operation is started until the operation
terminates or the transaction waits. In Figure 1, the wait command within the
lock_item(X) operation is usually implemented by putting the transaction in a wait-
ing queue for item X until X is unlocked and the transaction can be granted access
to it. Other transactions that also want to access X are placed in the same queue.
Hence, the wait command is considered to be outside the lock_item operation.
It is quite simple to implement a binary lock; all that is needed is a binary-valued
variable, LOCK, associated with each data item X in the database. In its simplest
form, each lock can be a record with three fields: plus a queue for transactions that are waiting to access the item.
The system needs to maintain only these records for the items that are currently locked
in a lock table, which could be organized as a hash file on the item name. Items not
in the lock table are considered to be unlocked. The DBMS has a lock manager sub-
system to keep track of and control access to locks.
If the simple binary locking scheme described here is used, every transaction must
obey the following rules:
1. A transaction T must issue the operation lock_item(X) before any
read_item(X) or write_item(X) operations are performed in T.
2. A transaction T must issue the operation unlock_item(X) after all read_item(X)
and write_item(X) operations are completed in T.
3. A transaction T will not issue a lock_item(X) operation if it already holds the
lock on item X.1
4. A transaction T will not issue an unlock_item(X) operation unless it already
holds the lock on item X.
1This rule may be removed if we modify the lock_item (X) operation in Figure 1 so that if the item is cur-
rently locked by the requesting transaction, the lock is granted.
782
Concurrency Control Techniques
These rules can be enforced by the lock manager module of the DBMS. Between the
lock_item(X) and unlock_item(X) operations in transaction T, T is said to hold the
lock on item X. At most one transaction can hold the lock on a particular item.
Thus no two transactions can access the same item concurrently.
Shared/Exclusive (or Read/Write) Locks. The preceding binary locking
scheme is too restrictive for database items because at most, one transaction can
hold a lock on a given item. We should allow several transactions to access the same
item X if they all access X for reading purposes only. This is because read operations
on the same item by different transactions are not conflicting. However, if a transac-
tion is to write an item X, it must have exclusive access to X. For this purpose, a dif-
ferent type of lock called a multiple-mode lock is used. In this scheme—called
shared/exclusive or read/write locks—there are three locking operations:
read_lock(X), write_lock(X), and unlock(X). A lock associated with an item X,
LOCK(X), now has three possible states: read-locked, write-locked, or unlocked. A
read-locked item is also called share-locked because other transactions are allowed
to read the item, whereas a write-locked item is called exclusive-locked because a
single transaction exclusively holds the lock on the item.
One method for implementing the preceding operations on a read/write lock is to
keep track of the number of transactions that hold a shared (read) lock on an item
in the lock table. Each record in the lock table will have four fields: . Again, to save space, the system needs to
maintain lock records only for locked items in the lock table. The value (state) of
LOCK is either read-locked or write-locked, suitably coded (if we assume no records
are kept in the lock table for unlocked items). If LOCK(X)=write-locked, the value of
locking_transaction(s) is a single transaction that holds the exclusive (write) lock
on X. If LOCK(X)=read-locked, the value of locking transaction(s) is a list of one or
more transactions that hold the shared (read) lock on X. The three operations
read_lock(X), write_lock(X), and unlock(X) are described in Figure 2.2 As before, each
of the three locking operations should be considered indivisible; no interleaving
should be allowed once one of the operations is started until either the operation
terminates by granting the lock or the transaction is placed in a waiting queue for
the item.
When we use the shared/exclusive locking scheme, the system must enforce the fol-
lowing rules:
1. A transaction T must issue the operation read_lock(X) or write_lock(X) before
any read_item(X) operation is performed in T.
2. A transaction T must issue the operation write_lock(X) before any
write_item(X) operation is performed in T.
2These algorithms do not allow upgrading or downgrading of locks, as described later in this section. The
reader can extend the algorithms to allow these additional operations.
783
read_lock(X):
B: if LOCK(X) = “unlocked”
then begin LOCK(X) ← “read-locked”;
no_of_reads(X) ← 1
end
else if LOCK(X) = “read-locked”
then no_of_reads(X) ← no_of_reads(X) + 1
else begin
wait (until LOCK(X) = “unlocked”
and the lock manager wakes up the transaction);
go to B
end;
write_lock(X):
B: if LOCK(X) = “unlocked”
then LOCK(X) ← “write-locked”
else begin
wait (until LOCK(X) = “unlocked”
and the lock manager wakes up the transaction);
go to B
end;
unlock (X):
if LOCK(X) = “write-locked”
then begin LOCK(X) ← “unlocked”;
wakeup one of the waiting transactions, if any
end
else it LOCK(X) = “read-locked”
then begin
no_of_reads(X) ← no_of_reads(X) −1;
if no_of_reads(X) = 0
then begin LOCK(X) = “unlocked”;
wakeup one of the waiting transactions, if any
end
end;
Concurrency Control Techniques
Figure 2
Locking and unlocking
operations for two-
mode (read-write or
shared-exclusive)
locks.
3. A transaction T must issue the operation unlock(X) after all read_item(X) and
write_item(X) operations are completed in T.3
4. A transaction T will not issue a read_lock(X) operation if it already holds a
read (shared) lock or a write (exclusive) lock on item X. This rule may be
relaxed, as we discuss shortly.
3This rule may be relaxed to allow a transaction to unlock an item, then lock it again later.
784
Concurrency Control Techniques
5. A transaction T will not issue a write_lock(X) operation if it already holds a
read (shared) lock or write (exclusive) lock on item X. This rule may also be
relaxed, as we discuss shortly.
6. A transaction T will not issue an unlock(X) operation unless it already holds
a read (shared) lock or a write (exclusive) lock on item X.
Conversion of Locks. Sometimes it is desirable to relax conditions 4 and 5 in the
preceding list in order to allow lock conversion; that is, a transaction that already
holds a lock on item X is allowed under certain conditions to convert the lock from
one locked state to another. For example, it is possible for a transaction T to issue a
read_lock(X) and then later to upgrade the lock by issuing a write_lock(X) operation.
If T is the only transaction holding a read lock on X at the time it issues the
write_lock(X) operation, the lock can be upgraded; otherwise, the transaction must
wait. It is also possible for a transaction T to issue a write_lock(X) and then later to
downgrade the lock by issuing a read_lock(X) operation. When upgrading and
downgrading of locks is used, the lock table must include transaction identifiers in
the record structure for each lock (in the locking_transaction(s) field) to store the
information on which transactions hold locks on the item. The descriptions of the
read_lock(X) and write_lock(X) operations in Figure 2 must be changed appropri-
ately to allow for lock upgrading and downgrading. We leave this as an exercise for
the reader.
Using binary locks or read/write locks in transactions, as described earlier, does not
guarantee serializability of schedules on its own. Figure 3 shows an example where
the preceding locking rules are followed but a nonserializable schedule may result.
This is because in Figure 3(a) the items Y in T1 and X in T2 were unlocked too early.
This allows a schedule such as the one shown in Figure 3(c) to occur, which is not a
serializable schedule and hence gives incorrect results. To guarantee serializability,
we must follow an additional protocol concerning the positioning of locking and
unlocking operations in every transaction. The best-known protocol, two-phase
locking, is described in the next section.
1.2 Guaranteeing Serializability by Two-Phase Locking
A transaction is said to follow the two-phase locking protocol if all locking opera-
tions (read_lock, write_lock) precede the first unlock operation in the transaction.4
Such a transaction can be divided into two phases: an expanding or growing (first)
phase, during which new locks on items can be acquired but none can be released;
and a shrinking (second) phase, during which existing locks can be released but no
new locks can be acquired. If lock conversion is allowed, then upgrading of locks
(from read-locked to write-locked) must be done during the expanding phase, and
downgrading of locks (from write-locked to read-locked) must be done in the
4This is unrelated to the two-phase commit protocol for recovery in distributed databases.
785
Concurrency Control Techniques
(a) T1 Initial values: X=20, Y=30
Result serial schedule T1
followed by T2: X=50, Y=80
Result of serial schedule T2
followed by T1: X=70, Y=50
read_lock(Y );
read_item(Y );
unlock(Y );
write_lock(X );
read_item(X );
X := X + Y;
write_item(X );
unlock(X );
write_lock(X );
read_item(X );
X := X + Y;
write_item(X );
unlock(X );
read_lock(X );
read_item(X );
unlock(X );
write_lock(Y );
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );
read_lock(X );
read_item(X );
unlock(X );
write_lock(Y );
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );
(b)
(c)
Time
read_lock(Y );
read_item(Y );
unlock(Y );
Result of schedule S:
X=50, Y=50
(nonserializable)
T2
T1 T2
Figure 3
Transactions that do not obey two-phase lock-
ing. (a) Two transactions T1 and T2. (b) Results
of possible serial schedules of T1 and T2. (c) A
nonserializable schedule S that uses locks.
shrinking phase. Hence, a read_lock(X) operation that downgrades an already held
write lock on X can appear only in the shrinking phase.
Transactions T1 and T2 in Figure 3(a) do not follow the two-phase locking protocol
because the write_lock(X) operation follows the unlock(Y) operation in T1, and simi-
larly the write_lock(Y) operation follows the unlock(X) operation in T2. If we enforce
two-phase locking, the transactions can be rewritten as T1� and T2�, as shown in
Figure 4. Now, the schedule shown in Figure 3(c) is not permitted for T1� and T2�
(with their modified order of locking and unlocking operations) under the rules of
locking described in Section 1.1 because T1� will issue its write_lock(X) before it
unlocks item Y; consequently, when T2� issues its read_lock(X), it is forced to wait
until T1� releases the lock by issuing an unlock (X) in the schedule.
786
Concurrency Control Techniques
read_lock(Y );
read_item(Y );
write_lock(X );
unlock(Y )
read_item(X );
X := X + Y;
write_item(X );
unlock(X );
read_lock(X );
read_item(X );
write_lock(Y );
unlock(X )
read_item(Y );
Y := X + Y;
write_item(Y );
unlock(Y );
T1� T2�
Figure 4
Transactions T1� and T2�, which are the
same as T1 and T2 in Figure 3, but fol-
low the two-phase locking protocol.
Note that they can produce a deadlock.
It can be proved that, if every transaction in a schedule follows the two-phase lock-
ing protocol, the schedule is guaranteed to be serializable, obviating the need to test
for serializability of schedules. The locking protocol, by enforcing two-phase lock-
ing rules, also enforces serializability.
Two-phase locking may limit the amount of concurrency that can occur in a sched-
ule because a transaction T may not be able to release an item X after it is through
using it if T must lock an additional item Y later; or conversely, T must lock the
additional item Y before it needs it so that it can release X. Hence, X must remain
locked by T until all items that the transaction needs to read or write have been
locked; only then can X be released by T. Meanwhile, another transaction seeking to
access X may be forced to wait, even though T is done with X; conversely, if Y is
locked earlier than it is needed, another transaction seeking to access Y is forced to
wait even though T is not using Y yet. This is the price for guaranteeing serializabil-
ity of all schedules without having to check the schedules themselves.
Although the two-phase locking protocol guarantees serializability (that is, every
schedule that is permitted is serializable), it does not permit all possible serializable
schedules (that is, some serializable schedules will be prohibited by the protocol).
Basic, Conservative, Strict, and Rigorous Two-Phase Locking. There are a
number of variations of two-phase locking (2PL). The technique just described is
known as basic 2PL. A variation known as conservative 2PL (or static 2PL)
requires a transaction to lock all the items it accesses before the transaction begins
execution, by predeclaring its read-set and write-set. Recall that the read-set of a
transaction is the set of all items that the transaction reads, and the write-set is the
set of all items that it writes. If any of the predeclared items needed cannot be
locked, the transaction does not lock any item; instead, it waits until all the items are
available for locking. Conservative 2PL is a deadlock-free protocol, as we will see in
Section 1.3 when we discuss the deadlock problem. However, it is difficult to use in
practice because of the need to predeclare the read-set and write-set, which is not
possible in many situations.
In practice, the most popular variation of 2PL is strict 2PL, which guarantees strict
schedules. In this variation, a transaction T does not release any of its exclusive
787
Concurrency Control Techniques
(write) locks until after it commits or aborts. Hence, no other transaction can read
or write an item that is written by T unless T has committed, leading to a strict
schedule for recoverability. Strict 2PL is not deadlock-free. A more restrictive varia-
tion of strict 2PL is rigorous 2PL, which also guarantees strict schedules. In this
variation, a transaction T does not release any of its locks (exclusive or shared) until
after it commits or aborts, and so it is easier to implement than strict 2PL. Notice
the difference between conservative and rigorous 2PL: the former must lock all its
items before it starts, so once the transaction starts it is in its shrinking phase; the lat-
ter does not unlock any of its items until after it terminates (by committing or abort-
ing), so the transaction is in its expanding phase until it ends.
In many cases, the concurrency control subsystem itself is responsible for generat-
ing the read_lock and write_lock requests. For example, suppose the system is to
enforce the strict 2PL protocol. Then, whenever transaction T issues a read_item(X),
the system calls the read_lock(X) operation on behalf of T. If the state of LOCK(X) is
write_locked by some other transaction T�, the system places T in the waiting queue
for item X; otherwise, it grants the read_lock(X) request and permits the
read_item(X) operation of T to execute. On the other hand, if transaction T issues a
write_item(X), the system calls the write_lock(X) operation on behalf of T. If the state
of LOCK(X) is write_locked or read_locked by some other transaction T�, the system
places T in the waiting queue for item X; if the state of LOCK(X) is read_locked and
T itself is the only transaction holding the read lock on X, the system upgrades the
lock to write_locked and permits the write_item(X) operation by T. Finally, if the
state of LOCK(X) is unlocked, the system grants the write_lock(X) request and per-
mits the write_item(X) operation to execute. After each action, the system must
update its lock table appropriately.
The use of locks can cause two additional problems: deadlock and starvation. We
discuss these problems and their solutions in the next section.
1.3 Dealing with Deadlock and Starvation
Deadlock occurs when each transaction T in a set of two or more transactions is
waiting for some item that is locked by some other transaction T� in the set. Hence,
each transaction in the set is in a waiting queue, waiting for one of the other trans-
actions in the set to release the lock on an item. But because the other transaction is
also waiting, it will never release the lock. A simple example is shown in Figure 5(a),
where the two transactions T1�and T2�are deadlocked in a partial schedule; T1� is in
the waiting queue for X, which is locked by T2�, while T2� is in the waiting queue for
Y, which is locked by T1�. Meanwhile, neither T1� nor T2� nor any other transaction
can access items X and Y.
Deadlock Prevention Protocols. One way to prevent deadlock is to use a
deadlock prevention protocol.5 One deadlock prevention protocol, which is used
5These protocols are not generally used in practice, either because of unrealistic assumptions or
because of their possible overhead. Deadlock detection and timeouts (covered in the following sections)
are more practical.
788
Concurrency Control Techniques
(a) T1� (b)
read_lock(Y );
read_item(Y );
Time
write_lock(X );
read_lock(X );
read_item(X );
write_lock(Y );
T2�
T2�T1�
X
Y
Figure 5
Illustrating the deadlock problem. (a) A partial schedule of T1� and T2� that is
in a state of deadlock. (b) A wait-for graph for the partial schedule in (a).
in conservative two-phase locking, requires that every transaction lock all the items
it needs in advance (which is generally not a practical assumption)—if any of the
items cannot be obtained, none of the items are locked. Rather, the transaction waits
and then tries again to lock all the items it needs. Obviously this solution further
limits concurrency. A second protocol, which also limits concurrency, involves
ordering all the items in the database and making sure that a transaction that needs
several items will lock them according to that order. This requires that the program-
mer (or the system) is aware of the chosen order of the items, which is also not prac-
tical in the database context.
A number of other deadlock prevention schemes have been proposed that make a
decision about what to do with a transaction involved in a possible deadlock situa-
tion: Should it be blocked and made to wait or should it be aborted, or should the
transaction preempt and abort another transaction? Some of these techniques use
the concept of transaction timestamp TS(T), which is a unique identifier assigned
to each transaction. The timestamps are typically based on the order in which trans-
actions are started; hence, if transaction T1 starts before transaction T2, then TS(T1)
< TS(T2). Notice that the older transaction (which starts first) has the smaller time-
stamp value. Two schemes that prevent deadlock are called wait-die and wound-
wait. Suppose that transaction Ti tries to lock an item X but is not able to because X
is locked by some other transaction Tj with a conflicting lock. The rules followed by
these schemes are:
■ Wait-die. If TS(Ti) < TS(Tj), then (Ti older than Tj) Ti is allowed to wait;
otherwise (Ti younger than Tj) abort Ti (Ti dies) and restart it later with the
same timestamp.
■ Wound-wait. If TS(Ti) < TS(Tj), then (Ti older than Tj) abort Tj (Ti wounds
Tj) and restart it later with the same timestamp; otherwise (Ti younger than
Tj) Ti is allowed to wait.
In wait-die, an older transaction is allowed to wait for a younger transaction, whereas
a younger transaction requesting an item held by an older transaction is aborted
and restarted. The wound-wait approach does the opposite: A younger transaction
is allowed to wait for an older one, whereas an older transaction requesting an item
789
Concurrency Control Techniques
held by a younger transaction preempts the younger transaction by aborting it. Both
schemes end up aborting the younger of the two transactions (the transaction that
started later) that may be involved in a deadlock, assuming that this will waste less
processing. It can be shown that these two techniques are deadlock-free, since in
wait-die, transactions only wait for younger transactions so no cycle is created.
Similarly, in wound-wait, transactions only wait for older transactions so no cycle is
created. However, both techniques may cause some transactions to be aborted and
restarted needlessly, even though those transactions may never actually cause a
deadlock.
Another group of protocols that prevent deadlock do not require timestamps. These
include the no waiting (NW) and cautious waiting (CW) algorithms. In the no
waiting algorithm, if a transaction is unable to obtain a lock, it is immediately
aborted and then restarted after a certain time delay without checking whether a
deadlock will actually occur or not. In this case, no transaction ever waits, so no
deadlock will occur. However, this scheme can cause transactions to abort and
restart needlessly. The cautious waiting algorithm was proposed to try to reduce the
number of needless aborts/restarts. Suppose that transaction Ti tries to lock an item
X but is not able to do so because X is locked by some other transaction Tj with a
conflicting lock. The cautious waiting rules are as follows:
■ Cautious waiting. If Tj is not blocked (not waiting for some other locked
item), then Ti is blocked and allowed to wait; otherwise abort Ti.
It can be shown that cautious waiting is deadlock-free, because no transaction will
ever wait for another blocked transaction. By considering the time b(T) at which
each blocked transaction T was blocked, if the two transactions Ti and Tj above both
become blocked, and Ti is waiting for Tj, then b(Ti) < b(Tj), since Ti can only wait
for Tj at a time when Tj is not blocked itself. Hence, the blocking times form a total
ordering on all blocked transactions, so no cycle that causes deadlock can occur.
Deadlock Detection. A second, more practical approach to dealing with dead-
lock is deadlock detection, where the system checks if a state of deadlock actually
exists. This solution is attractive if we know there will be little interference among
the transactions—that is, if different transactions will rarely access the same items at
the same time. This can happen if the transactions are short and each transaction
locks only a few items, or if the transaction load is light. On the other hand, if trans-
actions are long and each transaction uses many items, or if the transaction load is
quite heavy, it may be advantageous to use a deadlock prevention scheme.
A simple way to detect a state of deadlock is for the system to construct and main-
tain a wait-for graph. One node is created in the wait-for graph for each transaction
that is currently executing. Whenever a transaction Ti is waiting to lock an item X
that is currently locked by a transaction Tj, a directed edge (Ti → Tj) is created in
the wait-for graph. When Tj releases the lock(s) on the items that Ti was waiting
for, the directed edge is dropped from the wait-for graph. We have a state of dead-
lock if and only if the wait-for graph has a cycle. One problem with this approach is
the matter of determining when the system should check for a deadlock. One possi-
790
Concurrency Control Techniques
bility is to check for a cycle every time an edge is added to the wait-for graph, but
this may cause excessive overhead. Criteria such as the number of currently execut-
ing transactions or the period of time several transactions have been waiting to lock
items may be used instead to check for a cycle. Figure 5(b) shows the wait-for graph
for the (partial) schedule shown in Figure 5(a).
If the system is in a state of deadlock, some of the transactions causing the deadlock
must be aborted. Choosing which transactions to abort is known as victim selec-
tion. The algorithm for victim selection should generally avoid selecting transac-
tions that have been running for a long time and that have performed many
updates, and it should try instead to select transactions that have not made many
changes (younger transactions).
Timeouts. Another simple scheme to deal with deadlock is the use of timeouts.
This method is practical because of its low overhead and simplicity. In this method,
if a transaction waits for a period longer than a system-defined timeout period, the
system assumes that the transaction may be deadlocked and aborts it—regardless of
whether a deadlock actually exists or not.
Starvation. Another problem that may occur when we use locking is starvation,
which occurs when a transaction cannot proceed for an indefinite period of time
while other transactions in the system continue normally. This may occur if the
waiting scheme for locked items is unfair, giving priority to some transactions over
others. One solution for starvation is to have a fair waiting scheme, such as using a
first-come-first-served queue; transactions are enabled to lock an item in the order
in which they originally requested the lock. Another scheme allows some transac-
tions to have priority over others but increases the priority of a transaction the
longer it waits, until it eventually gets the highest priority and proceeds. Starvation
can also occur because of victim selection if the algorithm selects the same transac-
tion as victim repeatedly, thus causing it to abort and never finish execution. The
algorithm can use higher priorities for transactions that have been aborted multiple
times to avoid this problem. The wait-die and wound-wait schemes discussed previ-
ously avoid starvation, because they restart a transaction that has been aborted with
its same original timestamp, so the possibility that the same transaction is aborted
repeatedly is slim.
2 Concurrency Control Based
on Timestamp Ordering
The use of locks, combined with the 2PL protocol, guarantees serializability of
schedules. The serializable schedules produced by 2PL have their equivalent serial
schedules based on the order in which executing transactions lock the items they
acquire. If a transaction needs an item that is already locked, it may be forced to wait
until the item is released. Some transactions may be aborted and restarted because
of the deadlock problem. A different approach that guarantees serializability
involves using transaction timestamps to order transaction execution for an equiva-
791
Concurrency Control Techniques
lent serial schedule. In Section 2.1 we discuss timestamps, and in Section 2.2 we dis-
cuss how serializability is enforced by ordering transactions based on their time-
stamps.
2.1 Timestamps
Recall that a timestamp is a unique identifier created by the DBMS to identify a
transaction. Typically, timestamp values are assigned in the order in which the
transactions are submitted to the system, so a timestamp can be thought of as the
transaction start time. We will refer to the timestamp of transaction T as TS(T).
Concurrency control techniques based on timestamp ordering do not use locks;
hence, deadlocks cannot occur.
Timestamps can be generated in several ways. One possibility is to use a counter that
is incremented each time its value is assigned to a transaction. The transaction time-
stamps are numbered 1, 2, 3, ... in this scheme. A computer counter has a finite max-
imum value, so the system must periodically reset the counter to zero when no
transactions are executing for some short period of time. Another way to implement
timestamps is to use the current date/time value of the system clock and ensure that
no two timestamp values are generated during the same tick of the clock.
2.2 The Timestamp Ordering Algorithm
The idea for this scheme is to order the transactions based on their timestamps. A
schedule in which the transactions participate is then serializable, and the only
equivalent serial schedule permitted has the transactions in order of their timestamp
values. This is called timestamp ordering (TO). Notice how this differs from 2PL,
where a schedule is serializable by being equivalent to some serial schedule allowed
by the locking protocols. In timestamp ordering, however, the schedule is equivalent
to the particular serial order corresponding to the order of the transaction time-
stamps. The algorithm must ensure that, for each item accessed by conflicting opera-
tions in the schedule, the order in which the item is accessed does not violate the
timestamp order. To do this, the algorithm associates with each database item X two
timestamp (TS) values:
1. read_TS(X). The read timestamp of item X is the largest timestamp among
all the timestamps of transactions that have successfully read item X—that
is, read_TS(X) = TS(T), where T is the youngest transaction that has read X
successfully.
2. write_TS(X). The write timestamp of item X is the largest of all the time-
stamps of transactions that have successfully written item X—that is,
write_TS(X) = TS(T), where T is the youngest transaction that has written X
successfully.
Basic Timestamp Ordering (TO). Whenever some transaction T tries to issue a
read_item(X) or a write_item(X) operation, the basic TO algorithm compares the
timestamp of T with read_TS(X) and write_TS(X) to ensure that the timestamp
792
Concurrency Control Techniques
order of transaction execution is not violated. If this order is violated, then transac-
tion T is aborted and resubmitted to the system as a new transaction with a new
timestamp. If T is aborted and rolled back, any transaction T1 that may have used a
value written by T must also be rolled back. Similarly, any transaction T2 that may
have used a value written by T1 must also be rolled back, and so on. This effect is
known as cascading rollback and is one of the problems associated with basic TO,
since the schedules produced are not guaranteed to be recoverable. An additional
protocol must be enforced to ensure that the schedules are recoverable, cascadeless,
or strict. We first describe the basic TO algorithm here. The concurrency control
algorithm must check whether conflicting operations violate the timestamp order-
ing in the following two cases:
1. Whenever a transaction T issues a write_item(X) operation, the following is
checked:
a. If read_TS(X) > TS(T) or if write_TS(X) > TS(T), then abort and roll back
T and reject the operation. This should be done because some younger
transaction with a timestamp greater than TS(T)—and hence after T in
the timestamp ordering—has already read or written the value of item X
before T had a chance to write X, thus violating the timestamp ordering.
b. If the condition in part (a) does not occur, then execute the write_item(X)
operation of T and set write_TS(X) to TS(T).
2. Whenever a transaction T issues a read_item(X) operation, the following is
checked:
a. If write_TS(X) > TS(T), then abort and roll back T and reject the opera-
tion. This should be done because some younger transaction with time-
stamp greater than TS(T)—and hence after T in the timestamp
ordering—has already written the value of item X before T had a chance
to read X.
b. If write_TS(X) ≤ TS(T), then execute the read_item(X) operation of T and
set read_TS(X) to the larger of TS(T) and the current read_TS(X).
Whenever the basic TO algorithm detects two conflicting operations that occur in the
incorrect order, it rejects the later of the two operations by aborting the transaction
that issued it. The schedules produced by basic TO are hence guaranteed to be
conflict serializable, like the 2PL protocol. However, some schedules are possible
under each protocol that are not allowed under the other. Thus, neither protocol
allows all possible serializable schedules. As mentioned earlier, deadlock does not
occur with timestamp ordering. However, cyclic restart (and hence starvation) may
occur if a transaction is continually aborted and restarted.
Strict Timestamp Ordering (TO). A variation of basic TO called strict TO
ensures that the schedules are both strict (for easy recoverability) and (conflict)
serializable. In this variation, a transaction T that issues a read_item(X) or
write_item(X) such that TS(T) > write_TS(X) has its read or write operation delayed
until the transaction T� that wrote the value of X (hence TS(T�) = write_TS(X)) has
committed or aborted. To implement this algorithm, it is necessary to simulate the
793
Concurrency Control Techniques
locking of an item X that has been written by transaction T� until T� is either com-
mitted or aborted. This algorithm does not cause deadlock, since T waits for T� only
if TS(T) > TS(T�).
Thomas’s Write Rule. A modification of the basic TO algorithm, known as
Thomas’s write rule, does not enforce conflict serializability, but it rejects fewer
write operations by modifying the checks for the write_item(X) operation as
follows:
1. If read_TS(X) > TS(T), then abort and roll back T and reject the operation.
2. If write_TS(X) > TS(T), then do not execute the write operation but continue
processing. This is because some transaction with timestamp greater than
TS(T)—and hence after T in the timestamp ordering—has already written
the value of X. Thus, we must ignore the write_item(X) operation of T
because it is already outdated and obsolete. Notice that any conflict arising
from this situation would be detected by case (1).
3. If neither the condition in part (1) nor the condition in part (2) occurs, then
execute the write_item(X) operation of T and set write_TS(X) to TS(T).
3 Multiversion Concurrency
Control Techniques
Other protocols for concurrency control keep the old values of a data item when the
item is updated. These are known as multiversion concurrency control, because
several versions (values) of an item are maintained. When a transaction requires
access to an item, an appropriate version is chosen to maintain the serializability of
the currently executing schedule, if possible. The idea is that some read operations
that would be rejected in other techniques can still be accepted by reading an older
version of the item to maintain serializability. When a transaction writes an item, it
writes a new version and the old version(s) of the item are retained. Some multiver-
sion concurrency control algorithms use the concept of view serializability rather
than conflict serializability.
An obvious drawback of multiversion techniques is that more storage is needed to
maintain multiple versions of the database items. However, older versions may have
to be maintained anyway—for example, for recovery purposes. In addition, some
database applications require older versions to be kept to maintain a history of the
evolution of data item values. The extreme case is a temporal database, which keeps
track of all changes and the times at which they occurred. In such cases, there is no
additional storage penalty for multiversion techniques, since older versions are
already maintained.
Several multiversion concurrency control schemes have been proposed. We discuss
two schemes here, one based on timestamp ordering and the other based on 2PL. In
addition, the validation concurrency control method (see Section 4) also maintains
multiple versions.
794
Concurrency Control Techniques
3.1 Multiversion Technique Based on Timestamp Ordering
In this method, several versions X1, X2, …, Xk of each data item X are maintained.
For each version, the value of version Xi and the following two timestamps are kept:
1. read_TS(Xi). The read timestamp of Xi is the largest of all the timestamps of
transactions that have successfully read version Xi.
2. write_TS(Xi). The write timestamp of Xi is the timestamp of the transac-
tion that wrote the value of version Xi.
Whenever a transaction T is allowed to execute a write_item(X) operation, a new ver-
sion Xk+1 of item X is created, with both the write_TS(Xk+1) and the read_TS(Xk+1)
set to TS(T). Correspondingly, when a transaction T is allowed to read the value of
version Xi, the value of read_TS(Xi) is set to the larger of the current read_TS(Xi) and
TS(T).
To ensure serializability, the following rules are used:
1. If transaction T issues a write_item(X) operation, and version i of X has the
highest write_TS(Xi) of all versions of X that is also less than or equal to TS(T),
and read_TS(Xi) > TS(T), then abort and roll back transaction T; otherwise,
create a new version Xj of X with read_TS(Xj) = write_TS(Xj) = TS(T).
2. If transaction T issues a read_item(X) operation, find the version i of X that
has the highest write_TS(Xi) of all versions of X that is also less than or equal
to TS(T); then return the value of Xi to transaction T, and set the value of
read_TS(Xi) to the larger of TS(T) and the current read_TS(Xi).
As we can see in case 2, a read_item(X) is always successful, since it finds the appro-
priate version Xi to read based on the write_TS of the various existing versions of X.
In case 1, however, transaction T may be aborted and rolled back. This happens if T
attempts to write a version of X that should have been read by another transaction
T� whose timestamp is read_TS(Xi); however, T� has already read version Xi, which
was written by the transaction with timestamp equal to write_TS(Xi). If this conflict
occurs, T is rolled back; otherwise, a new version of X, written by transaction T, is
created. Notice that if T is rolled back, cascading rollback may occur. Hence, to
ensure recoverability, a transaction T should not be allowed to commit until after all
the transactions that have written some version that T has read have committed.
3.2 Multiversion Two-Phase Locking Using Certify Locks
In this multiple-mode locking scheme, there are three locking modes for an item:
read, write, and certify, instead of just the two modes (read, write) discussed previ-
ously. Hence, the state of LOCK(X) for an item X can be one of read-locked, write-
locked, certify-locked, or unlocked. In the standard locking scheme, with only read
and write locks (see Section 1.1), a write lock is an exclusive lock. We can describe
the relationship between read and write locks in the standard scheme by means of
the lock compatibility table shown in Figure 6(a). An entry of Yes means that if a
transaction T holds the type of lock specified in the column header on item X and if
795
Concurrency Control Techniques
(b) Read Write
Read
Write
Certify
Yes No No
No No No
Yes Yes No
Certify
(a) Read Write
Read
Write No No
Yes No
Figure 6
Lock compatibility tables.
(a) A compatibility table for
read/write locking scheme.
(b) A compatibility table for
read/write/certify locking
scheme.
transaction T�requests the type of lock specified in the row header on the same item
X, then T� can obtain the lock because the locking modes are compatible. On the
other hand, an entry of No in the table indicates that the locks are not compatible,
so T� must wait until T releases the lock.
In the standard locking scheme, once a transaction obtains a write lock on an item,
no other transactions can access that item. The idea behind multiversion 2PL is to
allow other transactions T� to read an item X while a single transaction T holds a
write lock on X. This is accomplished by allowing two versions for each item X; one
version must always have been written by some committed transaction. The second
version X� is created when a transaction T acquires a write lock on the item. Other
transactions can continue to read the committed version of X while T holds the write
lock. Transaction T can write the value of X� as needed, without affecting the value
of the committed version X. However, once T is ready to commit, it must obtain a
certify lock on all items that it currently holds write locks on before it can commit.
The certify lock is not compatible with read locks, so the transaction may have to
delay its commit until all its write-locked items are released by any reading transac-
tions in order to obtain the certify locks. Once the certify locks—which are exclusive
locks—are acquired, the committed version X of the data item is set to the value of
version X�, version X� is discarded, and the certify locks are then released. The lock
compatibility table for this scheme is shown in Figure 6(b).
In this multiversion 2PL scheme, reads can proceed concurrently with a single write
operation—an arrangement not permitted under the standard 2PL schemes. The
cost is that a transaction may have to delay its commit until it obtains exclusive cer-
tify locks on all the items it has updated. It can be shown that this scheme avoids cas-
cading aborts, since transactions are only allowed to read the version X that was
written by a committed transaction. However, deadlocks may occur if upgrading of
a read lock to a write lock is allowed, and these must be handled by variations of the
techniques discussed in Section 1.3.
796
Concurrency Control Techniques
4 Validation (Optimistic) Concurrency Control
Techniques
In all the concurrency control techniques we have discussed so far, a certain degree
of checking is done before a database operation can be executed. For example, in
locking, a check is done to determine whether the item being accessed is locked. In
timestamp ordering, the transaction timestamp is checked against the read and
write timestamps of the item. Such checking represents overhead during transac-
tion execution, with the effect of slowing down the transactions.
In optimistic concurrency control techniques, also known as validation or
certification techniques, no checking is done while the transaction is executing.
Several theoretical concurrency control methods are based on the validation tech-
nique. We will describe only one scheme here. In this scheme, updates in the trans-
action are not applied directly to the database items until the transaction reaches its
end. During transaction execution, all updates are applied to local copies of the data
items that are kept for the transaction.6 At the end of transaction execution, a
validation phase checks whether any of the transaction’s updates violate serializ-
ability. Certain information needed by the validation phase must be kept by the sys-
tem. If serializability is not violated, the transaction is committed and the database
is updated from the local copies; otherwise, the transaction is aborted and then
restarted later.
There are three phases for this concurrency control protocol:
1. Read phase. A transaction can read values of committed data items from the
database. However, updates are applied only to local copies (versions) of the
data items kept in the transaction workspace.
2. Validation phase. Checking is performed to ensure that serializability will
not be violated if the transaction updates are applied to the database.
3. Write phase. If the validation phase is successful, the transaction updates are
applied to the database; otherwise, the updates are discarded and the trans-
action is restarted.
The idea behind optimistic concurrency control is to do all the checks at once;
hence, transaction execution proceeds with a minimum of overhead until the vali-
dation phase is reached. If there is little interference among transactions, most will
be validated successfully. However, if there is much interference, many transactions
that execute to completion will have their results discarded and must be restarted
later. Under these circumstances, optimistic techniques do not work well. The tech-
niques are called optimistic because they assume that little interference will occur
and hence that there is no need to do checking during transaction execution.
The optimistic protocol we describe uses transaction timestamps and also requires
that the write_sets and read_sets of the transactions be kept by the system.
Additionally, start and end times for some of the three phases need to be kept for
6Note that this can be considered as keeping multiple versions of items!
797
Concurrency Control Techniques
each transaction. Recall that the write_set of a transaction is the set of items it writes,
and the read_set is the set of items it reads. In the validation phase for transaction Ti,
the protocol checks that Ti does not interfere with any committed transactions or
with any other transactions currently in their validation phase. The validation phase
for Ti checks that, for each such transaction Tj that is either committed or is in its
validation phase, one of the following conditions holds:
1. Transaction Tj completes its write phase before Ti starts its read phase.
2. Ti starts its write phase after Tj completes its write phase, and the read_set
of Ti has no items in common with the write_set of Tj.
3. Both the read_set and write_set of Ti have no items in common with the
write_set of Tj, and Tj completes its read phase before Ti completes its read
phase.
When validating transaction Ti, the first condition is checked first for each transac-
tion Tj, since (1) is the simplest condition to check. Only if condition 1 is false is
condition 2 checked, and only if (2) is false is condition 3—the most complex to
evaluate—checked. If any one of these three conditions holds, there is no interfer-
ence and Ti is validated successfully. If none of these three conditions holds, the val-
idation of transaction Ti fails and it is aborted and restarted later because
interference may have occurred.
5 Granularity of Data Items and Multiple
Granularity Locking
All concurrency control techniques assume that the database is formed of a number
of named data items. A database item could be chosen to be one of the following:
■ A database record
■ A field value of a database record
■ A disk block
■ A whole file
■ The whole database
The granularity can affect the performance of concurrency control and recovery. In
Section 5.1, we discuss some of the tradeoffs with regard to choosing the granular-
ity level used for locking, and in Section 5.2 we discuss a multiple granularity lock-
ing scheme, where the granularity level (size of the data item) may be changed
dynamically.
5.1 Granularity Level Considerations for Locking
The size of data items is often called the data item granularity. Fine granularity
refers to small item sizes, whereas coarse granularity refers to large item sizes. Several
tradeoffs must be considered in choosing the data item size. We will discuss data
item size in the context of locking, although similar arguments can be made for
other concurrency control techniques.
798
Concurrency Control Techniques
db
r111 r11j r121 r12j r1n1 r1nj r211 r21k r221 r22k r2m1 r2mk. . . . . . . . .
. . .
. . . . . . . . . . . .
. . .
. . .
p11 p12
f1
p1n p21 p22 p2m
f2
Figure 7
A granularity hierarchy
for illustrating multiple
granularity level
locking.
First, notice that the larger the data item size is, the lower the degree of concurrency
permitted. For example, if the data item size is a disk block, a transaction T that
needs to lock a record B must lock the whole disk block X that contains B because a
lock is associated with the whole data item (block). Now, if another transaction S
wants to lock a different record C that happens to reside in the same block X in a
conflicting lock mode, it is forced to wait. If the data item size was a single record,
transaction S would be able to proceed, because it would be locking a different data
item (record).
On the other hand, the smaller the data item size is, the more the number of items
in the database. Because every item is associated with a lock, the system will have a
larger number of active locks to be handled by the lock manager. More lock and
unlock operations will be performed, causing a higher overhead. In addition, more
storage space will be required for the lock table. For timestamps, storage is required
for the read_TS and write_TS for each data item, and there will be similar overhead
for handling a large number of items.
Given the above tradeoffs, an obvious question can be asked: What is the best item
size? The answer is that it depends on the types of transactions involved. If a typical
transaction accesses a small number of records, it is advantageous to have the data
item granularity be one record. On the other hand, if a transaction typically accesses
many records in the same file, it may be better to have block or file granularity so
that the transaction will consider all those records as one (or a few) data items.
5.2 Multiple Granularity Level Locking
Since the best granularity size depends on the given transaction, it seems appropri-
ate that a database system should support multiple levels of granularity, where the
granularity level can be different for various mixes of transactions. Figure 7 shows a
simple granularity hierarchy with a database containing two files, each file contain-
ing several disk pages, and each page containing several records. This can be used to
illustrate a multiple granularity level 2PL protocol, where a lock can be requested
at any level. However, additional types of locks will be needed to support such a pro-
tocol efficiently.
799
Concurrency Control Techniques
Consider the following scenario, with only shared and exclusive lock types, that refers
to the example in Figure 7. Suppose transaction T1 wants to update all the records in
file f1, and T1 requests and is granted an exclusive lock for f1. Then all of f1’s pages (p11
through p1n)—and the records contained on those pages—are locked in exclusive
mode. This is beneficial for T1 because setting a single file-level lock is more efficient
than setting n page-level locks or having to lock each individual record. Now suppose
another transaction T2 only wants to read record r1nj from page p1n of file f1; then T2
would request a shared record-level lock on r1nj. However, the database system (that
is, the transaction manager or more specifically the lock manager) must verify the
compatibility of the requested lock with already held locks. One way to verify this is
to traverse the tree from the leaf r1nj to p1n to f1 to db. If at any time a conflicting lock
is held on any of those items, then the lock request for r1nj is denied and T2 is blocked
and must wait. This traversal would be fairly efficient.
However, what if transaction T2’s request came before transaction T1’s request? In
this case, the shared record lock is granted to T2 for r1nj, but when T1’s file-level lock
is requested, it is quite difficult for the lock manager to check all nodes (pages and
records) that are descendants of node f1 for a lock conflict. This would be very inef-
ficient and would defeat the purpose of having multiple granularity level locks.
To make multiple granularity level locking practical, additional types of locks, called
intention locks, are needed. The idea behind intention locks is for a transaction to
indicate, along the path from the root to the desired node, what type of lock (shared
or exclusive) it will require from one of the node’s descendants. There are three
types of intention locks:
1. Intention-shared (IS) indicates that one or more shared locks will be
requested on some descendant node(s).
2. Intention-exclusive (IX) indicates that one or more exclusive locks will be
requested on some descendant node(s).
3. Shared-intention-exclusive (SIX) indicates that the current node is locked in
shared mode but that one or more exclusive locks will be requested on some
descendant node(s).
The compatibility table of the three intention locks, and the shared and exclusive
locks, is shown in Figure 8. Besides the introduction of the three types of intention
locks, an appropriate locking protocol must be used. The multiple granularity
locking (MGL) protocol consists of the following rules:
1. The lock compatibility (based on Figure 8) must be adhered to.
2. The root of the tree must be locked first, in any mode.
3. A node N can be locked by a transaction T in S or IS mode only if the parent
node N is already locked by transaction T in either IS or IX mode.
4. A node N can be locked by a transaction T in X, IX, or SIX mode only if the
parent of node N is already locked by transaction T in either IX or SIX mode.
5. A transaction T can lock a node only if it has not unlocked any node (to
enforce the 2PL protocol).
800
Concurrency Control Techniques
IS
IX
S
SIX
X
IS
Yes
Yes
Yes
Yes
No
IX
Yes
No
Yes
No
No
S
No
Yes
Yes
No
No
SIX
No
No
Yes
No
No
X
No
No
No
No
No
Figure 8
Lock compatibility matrix for
multiple granularity locking.
6. A transaction T can unlock a node, N, only if none of the children of node N
are currently locked by T.
Rule 1 simply states that conflicting locks cannot be granted. Rules 2, 3, and 4 state
the conditions when a transaction may lock a given node in any of the lock modes.
Rules 5 and 6 of the MGL protocol enforce 2PL rules to produce serializable sched-
ules. To illustrate the MGL protocol with the database hierarchy in Figure 7, con-
sider the following three transactions:
1. T1 wants to update record r111 and record r211.
2. T2 wants to update all records on page p12.
3. T3 wants to read record r11j and the entire f2 file.
Figure 9 shows a possible serializable schedule for these three transactions. Only the
lock and unlock operations are shown. The notation (- ) is used to
display the locking operations in the schedule.
The multiple granularity level protocol is especially suited when processing a mix of
transactions that include (1) short transactions that access only a few items (records
or fields) and (2) long transactions that access entire files. In this environment, less
transaction blocking and less locking overhead is incurred by such a protocol when
compared to a single level granularity locking approach.
6 Using Locks for Concurrency
Control in Indexes
Two-phase locking can also be applied to indexes, where the nodes of an index cor-
respond to disk pages. However, holding locks on index pages until the shrinking
phase of 2PL could cause an undue amount of transaction blocking because search-
ing an index always starts at the root. Therefore, if a transaction wants to insert a
record (write operation), the root would be locked in exclusive mode, so all other
conflicting lock requests for the index must wait until the transaction enters its
shrinking phase. This blocks all other transactions from accessing the index, so in
practice other approaches to locking an index must be used.
801
Concurrency Control Techniques
IX(db)
IX(f1)
T1
IX(p11)
X(r111)
IX(f2)
IX(p21)
X(p211)
unlock(r211)
unlock(p21)
unlock(f2)
unlock(r111)
unlock(p11)
unlock(f1)
unlock(db)
T3
IS(db)
IS(f1)
IS(p11)
S(r11j)
S(f2)
unlock(r11j)
unlock(p11)
unlock(f1)
unlock(f2)
unlock(db)
IX(db)
T2
IX(f1)
X(p12)
unlock(p12)
unlock(f1)
unlock(db)
Figure 9
Lock operations to
illustrate a serializable
schedule.
The tree structure of the index can be taken advantage of when developing a con-
currency control scheme. For example, when an index search (read operation) is
being executed, a path in the tree is traversed from the root to a leaf. Once a lower-
level node in the path has been accessed, the higher-level nodes in that path will not
be used again. So once a read lock on a child node is obtained, the lock on the par-
ent can be released. When an insertion is being applied to a leaf node (that is, when
a key and a pointer are inserted), then a specific leaf node must be locked in exclu-
sive mode. However, if that node is not full, the insertion will not cause changes to
higher-level index nodes, which implies that they need not be locked exclusively.
A conservative approach for insertions would be to lock the root node in exclusive
mode and then to access the appropriate child node of the root. If the child node is
802
Concurrency Control Techniques
not full, then the lock on the root node can be released. This approach can be
applied all the way down the tree to the leaf, which is typically three or four levels
from the root. Although exclusive locks are held, they are soon released. An alterna-
tive, more optimistic approach would be to request and hold shared locks on the
nodes leading to the leaf node, with an exclusive lock on the leaf. If the insertion
causes the leaf to split, insertion will propagate to one or more higher-level nodes.
Then, the locks on the higher-level nodes can be upgraded to exclusive mode.
Another approach to index locking is to use a variant of the B+-tree, called the B-
link tree. In a B-link tree, sibling nodes on the same level are linked at every level.
This allows shared locks to be used when requesting a page and requires that the
lock be released before accessing the child node. For an insert operation, the shared
lock on a node would be upgraded to exclusive mode. If a split occurs, the parent
node must be relocked in exclusive mode. One complication is for search operations
executed concurrently with the update. Suppose that a concurrent update operation
follows the same path as the search, and inserts a new entry into the leaf node.
Additionally, suppose that the insert causes that leaf node to split. When the insert is
done, the search process resumes, following the pointer to the desired leaf, only to
find that the key it is looking for is not present because the split has moved that key
into a new leaf node, which would be the right sibling of the original leaf node.
However, the search process can still succeed if it follows the pointer (link) in the
original leaf node to its right sibling, where the desired key has been moved.
Handling the deletion case, where two or more nodes from the index tree merge, is
also part of the B-link tree concurrency protocol. In this case, locks on the nodes to
be merged are held as well as a lock on the parent of the two nodes to be merged.
7 Other Concurrency Control Issues
In this section we discuss some other issues relevant to concurrency control. In
Section 7.1, we discuss problems associated with insertion and deletion of records
and the so-called phantom problem, which may occur when records are inserted. In
Section 7.2 we discuss problems that may occur when a transaction outputs some
data to a monitor before it commits, and then the transaction is later aborted.
7.1 Insertion, Deletion, and Phantom Records
When a new data item is inserted in the database, it obviously cannot be accessed
until after the item is created and the insert operation is completed. In a locking
environment, a lock for the item can be created and set to exclusive (write) mode;
the lock can be released at the same time as other write locks would be released,
based on the concurrency control protocol being used. For a timestamp-based pro-
tocol, the read and write timestamps of the new item are set to the timestamp of the
creating transaction.
803
Concurrency Control Techniques
Next, consider a deletion operation that is applied on an existing data item. For
locking protocols, again an exclusive (write) lock must be obtained before the trans-
action can delete the item. For timestamp ordering, the protocol must ensure that no
later transaction has read or written the item before allowing the item to be deleted.
A situation known as the phantom problem can occur when a new record that is
being inserted by some transaction T satisfies a condition that a set of records
accessed by another transaction T� must satisfy. For example, suppose that transac-
tion T is inserting a new EMPLOYEE record whose Dno = 5, while transaction T� is
accessing all EMPLOYEE records whose Dno = 5 (say, to add up all their Salary values
to calculate the personnel budget for department 5). If the equivalent serial order is
T followed by T�, then T�must read the new EMPLOYEE record and include its Salary
in the sum calculation. For the equivalent serial order T� followed by T, the new
salary should not be included. Notice that although the transactions logically con-
flict, in the latter case there is really no record (data item) in common between the
two transactions, since T� may have locked all the records with Dno = 5 before T
inserted the new record. This is because the record that causes the conflict is a
phantom record that has suddenly appeared in the database on being inserted. If
other operations in the two transactions conflict, the conflict due to the phantom
record may not be recognized by the concurrency control protocol.
One solution to the phantom record problem is to use index locking, as discussed
in Section 6. Recall that an index includes entries that have an attribute value, plus a
set of pointers to all records in the file with that value. For example, an index on Dno
of EMPLOYEE would include an entry for each distinct Dno value, plus a set of
pointers to all EMPLOYEE records with that value. If the index entry is locked before
the record itself can be accessed, then the conflict on the phantom record can be
detected, because transaction T� would request a read lock on the index entry for
Dno = 5, and T would request a write lock on the same entry before they could place
the locks on the actual records. Since the index locks conflict, the phantom conflict
would be detected.
A more general technique, called predicate locking, would lock access to all records
that satisfy an arbitrary predicate (condition) in a similar manner; however, predi-
cate locks have proved to be difficult to implement efficiently.
7.2 Interactive Transactions
Another problem occurs when interactive transactions read input and write output
to an interactive device, such as a monitor screen, before they are committed. The
problem is that a user can input a value of a data item to a transaction T that is
based on some value written to the screen by transaction T�, which may not have
committed. This dependency between T and T� cannot be modeled by the system
concurrency control method, since it is only based on the user interacting with the
two transactions.
An approach to dealing with this problem is to postpone output of transactions to
the screen until they have committed.
804
Concurrency Control Techniques
7.3 Latches
Locks held for a short duration are typically called latches. Latches do not follow the
usual concurrency control protocol such as two-phase locking. For example, a latch
can be used to guarantee the physical integrity of a page when that page is being
written from the buffer to disk. A latch would be acquired for the page, the page
written to disk, and then the latch released.
8 Summary
In this chapter we discussed DBMS techniques for concurrency control. We started
by discussing lock-based protocols, which are by far the most commonly used in
practice. We described the two-phase locking (2PL) protocol and a number of its
variations: basic 2PL, strict 2PL, conservative 2PL, and rigorous 2PL. The strict and
rigorous variations are more common because of their better recoverability proper-
ties. We introduced the concepts of shared (read) and exclusive (write) locks, and
showed how locking can guarantee serializability when used in conjunction with
the two-phase locking rule. We also presented various techniques for dealing with
the deadlock problem, which can occur with locking. In practice, it is common to
use timeouts and deadlock detection (wait-for graphs).
We presented other concurrency control protocols that are not used often in prac-
tice but are important for the theoretical alternatives they show for solving this
problem. These include the timestamp ordering protocol, which ensures serializ-
ability based on the order of transaction timestamps. Timestamps are unique,
system-generated transaction identifiers. We discussed Thomas’s write rule, which
improves performance but does not guarantee conflict serializability. The strict
timestamp ordering protocol was also presented. We discussed two multiversion
protocols, which assume that older versions of data items can be kept in the data-
base. One technique, called multiversion two-phase locking (which has been used in
practice), assumes that two versions can exist for an item and attempts to increase
concurrency by making write and read locks compatible (at the cost of introducing
an additional certify lock mode). We also presented a multiversion protocol based
on timestamp ordering, and an example of an optimistic protocol, which is also
known as a certification or validation protocol.
Then we turned our attention to the important practical issue of data item granu-
larity. We described a multigranularity locking protocol that allows the change of
granularity (item size) based on the current transaction mix, with the goal of
improving the performance of concurrency control. An important practical issue
was then presented, which is to develop locking protocols for indexes so that indexes
do not become a hindrance to concurrent access. Finally, we introduced the phan-
tom problem and problems with interactive transactions, and briefly described the
concept of latches and how it differs from locks.
805
Concurrency Control Techniques
Review Questions
1. What is the two-phase locking protocol? How does it guarantee serializabil-
ity?
2. What are some variations of the two-phase locking protocol? Why is strict or
rigorous two-phase locking often preferred?
3. Discuss the problems of deadlock and starvation, and the different
approaches to dealing with these problems.
4. Compare binary locks to exclusive/shared locks. Why is the latter type of
locks preferable?
5. Describe the wait-die and wound-wait protocols for deadlock prevention.
6. Describe the cautious waiting, no waiting, and timeout protocols for dead-
lock prevention.
7. What is a timestamp? How does the system generate timestamps?
8. Discuss the timestamp ordering protocol for concurrency control. How does
strict timestamp ordering differ from basic timestamp ordering?
9. Discuss two multiversion techniques for concurrency control.
10. What is a certify lock? What are the advantages and disadvantages of using
certify locks?
11. How do optimistic concurrency control techniques differ from other con-
currency control techniques? Why are they also called validation or certifica-
tion techniques? Discuss the typical phases of an optimistic concurrency
control method.
12. How does the granularity of data items affect the performance of concur-
rency control? What factors affect selection of granularity size for data items?
13. What type of lock is needed for insert and delete operations?
14. What is multiple granularity locking? Under what circumstances is it used?
15. What are intention locks?
16. When are latches used?
17. What is a phantom record? Discuss the problem that a phantom record can
cause for concurrency control.
18. How does index locking resolve the phantom problem?
19. What is a predicate lock?
806
Concurrency Control Techniques
Exercises
20. Prove that the basic two-phase locking protocol guarantees conflict serializ-
ability of schedules. (Hint: Show that if a serializability graph for a schedule
has a cycle, then at least one of the transactions participating in the schedule
does not obey the two-phase locking protocol.)
21. Modify the data structures for multiple-mode locks and the algorithms for
read_lock(X), write_lock(X), and unlock(X) so that upgrading and downgrad-
ing of locks are possible. (Hint: The lock needs to check the transaction id(s)
that hold the lock, if any.)
22. Prove that strict two-phase locking guarantees strict schedules.
23. Prove that the wait-die and wound-wait protocols avoid deadlock and star-
vation.
24. Prove that cautious waiting avoids deadlock.
25. Apply the timestamp ordering algorithm to the schedules in Figure A.1(b)
and (c) at the end of this chapter, and determine whether the algorithm will
allow the execution of the schedules.
26. Repeat Exercise 25, but use the multiversion timestamp ordering method.
27. Why is two-phase locking not used as a concurrency control method for
indexes such as B+-trees?
28. The compatibility matrix in Figure 8 shows that IS and IX locks are compat-
ible. Explain why this is valid.
29. The MGL protocol states that a transaction T can unlock a node N, only if
none of the children of node N are still locked by transaction T. Show that
without this condition, the MGL protocol would be incorrect.
Selected Bibliography
The two-phase locking protocol and the concept of predicate locks were first pro-
posed by Eswaran et al. (1976). Bernstein et al. (1987), Gray and Reuter (1993), and
Papadimitriou (1986) focus on concurrency control and recovery. Kumar (1996)
focuses on performance of concurrency control methods. Locking is discussed in
Gray et al. (1975), Lien and Weinberger (1978), Kedem and Silbershatz (1980), and
Korth (1983). Deadlocks and wait-for graphs were formalized by Holt (1972), and
the wait-wound and wound-die schemes are presented in Rosenkrantz et al. (1978).
Cautious waiting is discussed in Hsu and Zhang (1992). Helal et al. (1993) com-
pares various locking approaches. Timestamp-based concurrency control tech-
niques are discussed in Bernstein and Goodman (1980) and Reed (1983).
Optimistic concurrency control is discussed in Kung and Robinson (1981) and
Bassiouni (1988). Papadimitriou and Kanellakis (1979) and Bernstein and
807
Concurrency Control Techniques
Goodman (1983) discuss multiversion techniques. Multiversion timestamp order-
ing was proposed in Reed (1979, 1983), and multiversion two-phase locking is dis-
cussed in Lai and Wilkinson (1984). A method for multiple locking granularities
was proposed in Gray et al. (1975), and the effects of locking granularities are ana-
lyzed in Ries and Stonebraker (1977). Bhargava and Reidl (1988) presents an
approach for dynamically choosing among various concurrency control and recov-
ery methods. Concurrency control methods for indexes are presented in Lehman
and Yao (1981) and in Shasha and Goodman (1988). A performance study of vari-
ous B+-tree concurrency control algorithms is presented in Srinivasan and Carey
(1991).
Other work on concurrency control includes semantic-based concurrency control
(Badrinath and Ramamritham, 1992), transaction models for long-running activi-
ties (Dayal et al., 1991), and multilevel transaction management (Hasse and
Weikum, 1991).
808
Transaction T1
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
Transaction T3
read_item(Y );
read_item(Z );
write_item(Y );
write_item(Z );
read_item(Y );
read_item(Z );
write_item(Y);
write_item(Z );
Transaction T2
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
(b)
(a)
Schedule E
Time
read_item(X );
write_item(X );
read_item(Y );
write_item(Y );
read_item(Y );
read_item(Z );
write_item(Y );
write_item(Z );
read_item(Z );
read_item(Y );
write_item(Y );
read_item(X );
write_item(X );
(c)
Schedule F
Time
Transaction T1 Transaction T2 Transaction T3
Transaction T1 Transaction T2 Transaction T3
Figure A.1
Example of serializabil-
ity testing. (a) The
read and write opera-
tions of three transac-
tions T1, T2, and T3. (b)
Schedule E. (c)
Schedule F.
809
Database Recovery
Techniques
In this chapter we discuss some of the techniques thatcan be used for database recovery from failures.
This chapter presents concepts that are relevant to recovery protocols, and provides
an overview of the various database recovery algorithms We start in Section 1 with
an outline of a typical recovery procedure and a categorization of recovery algo-
rithms, and then we discuss several recovery concepts, including write-ahead log-
ging, in-place versus shadow updates, and the process of rolling back (undoing) the
effect of an incomplete or failed transaction. In Section 2 we pre-sent recovery tech-
niques based on deferred update, also known as the NO-UNDO/REDO technique,
where the data on disk is not updated until after a transaction commits. In Section 3
we discuss recovery techniques based on immediate update, where data can be
updated on disk during transaction execution; these include the UNDO/REDO and
UNDO/NO-REDO algorithms. We discuss the technique known as shadowing or
shadow paging, which can be categorized as a NO-UNDO/NO-REDO algorithm in
Section 4. An example of a practical DBMS recovery scheme, called ARIES, is pre-
sented in Section 5. Recovery in multidatabases is briefly discussed in Section 6.
Finally, techniques for recovery from catastrophic failure are discussed in Section 7.
Section 8 summarizes the chapter.
Our emphasis is on conceptually describing several different approaches to recov-
ery. For descriptions of recovery features in specific systems, the reader should con-
sult the bibliographic notes at the end of the chapter and the online and printed
user manuals for those systems. Recovery techniques are often intertwined with the
From Chapter 23 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
810
Database Recovery Techniques
concurrency control mechanisms. Certain recovery techniques are best used with
specific concurrency control methods. We will discuss recovery concepts indepen-
dently of concurrency control mechanisms, but we will discuss the circumstances
under which a particular recovery mechanism is best used with a certain concur-
rency control protocol.
1 Recovery Concepts
1.1 Recovery Outline and Categorization
of Recovery Algorithms
Recovery from transaction failures usually means that the database is restored to the
most recent consistent state just before the time of failure. To do this, the system
must keep information about the changes that were applied to data items by the
various transactions. This information is typically kept in the system log. A typical
strategy for recovery may be summarized informally as follows:
1. If there is extensive damage to a wide portion of the database due to cata-
strophic failure, such as a disk crash, the recovery method restores a past
copy of the database that was backed up to archival storage (typically tape or
other large capacity offline storage media) and reconstructs a more current
state by reapplying or redoing the operations of committed transactions
from the backed up log, up to the time of failure.
2. When the database on disk is not physically damaged, and a noncatastrophic
failure has occurred, the recovery strategy is to identify any changes that may
cause an inconsistency in the database. For example, a transaction that has
updated some database items on disk but has not been committed needs to
have its changes reversed by undoing its write operations. It may also be nec-
essary to redo some operations in order to restore a consistent state of the
database; for example, if a transaction has committed but some of its write
operations have not yet been written to disk. For noncatastrophic failure, the
recovery protocol does not need a complete archival copy of the database.
Rather, the entries kept in the online system log on disk are analyzed to
determine the appropriate actions for recovery.
Conceptually, we can distinguish two main techniques for recovery from noncata-
strophic transaction failures: deferred update and immediate update. The deferred
update techniques do not physically update the database on disk until after a trans-
action reaches its commit point; then the updates are recorded in the database.
Before reaching commit, all transaction updates are recorded in the local transac-
tion workspace or in the main memory buffers that the DBMS maintains (the
DBMS main memory cache). Before commit, the updates are recorded persistently
in the log, and then after commit, the updates are written to the database on disk.
If a transaction fails before reaching its commit point, it will not have changed the
811
Database Recovery Techniques
database in any way, so UNDO is not needed. It may be necessary to REDO the
effect of the operations of a committed transaction from the log, because their
effect may not yet have been recorded in the database on disk. Hence, deferred
update is also known as the NO-UNDO/REDO algorithm. We discuss this tech-
nique in Section 2.
In the immediate update techniques, the database may be updated by some opera-
tions of a transaction before the transaction reaches its commit point. However,
these operations must also be recorded in the log on disk by force-writing before they
are applied to the database on disk, making recovery still possible. If a transaction
fails after recording some changes in the database on disk but before reaching its
commit point, the effect of its operations on the database must be undone; that is,
the transaction must be rolled back. In the general case of immediate update, both
undo and redo may be required during recovery. This technique, known as the
UNDO/REDO algorithm, requires both operations during recovery, and is used
most often in practice. A variation of the algorithm where all updates are required
to be recorded in the database on disk before a transaction commits requires undo
only, so it is known as the UNDO/NO-REDO algorithm. We discuss these techniques
in Section 3.
The UNDO and REDO operations are required to be idempotent—that is, executing
an operation multiple times is equivalent to executing it just once. In fact, the whole
recovery process should be idempotent because if the system were to fail during the
recovery process, the next recovery attempt might UNDO and REDO certain
write_item operations that had already been executed during the first recovery
process. The result of recovery from a system crash during recovery should be the
same as the result of recovering when there is no crash during recovery!
1.2 Caching (Buffering) of Disk Blocks
The recovery process is often closely intertwined with operating system functions—
in particular, the buffering of database disk pages in the DBMS main memory
cache. Typically, multiple disk pages that include the data items to be updated are
cached into main memory buffers and then updated in memory before being writ-
ten back to disk. The caching of disk pages is traditionally an operating system func-
tion, but because of its importance to the efficiency of recovery procedures, it is
handled by the DBMS by calling low-level operating systems routines.
In general, it is convenient to consider recovery in terms of the database disk pages
(blocks). Typically a collection of in-memory buffers, called the DBMS cache, is
kept under the control of the DBMS for the purpose of holding these buffers. A
directory for the cache is used to keep track of which database items are in the
buffers.1 This can be a table of entries.
When the DBMS requests action on some item, first it checks the cache directory to
determine whether the disk page containing the item is in the DBMS cache. If it is
1This is somewhat similar to the concept of page tables used by the operating system.
812
Database Recovery Techniques
not, the item must be located on disk, and the appropriate disk pages are copied into
the cache. It may be necessary to replace (or flush) some of the cache buffers to
make space available for the new item. Some page replacement strategy similar to
these used in operating systems, such as least recently used (LRU) or first-in-first-
out (FIFO), or a new strategy that is DBMS-specific can be used to select the buffers
for replacement, such as DBMIN or Least-Likely-to-Use (see bibliographic notes).
The entries in the DBMS cache directory hold additional information relevant to
buffer management. Associated with each buffer in the cache is a dirty bit, which
can be included in the directory entry, to indicate whether or not the buffer has
been modified. When a page is first read from the database disk into a cache buffer,
a new entry is inserted in the cache directory with the new disk page address, and
the dirty bit is set to 0 (zero). As soon as the buffer is modified, the dirty bit for the
corresponding directory entry is set to 1 (one). Additional information, such as the
transaction id(s) of the transaction(s) that modified the buffer can also be kept in
the directory. When the buffer contents are replaced (flushed) from the cache, the
contents must first be written back to the corresponding disk page only if its dirty bit
is 1. Another bit, called the pin-unpin bit, is also needed—a page in the cache is
pinned (bit value 1 (one)) if it cannot be written back to disk as yet. For example,
the recovery protocol may restrict certain buffer pages from being written back to
the disk until the transactions that changed this buffer have committed.
Two main strategies can be employed when flushing a modified buffer back to disk.
The first strategy, known as in-place updating, writes the buffer to the same original
disk location, thus overwriting the old value of any changed data items on disk.2
Hence, a single copy of each database disk block is maintained. The second strategy,
known as shadowing, writes an updated buffer at a different disk location, so mul-
tiple versions of data items can be maintained, but this approach is not typically
used in practice.
In general, the old value of the data item before updating is called the before image
(BFIM), and the new value after updating is called the after image (AFIM). If shad-
owing is used, both the BFIM and the AFIM can be kept on disk; hence, it is not
strictly necessary to maintain a log for recovering. We briefly discuss recovery based
on shadowing in Section 4.
1.3 Write-Ahead Logging, Steal/No-Steal,
and Force/No-Force
When in-place updating is used, it is necessary to use a log for recovery. In this case,
the recovery mechanism must ensure that the BFIM of the data item is recorded in
the appropriate log entry and that the log entry is flushed to disk before the BFIM is
overwritten with the AFIM in the database on disk. This process is generally known
as write-ahead logging, and is necessary to be able to UNDO the operation if this is
required during recovery. Before we can describe a protocol for write-ahead
2In-place updating is used in most systems in practice.
813
Database Recovery Techniques
logging, we need to distinguish between two types of log entry information
included for a write command: the information needed for UNDO and the informa-
tion needed for REDO. A REDO-type log entry includes the new value (AFIM) of
the item written by the operation since this is needed to redo the effect of the oper-
ation from the log (by setting the item value in the database on disk to its AFIM).
The UNDO-type log entries include the old value (BFIM) of the item since this is
needed to undo the effect of the operation from the log (by setting the item value in
the database back to its BFIM). In an UNDO/REDO algorithm, both types of log
entries are combined. Additionally, when cascading rollback is possible, read_item
entries in the log are considered to be UNDO-type entries (see Section 1.5).
As mentioned, the DBMS cache holds the cached database disk blocks in main
memory buffers, which include not only data blocks, but also index blocks and log
blocks from the disk. When a log record is written, it is stored in the current log
buffer in the DBMS cache. The log is simply a sequential (append-only) disk file,
and the DBMS cache may contain several log blocks in main memory buffers (typi-
cally, the last n log blocks of the log file). When an update to a data block—stored in
the DBMS cache—is made, an associated log record is written to the last log buffer
in the DBMS cache. With the write-ahead logging approach, the log buffers (blocks)
that contain the associated log records for a particular data block update must first
be written to disk before the data block itself can be written back to disk from its
main memory buffer.
Standard DBMS recovery terminology includes the terms steal/no-steal and
force/no-force, which specify the rules that govern when a page from the database
can be written to disk from the cache:
1. If a cache buffer page updated by a transaction cannot be written to disk
before the transaction commits, the recovery method is called a no-steal
approach. The pin-unpin bit will be used to indicate if a page cannot be
written back to disk. On the other hand, if the recovery protocol allows writ-
ing an updated buffer before the transaction commits, it is called steal. Steal
is used when the DBMS cache (buffer) manager needs a buffer frame for
another transaction and the buffer manager replaces an existing page that
had been updated but whose transaction has not committed. The no-steal
rule means that UNDO will never be needed during recovery, since a commit-
ted transaction will not have any of its updates on disk before it commits.
2. If all pages updated by a transaction are immediately written to disk before
the transaction commits, it is called a force approach. Otherwise, it is called
no-force. The force rule means that REDO will never be needed during recov-
ery, since any committed transaction will have all its updates on disk before
it is committed.
The deferred update (NO-UNDO) recovery scheme discussed in Section 2 follows a
no-steal approach. However, typical database systems employ a steal/no-force strat-
egy. The advantage of steal is that it avoids the need for a very large buffer space to
store all updated pages in memory. The advantage of no-force is that an updated
814
Database Recovery Techniques
page of a committed transaction may still be in the buffer when another transaction
needs to update it, thus eliminating the I/O cost to write that page multiple times to
disk, and possibly to have to read it again from disk. This may provide a substantial
saving in the number of disk I/O operations when a specific page is updated heavily
by multiple transactions.
To permit recovery when in-place updating is used, the appropriate entries required
for recovery must be permanently recorded in the log on disk before changes are
applied to the database. For example, consider the following write-ahead logging
(WAL) protocol for a recovery algorithm that requires both UNDO and REDO:
1. The before image of an item cannot be overwritten by its after image in the
database on disk until all UNDO-type log records for the updating transac-
tion—up to this point—have been force-written to disk.
2. The commit operation of a transaction cannot be completed until all the
REDO-type and UNDO-type log records for that transaction have been force-
written to disk.
To facilitate the recovery process, the DBMS recovery subsystem may need to main-
tain a number of lists related to the transactions being processed in the system.
These include a list for active transactions that have started but not committed as
yet, and it may also include lists of all committed and aborted transactions since
the last checkpoint (see the next section). Maintaining these lists makes the recovery
process more efficient.
1.4 Checkpoints in the System Log
and Fuzzy Checkpointing
Another type of entry in the log is called a checkpoint.3 A [checkpoint, list of active
transactions] record is written into the log periodically at that point when the system
writes out to the database on disk all DBMS buffers that have been modified. As a
consequence of this, all transactions that have their [commit, T ] entries in the log
before a [checkpoint] entry do not need to have their WRITE operations redone in case
of a system crash, since all their updates will be recorded in the database on disk
during checkpointing. As part of checkpointing, the list of transaction ids for active
transactions at the time of the checkpoint is included in the checkpoint record, so
that these transactions can be easily identified during recovery.
The recovery manager of a DBMS must decide at what intervals to take a check-
point. The interval may be measured in time—say, every m minutes—or in the
number t of committed transactions since the last checkpoint, where the values of m
or t are system parameters. Taking a checkpoint consists of the following actions:
1. Suspend execution of transactions temporarily.
2. Force-write all main memory buffers that have been modified to disk.
3The term checkpoint has been used to describe more restrictive situations in some systems, such as
DB2. It has also been used in the literature to describe entirely different concepts.
815
Database Recovery Techniques
3. Write a [checkpoint] record to the log, and force-write the log to disk.
4. Resume executing transactions.
As a consequence of step 2, a checkpoint record in the log may also include addi-
tional information, such as a list of active transaction ids, and the locations
(addresses) of the first and most recent (last) records in the log for each active trans-
action. This can facilitate undoing transaction operations in the event that a trans-
action must be rolled back.
The time needed to force-write all modified memory buffers may delay transaction
processing because of step 1. To reduce this delay, it is common to use a technique
called fuzzy checkpointing. In this technique, the system can resume transaction
processing after a [begin_checkpoint] record is written to the log without having to
wait for step 2 to finish. When step 2 is completed, an [end_checkpoint, …] record is
written in the log with the relevant information collected during checkpointing.
However, until step 2 is completed, the previous checkpoint record should remain
valid. To accomplish this, the system maintains a file on disk that contains a pointer
to the valid checkpoint, which continues to point to the previous checkpoint record
in the log. Once step 2 is concluded, that pointer is changed to point to the new
checkpoint in the log.
1.5 Transaction Rollback and Cascading Rollback
If a transaction fails for whatever reason after updating the database, but before the
transaction commits, it may be necessary to roll back the transaction. If any data
item values have been changed by the transaction and written to the database, they
must be restored to their previous values (BFIMs). The undo-type log entries are
used to restore the old values of data items that must be rolled back.
If a transaction T is rolled back, any transaction S that has, in the interim, read the
value of some data item X written by T must also be rolled back. Similarly, once S is
rolled back, any transaction R that has read the value of some data item Y written by
S must also be rolled back; and so on. This phenomenon is called cascading roll-
back, and can occur when the recovery protocol ensures recoverable schedules but
does not ensure strict or cascadeless schedules. Understandably, cascading rollback
can be quite complex and time-consuming. That is why almost all recovery mecha-
nisms are designed so that cascading rollback is never required.
Figure 1 shows an example where cascading rollback is required. The read and write
operations of three individual transactions are shown in Figure 1(a). Figure 1(b)
shows the system log at the point of a system crash for a particular execution sched-
ule of these transactions. The values of data items A, B, C, and D, which are used by
the transactions, are shown to the right of the system log entries. We assume that the
original item values, shown in the first line, are A = 30, B = 15, C = 40, and D = 20. At
the point of system failure, transaction T3 has not reached its conclusion and must be
rolled back. The WRITE operations of T3, marked by a single * in Figure 1(b), are the
T3 operations that are undone during transaction rollback. Figure 1(c) graphically
shows the operations of the different transactions along the time axis.
816
(a)
(b)
*
**
**
[start_transaction,T3]
[read_item,T3,C]
[write_item,T3,B,15,12]
[start_transaction,T2]
[read_item,T2,B]
[write_item,T2,B,12,18]
[write_item,T1,D,20,25]
[write_item,T2,D,25,26]
[start_transaction,T1]
[read_item,T1,A]
[read_item,T1,D]
[read_item,T2,D]
[read_item,T3,A]
* T3 is rolled back because it
did not reach its commit point.
** T2 is rolled back because it
reads the value of item B written by T3.
read_item(A)
read_item(D)
write_item(D)
read_item(B)
write_item(B)
read_item(D)
write_item(D)
System crash
T3T2T1
read_item(C)
write_item(B)
read_item(A)
write_item(A)
A
30
B
15
12
18
C
40
D
20
25
26
(c) READ(C)
BEGIN
READ(A)WRITE(B)
T3
READ(B)
BEGIN
WRITE(D)READ(D)WRITE(B)
READ(A)
BEGIN
System crash
Time
READ(D) WRITE(D)
T2
T1
Figure 1
Illustrating cascading rollback
(a process that never occurs
in strict or cascadeless
schedules). (a) The read and
write operations of three
transactions. (b) System log at
point of crash. (c) Operations
before the crash.
Database Recovery Techniques
817
We must now check for cascading rollback. From Figure 1(c) we see that transaction
T2 reads the value of item B that was written by transaction T3; this can also be
determined by examining the log. Because T3 is rolled back, T2 must now be rolled
back, too. The WRITE operations of T2, marked by ** in the log, are the ones that are
undone. Note that only write_item operations need to be undone during transaction
rollback; read_item operations are recorded in the log only to determine whether
cascading rollback of additional transactions is necessary.
In practice, cascading rollback of transactions is never required because practical
recovery methods guarantee cascadeless or strict schedules. Hence, there is also no
need to record any read_item operations in the log because these are needed only for
determining cascading rollback.
1.6 Transaction Actions That Do Not Affect
the Database
In general, a transaction will have actions that do not affect the database, such as
generating and printing messages or reports from information retrieved from the
database. If a transaction fails before completion, we may not want the user to get
these reports, since the transaction has failed to complete. If such erroneous reports
are produced, part of the recovery process would have to inform the user that these
reports are wrong, since the user may take an action based on these reports that
affects the database. Hence, such reports should be generated only after the transac-
tion reaches its commit point. A common method of dealing with such actions is to
issue the commands that generate the reports but keep them as batch jobs, which
are executed only after the transaction reaches its commit point. If the transaction
fails, the batch jobs are canceled.
2 NO-UNDO/REDO Recovery Based
on Deferred Update
The idea behind deferred update is to defer or postpone any actual updates to the
database on disk until the transaction completes its execution successfully and
reaches its commit point.4
During transaction execution, the updates are recorded only in the log and in the
cache buffers. After the transaction reaches its commit point and the log is force-
written to disk, the updates are recorded in the database. If a transaction fails before
reaching its commit point, there is no need to undo any operations because the
transaction has not affected the database on disk in any way. Therefore, only REDO-
type log entries are needed in the log, which include the new value (AFIM) of the
item written by a write operation. The UNDO-type log entries are not needed since
no undoing of operations will be required during recovery. Although this may sim-
plify the recovery process, it cannot be used in practice unless transactions are short
Database Recovery Techniques
4Hence deferred update can generally be characterized as a no-steal approach.
818
Database Recovery Techniques
and each transaction changes few items. For other types of transactions, there is the
potential for running out of buffer space because transaction changes must be held
in the cache buffers until the commit point.
We can state a typical deferred update protocol as follows:
1. A transaction cannot change the database on disk until it reaches its commit
point.
2. A transaction does not reach its commit point until all its REDO-type log
entries are recorded in the log and the log buffer is force-written to disk.
Notice that step 2 of this protocol is a restatement of the write-ahead logging (WAL)
protocol. Because the database is never updated on disk until after the transaction
commits, there is never a need to UNDO any operations. REDO is needed in case the
system fails after a transaction commits but before all its changes are recorded in the
database on disk. In this case, the transaction operations are redone from the log
entries during recovery.
For multiuser systems with concurrency control, the concurrency control and
recovery processes are interrelated. Consider a system in which concurrency control
uses strict two-phase locking, so the locks on items remain in effect until the trans-
action reaches its commit point. After that, the locks can be released. This ensures
strict and serializable schedules. Assuming that [checkpoint] entries are included in
the log, a possible recovery algorithm for this case, which we call RDU_M (Recovery
using Deferred Update in a Multiuser environment), is given next.
Procedure RDU_M (NO-UNDO/REDO with checkpoints). Use two lists of
transactions maintained by the system: the committed transactions T since the
last checkpoint (commit list), and the active transactions T� (active list).
REDO all the WRITE operations of the committed transactions from the log, in
the order in which they were written into the log. The transactions that are active
and did not commit are effectively canceled and must be resubmitted.
The REDO procedure is defined as follows:
Procedure REDO (WRITE_OP). Redoing a write_item operation WRITE_OP con-
sists of examining its log entry [write_item, T, X, new_value] and setting the value
of item X in the database to new_value, which is the after image (AFIM).
Figure 2 illustrates a timeline for a possible schedule of executing transactions.
When the checkpoint was taken at time t1, transaction T1 had committed, whereas
transactions T3 and T4 had not. Before the system crash at time t2, T3 and T2 were
committed but not T4 and T5. According to the RDU_M method, there is no need to
redo the write_item operations of transaction T1—or any transactions committed
before the last checkpoint time t1. The write_item operations of T2 and T3 must be
redone, however, because both transactions reached their commit points after the
last checkpoint. Recall that the log is force-written before committing a transaction.
Transactions T4 and T5 are ignored: They are effectively canceled or rolled back
because none of their write_item operations were recorded in the database on disk
under the deferred update protocol.
819
Database Recovery Techniques
System crash TimeCheckpoint
T2
T1
T3
T5
T4
t1 t2
Figure 2
An example of a
recovery timeline to
illustrate the effect of
checkpointing.
We can make the NO-UNDO/REDO recovery algorithm more efficient by noting that,
if a data item X has been updated—as indicated in the log entries—more than once
by committed transactions since the last checkpoint, it is only necessary to REDO
the last update of X from the log during recovery because the other updates would be
overwritten by this last REDO. In this case, we start from the end of the log; then,
whenever an item is redone, it is added to a list of redone items. Before REDO is
applied to an item, the list is checked; if the item appears on the list, it is not redone
again, since its last value has already been recovered.
If a transaction is aborted for any reason (say, by the deadlock detection method), it
is simply resubmitted, since it has not changed the database on disk. A drawback of
the method described here is that it limits the concurrent execution of transactions
because all write-locked items remain locked until the transaction reaches its commit
point. Additionally, it may require excessive buffer space to hold all updated items
until the transactions commit. The method’s main benefit is that transaction oper-
ations never need to be undone, for two reasons:
1. A transaction does not record any changes in the database on disk until after
it reaches its commit point—that is, until it completes its execution success-
fully. Hence, a transaction is never rolled back because of failure during
transaction execution.
2. A transaction will never read the value of an item that is written by an
uncommitted transaction, because items remain locked until a transaction
reaches its commit point. Hence, no cascading rollback will occur.
Figure 3 shows an example of recovery for a multiuser system that utilizes the recov-
ery and concurrency control method just described.
3 Recovery Techniques Based
on Immediate Update
In these techniques, when a transaction issues an update command, the database on
disk can be updated immediately, without any need to wait for the transaction to
reach its commit point. Notice that it is not a requirement that every update be
820
Database Recovery Techniques
(a) T1
read_item(A)
read_item(D)
write_item(D)
[checkpoint]
(b)
read_item(B)
write_item(B)
read_item(D)
write_item(D)
read_item(A)
write_item(A)
read_item(C)
write_item(C)
read_item(B)
write_item(B)
read_item(A)
write_item(A)
[start_transaction,T1]
[start_transaction, T2]
[write_item, T1, D, 20]
[commit, T1]
[commit, T4]
[start_transaction, T4]
[start_transaction, T3]
[write_item, T4, B, 15]
[write_item, T2, B, 12]
[write_item, T4, A, 20]
[write_item, T3, A, 30]
[write_item,T2, D, 25]
T2 and T3 are ignored because they did not reach their commit points.
T4 is redone because its commit point is after the last system checkpoint.
System crash
T2 T3 T4
Figure 3
An example of recov-
ery using deferred
update with concurrent
transactions. (a) The
READ and WRITE
operations of four
transactions. (b)
System log at the
point of crash.
applied immediately to disk; it is just possible that some updates are applied to disk
before the transaction commits.
Provisions must be made for undoing the effect of update operations that have been
applied to the database by a failed transaction. This is accomplished by rolling back
the transaction and undoing the effect of the transaction’s write_item operations.
Therefore, the UNDO-type log entries, which include the old value (BFIM) of the
item, must be stored in the log. Because UNDO can be needed during recovery,
these methods follow a steal strategy for deciding when updated main memory
buffers can be written back to disk (see Section 1.3). Theoretically, we can distin-
guish two main categories of immediate update algorithms. If the recovery tech-
nique ensures that all updates of a transaction are recorded in the database on disk
before the transaction commits, there is never a need to REDO any operations of
committed transactions. This is called the UNDO/NO-REDO recovery algorithm.
In this method, all updates by a transaction must be recorded on disk before the
transaction commits, so that REDO is never needed. Hence, this method must utilize
821
Database Recovery Techniques
the force strategy for deciding when updated main memory buffers are written
back to disk (see Section 1.3).
If the transaction is allowed to commit before all its changes are written to the data-
base, we have the most general case, known as the UNDO/REDO recovery algo-
rithm. In this case, the steal/no-force strategy is applied (see Section 1.3). This is
also the most complex technique. We will outline an UNDO/REDO recovery algo-
rithm and leave it as an exercise for the reader to develop the UNDO/NO-REDO vari-
ation. In Section 5, we describe a more practical approach known as the ARIES
recovery technique.
When concurrent execution is permitted, the recovery process again depends on the
protocols used for concurrency control. The procedure RIU_M (Recovery using
Immediate Updates for a Multiuser environment) outlines a recovery algorithm for
concurrent transactions with immediate update (UNDO/REDO recovery). Assume
that the log includes checkpoints and that the concurrency control protocol pro-
duces strict schedules—as, for example, the strict two-phase locking protocol does.
Recall that a strict schedule does not allow a transaction to read or write an item
unless the transaction that last wrote the item has committed (or aborted and rolled
back). However, deadlocks can occur in strict two-phase locking, thus requiring
abort and UNDO of transactions. For a strict schedule, UNDO of an operation
requires changing the item back to its old value (BFIM).
Procedure RIU_M (UNDO/REDO with checkpoints).
1. Use two lists of transactions maintained by the system: the committed trans-
actions since the last checkpoint and the active transactions.
2. Undo all the write_item operations of the active (uncommitted) transactions,
using the UNDO procedure. The operations should be undone in the reverse
of the order in which they were written into the log.
3. Redo all the write_item operations of the committed transactions from the log,
in the order in which they were written into the log, using the REDO proce-
dure defined earlier.
The UNDO procedure is defined as follows:
Procedure UNDO (WRITE_OP). Undoing a write_item operation write_op con-
sists of examining its log entry [write_item, T, X, old_value, new_value] and setting
the value of item X in the database to old_value, which is the before image
(BFIM). Undoing a number of write_item operations from one or more trans-
actions from the log must proceed in the reverse order from the order in which
the operations were written in the log.
As we discussed for the NO-UNDO/REDO procedure, step 3 is more efficiently done
by starting from the end of the log and redoing only the last update of each item X.
Whenever an item is redone, it is added to a list of redone items and is not redone
again. A similar procedure can be devised to improve the efficiency of step 2 so that
an item can be undone at most once during recovery. In this case, the earliest
UNDO is applied first by scanning the log in the forward direction (starting from the
822
Database Recovery Techniques
Current directory
(after updating
pages 2, 5)
Database disk
blocks (pages)
Shadow directory
(not updated)
Page 5 (old)
Page 1
Page 4
Page 2 (old)
Page 3
Page 6
Page 2 (new)
Page 5 (new)
1
2
3
4
5
6
1
2
3
4
5
6
Figure 4
An example of shadow paging.
beginning of the log). Whenever an item is undone, it is added to a list of undone
items and is not undone again.
4 Shadow Paging
This recovery scheme does not require the use of a log in a single-user environment.
In a multiuser environment, a log may be needed for the concurrency control
method. Shadow paging considers the database to be made up of a number of fixed-
size disk pages (or disk blocks)—say, n—for recovery purposes. A directory with n
entries5 is constructed, where the ith entry points to the ith database page on disk.
The directory is kept in main memory if it is not too large, and all references—reads
or writes—to database pages on disk go through it. When a transaction begins exe-
cuting, the current directory—whose entries point to the most recent or current
database pages on disk—is copied into a shadow directory. The shadow directory is
then saved on disk while the current directory is used by the transaction.
During transaction execution, the shadow directory is never modified. When a
write_item operation is performed, a new copy of the modified database page is cre-
ated, but the old copy of that page is not overwritten. Instead, the new page is writ-
ten elsewhere—on some previously unused disk block. The current directory entry
is modified to point to the new disk block, whereas the shadow directory is not
modified and continues to point to the old unmodified disk block. Figure 4 illus-
trates the concepts of shadow and current directories. For pages updated by the
transaction, two versions are kept. The old version is referenced by the shadow
directory and the new version by the current directory.
5The directory is similar to the page table maintained by the operating system for each process.
823
Database Recovery Techniques
To recover from a failure during transaction execution, it is sufficient to free the
modified database pages and to discard the current directory. The state of the data-
base before transaction execution is available through the shadow directory, and
that state is recovered by reinstating the shadow directory. The database thus is
returned to its state prior to the transaction that was executing when the crash
occurred, and any modified pages are discarded. Committing a transaction corre-
sponds to discarding the previous shadow directory. Since recovery involves neither
undoing nor redoing data items, this technique can be categorized as a NO-
UNDO/NO-REDO technique for recovery.
In a multiuser environment with concurrent transactions, logs and checkpoints must
be incorporated into the shadow paging technique. One disadvantage of shadow
paging is that the updated database pages change location on disk. This makes it dif-
ficult to keep related database pages close together on disk without complex storage
management strategies. Furthermore, if the directory is large, the overhead of writ-
ing shadow directories to disk as transactions commit is significant. A further com-
plication is how to handle garbage collection when a transaction commits. The old
pages referenced by the shadow directory that have been updated must be released
and added to a list of free pages for future use. These pages are no longer needed after
the transaction commits. Another issue is that the operation to migrate between cur-
rent and shadow directories must be implemented as an atomic operation.
5 The ARIES Recovery Algorithm
We now describe the ARIES algorithm as an example of a recovery algorithm used
in database systems. It is used in many relational database-related products of IBM.
ARIES uses a steal/no-force approach for writing, and it is based on three concepts:
write-ahead logging, repeating history during redo, and logging changes during
undo. We discussed write-ahead logging in Section 1.3. The second concept,
repeating history, means that ARIES will retrace all actions of the database system
prior to the crash to reconstruct the database state when the crash occurred.
Transactions that were uncommitted at the time of the crash (active transactions)
are undone. The third concept, logging during undo, will prevent ARIES from
repeating the completed undo operations if a failure occurs during recovery, which
causes a restart of the recovery process.
The ARIES recovery procedure consists of three main steps: analysis, REDO, and
UNDO. The analysis step identifies the dirty (updated) pages in the buffer6 and the
set of transactions active at the time of the crash. The appropriate point in the log
where the REDO operation should start is also determined. The REDO phase actu-
ally reapplies updates from the log to the database. Generally, the REDO operation is
applied only to committed transactions. However, this is not the case in ARIES.
Certain information in the ARIES log will provide the start point for REDO, from
6The actual buffers may be lost during a crash, since they are in main memory. Additional tables stored in
the log during checkpointing (Dirty Page Table, Transaction Table) allows ARIES to identify this informa-
tion (as discussed later in this section).
824
Database Recovery Techniques
which REDO operations are applied until the end of the log is reached. Additionally,
information stored by ARIES and in the data pages will allow ARIES to determine
whether the operation to be redone has actually been applied to the database and
therefore does not need to be reapplied. Thus, only the necessary REDO operations
are applied during recovery. Finally, during the UNDO phase, the log is scanned
backward and the operations of transactions that were active at the time of the crash
are undone in reverse order. The information needed for ARIES to accomplish its
recovery procedure includes the log, the Transaction Table, and the Dirty Page
Table. Additionally, checkpointing is used. These tables are maintained by the trans-
action manager and written to the log during checkpointing.
In ARIES, every log record has an associated log sequence number (LSN) that is
monotonically increasing and indicates the address of the log record on disk. Each
LSN corresponds to a specific change (action) of some transaction. Also, each data
page will store the LSN of the latest log record corresponding to a change for that page.
A log record is written for any of the following actions: updating a page (write),
committing a transaction (commit), aborting a transaction (abort), undoing an
update (undo), and ending a transaction (end). The need for including the first
three actions in the log has been discussed, but the last two need some explanation.
When an update is undone, a compensation log record is written in the log. When a
transaction ends, whether by committing or aborting, an end log record is written.
Common fields in all log records include the previous LSN for that transaction, the
transaction ID, and the type of log record. The previous LSN is important because it
links the log records (in reverse order) for each transaction. For an update (write)
action, additional fields in the log record include the page ID for the page that con-
tains the item, the length of the updated item, its offset from the beginning of the
page, the before image of the item, and its after image.
Besides the log, two tables are needed for efficient recovery: the Transaction Table
and the Dirty Page Table, which are maintained by the transaction manager. When
a crash occurs, these tables are rebuilt in the analysis phase of recovery. The
Transaction Table contains an entry for each active transaction, with information
such as the transaction ID, transaction status, and the LSN of the most recent log
record for the transaction. The Dirty Page Table contains an entry for each dirty
page in the buffer, which includes the page ID and the LSN corresponding to the
earliest update to that page.
Checkpointing in ARIES consists of the following: writing a begin_checkpoint record
to the log, writing an end_checkpoint record to the log, and writing the LSN of the
begin_checkpoint record to a special file. This special file is accessed during recovery
to locate the last checkpoint information. With the end_checkpoint record, the con-
tents of both the Transaction Table and Dirty Page Table are appended to the end of
the log. To reduce the cost, fuzzy checkpointing is used so that the DBMS can con-
tinue to execute transactions during checkpointing (see Section 1.4). Additionally,
the contents of the DBMS cache do not have to be flushed to disk during check-
point, since the Transaction Table and Dirty Page Table—which are appended to the
log on disk—contain the information needed for recovery. Note that if a crash
825
Database Recovery Techniques
occurs during checkpointing, the special file will refer to the previous checkpoint,
which is used for recovery.
After a crash, the ARIES recovery manager takes over. Information from the last
checkpoint is first accessed through the special file. The analysis phase starts at the
begin_checkpoint record and proceeds to the end of the log. When the end_checkpoint
record is encountered, the Transaction Table and Dirty Page Table are accessed
(recall that these tables were written in the log during checkpointing). During
analysis, the log records being analyzed may cause modifications to these two tables.
For instance, if an end log record was encountered for a transaction T in the
Transaction Table, then the entry for T is deleted from that table. If some other type
of log record is encountered for a transaction T�, then an entry for T� is inserted into
the Transaction Table, if not already present, and the last LSN field is modified. If
the log record corresponds to a change for page P, then an entry would be made for
page P (if not present in the table) and the associated LSN field would be modified.
When the analysis phase is complete, the necessary information for REDO and
UNDO has been compiled in the tables.
The REDO phase follows next. To reduce the amount of unnecessary work, ARIES
starts redoing at a point in the log where it knows (for sure) that previous changes
to dirty pages have already been applied to the database on disk. It can determine this
by finding the smallest LSN, M, of all the dirty pages in the Dirty Page Table, which
indicates the log position where ARIES needs to start the REDO phase. Any changes
corresponding to an LSN < M, for redoable transactions, must have already been
propagated to disk or already been overwritten in the buffer; otherwise, those dirty
pages with that LSN would be in the buffer (and the Dirty Page Table). So, REDO
starts at the log record with LSN = M and scans forward to the end of the log. For
each change recorded in the log, the REDO algorithm would verify whether or not
the change has to be reapplied. For example, if a change recorded in the log pertains
to page P that is not in the Dirty Page Table, then this change is already on disk and
does not need to be reapplied. Or, if a change recorded in the log (with LSN = N,
say) pertains to page P and the Dirty Page Table contains an entry for P with LSN
greater than N, then the change is already present. If neither of these two conditions
hold, page P is read from disk and the LSN stored on that page, LSN(P), is compared
with N. If N < LSN(P), then the change has been applied and the page does not need
to be rewritten to disk.
Once the REDO phase is finished, the database is in the exact state that it was in
when the crash occurred. The set of active transactions—called the undo_set—has
been identified in the Transaction Table during the analysis phase. Now, the UNDO
phase proceeds by scanning backward from the end of the log and undoing the
appropriate actions. A compensating log record is written for each action that is
undone. The UNDO reads backward in the log until every action of the set of trans-
actions in the undo_set has been undone. When this is completed, the recovery
process is finished and normal processing can begin again.
Consider the recovery example shown in Figure 5. There are three transactions: T1,
T2, and T3. T1 updates page C, T2 updates pages B and C, and T3 updates page A.
826
Database Recovery Techniques
TRANSACTION TABLE
Last_lsn Status(b)
(c)
(a) Lsn
1
Last_lsn Tran_id Type Page_id Other_information
Transaction_id
TRANSACTION TABLE DIRTY PAGE TABLE
Transaction_id
T1 3
Last_lsn
commit
Status Page_id
C
Lsn
1
T3
T2 8
6 in progress
commit
A
B
6
2
T2
T1
DIRTY PAGE TABLE
Page_id
C
Lsn
1
B 22
3 commit
in progress
8
7
6
5
4
3
2
0
7
2
0
end checkpoint
begin checkpoint
1
0
T1
T2
T1
T3
T2
T2
update
commit
update
update
commit
update B
C
A
C . . .
. . .
. . .
. . .
. . .
. . .
Figure 5
An example of recovery in ARIES. (a) The log at point of crash. (b)
The Transaction and Dirty Page Tables at time of checkpoint. (c)
The Transaction and Dirty Page Tables after the analysis phase.
Figure 5(a) shows the partial contents of the log, and Figure 5(b) shows the contents
of the Transaction Table and Dirty Page Table. Now, suppose that a crash occurs at
this point. Since a checkpoint has occurred, the address of the associated
begin_checkpoint record is retrieved, which is location 4. The analysis phase starts
from location 4 until it reaches the end. The end_checkpoint record would contain
the Transaction Table and Dirty Page Table in Figure 5(b), and the analysis phase
will further reconstruct these tables. When the analysis phase encounters log record
6, a new entry for transaction T3 is made in the Transaction Table and a new entry
for page A is made in the Dirty Page Table. After log record 8 is analyzed, the status
of transaction T2 is changed to committed in the Transaction Table. Figure 5(c)
shows the two tables after the analysis phase.
827
Database Recovery Techniques
For the REDO phase, the smallest LSN in the Dirty Page Table is 1. Hence the REDO
will start at log record 1 and proceed with the REDO of updates. The LSNs {1, 2, 6,
7} corresponding to the updates for pages C, B, A, and C, respectively, are not less
than the LSNs of those pages (as shown in the Dirty Page Table). So those data pages
will be read again and the updates reapplied from the log (assuming the actual LSNs
stored on those data pages are less then the corresponding log entry). At this point,
the REDO phase is finished and the UNDO phase starts. From the Transaction Table
(Figure 5(c)), UNDO is applied only to the active transaction T3. The UNDO phase
starts at log entry 6 (the last update for T3) and proceeds backward in the log. The
backward chain of updates for transaction T3 (only log record 6 in this example) is
followed and undone.
6 Recovery in Multidatabase Systems
So far, we have implicitly assumed that a transaction accesses a single database. In
some cases, a single transaction, called a multidatabase transaction, may require
access to multiple databases. These databases may even be stored on different types
of DBMSs; for example, some DBMSs may be relational, whereas others are object-
oriented, hierarchical, or network DBMSs. In such a case, each DBMS involved in
the multidatabase transaction may have its own recovery technique and transaction
manager separate from those of the other DBMSs. This situation is somewhat simi-
lar to the case of a distributed database management system, where parts of the
database reside at different sites that are connected by a communication network.
To maintain the atomicity of a multidatabase transaction, it is necessary to have a
two-level recovery mechanism. A global recovery manager, or coordinator, is
needed to maintain information needed for recovery, in addition to the local recov-
ery managers and the information they maintain (log, tables). The coordinator usu-
ally follows a protocol called the two-phase commit protocol, whose two phases
can be stated as follows:
■ Phase 1. When all participating databases signal the coordinator that the
part of the multidatabase transaction involving each has concluded, the
coordinator sends a message prepare for commit to each participant to get
ready for committing the transaction. Each participating database receiving
that message will force-write all log records and needed information for
local recovery to disk and then send a ready to commit or OK signal to the
coordinator. If the force-writing to disk fails or the local transaction cannot
commit for some reason, the participating database sends a cannot commit
or not OK signal to the coordinator. If the coordinator does not receive a
reply from the database within a certain time out interval, it assumes a not
OK response.
■ Phase 2. If all participating databases reply OK, and the coordinator’s vote is
also OK, the transaction is successful, and the coordinator sends a commit
signal for the transaction to the participating databases. Because all the local
828
Database Recovery Techniques
effects of the transaction and information needed for local recovery have
been recorded in the logs of the participating databases, recovery from fail-
ure is now possible. Each participating database completes transaction com-
mit by writing a [commit] entry for the transaction in the log and
permanently updating the database if needed. On the other hand, if one or
more of the participating databases or the coordinator have a not OK
response, the transaction has failed, and the coordinator sends a message to
roll back or UNDO the local effect of the transaction to each participating
database. This is done by undoing the transaction operations, using the log.
The net effect of the two-phase commit protocol is that either all participating data-
bases commit the effect of the transaction or none of them do. In case any of the
participants—or the coordinator—fails, it is always possible to recover to a state
where either the transaction is committed or it is rolled back. A failure during or
before Phase 1 usually requires the transaction to be rolled back, whereas a failure
during Phase 2 means that a successful transaction can recover and commit.
7 Database Backup and Recovery
from Catastrophic Failures
So far, all the techniques we have discussed apply to noncatastrophic failures. A key
assumption has been that the system log is maintained on the disk and is not lost as
a result of the failure. Similarly, the shadow directory must be stored on disk to
allow recovery when shadow paging is used. The recovery techniques we have dis-
cussed use the entries in the system log or the shadow directory to recover from fail-
ure by bringing the database back to a consistent state.
The recovery manager of a DBMS must also be equipped to handle more cata-
strophic failures such as disk crashes. The main technique used to handle such
crashes is a database backup, in which the whole database and the log are periodi-
cally copied onto a cheap storage medium such as magnetic tapes or other large
capacity offline storage devices. In case of a catastrophic system failure, the latest
backup copy can be reloaded from the tape to the disk, and the system can be
restarted.
Data from critical applications such as banking, insurance, stock market, and other
databases is periodically backed up in its entirety and moved to physically separate
safe locations. Subterranean storage vaults have been used to protect such data from
flood, storm, earthquake, or fire damage. Events like the 9/11 terrorist attack in New
York (in 2001) and the Katrina hurricane disaster in New Orleans (in 2005) have
created a greater awareness of disaster recovery of business-critical databases.
To avoid losing all the effects of transactions that have been executed since the last
backup, it is customary to back up the system log at more frequent intervals than
full database backup by periodically copying it to magnetic tape. The system log is
usually substantially smaller than the database itself and hence can be backed up
more frequently. Therefore, users do not lose all transactions they have performed
829
Database Recovery Techniques
since the last database backup. All committed transactions recorded in the portion
of the system log that has been backed up to tape can have their effect on the data-
base redone. A new log is started after each database backup. Hence, to recover from
disk failure, the database is first recreated on disk from its latest backup copy on
tape. Following that, the effects of all the committed transactions whose operations
have been recorded in the backed-up copies of the system log are reconstructed.
8 Summary
In this chapter we discussed the techniques for recovery from transaction failures.
The main goal of recovery is to ensure the atomicity property of a transaction. If a
transaction fails before completing its execution, the recovery mechanism has to
make sure that the transaction has no lasting effects on the database. First we gave
an informal outline for a recovery process and then we discussed system concepts
for recovery. These included a discussion of caching, in-place updating versus shad-
owing, before and after images of a data item, UNDO versus REDO recovery opera-
tions, steal/no-steal and force/no-force policies, system checkpointing, and the
write-ahead logging protocol.
Next we discussed two different approaches to recovery: deferred update and imme-
diate update. Deferred update techniques postpone any actual updating of the data-
base on disk until a transaction reaches its commit point. The transaction
force-writes the log to disk before recording the updates in the database. This
approach, when used with certain concurrency control methods, is designed never
to require transaction rollback, and recovery simply consists of redoing the opera-
tions of transactions committed after the last checkpoint from the log. The disad-
vantage is that too much buffer space may be needed, since updates are kept in the
buffers and are not applied to disk until a transaction commits. Deferred update can
lead to a recovery algorithm known as NO-UNDO/REDO. Immediate update tech-
niques may apply changes to the database on disk before the transaction reaches a
successful conclusion. Any changes applied to the database must first be recorded in
the log and force-written to disk so that these operations can be undone if neces-
sary. We also gave an overview of a recovery algorithm for immediate update known
as UNDO/REDO. Another algorithm, known as UNDO/NO-REDO, can also be devel-
oped for immediate update if all transaction actions are recorded in the database
before commit.
We discussed the shadow paging technique for recovery, which keeps track of old
database pages by using a shadow directory. This technique, which is classified as
NO-UNDO/NO-REDO, does not require a log in single-user systems but still needs
the log for multiuser systems. We also presented ARIES, a specific recovery scheme
used in many of IBM’s relational database products. Then we discussed the two-
phase commit protocol, which is used for recovery from failures involving multi-
database transactions. Finally, we discussed recovery from catastrophic failures,
which is typically done by backing up the database and the log to tape. The log can
be backed up more frequently than the database, and the backup log can be used to
redo operations starting from the last database backup.
830
Database Recovery Techniques
Review Questions
1. Discuss the different types of transaction failures. What is meant by cata-
strophic failure?
2. Discuss the actions taken by the read_item and write_item operations on a
database.
3. What is the system log used for? What are the typical kinds of entries in a
system log? What are checkpoints, and why are they important? What are
transaction commit points, and why are they important?
4. How are buffering and caching techniques used by the recovery subsystem?
5. What are the before image (BFIM) and after image (AFIM) of a data item?
What is the difference between in-place updating and shadowing, with
respect to their handling of BFIM and AFIM?
6. What are UNDO-type and REDO-type log entries?
7. Describe the write-ahead logging protocol.
8. Identify three typical lists of transactions that are maintained by the recovery
subsystem.
9. What is meant by transaction rollback? What is meant by cascading rollback?
Why do practical recovery methods use protocols that do not permit cascad-
ing rollback? Which recovery techniques do not require any rollback?
10. Discuss the UNDO and REDO operations and the recovery techniques that
use each.
11. Discuss the deferred update technique of recovery. What are the advantages
and disadvantages of this technique? Why is it called the NO-UNDO/REDO
method?
12. How can recovery handle transaction operations that do not affect the data-
base, such as the printing of reports by a transaction?
13. Discuss the immediate update recovery technique in both single-user and
multiuser environments. What are the advantages and disadvantages of
immediate update?
14. What is the difference between the UNDO/REDO and the UNDO/NO-REDO
algorithms for recovery with immediate update? Develop the outline for an
UNDO/NO-REDO algorithm.
15. Describe the shadow paging recovery technique. Under what circumstances
does it not require a log?
16. Describe the three phases of the ARIES recovery method.
17. What are log sequence numbers (LSNs) in ARIES? How are they used? What
information do the Dirty Page Table and Transaction Table contain?
Describe how fuzzy checkpointing is used in ARIES.
831
Database Recovery Techniques
[checkpoint]
[start_transaction, T1]
[start_transaction, T2]
[start_transaction, T3]
[read_item, T1, A]
[read_item, T1, D]
[read_item, T4, D]
[read_item, T2, D]
[read_item, T2, B]
[write_item, T1, D, 20, 25]
[write_item, T2, B, 12, 18]
[read_item, T4, A]
[write_item, T4, D, 25, 15]
[write_item, T3, C, 30, 40]
[write_item, T2, D, 15, 25]
[write_item, T4, A, 30, 20]
[commit, T1]
[commit, T4]
[start_transaction, T4]
System crash
Figure 6
A sample schedule and its
corresponding log.
18. What do the terms steal/no-steal and force/no-force mean with regard to
buffer management for transaction processing?
19. Describe the two-phase commit protocol for multidatabase transactions.
20. Discuss how disaster recovery from catastrophic failures is handled.
Exercises
21. Suppose that the system crashes before the [read_item, T3, A] entry is written
to the log in Figure 1(b). Will that make any difference in the recovery
process?
22. Suppose that the system crashes before the [write_item, T2, D, 25, 26] entry is
written to the log in Figure 1(b). Will that make any difference in the recov-
ery process?
23. Figure 6 shows the log corresponding to a particular schedule at the point of
a system crash for four transactions T1, T2, T3, and T4. Suppose that we use
the immediate update protocol with checkpointing. Describe the recovery
process from the system crash. Specify which transactions are rolled back,
which operations in the log are redone and which (if any) are undone, and
whether any cascading rollback takes place.
832
Database Recovery Techniques
24. Suppose that we use the deferred update protocol for the example in Figure
6. Show how the log would be different in the case of deferred update by
removing the unnecessary log entries; then describe the recovery process,
using your modified log. Assume that only REDO operations are applied,
and specify which operations in the log are redone and which are ignored.
25. How does checkpointing in ARIES differ from checkpointing as described in
Section 1.4?
26. How are log sequence numbers used by ARIES to reduce the amount of
REDO work needed for recovery? Illustrate with an example using the infor-
mation shown in Figure 5. You can make your own assumptions as to when
a page is written to disk.
27. What implications would a no-steal/force buffer management policy have
on checkpointing and recovery?
Choose the correct answer for each of the following multiple-choice questions:
28. Incremental logging with deferred updates implies that the recovery system
must necessarily
a. store the old value of the updated item in the log.
b. store the new value of the updated item in the log.
c. store both the old and new value of the updated item in the log.
d. store only the Begin Transaction and Commit Transaction records in the
log.
29. The write-ahead logging (WAL) protocol simply means that
a. writing of a data item should be done ahead of any logging operation.
b. the log record for an operation should be written before the actual data is
written.
c. all log records should be written before a new transaction begins execu-
tion.
d. the log never needs to be written to disk.
30. In case of transaction failure under a deferred update incremental logging
scheme, which of the following will be needed?
a. an undo operation
b. a redo operation
c. an undo and redo operation
d. none of the above
31. For incremental logging with immediate updates, a log record for a transac-
tion would contain
a. a transaction name, a data item name, and the old and new value of the
item.
833
Database Recovery Techniques
b. a transaction name, a data item name, and the old value of the item.
c. a transaction name, a data item name, and the new value of the item.
d. a transaction name and a data item name.
32. For correct behavior during recovery, undo and redo operations must be
a. commutative.
b. associative.
c. idempotent.
d. distributive.
33. When a failure occurs, the log is consulted and each operation is either
undone or redone. This is a problem because
a. searching the entire log is time consuming.
b. many redos are unnecessary.
c. both (a) and (b).
d. none of the above.
34. When using a log-based recovery scheme, it might improve performance as
well as providing a recovery mechanism by
a. writing the log records to disk when each transaction commits.
b. writing the appropriate log records to disk during the transaction’s execu-
tion.
c. waiting to write the log records until multiple transactions commit and
writing them as a batch.
d. never writing the log records to disk.
35. There is a possibility of a cascading rollback when
a. a transaction writes items that have been written only by a committed
transaction.
b. a transaction writes an item that is previously written by an uncommitted
transaction.
c. a transaction reads an item that is previously written by an uncommitted
transaction.
d. both (b) and (c).
36. To cope with media (disk) failures, it is necessary
a. for the DBMS to only execute transactions in a single user environment.
b. to keep a redundant copy of the database.
c. to never abort a transaction.
d. all of the above.
834
37. If the shadowing approach is used for flushing a data item back to disk, then
a. the item is written to disk only after the transaction commits.
b. the item is written to a different location on disk.
c. the item is written to disk before the transaction commits.
d. the item is written to the same disk location from which it was read.
Selected Bibliography
The books by Bernstein et al. (1987) and Papadimitriou (1986) are devoted to the
theory and principles of concurrency control and recovery. The book by Gray and
Reuter (1993) is an encyclopedic work on concurrency control, recovery, and other
transaction-processing issues.
Verhofstad (1978) presents a tutorial and survey of recovery techniques in database
systems. Categorizing algorithms based on their UNDO/REDO characteristics is dis-
cussed in Haerder and Reuter (1983) and in Bernstein et al. (1983). Gray (1978) dis-
cusses recovery, along with other system aspects of implementing operating systems
for databases. The shadow paging technique is discussed in Lorie (1977), Verhofstad
(1978), and Reuter (1980). Gray et al. (1981) discuss the recovery mechanism in
SYSTEM R. Lockemann and Knutsen (1968), Davies (1973), and Bjork (1973) are
early papers that discuss recovery. Chandy et al. (1975) discuss transaction rollback.
Lilien and Bhargava (1985) discuss the concept of integrity block and its use to
improve the efficiency of recovery.
Recovery using write-ahead logging is analyzed in Jhingran and Khedkar (1992)
and is used in the ARIES system (Mohan et al. 1992). More recent work on recovery
includes compensating transactions (Korth et al. 1990) and main memory database
recovery (Kumar 1991). The ARIES recovery algorithms (Mohan et al. 1992) have
been quite successful in practice. Franklin et al. (1992) discusses recovery in the
EXODUS system. Two books by Kumar and Hsu (1998) and Kumar and Song
(1998) discuss recovery in detail and contain descriptions of recovery methods used
in a number of existing relational database products. Examples of page replacement
strategies that are specific for databases are discussed in Chou and DeWitt (1985)
and Pazos et al. (2006).
Database Recovery Techniques
835
Database Security
This chapter discusses techniques for securing data-bases against a variety of threats. It also presents
schemes of providing access privileges to authorized users. Some of the security
threats to databases—such as SQL Injection—will be presented. At the end of the
chapter we also summarize how a commercial RDBMS—specifically, the Oracle sys-
tem—provides different types of security. We start in Section 1 with an introduc-
tion to security issues and the threats to databases, and we give an overview of the
control measures that are covered in the rest of this chapter. We also comment on
the relationship between data security and privacy as it applies to personal informa-
tion. Section 2 discusses the mechanisms used to grant and revoke privileges in rela-
tional database systems and in SQL, mechanisms that are often referred to as
discretionary access control. In Section 3, we present an overview of the mecha-
nisms for enforcing multiple levels of security—a particular concern in database
system security that is known as mandatory access control. Section 3 also intro-
duces the more recently developed strategies of role-based access control, and
label-based and row-based security. Section 3 also provides a brief discussion of
XML access control. Section 4 discusses a major threat to databases called SQL
Injection, and discusses some of the proposed preventive measures against it.
Section 5 briefly discusses the security problem in statistical databases. Section 6
introduces the topic of flow control and mentions problems associated with covert
channels. Section 7 provides a brief summary of encryption and symmetric key and
asymmetric (public) key infrastructure schemes. It also discusses digital certificates.
Section 8 introduces privacy-preserving techniques, and Section 9 presents the cur-
rent challenges to database security. In Section 10, we discuss Oracle label-based
security. Finally, Section 11 summarizes the chapter. Readers who are interested
only in basic database security mechanisms will find it sufficient to cover the mate-
rial in Sections 1 and 2.
From Chapter 24 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
836
Database Security
1 Introduction to Database Security Issues1
1.1 Types of Security
Database security is a broad area that addresses many issues, including the following:
■ Various legal and ethical issues regarding the right to access certain informa-
tion—for example, some information may be deemed to be private and can-
not be accessed legally by unauthorized organizations or persons. In the
United States, there are numerous laws governing privacy of information.
■ Policy issues at the governmental, institutional, or corporate level as to what
kinds of information should not be made publicly available—for example,
credit ratings and personal medical records.
■ System-related issues such as the system levels at which various security func-
tions should be enforced—for example, whether a security function should
be handled at the physical hardware level, the operating system level, or the
DBMS level.
■ The need in some organizations to identify multiple security levels and to
categorize the data and users based on these classifications—for example,
top secret, secret, confidential, and unclassified. The security policy of the
organization with respect to permitting access to various classifications of
data must be enforced.
Threats to Databases. Threats to databases can result in the loss or degradation
of some or all of the following commonly accepted security goals: integrity, avail-
ability, and confidentiality.
■ Loss of integrity. Database integrity refers to the requirement that informa-
tion be protected from improper modification. Modification of data
includes creation, insertion, updating, changing the status of data, and dele-
tion. Integrity is lost if unauthorized changes are made to the data by either
intentional or accidental acts. If the loss of system or data integrity is not
corrected, continued use of the contaminated system or corrupted data
could result in inaccuracy, fraud, or erroneous decisions.
■ Loss of availability. Database availability refers to making objects available
to a human user or a program to which they have a legitimate right.
■ Loss of confidentiality. Database confidentiality refers to the protection of
data from unauthorized disclosure. The impact of unauthorized disclosure
of confidential information can range from violation of the Data Privacy Act
to the jeopardization of national security. Unauthorized, unanticipated, or
unintentional disclosure could result in loss of public confidence, embar-
rassment, or legal action against the organization.
1The substantial contribution of Fariborz Farahmand and Bharath Rengarajan to this and subsequent
sections in this chapter is much appreciated.
837
Database Security
To protect databases against these types of threats, it is common to implement four
kinds of control measures: access control, inference control, flow control, and encryp-
tion. We discuss each of these in this chapter.
In a multiuser database system, the DBMS must provide techniques to enable cer-
tain users or user groups to access selected portions of a database without gaining
access to the rest of the database. This is particularly important when a large inte-
grated database is to be used by many different users within the same organization.
For example, sensitive information such as employee salaries or performance
reviews should be kept confidential from most of the database system’s users. A
DBMS typically includes a database security and authorization subsystem that is
responsible for ensuring the security of portions of a database against unauthorized
access. It is now customary to refer to two types of database security mechanisms:
■ Discretionary security mechanisms. These are used to grant privileges to
users, including the capability to access specific data files, records, or fields in
a specified mode (such as read, insert, delete, or update).
■ Mandatory security mechanisms. These are used to enforce multilevel
security by classifying the data and users into various security classes (or lev-
els) and then implementing the appropriate security policy of the organiza-
tion. For example, a typical security policy is to permit users at a certain
classification (or clearance) level to see only the data items classified at the
user’s own (or lower) classification level. An extension of this is role-based
security, which enforces policies and privileges based on the concept of orga-
nizational roles.
We discuss discretionary security in Section 2 and mandatory and role-based secu-
rity in Section 3.
1.2 Control Measures
Four main control measures are used to provide security of data in databases:
■ Access control
■ Inference control
■ Flow control
■ Data encryption
A security problem common to computer systems is that of preventing unautho-
rized persons from accessing the system itself, either to obtain information or to
make malicious changes in a portion of the database. The security mechanism of a
DBMS must include provisions for restricting access to the database system as a
whole. This function, called access control, is handled by creating user accounts and
passwords to control the login process by the DBMS. We discuss access control tech-
niques in Section 1.3.
Statistical databases are used to provide statistical information or summaries of
values based on various criteria. For example, a database for population statistics
838
Database Security
may provide statistics based on age groups, income levels, household size, education
levels, and other criteria. Statistical database users such as government statisticians
or market research firms are allowed to access the database to retrieve statistical
information about a population but not to access the detailed confidential informa-
tion about specific individuals. Security for statistical databases must ensure that
information about individuals cannot be accessed. It is sometimes possible to
deduce or infer certain facts concerning individuals from queries that involve only
summary statistics on groups; consequently, this must not be permitted either. This
problem, called statistical database security, is discussed briefly in Section 4. The
corresponding control measures are called inference control measures.
Another security issue is that of flow control, which prevents information from
flowing in such a way that it reaches unauthorized users. It is discussed in Section 6.
Channels that are pathways for information to flow implicitly in ways that violate
the security policy of an organization are called covert channels. We briefly discuss
some issues related to covert channels in Section 6.1.
A final control measure is data encryption, which is used to protect sensitive data
(such as credit card numbers) that is transmitted via some type of communications
network. Encryption can be used to provide additional protection for sensitive por-
tions of a database as well. The data is encoded using some coding algorithm. An
unauthorized user who accesses encoded data will have difficulty deciphering it, but
authorized users are given decoding or decrypting algorithms (or keys) to decipher
the data. Encrypting techniques that are very difficult to decode without a key have
been developed for military applications. Section 7 briefly discusses encryption
techniques, including popular techniques such as public key encryption, which is
heavily used to support Web-based transactions against databases, and digital signa-
tures, which are used in personal communications.
A comprehensive discussion of security in computer systems and databases is out-
side the scope of this text. We give only a brief overview of database security tech-
niques here. The interested reader can refer to several of the references discussed in
the Selected Bibliography at the end of this chapter for a more comprehensive dis-
cussion.
1.3 Database Security and the DBA
The database administrator (DBA) is the central authority for managing a data-
base system. The DBA’s responsibilities include granting privileges to users who
need to use the system and classifying users and data in accordance with the pol-
icy of the organization. The DBA has a DBA account in the DBMS, sometimes
called a system or superuser account, which provides powerful capabilities that
are not made available to regular database accounts and users.2 DBA-privileged
commands include commands for granting and revoking privileges to individual
2This account is similar to the root or superuser accounts that are given to computer system administra-
tors, which allow access to restricted operating system commands.
839
Database Security
accounts, users, or user groups and for performing the following types of
actions:
1. Account creation. This action creates a new account and password for a user
or a group of users to enable access to the DBMS.
2. Privilege granting. This action permits the DBA to grant certain privileges
to certain accounts.
3. Privilege revocation. This action permits the DBA to revoke (cancel) certain
privileges that were previously given to certain accounts.
4. Security level assignment. This action consists of assigning user accounts to
the appropriate security clearance level.
The DBA is responsible for the overall security of the database system. Action 1 in
the preceding list is used to control access to the DBMS as a whole, whereas actions
2 and 3 are used to control discretionary database authorization, and action 4 is used
to control mandatory authorization.
1.4 Access Control, User Accounts, and Database Audits
Whenever a person or a group of persons needs to access a database system, the
individual or group must first apply for a user account. The DBA will then create a
new account number and password for the user if there is a legitimate need to
access the database. The user must log in to the DBMS by entering the account
number and password whenever database access is needed. The DBMS checks that
the account number and password are valid; if they are, the user is permitted to use
the DBMS and to access the database. Application programs can also be considered
users and are required to log in to the database.
It is straightforward to keep track of database users and their accounts and pass-
words by creating an encrypted table or file with two fields: AccountNumber and
Password. This table can easily be maintained by the DBMS. Whenever a new
account is created, a new record is inserted into the table. When an account is can-
celed, the corresponding record must be deleted from the table.
The database system must also keep track of all operations on the database that are
applied by a certain user throughout each login session, which consists of the
sequence of database interactions that a user performs from the time of logging in
to the time of logging off. When a user logs in, the DBMS can record the user’s
account number and associate it with the computer or device from which the user
logged in. All operations applied from that computer or device are attributed to the
user’s account until the user logs off. It is particularly important to keep track of
update operations that are applied to the database so that, if the database is tam-
pered with, the DBA can determine which user did the tampering.
To keep a record of all updates applied to the database and of particular users who
applied each update, we can modify the system log. Recall that the system log
includes an entry for each operation applied to the database that may be required
for recovery from a transaction failure or system crash. We can expand the log
840
Database Security
entries so that they also include the account number of the user and the online
computer or device ID that applied each operation recorded in the log. If any tam-
pering with the database is suspected, a database audit is performed, which consists
of reviewing the log to examine all accesses and operations applied to the database
during a certain time period. When an illegal or unauthorized operation is found,
the DBA can determine the account number used to perform the operation.
Database audits are particularly important for sensitive databases that are updated
by many transactions and users, such as a banking database that is updated by many
bank tellers. A database log that is used mainly for security purposes is sometimes
called an audit trail.
1.5 Sensitive Data and Types of Disclosures
Sensitivity of data is a measure of the importance assigned to the data by its owner,
for the purpose of denoting its need for protection. Some databases contain only
sensitive data while other databases may contain no sensitive data at all. Handling
databases that fall at these two extremes is relatively easy, because these can be cov-
ered by access control, which is explained in the next section. The situation becomes
tricky when some of the data is sensitive while other data is not.
Several factors can cause data to be classified as sensitive:
1. Inherently sensitive. The value of the data itself may be so revealing or con-
fidential that it becomes sensitive—for example, a person’s salary or that a
patient has HIV/AIDS.
2. From a sensitive source. The source of the data may indicate a need for
secrecy—for example, an informer whose identity must be kept secret.
3. Declared sensitive. The owner of the data may have explicitly declared it as
sensitive.
4. A sensitive attribute or sensitive record. The particular attribute or record
may have been declared sensitive—for example, the salary attribute of an
employee or the salary history record in a personnel database.
5. Sensitive in relation to previously disclosed data. Some data may not be
sensitive by itself but will become sensitive in the presence of some other
data—for example, the exact latitude and longitude information for a loca-
tion where some previously recorded event happened that was later deemed
sensitive.
It is the responsibility of the database administrator and security administrator to
collectively enforce the security policies of an organization. This dictates whether
access should be permitted to a certain database attribute (also known as a table col-
umn or a data element) or not for individual users or for categories of users. Several
factors need to be considered before deciding whether it is safe to reveal the data.
The three most important factors are data availability, access acceptability, and
authenticity assurance.
1. Data availability. If a user is updating a field, then this field becomes inac-
cessible and other users should not be able to view this data. This blocking is
841
Database Security
only temporary and only to ensure that no user sees any inaccurate data.
This is typically handled by the concurrency control mechanism.
2. Access acceptability. Data should only be revealed to authorized users. A
database administrator may also deny access to a user request even if the
request does not directly access a sensitive data item, on the grounds that the
requested data may reveal information about the sensitive data that the user
is not authorized to have.
3. Authenticity assurance. Before granting access, certain external characteris-
tics about the user may also be considered. For example, a user may only be
permitted access during working hours. The system may track previous
queries to ensure that a combination of queries does not reveal sensitive
data. The latter is particularly relevant to statistical database queries (see
Section 5).
The term precision, when used in the security area, refers to allowing as much as
possible of the data to be available, subject to protecting exactly the subset of data
that is sensitive. The definitions of security versus precision are as follows:
■ Security: Means of ensuring that data is kept safe from corruption and that
access to it is suitably controlled. To provide security means to disclose only
nonsensitive data, and reject any query that references a sensitive field.
■ Precision: To protect all sensitive data while disclosing as much nonsensitive
data as possible.
The ideal combination is to maintain perfect security with maximum precision. If
we want to maintain security, some sacrifice has to be made with precision. Hence
there is typically a tradeoff between security and precision.
1.6 Relationship between Information Security
versus Information Privacy
The rapid advancement of the use of information technology (IT) in industry, gov-
ernment, and academia raises challenging questions and problems regarding the
protection and use of personal information. Questions of who has what rights to
information about individuals for which purposes become more important as we
move toward a world in which it is technically possible to know just about anything
about anyone.
Deciding how to design privacy considerations in technology for the future includes
philosophical, legal, and practical dimensions. There is a considerable overlap
between issues related to access to resources (security) and issues related to appro-
priate use of information (privacy). We now define the difference between security
versus privacy.
Security in information technology refers to many aspects of protecting a system
from unauthorized use, including authentication of users, information encryption,
access control, firewall policies, and intrusion detection. For our purposes here, we
842
Database Security
will limit our treatment of security to the concepts associated with how well a sys-
tem can protect access to information it contains. The concept of privacy goes
beyond security. Privacy examines how well the use of personal information that
the system acquires about a user conforms to the explicit or implicit assumptions
regarding that use. From an end user perspective, privacy can be considered from
two different perspectives: preventing storage of personal information versus
ensuring appropriate use of personal information.
For the purposes of this chapter, a simple but useful definition of privacy is the abil-
ity of individuals to control the terms under which their personal information is
acquired and used. In summary, security involves technology to ensure that informa-
tion is appropriately protected. Security is a required building block for privacy to
exist. Privacy involves mechanisms to support compliance with some basic principles
and other explicitly stated policies. One basic principle is that people should be
informed about information collection, told in advance what will be done with their
information, and given a reasonable opportunity to approve of such use of the infor-
mation. A related concept, trust, relates to both security and privacy, and is seen as
increasing when it is perceived that both security and privacy are provided for.
2 Discretionary Access Control Based
on Granting and Revoking Privileges
The typical method of enforcing discretionary access control in a database system
is based on the granting and revoking of privileges. Let us consider privileges in the
context of a relational DBMS. In particular, we will discuss a system of privileges
somewhat similar to the one originally developed for the SQL language. Many cur-
rent relational DBMSs use some variation of this technique. The main idea is to
include statements in the query language that allow the DBA and selected users to
grant and revoke privileges.
2.1 Types of Discretionary Privileges
In SQL2 and later versions,3 the concept of an authorization identifier is used to
refer, roughly speaking, to a user account (or group of user accounts). For simplic-
ity, we will use the words user or account interchangeably in place of authorization
identifier. The DBMS must provide selective access to each relation in the database
based on specific accounts. Operations may also be controlled; thus, having an
account does not necessarily entitle the account holder to all the functionality pro-
vided by the DBMS. Informally, there are two levels for assigning privileges to use
the database system:
■ The account level. At this level, the DBA specifies the particular privileges
that each account holds independently of the relations in the database.
■ The relation (or table) level. At this level, the DBA can control the privilege
to access each individual relation or view in the database.
3Discretionary privileges were incorporated into SQL2 and are applicable to later versions of SQL.
843
Database Security
The privileges at the account level apply to the capabilities provided to the account
itself and can include the CREATE SCHEMA or CREATE TABLE privilege, to create a
schema or base relation; the CREATE VIEW privilege; the ALTER privilege, to apply
schema changes such as adding or removing attributes from relations; the DROP
privilege, to delete relations or views; the MODIFY privilege, to insert, delete, or
update tuples; and the SELECT privilege, to retrieve information from the database
by using a SELECT query. Notice that these account privileges apply to the account
in general. If a certain account does not have the CREATE TABLE privilege, no rela-
tions can be created from that account. Account-level privileges are not defined as
part of SQL2; they are left to the DBMS implementers to define. In earlier versions
of SQL, a CREATETAB privilege existed to give an account the privilege to create
tables (relations).
The second level of privileges applies to the relation level, whether they are base
relations or virtual (view) relations. These privileges are defined for SQL2. In the
following discussion, the term relation may refer either to a base relation or to a
view, unless we explicitly specify one or the other. Privileges at the relation level
specify for each user the individual relations on which each type of command can
be applied. Some privileges also refer to individual columns (attributes) of relations.
SQL2 commands provide privileges at the relation and attribute level only. Although
this is quite general, it makes it difficult to create accounts with limited privileges.
The granting and revoking of privileges generally follow an authorization model for
discretionary privileges known as the access matrix model, where the rows of a
matrix M represent subjects (users, accounts, programs) and the columns represent
objects (relations, records, columns, views, operations). Each position M(i, j) in the
matrix represents the types of privileges (read, write, update) that subject i holds on
object j.
To control the granting and revoking of relation privileges, each relation R in a data-
base is assigned an owner account, which is typically the account that was used
when the relation was created in the first place. The owner of a relation is given all
privileges on that relation. In SQL2, the DBA can assign an owner to a whole
schema by creating the schema and associating the appropriate authorization iden-
tifier with that schema, using the CREATE SCHEMA command. The owner account
holder can pass privileges on any of the owned relations to other users by granting
privileges to their accounts. In SQL the following types of privileges can be granted
on each individual relation R:
■ SELECT (retrieval or read) privilege on R. Gives the account retrieval privi-
lege. In SQL this gives the account the privilege to use the SELECT statement
to retrieve tuples from R.
■ Modification privileges on R. This gives the account the capability to mod-
ify the tuples of R. In SQL this includes three privileges: UPDATE, DELETE,
and INSERT. These correspond to the three SQL commands (see Section 4.4)
for modifying a table R. Additionally, both the INSERT and UPDATE privi-
leges can specify that only certain attributes of R can be modified by the
account.
844
Database Security
■ References privilege on R. This gives the account the capability to reference
(or refer to) a relation R when specifying integrity constraints. This privilege
can also be restricted to specific attributes of R.
Notice that to create a view, the account must have the SELECT privilege on all rela-
tions involved in the view definition in order to specify the query that corresponds
to the view.
2.2 Specifying Privileges through the Use of Views
The mechanism of views is an important discretionary authorization mechanism in
its own right. For example, if the owner A of a relation R wants another account B to
be able to retrieve only some fields of R, then A can create a view V of R that
includes only those attributes and then grant SELECT on V to B. The same applies
to limiting B to retrieving only certain tuples of R; a view V�can be created by defin-
ing the view by means of a query that selects only those tuples from R that A wants
to allow B to access. We will illustrate this discussion with the example given in
Section 2.5.
2.3 Revoking of Privileges
In some cases it is desirable to grant a privilege to a user temporarily. For example,
the owner of a relation may want to grant the SELECT privilege to a user for a spe-
cific task and then revoke that privilege once the task is completed. Hence, a mech-
anism for revoking privileges is needed. In SQL a REVOKE command is included for
the purpose of canceling privileges. We will see how the REVOKE command is used
in the example in Section 2.5.
2.4 Propagation of Privileges Using the GRANT OPTION
Whenever the owner A of a relation R grants a privilege on R to another account B,
the privilege can be given to B with or without the GRANT OPTION. If the GRANT
OPTION is given, this means that B can also grant that privilege on R to other
accounts. Suppose that B is given the GRANT OPTION by A and that B then grants
the privilege on R to a third account C, also with the GRANT OPTION. In this way,
privileges on R can propagate to other accounts without the knowledge of the
owner of R. If the owner account A now revokes the privilege granted to B, all the
privileges that B propagated based on that privilege should automatically be revoked
by the system.
It is possible for a user to receive a certain privilege from two or more sources. For
example, A4 may receive a certain UPDATE R privilege from both A2 and A3. In such
a case, if A2 revokes this privilege from A4, A4 will still continue to have the privilege
by virtue of having been granted it from A3. If A3 later revokes the privilege from
A4, A4 totally loses the privilege. Hence, a DBMS that allows propagation of privi-
leges must keep track of how all the privileges were granted so that revoking of priv-
ileges can be done correctly and completely.
845
Database Security
2.5 An Example to Illustrate Granting and Revoking
of Privileges
Suppose that the DBA creates four accounts—A1, A2, A3, and A4—and wants only
A1 to be able to create base relations. To do this, the DBA must issue the following
GRANT command in SQL:
GRANT CREATETAB TO A1;
The CREATETAB (create table) privilege gives account A1 the capability to create
new database tables (base relations) and is hence an account privilege. This privilege
was part of earlier versions of SQL but is now left to each individual system imple-
mentation to define.
In SQL2 the same effect can be accomplished by having the DBA issue a CREATE
SCHEMA command, as follows:
CREATE SCHEMA EXAMPLE AUTHORIZATION A1;
User account A1 can now create tables under the schema called EXAMPLE. To con-
tinue our example, suppose that A1 creates the two base relations EMPLOYEE and
DEPARTMENT shown in Figure 1; A1 is then the owner of these two relations and
hence has all the relation privileges on each of them.
Next, suppose that account A1 wants to grant to account A2 the privilege to insert
and delete tuples in both of these relations. However, A1 does not want A2 to be able
to propagate these privileges to additional accounts. A1 can issue the following com-
mand:
GRANT INSERT, DELETE ON EMPLOYEE, DEPARTMENT TO A2;
Notice that the owner account A1 of a relation automatically has the GRANT
OPTION, allowing it to grant privileges on the relation to other accounts. However,
account A2 cannot grant INSERT and DELETE privileges on the EMPLOYEE and
DEPARTMENT tables because A2 was not given the GRANT OPTION in the preceding
command.
Next, suppose that A1 wants to allow account A3 to retrieve information from either
of the two tables and also to be able to propagate the SELECT privilege to other
accounts. A1 can issue the following command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT TO A3 WITH GRANT OPTION;
DEPARTMENT
DnameDnumber Mgr_ssn
Name Bdate Address Sex Salary Dno
EMPLOYEE
Ssn
Figure 1
Schemas for the two
relations EMPLOYEE
and DEPARTMENT.
846
Database Security
The clause WITH GRANT OPTION means that A3 can now propagate the privilege to
other accounts by using GRANT. For example, A3 can grant the SELECT privilege on
the EMPLOYEE relation to A4 by issuing the following command:
GRANT SELECT ON EMPLOYEE TO A4;
Notice that A4 cannot propagate the SELECT privilege to other accounts because
the GRANT OPTION was not given to A4.
Now suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE
relation from A3; A1 then can issue this command:
REVOKE SELECT ON EMPLOYEE FROM A3;
The DBMS must now revoke the SELECT privilege on EMPLOYEE from A3, and it
must also automatically revoke the SELECT privilege on EMPLOYEE from A4. This is
because A3 granted that privilege to A4, but A3 does not have the privilege any
more.
Next, suppose that A1 wants to give back to A3 a limited capability to SELECT from
the EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege.
The limitation is to retrieve only the Name, Bdate, and Address attributes and only
for the tuples with Dno = 5. A1 then can create the following view:
CREATE VIEW A3EMPLOYEE AS
SELECT Name, Bdate, Address
FROM EMPLOYEE
WHERE Dno = 5;
After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3 as
follows:
GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;
Finally, suppose that A1 wants to allow A4 to update only the Salary attribute of
EMPLOYEE; A1 can then issue the following command:
GRANT UPDATE ON EMPLOYEE (Salary) TO A4;
The UPDATE and INSERT privileges can specify particular attributes that may be
updated or inserted in a relation. Other privileges (SELECT, DELETE) are not attrib-
ute specific, because this specificity can easily be controlled by creating the appro-
priate views that include only the desired attributes and granting the corresponding
privileges on the views. However, because updating views is not always possible, the
UPDATE and INSERT privileges are given the option to specify the particular attrib-
utes of a base relation that may be updated.
2.6 Specifying Limits on Propagation of Privileges
Techniques to limit the propagation of privileges have been developed, although
they have not yet been implemented in most DBMSs and are not a part of SQL.
Limiting horizontal propagation to an integer number i means that an account B
given the GRANT OPTION can grant the privilege to at most i other accounts.
847
Database Security
Vertical propagation is more complicated; it limits the depth of the granting of
privileges. Granting a privilege with a vertical propagation of zero is equivalent to
granting the privilege with no GRANT OPTION. If account A grants a privilege to
account B with the vertical propagation set to an integer number j > 0, this means
that the account B has the GRANT OPTION on that privilege, but B can grant the
privilege to other accounts only with a vertical propagation less than j. In effect, ver-
tical propagation limits the sequence of GRANT OPTIONS that can be given from
one account to the next based on a single original grant of the privilege.
We briefly illustrate horizontal and vertical propagation limits—which are not
available currently in SQL or other relational systems—with an example. Suppose
that A1 grants SELECT to A2 on the EMPLOYEE relation with horizontal propaga-
tion equal to 1 and vertical propagation equal to 2. A2 can then grant SELECT to at
most one account because the horizontal propagation limitation is set to 1.
Additionally, A2 cannot grant the privilege to another account except with vertical
propagation set to 0 (no GRANT OPTION) or 1; this is because A2 must reduce the
vertical propagation by at least 1 when passing the privilege to others. In addition,
the horizontal propagation must be less than or equal to the originally granted hor-
izontal propagation. For example, if account A grants a privilege to account B with
the horizontal propagation set to an integer number j > 0, this means that B can
grant the privilege to other accounts only with a horizontal propagation less than or
equal to j. As this example shows, horizontal and vertical propagation techniques are
designed to limit the depth and breadth of propagation of privileges.
3 Mandatory Access Control and Role-Based
Access Control for Multilevel Security
The discretionary access control technique of granting and revoking privileges on
relations has traditionally been the main security mechanism for relational database
systems. This is an all-or-nothing method: A user either has or does not have a cer-
tain privilege. In many applications, an additional security policy is needed that clas-
sifies data and users based on security classes. This approach, known as mandatory
access control (MAC), would typically be combined with the discretionary access
control mechanisms described in Section 2. It is important to note that most com-
mercial DBMSs currently provide mechanisms only for discretionary access con-
trol. However, the need for multilevel security exists in government, military, and
intelligence applications, as well as in many industrial and corporate applications.
Some DBMS vendors—for example, Oracle—have released special versions of their
RDBMSs that incorporate mandatory access control for government use.
Typical security classes are top secret (TS), secret (S), confidential (C), and unclas-
sified (U), where TS is the highest level and U the lowest. Other more complex secu-
rity classification schemes exist, in which the security classes are organized in a
lattice. For simplicity, we will use the system with four security classification levels,
where TS ≥ S ≥ C ≥ U, to illustrate our discussion. The commonly used model for
multilevel security, known as the Bell-LaPadula model, classifies each subject (user,
848
Database Security
account, program) and object (relation, tuple, column, view, operation) into one of
the security classifications TS, S, C, or U. We will refer to the clearance (classifica-
tion) of a subject S as class(S) and to the classification of an object O as class(O).
Two restrictions are enforced on data access based on the subject/object classifica-
tions:
1. A subject S is not allowed read access to an object O unless class(S) ≥
class(O). This is known as the simple security property.
2. A subject S is not allowed to write an object O unless class(S) ≤ class(O). This
is known as the star property (or *-property).
The first restriction is intuitive and enforces the obvious rule that no subject can
read an object whose security classification is higher than the subject’s security
clearance. The second restriction is less intuitive. It prohibits a subject from writing
an object at a lower security classification than the subject’s security clearance.
Violation of this rule would allow information to flow from higher to lower classifi-
cations, which violates a basic tenet of multilevel security. For example, a user (sub-
ject) with TS clearance may make a copy of an object with classification TS and then
write it back as a new object with classification U, thus making it visible throughout
the system.
To incorporate multilevel security notions into the relational database model, it is
common to consider attribute values and tuples as data objects. Hence, each attrib-
ute A is associated with a classification attribute C in the schema, and each attrib-
ute value in a tuple is associated with a corresponding security classification. In
addition, in some models, a tuple classification attribute TC is added to the relation
attributes to provide a classification for each tuple as a whole. The model we
describe here is known as the multilevel model, because it allows classifications at
multiple security levels. A multilevel relation schema R with n attributes would be
represented as:
R(A1, C1, A2, C2, …, An, Cn, TC)
where each Ci represents the classification attribute associated with attribute Ai.
The value of the tuple classification attribute TC in each tuple t—which is the
highest of all attribute classification values within t—provides a general classifica-
tion for the tuple itself. Each attribute classification Ci provides a finer security clas-
sification for each attribute value within the tuple. The value of TC in each tuple t is
the highest of all attribute classification values Ci within t.
The apparent key of a multilevel relation is the set of attributes that would have
formed the primary key in a regular (single-level) relation. A multilevel relation will
appear to contain different data to subjects (users) with different clearance levels. In
some cases, it is possible to store a single tuple in the relation at a higher classifica-
tion level and produce the corresponding tuples at a lower-level classification
through a process known as filtering. In other cases, it is necessary to store two or
more tuples at different classification levels with the same value for the apparent key.
849
Database Security
This leads to the concept of polyinstantiation,4 where several tuples can have the
same apparent key value but have different attribute values for users at different
clearance levels.
We illustrate these concepts with the simple example of a multilevel relation shown
in Figure 2(a), where we display the classification attribute values next to each
attribute’s value. Assume that the Name attribute is the apparent key, and consider
the query SELECT * FROM EMPLOYEE. A user with security clearance S would see
the same relation shown in Figure 2(a), since all tuple classifications are less than or
equal to S. However, a user with security clearance C would not be allowed to see the
values for Salary of ‘Brown’ and Job_performance of ‘Smith’, since they have higher
classification. The tuples would be filtered to appear as shown in Figure 2(b), with
Salary and Job_performance appearing as null. For a user with security clearance U,
the filtering allows only the Name attribute of ‘Smith’ to appear, with all the other
Name Salary JobPerformance TC
Smith U C40000 SFair S
Smith U C40000 CExcellent C
Brown C S80000 CGood S
EMPLOYEE(d)
Name Salary JobPerformance TC
Smith U C40000 SFair S
Brown C S80000 CGood S
EMPLOYEE(a)
Name Salary JobPerformance TC
Smith U C40000 CNULL C
Brown C CNULL CGood C
EMPLOYEE(b)
Name Salary JobPerformance TC
Smith U UNULL UNULL U
EMPLOYEE(c)
Figure 2
A multilevel relation to illus-
trate multilevel security. (a)
The original EMPLOYEE
tuples. (b) Appearance of
EMPLOYEE after filtering
for classification C users.
(c) Appearance of
EMPLOYEE after filtering
for classification U users.
(d) Polyinstantiation of the
Smith tuple.
4This is similar to the notion of having multiple versions in the database that represent the same real-
world object.
850
Database Security
attributes appearing as null (Figure 2(c)). Thus, filtering introduces null values for
attribute values whose security classification is higher than the user’s security clear-
ance.
In general, the entity integrity rule for multilevel relations states that all attributes
that are members of the apparent key must not be null and must have the same
security classification within each individual tuple. Additionally, all other attribute
values in the tuple must have a security classification greater than or equal to that of
the apparent key. This constraint ensures that a user can see the key if the user is
permitted to see any part of the tuple. Other integrity rules, called null integrity
and interinstance integrity, informally ensure that if a tuple value at some security
level can be filtered (derived) from a higher-classified tuple, then it is sufficient to
store the higher-classified tuple in the multilevel relation.
To illustrate polyinstantiation further, suppose that a user with security clearance C
tries to update the value of Job_performance of ‘Smith’ in Figure 2 to ‘Excellent’; this
corresponds to the following SQL update being submitted by that user:
UPDATE EMPLOYEE
SET Job_performance = ‘Excellent’
WHERE Name = ‘Smith’;
Since the view provided to users with security clearance C (see Figure 2(b)) permits
such an update, the system should not reject it; otherwise, the user could infer that
some nonnull value exists for the Job_performance attribute of ‘Smith’ rather than
the null value that appears. This is an example of inferring information through
what is known as a covert channel, which should not be permitted in highly secure
systems (see Section 6.1). However, the user should not be allowed to overwrite the
existing value of Job_performance at the higher classification level. The solution is to
create a polyinstantiation for the ‘Smith’ tuple at the lower classification level C, as
shown in Figure 2(d). This is necessary since the new tuple cannot be filtered from
the existing tuple at classification S.
The basic update operations of the relational model (INSERT, DELETE, UPDATE)
must be modified to handle this and similar situations, but this aspect of the prob-
lem is outside the scope of our presentation. We refer the interested reader to the
Selected Bibliography at the end of this chapter for further details.
3.1 Comparing Discretionary Access Control
and Mandatory Access Control
Discretionary access control (DAC) policies are characterized by a high degree of
flexibility, which makes them suitable for a large variety of application domains.
The main drawback of DAC models is their vulnerability to malicious attacks, such
as Trojan horses embedded in application programs. The reason is that discre-
tionary authorization models do not impose any control on how information is
propagated and used once it has been accessed by users authorized to do so. By con-
trast, mandatory policies ensure a high degree of protection—in a way, they prevent
851
Database Security
any illegal flow of information. Therefore, they are suitable for military and high
security types of applications, which require a higher degree of protection.
However, mandatory policies have the drawback of being too rigid in that they
require a strict classification of subjects and objects into security levels, and there-
fore they are applicable to few environments. In many practical situations, discre-
tionary policies are preferred because they offer a better tradeoff between security
and applicability.
3.2 Role-Based Access Control
Role-based access control (RBAC) emerged rapidly in the 1990s as a proven tech-
nology for managing and enforcing security in large-scale enterprise-wide systems.
Its basic notion is that privileges and other permissions are associated with organi-
zational roles, rather than individual users. Individual users are then assigned to
appropriate roles. Roles can be created using the CREATE ROLE and DESTROY
ROLE commands. The GRANT and REVOKE commands discussed in Section 2 can
then be used to assign and revoke privileges from roles, as well as for individual
users when needed. For example, a company may have roles such as sales account
manager, purchasing agent, mailroom clerk, department manager, and so on.
Multiple individuals can be assigned to each role. Security privileges that are com-
mon to a role are granted to the role name, and any individual assigned to this role
would automatically have those privileges granted.
RBAC can be used with traditional discretionary and mandatory access controls; it
ensures that only authorized users in their specified roles are given access to certain
data or resources. Users create sessions during which they may activate a subset of
roles to which they belong. Each session can be assigned to several roles, but it maps
to one user or a single subject only. Many DBMSs have allowed the concept of roles,
where privileges can be assigned to roles.
Separation of duties is another important requirement in various commercial
DBMSs. It is needed to prevent one user from doing work that requires the involve-
ment of two or more people, thus preventing collusion. One method in which sepa-
ration of duties can be successfully implemented is with mutual exclusion of roles.
Two roles are said to be mutually exclusive if both the roles cannot be used simul-
taneously by the user. Mutual exclusion of roles can be categorized into two types,
namely authorization time exclusion (static) and runtime exclusion (dynamic). In
authorization time exclusion, two roles that have been specified as mutually exclu-
sive cannot be part of a user’s authorization at the same time. In runtime exclusion,
both these roles can be authorized to one user but cannot be activated by the user at
the same time. Another variation in mutual exclusion of roles is that of complete
and partial exclusion.
The role hierarchy in RBAC is a natural way to organize roles to reflect the organi-
zation’s lines of authority and responsibility. By convention, junior roles at the
bottom are connected to progressively senior roles as one moves up the hierarchy.
The hierarchic diagrams are partial orders, so they are reflexive, transitive, and
852
Database Security
antisymmetric. In other words, if a user has one role, the user automatically has
roles lower in the hierarchy. Defining a role hierarchy involves choosing the type of
hierarchy and the roles, and then implementing the hierarchy by granting roles to
other roles. Role hierarchy can be implemented in the following manner:
GRANT ROLE full_time TO employee_type1
GRANT ROLE intern TO employee_type2
The above are examples of granting the roles full_time and intern to two types of
employees.
Another issue related to security is identity management. Identity refers to a unique
name of an individual person. Since the legal names of persons are not necessarily
unique, the identity of a person must include sufficient additional information to
make the complete name unique. Authorizing this identity and managing the
schema of these identities is called Identity Management. Identity Management
addresses how organizations can effectively authenticate people and manage their
access to confidential information. It has become more visible as a business require-
ment across all industries affecting organizations of all sizes. Identity Management
administrators constantly need to satisfy application owners while keeping expendi-
tures under control and increasing IT efficiency.
Another important consideration in RBAC systems is the possible temporal con-
straints that may exist on roles, such as the time and duration of role activations,
and timed triggering of a role by an activation of another role. Using an RBAC
model is a highly desirable goal for addressing the key security requirements of
Web-based applications. Roles can be assigned to workflow tasks so that a user with
any of the roles related to a task may be authorized to execute it and may play a cer-
tain role only for a certain duration.
RBAC models have several desirable features, such as flexibility, policy neutrality,
better support for security management and administration, and other aspects that
make them attractive candidates for developing secure Web-based applications.
These features are lacking in DAC and MAC models. In addition, RBAC models
include the capabilities available in traditional DAC and MAC policies.
Furthermore, an RBAC model provides mechanisms for addressing the security
issues related to the execution of tasks and workflows, and for specifying user-
defined and organization-specific policies. Easier deployment over the Internet has
been another reason for the success of RBAC models.
3.3 Label-Based Security and Row-Level Access Control
Many commercial DBMSs currently use the concept of row-level access control,
where sophisticated access control rules can be implemented by considering the
data row by row. In row-level access control, each data row is given a label, which is
used to store information about data sensitivity. Row-level access control provides
finer granularity of data security by allowing the permissions to be set for each row
and not just for the table or column. Initially the user is given a default session label
by the database administrator. Levels correspond to a hierarchy of data-sensitivity
853
Database Security
levels to exposure or corruption, with the goal of maintaining privacy or security.
Labels are used to prevent unauthorized users from viewing or altering certain data.
A user having a low authorization level, usually represented by a low number, is
denied access to data having a higher-level number. If no such label is given to a row,
a row label is automatically assigned to it depending upon the user’s session label.
A policy defined by an administrator is called a Label Security policy. Whenever
data affected by the policy is accessed or queried through an application, the policy
is automatically invoked. When a policy is implemented, a new column is added to
each row in the schema. The added column contains the label for each row that
reflects the sensitivity of the row as per the policy. Similar to MAC, where each user
has a security clearance, each user has an identity in label-based security. This user’s
identity is compared to the label assigned to each row to determine whether the user
has access to view the contents of that row. However, the user can write the label
value himself, within certain restrictions and guidelines for that specific row. This
label can be set to a value that is between the user’s current session label and the
user’s minimum level. The DBA has the privilege to set an initial default row label.
The Label Security requirements are applied on top of the DAC requirements for
each user. Hence, the user must satisfy the DAC requirements and then the label
security requirements to access a row. The DAC requirements make sure that the
user is legally authorized to carry on that operation on the schema. In most applica-
tions, only some of the tables need label-based security. For the majority of the
application tables, the protection provided by DAC is sufficient.
Security policies are generally created by managers and human resources personnel.
The policies are high-level, technology neutral, and relate to risks. Policies are a
result of management instructions to specify organizational procedures, guiding
principles, and courses of action that are considered to be expedient, prudent, or
advantageous. Policies are typically accompanied by a definition of penalties and
countermeasures if the policy is transgressed. These policies are then interpreted
and converted to a set of label-oriented policies by the Label Security administra-
tor, who defines the security labels for data and authorizations for users; these labels
and authorizations govern access to specified protected objects.
Suppose a user has SELECT privileges on a table. When the user executes a SELECT
statement on that table, Label Security will automatically evaluate each row
returned by the query to determine whether the user has rights to view the data. For
example, if the user has a sensitivity of 20, then the user can view all rows having a
security level of 20 or lower. The level determines the sensitivity of the information
contained in a row; the more sensitive the row, the higher its security label value.
Such Label Security can be configured to perform security checks on UPDATE,
DELETE, and INSERT statements as well.
3.4 XML Access Control
With the worldwide use of XML in commercial and scientific applications, efforts
are under way to develop security standards. Among these efforts are digital
854
Database Security
signatures and encryption standards for XML. The XML Signature Syntax and
Processing specification describes an XML syntax for representing the associations
between cryptographic signatures and XML documents or other electronic
resources. The specification also includes procedures for computing and verifying
XML signatures. An XML digital signature differs from other protocols for message
signing, such as PGP (Pretty Good Privacy—a confidentiality and authentication
service that can be used for electronic mail and file storage application), in its sup-
port for signing only specific portions of the XML tree rather than the complete
document. Additionally, the XML signature specification defines mechanisms for
countersigning and transformations—so-called canonicalization to ensure that two
instances of the same text produce the same digest for signing even if their represen-
tations differ slightly, for example, in typographic white space.
The XML Encryption Syntax and Processing specification defines XML vocabulary
and processing rules for protecting confidentiality of XML documents in whole or
in part and of non-XML data as well. The encrypted content and additional pro-
cessing information for the recipient are represented in well-formed XML so that
the result can be further processed using XML tools. In contrast to other commonly
used technologies for confidentiality such as SSL (Secure Sockets Layer—a leading
Internet security protocol), and virtual private networks, XML encryption also
applies to parts of documents and to documents in persistent storage.
3.5 Access Control Policies for E-Commerce and the Web
Electronic commerce (e-commerce) environments are characterized by any trans-
actions that are done electronically. They require elaborate access control policies
that go beyond traditional DBMSs. In conventional database environments, access
control is usually performed using a set of authorizations stated by security officers
or users according to some security policies. Such a simple paradigm is not
well suited for a dynamic environment like e-commerce. Furthermore, in an
e-commerce environment the resources to be protected are not only traditional data
but also knowledge and experience. Such peculiarities call for more flexibility in
specifying access control policies. The access control mechanism must be flexible
enough to support a wide spectrum of heterogeneous protection objects.
A second related requirement is the support for content-based access control.
Content-based access control allows one to express access control policies that take
the protection object content into account. In order to support content-based access
control, access control policies must allow inclusion of conditions based on the
object content.
A third requirement is related to the heterogeneity of subjects, which requires access
control policies based on user characteristics and qualifications rather than on spe-
cific and individual characteristics (for example, user IDs). A possible solution, to
better take into account user profiles in the formulation of access control policies, is
to support the notion of credentials. A credential is a set of properties concerning a
user that are relevant for security purposes (for example, age or position or role
855
Database Security
within an organization). For instance, by using credentials, one can simply formu-
late policies such as Only permanent staff with five or more years of service can access
documents related to the internals of the system.
It is believed that the XML is expected to play a key role in access control for
e-commerce applications5 because XML is becoming the common representation
language for document interchange over the Web, and is also becoming the lan-
guage for e-commerce. Thus, on the one hand there is the need to make XML repre-
sentations secure, by providing access control mechanisms specifically tailored to
the protection of XML documents. On the other hand, access control information
(that is, access control policies and user credentials) can be expressed using XML
itself. The Directory Services Markup Language (DSML) is a representation of
directory service information in XML syntax. It provides a foundation for a stan-
dard for communicating with the directory services that will be responsible for pro-
viding and authenticating user credentials. The uniform presentation of both
protection objects and access control policies can be applied to policies and creden-
tials themselves. For instance, some credential properties (such as the user name)
may be accessible to everyone, whereas other properties may be visible only to a
restricted class of users. Additionally, the use of an XML-based language for specify-
ing credentials and access control policies facilitates secure credential submission
and export of access control policies.
4 SQL Injection
SQL Injection is one of the most common threats to a database system. We will dis-
cuss it in detail later in this section. Some of the other attacks on databases that are
quite frequent are:
■ Unauthorized privilege escalation. This attack is characterized by an indi-
vidual attempting to elevate his or her privilege by attacking vulnerable
points in the database systems.
■ Privilege abuse. While the previous attack is done by an unauthorized user,
this attack is performed by a privileged user. For example, an administrator
who is allowed to change student information can use this privilege to
update student grades without the instructor’s permission.
■ Denial of service. A Denial of Service (DOS) attack is an attempt to make
resources unavailable to its intended users. It is a general attack category in
which access to network applications or data is denied to intended users by
overflowing the buffer or consuming resources.
■ Weak Authentication. If the user authentication scheme is weak, an attacker
can impersonate the identity of a legitimate user by obtaining their login
credentials.
5See Thuraisingham et al. (2001).
856
Database Security
4.1 SQL Injection Methods
Web programs and applications that access a database can send commands and data
to the database, as well as display data retrieved from the database through the Web
browser. In an SQL Injection attack, the attacker injects a string input through the
application, which changes or manipulates the SQL statement to the attacker’s
advantage. An SQL Injection attack can harm the database in various ways, such as
unauthorized manipulation of the database, or retrieval of sensitive data. It can also
be used to execute system level commands that may cause the system to deny serv-
ice to the application. This section describes types of injection attacks.
SQL Manipulation. A manipulation attack, which is the most common type of
injection attack, changes an SQL command in the application—for example, by
adding conditions to the WHERE-clause of a query, or by expanding a query with
additional query components using set operations such as UNION, INTERSECT, or
MINUS. Other types of manipulation attacks are also possible. A typical manipula-
tion attack occurs during database login. For example, suppose that a simplistic
authentication procedure issues the following query and checks to see if any rows
were returned:
SELECT * FROM users WHERE username = ‘jake’ and PASSWORD =
‘jakespasswd’.
The attacker can try to change (or manipulate) the SQL statement, by changing it as
follows:
SELECT * FROM users WHERE username = ‘jake’ and (PASSWORD =
‘jakespasswd’ or ‘x’ = ‘x’)
As a result, the attacker who knows that ‘jake’ is a valid login of some user is able to
log into the database system as ‘jake’ without knowing his password and is able to do
everything that ‘jake’ may be authorized to do to the database system.
Code Injection. This type of attack attempts to add additional SQL statements or
commands to the existing SQL statement by exploiting a computer bug, which is
caused by processing invalid data. The attacker can inject or introduce code into a
computer program to change the course of execution. Code injection is a popular
technique for system hacking or cracking to gain information.
Function Call Injection. In this kind of attack, a database function or operating
system function call is inserted into a vulnerable SQL statement to manipulate the
data or make a privileged system call. For example, it is possible to exploit a function
that performs some aspect related to network communication. In addition, func-
tions that are contained in a customized database package, or any custom database
function, can be executed as part of an SQL query. In particular, dynamically cre-
ated SQL queries can be exploited since they are constructed at run time.
857
Database Security
For example, the dual table is used in the FROM clause of SQL in Oracle when a user
needs to run SQL that does not logically have a table name. To get today’s date, we
can use:
SELECT SYSDATE FROM dual;
The following example demonstrates that even the simplest SQL statements can be
vulnerable.
SELECT TRANSLATE (‘user input’, ‘from_string’, ‘to_string’) FROM dual;
Here, TRANSLATE is used to replace a string of characters with another string of
characters. The TRANSLATE function above will replace the characters of the
‘from_string’ with the characters in the ‘to_string’ one by one. This means that the f
will be replaced with the t, the r with the o, the o with the _, and so on.
This type of SQL statement can be subjected to a function injection attack. Consider
the following example:
SELECT TRANSLATE (“ || UTL_HTTP.REQUEST (‘http://129.107.2.1/’) || ’’,
‘98765432’, ‘9876’) FROM dual;
The user can input the string (“ || UTL_HTTP.REQUEST (‘http://129.107.2.1/’)
|| ’’), where || is the concatenate operator, thus requesting a page from a Web server.
UTL_HTTP makes Hypertext Transfer Protocol (HTTP) callouts from SQL. The
REQUEST object takes a URL (‘http://129.107.2.1/’ in this example) as a parameter,
contacts that site, and returns the data (typically HTML) obtained from that site.
The attacker could manipulate the string he inputs, as well as the URL, to include
other functions and do other illegal operations. We just used a dummy example to
show conversion of ‘98765432’ to ‘9876’, but the user’s intent would be to access the
URL and get sensitive information. The attacker can then retrieve useful informa-
tion from the database server—located at the URL that is passed as a parameter—
and send it to the Web server (that calls the TRANSLATE function).
4.2 Risks Associated with SQL Injection
SQL injection is harmful and the risks associated with it provide motivation for
attackers. Some of the risks associated with SQL injection attacks are explained
below.
■ Database Fingerprinting. The attacker can determine the type of database
being used in the backend so that he can use database-specific attacks that
correspond to weaknesses in a particular DBMS.
■ Denial of Service. The attacker can flood the server with requests, thus
denying service to valid users, or they can delete some data.
■ Bypassing Authentication. This is one of the most common risks, in which
the attacker can gain access to the database as an authorized user and per-
form all the desired tasks.
858
Database Security
■ Identifying Injectable Parameters. In this type of attack, the attacker gath-
ers important information about the type and structure of the back-end
database of a Web application. This attack is made possible by the fact
that the default error page returned by application servers is often overly
descriptive.
■ Executing Remote Commands. This provides attackers with a tool to exe-
cute arbitrary commands on the database. For example, a remote user can
execute stored database procedures and functions from a remote SQL inter-
active interface.
■ Performing Privilege Escalation. This type of attack takes advantage of log-
ical flaws within the database to upgrade the access level.
4.3 Protection Techniques against SQL Injection
Protection against SQL injection attacks can be achieved by applying certain pro-
gramming rules to all Web-accessible procedures and functions. This section
describes some of these techniques.
Bind Variables (Using Parameterized Statements). The use of bind variables
(also known as parameters) protects against injection attacks and also improves per-
formance.
Consider the following example using Java and JDBC:
PreparedStatement stmt = conn.prepareStatement( “SELECT * FROM
EMPLOYEE WHERE EMPLOYEE_ID=? AND PASSWORD=?”);
stmt.setString(1, employee_id);
stmt.setString(2, password);
Instead of embedding the user input into the statement, the input should be bound
to a parameter. In this example, the input ‘1’ is assigned (bound) to a bind variable
‘employee_id’ and input ‘2’ to the bind variable ‘password’ instead of directly pass-
ing string parameters.
Filtering Input (Input Validation). This technique can be used to remove escape
characters from input strings by using the SQL Replace function. For example, the
delimiter single quote (‘) can be replaced by two single quotes (‘’). Some SQL
Manipulation attacks can be prevented by using this technique, since escape charac-
ters can be used to inject manipulation attacks. However, because there can be a
large number of escape characters, this technique is not reliable.
Function Security. Database functions, both standard and custom, should be
restricted, as they can be exploited in the SQL function injection attacks.
859
Database Security
5 Introduction to Statistical Database Security
Statistical databases are used mainly to produce statistics about various popula-
tions. The database may contain confidential data about individuals, which should
be protected from user access. However, users are permitted to retrieve statistical
information about the populations, such as averages, sums, counts, maximums,
minimums, and standard deviations. The techniques that have been developed to
protect the privacy of individual information are beyond the scope of this text. We
will illustrate the problem with a very simple example, which refers to the relation
shown in Figure 3. This is a PERSON relation with the attributes Name, Ssn, Income,
Address, City, State, Zip, Sex, and Last_degree.
A population is a set of tuples of a relation (table) that satisfy some selection condi-
tion. Hence, each selection condition on the PERSON relation will specify a partic-
ular population of PERSON tuples. For example, the condition Sex = ‘M’ specifies
the male population; the condition ((Sex = ‘F’) AND (Last_degree = ‘M.S.’ OR
Last_degree = ‘Ph.D.’)) specifies the female population that has an M.S. or Ph.D.
degree as their highest degree; and the condition City = ‘Houston’ specifies the pop-
ulation that lives in Houston.
Statistical queries involve applying statistical functions to a population of tuples.
For example, we may want to retrieve the number of individuals in a population or
the average income in the population. However, statistical users are not allowed to
retrieve individual data, such as the income of a specific person. Statistical database
security techniques must prohibit the retrieval of individual data. This can be
achieved by prohibiting queries that retrieve attribute values and by allowing only
queries that involve statistical aggregate functions such as COUNT, SUM, MIN, MAX,
AVERAGE, and STANDARD DEVIATION. Such queries are sometimes called statistical
queries.
It is the responsibility of a database management system to ensure the confidential-
ity of information about individuals, while still providing useful statistical sum-
maries of data about those individuals to users. Provision of privacy protection of
users in a statistical database is paramount; its violation is illustrated in the follow-
ing example.
In some cases it is possible to infer the values of individual tuples from a sequence
of statistical queries. This is particularly true when the conditions result in a
Name Ssn Income Address City State Zip Sex Last_degree
PERSON Figure 3
The PERSON relation
schema for illustrating
statistical database
security.
860
Database Security
population consisting of a small number of tuples. As an illustration, consider the
following statistical queries:
Q1: SELECT COUNT (*) FROM PERSON
WHERE ;
Q2: SELECT AVG (Income) FROM PERSON
WHERE ;
Now suppose that we are interested in finding the Salary of Jane Smith, and we know
that she has a Ph.D. degree and that she lives in the city of Bellaire, Texas. We issue
the statistical query Q1 with the following condition:
(Last_degree=‘Ph.D.’ AND Sex=‘F’ AND City=‘Bellaire’ AND State=‘Texas’)
If we get a result of 1 for this query, we can issue Q2 with the same condition and
find the Salary of Jane Smith. Even if the result of Q1 on the preceding condition is
not 1 but is a small number—say 2 or 3—we can issue statistical queries using the
functions MAX, MIN, and AVERAGE to identify the possible range of values for the
Salary of Jane Smith.
The possibility of inferring individual information from statistical queries is
reduced if no statistical queries are permitted whenever the number of tuples in the
population specified by the selection condition falls below some threshold. Another
technique for prohibiting retrieval of individual information is to prohibit
sequences of queries that refer repeatedly to the same population of tuples. It is also
possible to introduce slight inaccuracies or noise into the results of statistical queries
deliberately, to make it difficult to deduce individual information from the results.
Another technique is partitioning of the database. Partitioning implies that records
are stored in groups of some minimum size; queries can refer to any complete group
or set of groups, but never to subsets of records within a group. The interested
reader is referred to the bibliography at the end of this chapter for a discussion of
these techniques.
6 Introduction to Flow Control
Flow control regulates the distribution or flow of information among accessible
objects. A flow between object X and object Y occurs when a program reads values
from X and writes values into Y. Flow controls check that information contained in
some objects does not flow explicitly or implicitly into less protected objects. Thus, a
user cannot get indirectly in Y what he or she cannot get directly in X. Active flow
control began in the early 1970s. Most flow controls employ some concept of security
class; the transfer of information from a sender to a receiver is allowed only if the
receiver’s security class is at least as privileged as the sender’s. Examples of a flow con-
trol include preventing a service program from leaking a customer’s confidential
data, and blocking the transmission of secret military data to an unknown classified
user.
A flow policy specifies the channels along which information is allowed to move.
The simplest flow policy specifies just two classes of information—confidential (C)
861
Database Security
and nonconfidential (N)—and allows all flows except those from class C to class N.
This policy can solve the confinement problem that arises when a service program
handles data such as customer information, some of which may be confidential.
For example, an income-tax computing service might be allowed to retain a cus-
tomer’s address and the bill for services rendered, but not a customer’s income or
deductions.
Access control mechanisms are responsible for checking users’ authorizations for
resource access: Only granted operations are executed. Flow controls can be
enforced by an extended access control mechanism, which involves assigning a secu-
rity class (usually called the clearance) to each running program. The program is
allowed to read a particular memory segment only if its security class is as high as
that of the segment. It is allowed to write in a segment only if its class is as low as
that of the segment. This automatically ensures that no information transmitted by
the person can move from a higher to a lower class. For example, a military program
with a secret clearance can only read from objects that are unclassified and confi-
dential and can only write into objects that are secret or top secret.
Two types of flow can be distinguished: explicit flows, occurring as a consequence of
assignment instructions, such as Y:= f(X1,Xn,), and implicit flows generated by con-
ditional instructions, such as if f(Xm+1, …, Xn) then Y:= f (X1,Xm).
Flow control mechanisms must verify that only authorized flows, both explicit and
implicit, are executed. A set of rules must be satisfied to ensure secure information
flows. Rules can be expressed using flow relations among classes and assigned to
information, stating the authorized flows within a system. (An information flow
from A to B occurs when information associated with A affects the value of infor-
mation associated with B. The flow results from operations that cause information
transfer from one object to another.) These relations can define, for a class, the set of
classes where information (classified in that class) can flow, or can state the specific
relations to be verified between two classes to allow information to flow from one to
the other. In general, flow control mechanisms implement the controls by assigning
a label to each object and by specifying the security class of the object. Labels are
then used to verify the flow relations defined in the model.
6.1 Covert Channels
A covert channel allows a transfer of information that violates the security or the
policy. Specifically, a covert channel allows information to pass from a higher clas-
sification level to a lower classification level through improper means. Covert chan-
nels can be classified into two broad categories: timing channels and storage. The
distinguishing feature between the two is that in a timing channel the information
is conveyed by the timing of events or processes, whereas storage channels do not
require any temporal synchronization, in that information is conveyed by accessing
system information or what is otherwise inaccessible to the user.
In a simple example of a covert channel, consider a distributed database system in
which two nodes have user security levels of secret (S) and unclassified (U). In order
862
Database Security
for a transaction to commit, both nodes must agree to commit. They mutually can
only do operations that are consistent with the *-property, which states that in any
transaction, the S site cannot write or pass information to the U site. However, if
these two sites collude to set up a covert channel between them, a transaction
involving secret data may be committed unconditionally by the U site, but the S site
may do so in some predefined agreed-upon way so that certain information may be
passed from the S site to the U site, violating the *-property. This may be achieved
where the transaction runs repeatedly, but the actions taken by the S site implicitly
convey information to the U site. Measures such as locking prevent concurrent
writing of the information by users with different security levels into the same
objects, preventing the storage-type covert channels. Operating systems and distrib-
uted databases provide control over the multiprogramming of operations that
allows a sharing of resources without the possibility of encroachment of one pro-
gram or process into another’s memory or other resources in the system, thus pre-
venting timing-oriented covert channels. In general, covert channels are not a major
problem in well-implemented robust database implementations. However, certain
schemes may be contrived by clever users that implicitly transfer information.
Some security experts believe that one way to avoid covert channels is to disallow
programmers to actually gain access to sensitive data that a program will process
after the program has been put into operation. For example, a programmer for a
bank has no need to access the names or balances in depositors’ accounts.
Programmers for brokerage firms do not need to know what buy and sell orders
exist for clients. During program testing, access to a form of real data or some sam-
ple test data may be justifiable, but not after the program has been accepted for reg-
ular use.
7 Encryption and Public
Key Infrastructures
The previous methods of access and flow control, despite being strong control
measures, may not be able to protect databases from some threats. Suppose we com-
municate data, but our data falls into the hands of a nonlegitimate user. In this situ-
ation, by using encryption we can disguise the message so that even if the
transmission is diverted, the message will not be revealed. Encryption is the conver-
sion of data into a form, called a ciphertext, which cannot be easily understood by
unauthorized persons. It enhances security and privacy when access controls are
bypassed, because in cases of data loss or theft, encrypted data cannot be easily
understood by unauthorized persons.
With this background, we adhere to following standard definitions:6
■ Ciphertext: Encrypted (enciphered) data.
6These definitions are from NIST (National Institute of Standards and Technology) from http://csrc.nist
.gov/publications/nistpubs/800-67/SP800-67.pdf.
863
Database Security
■ Plaintext (or cleartext): Intelligible data that has meaning and can be read or
acted upon without the application of decryption.
■ Encryption: The process of transforming plaintext into ciphertext.
■ Decryption: The process of transforming ciphertext back into plaintext.
Encryption consists of applying an encryption algorithm to data using some pre-
specified encryption key. The resulting data has to be decrypted using a
decryption key to recover the original data.
7.1 The Data Encryption and Advanced
Encryption Standards
The Data Encryption Standard (DES) is a system developed by the U.S. govern-
ment for use by the general public. It has been widely accepted as a cryptographic
standard both in the United States and abroad. DES can provide end-to-end
encryption on the channel between sender A and receiver B. The DES algorithm is a
careful and complex combination of two of the fundamental building blocks of
encryption: substitution and permutation (transposition). The algorithm derives its
strength from repeated application of these two techniques for a total of 16 cycles.
Plaintext (the original form of the message) is encrypted as blocks of 64 bits.
Although the key is 64 bits long, in effect the key can be any 56-bit number. After
questioning the adequacy of DES, the NIST introduced the Advanced Encryption
Standard (AES). This algorithm has a block size of 128 bits, compared with DES’s
56-block size, and can use keys of 128, 192, or 256 bits, compared with DES’s 56-bit
key. AES introduces more possible keys, compared with DES, and thus takes a much
longer time to crack.
7.2 Symmetric Key Algorithms
A symmetric key is one key that is used for both encryption and decryption. By
using a symmetric key, fast encryption and decryption is possible for routine use
with sensitive data in the database. A message encrypted with a secret key can be
decrypted only with the same secret key. Algorithms used for symmetric
key encryption are called secret-key algorithms. Since secret-key algorithms are
mostly used for encrypting the content of a message, they are also called content-
encryption algorithms.
The major liability associated with secret-key algorithms is the need for sharing the
secret key. A possible method is to derive the secret key from a user-supplied password
string by applying the same function to the string at both the sender and receiver; this
is known as a password-based encryption algorithm. The strength of the symmetric key
encryption depends on the size of the key used. For the same algorithm, encrypting
using a longer key is tougher to break than the one using a shorter key.
7.3 Public (Asymmetric) Key Encryption
In 1976, Diffie and Hellman proposed a new kind of cryptosystem, which they
called public key encryption. Public key algorithms are based on mathematical
864
Database Security
functions rather than operations on bit patterns. They address one drawback of
symmetric key encryption, namely that both sender and recipient must exchange
the common key in a secure manner. In public key systems, two keys are used for
encryption/decryption. The public key can be transmitted in a non-secure way,
whereas the private key is not transmitted at all. These algorithms—which use two
related keys, a public key and a private key, to perform complementary operations
(encryption and decryption)—are known as asymmetric key encryption algo-
rithms. The use of two keys can have profound consequences in the areas of confi-
dentiality, key distribution, and authentication. The two keys used for public key
encryption are referred to as the public key and the private key. The private key is
kept secret, but it is referred to as a private key rather than a secret key (the key used
in conventional encryption) to avoid confusion with conventional encryption. The
two keys are mathematically related, since one of the keys is used to perform
encryption and the other to perform decryption. However, it is very difficult to
derive the private key from the public key.
A public key encryption scheme, or infrastructure, has six ingredients:
1. Plaintext. This is the data or readable message that is fed into the algorithm
as input.
2. Encryption algorithm. This algorithm performs various transformations
on the plaintext.
3. and 4. Public and private keys. These are a pair of keys that have been
selected so that if one is used for encryption, the other is used for decryp-
tion. The exact transformations performed by the encryption algorithm
depend on the public or private key that is provided as input. For example, if
a message is encrypted using the public key, it can only be decrypted using
the private key.
5. Ciphertext. This is the scrambled message produced as output. It depends
on the plaintext and the key. For a given message, two different keys will pro-
duce two different ciphertexts.
6. Decryption algorithm. This algorithm accepts the ciphertext and the
matching key and produces the original plaintext.
As the name suggests, the public key of the pair is made public for others to use,
whereas the private key is known only to its owner. A general-purpose public key
cryptographic algorithm relies on one key for encryption and a different but related
key for decryption. The essential steps are as follows:
1. Each user generates a pair of keys to be used for the encryption and decryp-
tion of messages.
2. Each user places one of the two keys in a public register or other accessible
file. This is the public key. The companion key is kept private.
3. If a sender wishes to send a private message to a receiver, the sender encrypts
the message using the receiver’s public key.
865
Database Security
4. When the receiver receives the message, he or she decrypts it using the
receiver’s private key. No other recipient can decrypt the message because
only the receiver knows his or her private key.
The RSA Public Key Encryption Algorithm. One of the first public key schemes
was introduced in 1978 by Ron Rivest, Adi Shamir, and Len Adleman at MIT and is
named after them as the RSA scheme. The RSA scheme has since then reigned
supreme as the most widely accepted and implemented approach to public key
encryption. The RSA encryption algorithm incorporates results from number the-
ory, combined with the difficulty of determining the prime factors of a target. The
RSA algorithm also operates with modular arithmetic—mod n.
Two keys, d and e, are used for decryption and encryption. An important property is
that they can be interchanged. n is chosen as a large integer that is a product of two
large distinct prime numbers, a and b, n = a × b. The encryption key e is a randomly
chosen number between 1 and n that is relatively prime to (a – 1) × (b – 1). The
plaintext block P is encrypted as Pe where Pe = P mod n. Because the exponentiation
is performed mod n, factoring Pe to uncover the encrypted plaintext is difficult.
However, the decrypting key d is carefully chosen so that (Pe)d mod n = P. The
decryption key d can be computed from the condition that d × e = 1 mod ((a – 1) ×
(b – 1)). Thus, the legitimate receiver who knows d simply computes (Pe)d mod n =
P and recovers P without having to factor Pe.
7.4 Digital Signatures
A digital signature is an example of using encryption techniques to provide authen-
tication services in electronic commerce applications. Like a handwritten signature,
a digital signature is a means of associating a mark unique to an individual with a
body of text. The mark should be unforgettable, meaning that others should be able
to check that the signature comes from the originator.
A digital signature consists of a string of symbols. If a person’s digital signature were
always the same for each message, then one could easily counterfeit it by simply
copying the string of symbols. Thus, signatures must be different for each use. This
can be achieved by making each digital signature a function of the message that it is
signing, together with a timestamp. To be unique to each signer and counterfeit-
proof, each digital signature must also depend on some secret number that is
unique to the signer. Thus, in general, a counterfeitproof digital signature must
depend on the message and a unique secret number of the signer. The verifier of the
signature, however, should not need to know any secret number. Public key tech-
niques are the best means of creating digital signatures with these properties.
7.5 Digital Certificates
A digital certificate is used to combine the value of a public key with the identity of
the person or service that holds the corresponding private key into a digitally signed
866
Database Security
statement. Certificates are issued and signed by a certification authority (CA). The
entity receiving this certificate from a CA is the subject of that certificate. Instead of
requiring each participant in an application to authenticate every user, third-party
authentication relies on the use of digital certificates.
The digital certificate itself contains various types of information. For example,
both the certification authority and the certificate owner information are included.
The following list describes all the information included in the certificate:
1. The certificate owner information, which is represented by a unique identi-
fier known as the distinguished name (DN) of the owner. This includes the
owner’s name, as well as the owner’s organization and other information
about the owner.
2. The certificate also includes the public key of the owner.
3. The date of issue of the certificate is also included.
4. The validity period is specified by ‘Valid From’ and ‘Valid To’ dates, which are
included in each certificate.
5. Issuer identifier information is included in the certificate.
6. Finally, the digital signature of the issuing CA for the certificate is included.
All the information listed is encoded through a message-digest function,
which creates the digital signature. The digital signature basically certifies
that the association between the certificate owner and public key is valid.
8 Privacy Issues and Preservation
Preserving data privacy is a growing challenge for database security and privacy
experts. In some perspectives, to preserve data privacy we should even limit per-
forming large-scale data mining and analysis. The most commonly used techniques
to address this concern are to avoid building mammoth central warehouses as a sin-
gle repository of vital information. Another possible measure is to intentionally
modify or perturb data.
If all data were available at a single warehouse, violating only a single repository’s
security could expose all data. Avoiding central warehouses and using distributed
data mining algorithms minimizes the exchange of data needed to develop globally
valid models. By modifying, perturbing, and anonymizing data, we can also miti-
gate privacy risks associated with data mining. This can be done by removing iden-
tity information from the released data and injecting noise into the data. However,
by using these techniques, we should pay attention to the quality of the resulting
data in the database, which may undergo too many modifications. We must be able
to estimate the errors that may be introduced by these modifications.
Privacy is an important area of ongoing research in database management. It is
complicated due to its multidisciplinary nature and the issues related to the subjec-
tivity in the interpretation of privacy, trust, and so on. As an example, consider
medical and legal records and transactions, which must maintain certain privacy
867
Database Security
requirements while they are being defined and enforced. Providing access control
and privacy for mobile devices is also receiving increased attention. DBMSs need
robust techniques for efficient storage of security-relevant information on small
devices, as well as trust negotiation techniques. Where to keep information related
to user identities, profiles, credentials, and permissions and how to use it for reliable
user identification remains an important problem. Because large-sized streams of
data are generated in such environments, efficient techniques for access control
must be devised and integrated with processing techniques for continuous queries.
Finally, the privacy of user location data, acquired from sensors and communica-
tion networks, must be ensured.
9 Challenges of Database Security
Considering the vast growth in volume and speed of threats to databases and infor-
mation assets, research efforts need to be devoted to the following issues: data qual-
ity, intellectual property rights, and database survivability. These are only some of
the main challenges that researchers in database security are trying to address.
9.1 Data Quality
The database community needs techniques and organizational solutions to assess
and attest the quality of data. These techniques may include simple mechanisms
such as quality stamps that are posted on Web sites. We also need techniques that
provide more effective integrity semantics verification and tools for the assessment
of data quality, based on techniques such as record linkage. Application-level recov-
ery techniques are also needed for automatically repairing incorrect data. The ETL
(extract, transform, load) tools widely used to load data in data warehouses are
presently grappling with these issues.
9.2 Intellectual Property Rights
With the widespread use of the Internet and intranets, legal and informational
aspects of data are becoming major concerns of organizations. To address these
concerns, watermarking techniques for relational data have been proposed. The
main purpose of digital watermarking is to protect content from unauthorized
duplication and distribution by enabling provable ownership of the content. It has
traditionally relied upon the availability of a large noise domain within which the
object can be altered while retaining its essential properties. However, research is
needed to assess the robustness of such techniques and to investigate different
approaches aimed at preventing intellectual property rights violations.
9.3 Database Survivability
Database systems need to operate and continue their functions, even with reduced
capabilities, despite disruptive events such as information warfare attacks. A DBMS,
868
Database Security
in addition to making every effort to prevent an attack and detecting one in the
event of occurrence, should be able to do the following:
■ Confinement. Take immediate action to eliminate the attacker’s access to the
system and to isolate or contain the problem to prevent further spread.
■ Damage assessment. Determine the extent of the problem, including failed
functions and corrupted data.
■ Reconfiguration. Reconfigure to allow operation to continue in a degraded
mode while recovery proceeds.
■ Repair. Recover corrupted or lost data and repair or reinstall failed system
functions to reestablish a normal level of operation.
■ Fault treatment. To the extent possible, identify the weaknesses exploited in
the attack and take steps to prevent a recurrence.
The goal of the information warfare attacker is to damage the organization’s opera-
tion and fulfillment of its mission through disruption of its information systems.
The specific target of an attack may be the system itself or its data. While attacks that
bring the system down outright are severe and dramatic, they must also be well
timed to achieve the attacker’s goal, since attacks will receive immediate and con-
centrated attention in order to bring the system back to operational condition, diag-
nose how the attack took place, and install preventive measures.
To date, issues related to database survivability have not been sufficiently investi-
gated. Much more research needs to be devoted to techniques and methodologies
that ensure database system survivability.
10 Oracle Label-Based Security
Restricting access to entire tables or isolating sensitive data into separate databases is
a costly operation to administer. Oracle Label Security overcomes the need for such
measures by enabling row-level access control. It is available in Oracle Database 11g
Release 1 (11.1) Enterprise Edition at the time of writing. Each database table or
view has a security policy associated with it. This policy executes every time the
table or view is queried or altered. Developers can readily add label-based access
control to their Oracle Database applications. Label-based security provides an
adaptable way of controlling access to sensitive data. Both users and data have labels
associated with them. Oracle Label Security uses these labels to provide security.
10.1 Virtual Private Database (VPD) Technology
Virtual Private Databases (VPDs) is a feature of the Oracle Enterprise Edition that
adds predicates to user statements to limit their access in a transparent manner to
the user and the application. The VPD concept allows server-enforced, fine-grained
access control for a secure application.
VPD provides access control based on policies. These VPD policies enforce object-
level access control or row-level security. It provides an application programming
869
Database Security
interface (API) that allows security policies to be attached to database tables or
views. Using PL/SQL, a host programming language used in Oracle applications,
developers and security administrators can implement security policies with the
help of stored procedures. VPD policies allow developers to remove access security
mechanisms from applications and centralize them within the Oracle Database.
VPD is enabled by associating a security “policy” with a table, view, or synonym. An
administrator uses the supplied PL/SQL package, DBMS_RLS, to bind a policy
function with a database object. When an object having a security policy associated
with it is accessed, the function implementing this policy is consulted. The policy
function returns a predicate (a WHERE clause) which is then appended to the user’s
SQL statement, thus transparently and dynamically modifying the user’s data access.
Oracle Label Security is a technique of enforcing row-level security in the form of a
security policy.
10.2 Label Security Architecture
Oracle Label Security is built on the VPD technology delivered in the Oracle
Database 11.1 Enterprise Edition. Figure 4 illustrates how data is accessed under
Oracle Label Security, showing the sequence of DAC and label security checks.
Figure 4 shows the sequence of discretionary access control (DAC) and label secu-
rity checks. The left part of the figure shows an application user in an Oracle
Database 11g Release 1 (11.1) session sending out an SQL request. The Oracle
DBMS checks the DAC privileges of the user, making sure that he or she has
SELECT privileges on the table. Then it checks whether the table has a Virtual
Private Database (VPD) policy associated with it to determine if the table is pro-
tected using Oracle Label Security. If it is, the VPD SQL modification (WHERE
clause) is added to the original SQL statement to find the set of accessible rows for
the user to view. Then Oracle Label Security checks the labels on each row, to deter-
mine the subset of rows to which the user has access (as explained in the next sec-
tion). This modified query gets processed, optimized, and executed.
10.3 How Data Labels and User Labels Work Together
A user’s label indicates the information the user is permitted to access. It also deter-
mines the type of access (read or write) that the user has on that information. A
row’s label shows the sensitivity of the information that the row contains as well as
the ownership of the information. When a table in the database has a label-based
access associated with it, a row can be accessed only if the user’s label meet certain
criteria defined in the policy definitions. Access is granted or denied based on the
result of comparing the data label and the session label of the user.
Compartments allow a finer classification of sensitivity of the labeled data. All data
related to the same project can be labeled with the same compartment.
Compartments are optional; a label can contain zero or more compartments.
870
Database Security
Oracle User
Request for Data in SQL
Check DAC
(Discretionary) Access
Control
Check Virtual Private
Database (VDP) Policy
Process and Execute
Data Request
Enforce Label-
Based Security
Oracle Data Server
Table Level
Privileges
Table
Data Rows
in Table
Label Security
Policies
Row Level
Access Control
VPD-Based
Control
User-Defined
VPD Policies
Figure 4
Oracle Label Security
architecture.
Source: Oracle (2007)
Groups are used to identify organizations as owners of the data with corresponding
group labels. Groups are hierarchical; for example, a group can be associated with a
parent group.
If a user has a maximum level of SENSITIVE, then the user potentially has access to
all data having levels SENSITIVE, CONFIDENTIAL, and UNCLASSIFIED. This user has
no access to HIGHLY_SENSITIVE data. Figure 5 shows how data labels and user labels
work together to provide access control in Oracle Label Security.
As shown in Figure 5, User 1 can access the rows 2, 3, and 4 because his maximum
level is HS (Highly_Sensitive). He has access to the FIN (Finance) compartment,
and his access to group WR (Western Region) hierarchically includes group
WR_SAL (WR Sales). He cannot access row 1 because he does not have the CHEM
(Chemical) compartment. It is important that a user has authorization for all com-
partments in a row’s data label to be able to access that row. Based on this example,
user 2 can access both rows 3 and 4, and has a maximum level of S, which is less than
the HS in row 2. So, although user 2 has access to the FIN compartment, he can only
access the group WR_SAL, and thus cannot acces row 1.
11 Summary
In this chapter we discussed several techniques for enforcing database system secu-
rity. We presented different threats to databases in terms of loss of integrity, avail-
ability, and confidentiality. We discussed the types of control measures to deal with
these problems: access control, inference control, flow control, and encryption. In
871
Database Security
User Labels
HS FIN : WR
S FIN : WR_SAL
Legend for Labels
HS = Highly sensitive
S = Sensitive
C = Confidential
U = Unclassified
Maximum
Access
Level
All compartments to which
the user has access
Minimum
Access Level
Required
All compartments to which
the user must have access
User Label
Data Label
Rows in Table Data Labels
Row 1
Row 2
Row 3
Row 4
S CHEM, FIN : WR
HS FIN : WR_SAL
U FIN
C FIN : WR_SAL
Figure 5
Data labels and user
labels in Oracle.
Source: Oracle (2007)
the introduction we covered various issues related to security including data sensi-
tivity and type of disclosures, providing security vs. precision in the result when a
user requests information, and the relationship between information security and
privacy.
Security enforcement deals with controlling access to the database system as a whole
and controlling authorization to access specific portions of a database. The former
is usually done by assigning accounts with passwords to users. The latter can be
accomplished by using a system of granting and revoking privileges to individual
accounts for accessing specific parts of the database. This approach is generally
referred to as discretionary access control (DAC). We presented some SQL com-
mands for granting and revoking privileges, and we illustrated their use with exam-
ples. Then we gave an overview of mandatory access control (MAC) mechanisms
that enforce multilevel security. These require the classifications of users and data
values into security classes and enforce the rules that prohibit flow of information
from higher to lower security levels. Some of the key concepts underlying the mul-
tilevel relational model, including filtering and polyinstantiation, were presented.
Role-based access control (RBAC) was introduced, which assigns privileges based
on roles that users play. We introduced the notion of role hierarchies, mutual exclu-
sion of roles, and row- and label-based security. We briefly discussed the problem of
controlling access to statistical databases to protect the privacy of individual infor-
mation while concurrently providing statistical access to populations of records. We
explained the main ideas behind the threat of SQL Injection, the methods in which
it can be induced, and the various types of risks associated with it. Then we gave an
872
Database Security
idea of the various ways SQL injection can be prevented. The issues related to flow
control and the problems associated with covert channels were discussed next, as
well as encryption and public-private key-based infrastructures. The idea of sym-
metric key algorithms and the use of the popular asymmetric key-based public key
infrastructure (PKI) scheme was explained. We also covered the concepts of digital
signatures and digital certificates. We highlighted the importance of privacy issues
and hinted at some privacy preservation techniques. We discussed a variety of chal-
lenges to security including data quality, intellectual property rights, and data sur-
vivability. We ended the chapter by introducing the implementation of security
policies by using a combination of label-based security and virtual private databases
in Oracle 11g.
Review Questions
1. Discuss what is meant by each of the following terms: database authoriza-
tion, access control, data encryption, privileged (system) account, database
audit, audit trail.
2. Which account is designated as the owner of a relation? What privileges does
the owner of a relation have?
3. How is the view mechanism used as an authorization mechanism?
4. Discuss the types of privileges at the account level and those at the relation
level.
5. What is meant by granting a privilege? What is meant by revoking a
privilege?
6. Discuss the system of propagation of privileges and the restraints imposed
by horizontal and vertical propagation limits.
7. List the types of privileges available in SQL.
8. What is the difference between discretionary and mandatory access control?
9. What are the typical security classifications? Discuss the simple security
property and the *-property, and explain the justification behind these rules
for enforcing multilevel security.
10. Describe the multilevel relational data model. Define the following terms:
apparent key, polyinstantiation, filtering.
11. What are the relative merits of using DAC or MAC?
12. What is role-based access control? In what ways is it superior to DAC and
MAC?
13. What are the two types of mutual exclusion in role-based access control?
14. What is meant by row-level access control?
15. What is label security? How does an administrator enforce it?
873
Database Security
16. What are the different types of SQL injection attacks?
17. What risks are associated with SQL injection attacks?
18. What preventive measures are possible against SQL injection attacks?
19. What is a statistical database? Discuss the problem of statistical database
security.
20. How is privacy related to statistical database security? What measures can be
taken to ensure some degree of privacy in statistical databases?
21. What is flow control as a security measure? What types of flow control exist?
22. What are covert channels? Give an example of a covert channel.
23. What is the goal of encryption? What process is involved in encrypting data
and then recovering it at the other end?
24. Give an example of an encryption algorithm and explain how it works.
25. Repeat the previous question for the popular RSA algorithm.
26. What is a symmetric key algorithm for key-based security?
27. What is the public key infrastructure scheme? How does it provide security?
28. What are digital signatures? How do they work?
29. What type of information does a digital certificate include?
Exercises
30. How can privacy of data be preserved in a database?
31. What are some of the current outstanding challenges for database security?
32. Consider the relational database schema in Figure A.1 (at the end of this
chapter). Suppose that all the relations were created by (and hence are
owned by) user X, who wants to grant the following privileges to user
accounts A, B, C, D, and E:
a. Account A can retrieve or modify any relation except DEPENDENT and
can grant any of these privileges to other users.
b. Account B can retrieve all the attributes of EMPLOYEE and DEPARTMENT
except for Salary, Mgr_ssn, and Mgr_start_date.
c. Account C can retrieve or modify WORKS_ON but can only retrieve the
Fname, Minit, Lname, and Ssn attributes of EMPLOYEE and the Pname and
Pnumber attributes of PROJECT.
d. Account D can retrieve any attribute of EMPLOYEE or DEPENDENT and
can modify DEPENDENT.
e. Account E can retrieve any attribute of EMPLOYEE but only for
EMPLOYEE tuples that have Dno = 3.
f. Write SQL statements to grant these privileges. Use views where
appropriate.
874
Database Security
33. Suppose that privilege (a) of Exercise 32 is to be given with GRANT OPTION
but only so that account A can grant it to at most five accounts, and each of
these accounts can propagate the privilege to other accounts but without the
GRANT OPTION privilege. What would the horizontal and vertical propaga-
tion limits be in this case?
34. Consider the relation shown in Figure 2(d). How would it appear to a user
with classification U? Suppose that a classification U user tries to update the
salary of ‘Smith’ to $50,000; what would be the result of this action?
Selected Bibliography
Authorization based on granting and revoking privileges was proposed for the
SYSTEM R experimental DBMS and is presented in Griffiths and Wade (1976).
Several books discuss security in databases and computer systems in general,
including the books by Leiss (1982a) and Fernandez et al. (1981), and Fugini et al.
(1995). Natan (2005) is a practical book on security and auditing implementation
issues in all major RDBMSs.
Many papers discuss different techniques for the design and protection of statistical
databases. They include McLeish (1989), Chin and Ozsoyoglu (1981), Leiss (1982),
Wong (1984), and Denning (1980). Ghosh (1984) discusses the use of statistical
databases for quality control. There are also many papers discussing cryptography
and data encryption, including Diffie and Hellman (1979), Rivest et al. (1978), Akl
(1983), Pfleeger and Pfleeger (2007), Omura et al. (1990), Stallings (2000), and Iyer
at al. (2004).
Halfond et al. (2006) helps understand the concepts of SQL injection attacks and
the various threats imposed by them. The white paper Oracle (2007a) explains how
Oracle is less prone to SQL injection attack as compared to SQL Server. It also gives
a brief explanation as to how these attacks can be prevented from occurring.
Further proposed frameworks are discussed in Boyd and Keromytis (2004), Halfond
and Orso (2005), and McClure and Krüger (2005).
Multilevel security is discussed in Jajodia and Sandhu (1991), Denning et al. (1987),
Smith and Winslett (1992), Stachour and Thuraisingham (1990), Lunt et al. (1990),
and Bertino et al. (2001). Overviews of research issues in database security are given
by Lunt and Fernandez (1990), Jajodia and Sandhu (1991), Bertino (1998), Castano
et al. (1995), and Thuraisingham et al. (2001). The effects of multilevel security on
concurrency control are discussed in Atluri et al. (1997). Security in next-generation,
semantic, and object-oriented databases is discussed in Rabbiti et al. (1991), Jajodia
and Kogan (1990), and Smith (1990). Oh (1999) presents a model for both discre-
tionary and mandatory security. Security models for Web-based applications and
role-based access control are discussed in Joshi et al. (2001). Security issues for man-
agers in the context of e-commerce applications and the need for risk assessment
models for selection of appropriate security control measures are discussed in
875
Database Security
Farahmand et al. (2005). Row-level access control is explained in detail in Oracle
(2007b) and Sybase (2005). The latter also provides details on role hierarchy and
mutual exclusion. Oracle (2009) explains how Oracle uses the concept of identity
management.
Recent advances as well as future challenges for security and privacy of databases are
discussed in Bertino and Sandhu (2005). U.S. Govt. (1978), OECD (1980), and NRC
(2003) are good references on the view of privacy by important government bodies.
Karat et al. (2009) discusses a policy framework for security and privacy. XML and
access control are discussed in Naedele (2003). More details can be found on privacy
preserving techniques in Vaidya and Clifton (2004), intellectual property rights in
Sion et al. (2004), and database survivability in Jajodia et al. (1999). Oracle’s VPD
technology and label-based security is discussed in more detail in Oracle (2007b).
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.1
Schema diagram for the
COMPANY relational
database schema.
876
Distributed Databases
In this chapter we direct our attention to distributeddatabases (DDBs), distributed database management
systems (DDBMSs), and how the client-server architecture is used as a platform for
database application development. Distributed databases bring the advantages of
distributed computing to the database management domain. A distributed com-
puting system consists of a number of processing elements, not necessarily homo-
geneous, that are interconnected by a computer network, and that cooperate in
performing certain assigned tasks. As a general goal, distributed computing systems
partition a big, unmanageable problem into smaller pieces and solve it efficiently in
a coordinated manner. The economic viability of this approach stems from two rea-
sons: more computing power is harnessed to solve a complex task, and each
autonomous processing element can be managed independently to develop its own
applications.
DDB technology resulted from a merger of two technologies: database technology,
and network and data communication technology. Computer networks allow dis-
tributed processing of data. Traditional databases, on the other hand, focus on pro-
viding centralized, controlled access to data. Distributed databases allow an
integration of information and its processing by applications that may themselves
be centralized or distributed.
Several distributed database prototype systems were developed in the 1980s to
address the issues of data distribution, distributed query and transaction process-
ing, distributed database metadata management, and other topics. However, a full-
scale comprehensive DDBMS that implements the functionality and techniques
proposed in DDB research never emerged as a commercially viable product. Most
major vendors redirected their efforts from developing a pure DDBMS product into
developing systems based on client-server concepts, or toward developing technolo-
gies for accessing distributed heterogeneous data sources.
From Chapter 25 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
877
Distributed Databases
Organizations continue to be interested in the decentralization of processing (at the
system level) while achieving an integration of the information resources (at the log-
ical level) within their geographically distributed systems of databases, applications,
and users. There is now a general endorsement of the client-server approach to
application development, and the three-tier approach to Web applications develop-
ment.
In this chapter we discuss distributed databases, their architectural variations, and
concepts central to data distribution and the management of distributed data.
Details of the advances in communication technologies facilitating the develop-
ment of DDBs are outside the scope of this text; see the texts on data communica-
tions and networking listed in the Selected Bibliography at the end of this chapter.
Section 1 introduces distributed database management and related concepts.
Sections 2 and 3 introduce different types of distributed database systems and their
architectures, including federated and multidatabase systems. The problems of het-
erogeneity and the needs of autonomy in federated database systems are also high-
lighted. Detailed issues of distributed database design, involving fragmenting of
data and distributing it over multiple sites with possible replication, are discussed in
Section 4. Sections 5 and 6 introduce distributed database query and transaction
processing techniques, respectively. Section 7 gives an overview of the concurrency
control and recovery in distributed databases. Section 8 discusses catalog manage-
ment schemes in distributed databases. In Section 9, we briefly discuss current
trends in distributed databases such as cloud computing and peer-to-peer data-
bases. Section 10 discusses distributed database features of the Oracle RDBMS.
Section 11 summarizes the chapter.
For a short introduction to the topic of distributed databases, Sections 1, 2, and 3
may be covered.
1 Distributed Database Concepts1
We can define a distributed database (DDB) as a collection of multiple logically
interrelated databases distributed over a computer network, and a distributed data-
base management system (DDBMS) as a software system that manages a distrib-
uted database while making the distribution transparent to the user.2
Distributed databases are different from Internet Web files. Web pages are basically
a very large collection of files stored on different nodes in a network—the
Internet—with interrelationships among the files represented via hyperlinks. The
common functions of database management, including uniform query processing
and transaction processing, do not apply to this scenario yet. The technology is,
however, moving in a direction such that distributed World Wide Web (WWW)
databases will become a reality in the future. The proliferation of data at millions of
1The substantial contribution of Narasimhan Srinivasan to this and several other sections in this chapter
is appreciated.
2This definition and discussions in this section are based largely on Ozsu and Valduriez (1999).
878
Distributed Databases
Websites in various forms does not qualify as a DDB by the definition given earlier.
1.1 Differences between DDB and Multiprocessor Systems
We need to distinguish distributed databases from multiprocessor systems that use
shared storage (primary memory or disk). For a database to be called distributed,
the following minimum conditions should be satisfied:
■ Connection of database nodes over a computer network. There are multi-
ple computers, called sites or nodes. These sites must be connected by an
underlying communication network to transmit data and commands
among sites, as shown later in Figure 3(c).
■ Logical interrelation of the connected databases. It is essential that the
information in the databases be logically related.
■ Absence of homogeneity constraint among connected nodes. It is not nec-
essary that all nodes be identical in terms of data, hardware, and software.
The sites may all be located in physical proximity—say, within the same building or
a group of adjacent buildings—and connected via a local area network, or they may
be geographically distributed over large distances and connected via a long-haul or
wide area network. Local area networks typically use wireless hubs or cables,
whereas long-haul networks use telephone lines or satellites. It is also possible to use
a combination of networks.
Networks may have different topologies that define the direct communication
paths among sites. The type and topology of the network used may have a signifi-
cant impact on the performance and hence on the strategies for distributed query
processing and distributed database design. For high-level architectural issues, how-
ever, it does not matter what type of network is used; what matters is that each site
be able to communicate, directly or indirectly, with every other site. For the remain-
der of this chapter, we assume that some type of communication network exists
among sites, regardless of any particular topology. We will not address any network-
specific issues, although it is important to understand that for an efficient operation
of a distributed database system (DDBS), network design and performance issues
are critical and are an integral part of the overall solution. The details of the under-
lying communication network are invisible to the end user.
1.2 Transparency
The concept of transparency extends the general idea of hiding implementation
details from end users. A highly transparent system offers a lot of flexibility to the
end user/application developer since it requires little or no awareness of underlying
details on their part. In the case of a traditional centralized database, transparency
simply pertains to logical and physical data independence for application develop-
ers. However, in a DDB scenario, the data and software are distributed over multiple
879
Distributed Databases
sites connected by a computer network, so additional types of transparencies are
introduced.
Consider the company database in Figure A.1 in Appendix: Figures at the end of this
chapter. The EMPLOYEE, PROJECT, and WORKS_ON tables may be fragmented
horizontally (that is, into sets of rows, as we will discuss in Section 4) and stored
with possible replication as shown in Figure 1. The following types of transparen-
cies are possible:
■ Data organization transparency (also known as distribution or network
transparency). This refers to freedom for the user from the operational
details of the network and the placement of the data in the distributed sys-
tem. It may be divided into location transparency and naming transparency.
Location transparency refers to the fact that the command used to perform
a task is independent of the location of the data and the location of the node
where the command was issued. Naming transparency implies that once a
name is associated with an object, the named objects can be accessed unam-
biguously without additional specification as to where the data is located.
■ Replication transparency. As we show in Figure 1, copies of the same data
objects may be stored at multiple sites for better availability, perfor-mance,
and reliability. Replication transparency makes the user unaware of the exis-
tence of these copies.
■ Fragmentation transparency. Two types of fragmentation are possible.
Horizontal fragmentation distributes a relation (table) into subrelations
EMPLOYEES
PROJECTS
WORKS_ON
All
All
All
EMPLOYEES
PROJECTS
WORKS_ON
San Francisco
and Los Angeles
San Francisco
San Francisco
employees
EMPLOYEES
PROJECTS
WORKS_ON
Los Angeles
Los Angeles and
San Francisco
Los Angeles
employees
EMPLOYEES
PROJECTS
WORKS_ON
New York
All
New York
employees
EMPLOYEES
PROJECTS
WORKS_ON
Atlanta
Atlanta
Atlanta
employees
Chicago
(Headquarters)
New York
Los Angeles Atlanta
San Francisco
Communications
Network
Figure 1
Data distribution and replication
among distributed databases.
880
Distributed Databases
that are subsets of the tuples (rows) in the original relation. Vertical frag-
mentation distributes a relation into subrelations where each subrelation is
defined by a subset of the columns of the original relation. A global query by
the user must be transformed into several fragment queries. Fragmentation
transparency makes the user unaware of the existence of fragments.
■ Other transparencies include design transparency and execution trans-
parency—referring to freedom from knowing how the distributed database
is designed and where a transaction executes.
1.3 Autonomy
Autonomy determines the extent to which individual nodes or DBs in a connected
DDB can operate independently. A high degree of autonomy is desirable for
increased flexibility and customized maintenance of an individual node. Autonomy
can be applied to design, communication, and execution. Design autonomy refers
to independence of data model usage and transaction management techniques
among nodes. Communication autonomy determines the extent to which each
node can decide on sharing of information with other nodes. Execution autonomy
refers to independence of users to act as they please.
1.4 Reliability and Availability
Reliability and availability are two of the most common potential advantages cited
for distributed databases. Reliability is broadly defined as the probability that a sys-
tem is running (not down) at a certain time point, whereas availability is the prob-
ability that the system is continuously available during a time interval. We can
directly relate reliability and availability of the database to the faults, errors, and fail-
ures associated with it. A failure can be described as a deviation of a system’s behav-
ior from that which is specified in order to ensure correct execution of operations.
Errors constitute that subset of system states that causes the failure. Fault is the
cause of an error.
To construct a system that is reliable, we can adopt several approaches. One com-
mon approach stresses fault tolerance; it recognizes that faults will occur, and
designs mechanisms that can detect and remove faults before they can result in a
system failure. Another more stringent approach attempts to ensure that the final
system does not contain any faults. This is done through an exhaustive design
process followed by extensive quality control and testing. A reliable DDBMS toler-
ates failures of underlying components and processes user requests so long as data-
base consistency is not violated. A DDBMS recovery manager has to deal with
failures arising from transactions, hardware, and communication networks.
Hardware failures can either be those that result in loss of main memory contents or
loss of secondary storage contents. Communication failures occur due to errors
associated with messages and line failures. Message errors can include their loss,
corruption, or out-of-order arrival at destination.
881
Distributed Databases
1.5 Advantages of Distributed Databases
Organizations resort to distributed database management for various reasons.
Some important advantages are listed below.
1. Improved ease and flexibility of application development. Developing and
maintaining applications at geographically distributed sites of an organiza-
tion is facilitated owing to transparency of data distribution and control.
2. Increased reliability and availability. This is achieved by the isolation of
faults to their site of origin without affecting the other databases connected
to the network. When the data and DDBMS software are distributed over
several sites, one site may fail while other sites continue to operate. Only the
data and software that exist at the failed site cannot be accessed. This
improves both reliability and availability. Further improvement is achieved
by judiciously replicating data and software at more than one site. In a cen-
tralized system, failure at a single site makes the whole system unavailable to
all users. In a distributed database, some of the data may be unreachable, but
users may still be able to access other parts of the database. If the data in the
failed site had been replicated at another site prior to the failure, then the
user will not be affected at all.
3. Improved performance. A distributed DBMS fragments the database by
keeping the data closer to where it is needed most. Data localization reduces
the contention for CPU and I/O services and simultaneously reduces access
delays involved in wide area networks. When a large database is distributed
over multiple sites, smaller databases exist at each site. As a result, local
queries and transactions accessing data at a single site have better perfor-
mance because of the smaller local databases. In addition, each site has a
smaller number of transactions executing than if all transactions are submit-
ted to a single centralized database. Moreover, interquery and intraquery
parallelism can be achieved by executing multiple queries at different sites,
or by breaking up a query into a number of subqueries that execute in paral-
lel. This contributes to improved performance.
4. Easier expansion. In a distributed environment, expansion of the system in
terms of adding more data, increasing database sizes, or adding more proces-
sors is much easier.
The transparencies we discussed in Section 1.2 lead to a compromise between ease
of use and the overhead cost of providing transparency. Total transparency provides
the global user with a view of the entire DDBS as if it is a single centralized system.
Transparency is provided as a complement to autonomy, which gives the users
tighter control over local databases. Transparency features may be implemented as a
part of the user language, which may translate the required services into appropriate
operations. Additionally, transparency impacts the features that must be provided
by the operating system and the DBMS.
882
Distributed Databases
1.6 Additional Functions of Distributed Databases
Distribution leads to increased complexity in the system design and implementa-
tion. To achieve the potential advantages listed previously, the DDBMS software
must be able to provide the following functions in addition to those of a centralized
DBMS:
■ Keeping track of data distribution. The ability to keep track of the data dis-
tribution, fragmentation, and replication by expanding the DDBMS catalog.
■ Distributed query processing. The ability to access remote sites and trans-
mit queries and data among the various sites via a communication network.
■ Distributed transaction management. The ability to devise execution
strategies for queries and transactions that access data from more than one
site and to synchronize the access to distributed data and maintain the
integrity of the overall database.
■ Replicated data management. The ability to decide which copy of a repli-
cated data item to access and to maintain the consistency of copies of a repli-
cated data item.
■ Distributed database recovery. The ability to recover from individual site
crashes and from new types of failures, such as the failure of communication
links.
■ Security. Distributed transactions must be executed with the proper man-
agement of the security of the data and the authorization/access privileges of
users.
■ Distributed directory (catalog) management. A directory contains infor-
mation (metadata) about data in the database. The directory may be global
for the entire DDB, or local for each site. The placement and distribution of
the directory are design and policy issues.
These functions themselves increase the complexity of a DDBMS over a centralized
DBMS. Before we can realize the full potential advantages of distribution, we must
find satisfactory solutions to these design issues and problems. Including all this
additional functionality is hard to accomplish, and finding optimal solutions is a
step beyond that.
2 Types of Distributed Database Systems
The term distributed database management system can describe various systems that
differ from one another in many respects. The main thing that all such systems have
in common is the fact that data and software are distributed over multiple sites con-
nected by some form of communication network. In this section we discuss a num-
ber of types of DDBMSs and the criteria and factors that make some of these
systems different.
883
Distributed Databases
The first factor we consider is the degree of homogeneity of the DDBMS software.
If all servers (or individual local DBMSs) use identical software and all users
(clients) use identical software, the DDBMS is called homogeneous; otherwise, it is
called heterogeneous. Another factor related to the degree of homogeneity is the
degree of local autonomy. If there is no provision for the local site to function as a
standalone DBMS, then the system has no local autonomy. On the other hand, if
direct access by local transactions to a server is permitted, the system has some
degree of local autonomy.
Figure 2 shows classification of DDBMS alternatives along orthogonal axes of dis-
tribution, autonomy, and heterogeneity. For a centralized database, there is com-
plete autonomy, but a total lack of distribution and heterogeneity (Point A in the
figure). We see that the degree of local autonomy provides further ground for classi-
fication into federated and multidatabase systems. At one extreme of the autonomy
spectrum, we have a DDBMS that looks like a centralized DBMS to the user, with
zero autonomy (Point B). A single conceptual schema exists, and all access to the
system is obtained through a site that is part of the DDBMS—which means that no
local autonomy exists. Along the autonomy axis we encounter two types of
DDBMSs called federated database system (Point C) and multidatabase system
(Point D). In such systems, each server is an independent and autonomous central-
ized DBMS that has its own local users, local transactions, and DBA, and hence has
B
Distribution
Heterogeneity
Legend:
A: Traditional centralized database
systems
B: Pure distributed database systems
C: Federated database systems
D: Multidatabase or peer to peer
database systems
C D
A
Autonomy
Figure 2
Classification of dis-
tributed databases.
884
Distributed Databases
a very high degree of local autonomy. The term federated database system (FDBS)
is used when there is some global view or schema of the federation of databases that
is shared by the applications (Point C). On the other hand, a multidatabase system
has full local autonomy in that it does not have a global schema but interactively
constructs one as needed by the application (Point D).3 Both systems are hybrids
between distributed and centralized systems, and the distinction we made between
them is not strictly followed. We will refer to them as FDBSs in a generic sense. Point
D in the diagram may also stand for a system with full local autonomy and full het-
erogeneity—this could be a peer-to-peer database system (see Section 9.2). In a het-
erogeneous FDBS, one server may be a relational DBMS, another a network DBMS
(such as Computer Associates’ IDMS or HP’S IMAGE/3000), and a third an object
DBMS (such as Object Design’s ObjectStore) or hierarchical DBMS (such as IBM’s
IMS); in such a case, it is necessary to have a canonical system language and to
include language translators to translate subqueries from the canonical language to
the language of each server.
We briefly discuss the issues affecting the design of FDBSs next.
2.1 Federated Database Management Systems Issues
The type of heterogeneity present in FDBSs may arise from several sources. We dis-
cuss these sources first and then point out how the different types of autonomies
contribute to a semantic heterogeneity that must be resolved in a heterogeneous
FDBS.
■ Differences in data models. Databases in an organization come from a vari-
ety of data models, including the so-called legacy models (hierarchical and
network), the relational data model, the object data model, and even files.
The modeling capabilities of the models vary. Hence, to deal with them uni-
formly via a single global schema or to process them in a single language is
challenging. Even if two databases are both from the RDBMS environment,
the same information may be represented as an attribute name, as a relation
name, or as a value in different databases. This calls for an intelligent query-
processing mechanism that can relate information based on metadata.
■ Differences in constraints. Constraint facilities for specification and imple-
mentation vary from system to system. There are comparable features that
must be reconciled in the construction of a global schema. For example, the
relationships from ER models are represented as referential integrity con-
straints in the relational model. Triggers may have to be used to implement
certain constraints in the relational model. The global schema must also deal
with potential conflicts among constraints.
3The term multidatabase system is not easily applicable to most enterprise IT environments. The notion of
constructing a global schema as and when the need arises is not very feasible in practice for enterprise
databases.
885
Distributed Databases
■ Differences in query languages. Even with the same data model, the lan-
guages and their versions vary. For example, SQL has multiple versions like
SQL-89, SQL-92, SQL-99, and SQL:2008, and each system has its own set of
data types, comparison operators, string manipulation features, and so on.
Semantic Heterogeneity. Semantic heterogeneity occurs when there are differ-
ences in the meaning, interpretation, and intended use of the same or related data.
Semantic heterogeneity among component database systems (DBSs) creates the
biggest hurdle in designing global schemas of heterogeneous databases. The design
autonomy of component DBSs refers to their freedom of choosing the following
design parameters, which in turn affect the eventual complexity of the FDBS:
■ The universe of discourse from which the data is drawn. For example, for
two customer accounts, databases in the federation may be from the United
States and Japan and have entirely different sets of attributes about customer
accounts required by the accounting practices. Currency rate fluctuations
would also present a problem. Hence, relations in these two databases that
have identical names—CUSTOMER or ACCOUNT—may have some com-
mon and some entirely distinct information.
■ Representation and naming. The representation and naming of data ele-
ments and the structure of the data model may be prespecified for each local
database.
■ The understanding, meaning, and subjective interpretation of data. This is
a chief contributor to semantic heterogeneity.
■ Transaction and policy constraints. These deal with serializability criteria,
compensating transactions, and other transaction policies.
■ Derivation of summaries. Aggregation, summarization, and other data-
processing features and operations supported by the system.
The above problems related to semantic heterogeneity are being faced by all major
multinational and governmental organizations in all application areas. In today’s
commercial environment, most enterprises are resorting to heterogeneous FDBSs,
having heavily invested in the development of individual database systems using
diverse data models on different platforms over the last 20 to 30 years. Enterprises
are using various forms of software—typically called the middleware, or Web-
based packages called application servers (for example, WebLogic or WebSphere)
and even generic systems, called Enterprise Resource Planning (ERP) systems (for
example, SAP, J. D. Edwards ERP)—to manage the transport of queries and transac-
tions from the global application to individual databases (with possible additional
processing for business rules) and the data from the heterogeneous database servers
to the global application. Detailed discussion of these types of software systems is
outside the scope of this text.
Just as providing the ultimate transparency is the goal of any distributed database
architecture, local component databases strive to preserve autonomy.
Communication autonomy of a component DBS refers to its ability to decide
whether to communicate with another component DBS. Execution autonomy
886
Distributed Databases
refers to the ability of a component DBS to execute local operations without inter-
ference from external operations by other component DBSs and its ability to decide
the order in which to execute them. The association autonomy of a component
DBS implies that it has the ability to decide whether and how much to share its
functionality (operations it supports) and resources (data it manages) with other
component DBSs. The major challenge of designing FDBSs is to let component
DBSs interoperate while still providing the above types of autonomies to them.
3 Distributed Database Architectures
In this section, we first briefly point out the distinction between parallel and distrib-
uted database architectures. While both are prevalent in industry today, there are
various manifestations of the distributed architectures that are continuously evolv-
ing among large enterprises. The parallel architecture is more common in high-
performance computing, where there is a need for multiprocessor architectures to
cope with the volume of data undergoing transaction processing and warehousing
applications. We then introduce a generic architecture of a distributed database.
This is followed by discussions on the architecture of three-tier client-server and
federated database systems.
3.1 Parallel versus Distributed Architectures
There are two main types of multiprocessor system architectures that are common-
place:
■ Shared memory (tightly coupled) architecture. Multiple processors share
secondary (disk) storage and also share primary memory.
■ Shared disk (loosely coupled) architecture. Multiple processors share sec-
ondary (disk) storage but each has their own primary memory.
These architectures enable processors to communicate without the overhead of
exchanging messages over a network.4 Database management systems developed
using the above types of architectures are termed parallel database management
systems rather than DDBMSs, since they utilize parallel processor technology.
Another type of multiprocessor architecture is called shared nothing architecture.
In this architecture, every processor has its own primary and secondary (disk)
memory, no common memory exists, and the processors communicate over a high-
speed interconnection network (bus or switch). Although the shared nothing archi-
tecture resembles a distributed database computing environment, major differences
exist in the mode of operation. In shared nothing multiprocessor systems, there is
symmetry and homogeneity of nodes; this is not true of the distributed database
environment where heterogeneity of hardware and operating system at each node is
very common. Shared nothing architecture is also considered as an environment for
4If both primary and secondary memories are shared, the architecture is also known as shared everything
architecture.
887
Distributed Databases
parallel databases. Figure 3a illustrates a parallel database (shared nothing), whereas
Figure 3b illustrates a centralized database with distributed access and Figure 3c
shows a pure distributed database. We will not expand on parallel architectures and
related data management issues here.
(a)
(b)
Switch
CPU
Computer System 1
Memory
DB CPU
Computer System 2
Memory
DB
CPU
Memory
DB
Computer System n
Central Site
(Chicago)
Site
(New York)
Site
(Los Angeles)
Site
(Atlanta)
Site
(San Francisco)
DB1 DB2
Communications
Network
(c)
Site 5
Site 1
Site 2
Site 4
Site 3
Communications
Network
Figure 3
Some different database system architectures. (a) Shared nothing architecture.
(b) A networked architecture with a centralized database at one of the sites. (c)
A truly distributed database architecture.
888
Distributed Databases
User
Stored
Data
Global Conceptual Schema (GCS)
External
View
User
External
View
Local Conceptual Schema (LCS) Local Conceptual Schema (LCS)
Local Internal Schema (LIS) Local Internal Schema (LIS)
Stored
Data
Site 1 Site nSites 2 to n–1
Figure 4
Schema architecture of
distributed databases.
3.2 General Architecture of Pure Distributed Databases
In this section we discuss both the logical and component architectural models of a
DDB. In Figure 4, which describes the generic schema architecture of a DDB, the
enterprise is presented with a consistent, unified view showing the logical structure
of underlying data across all nodes. This view is represented by the global concep-
tual schema (GCS), which provides network transparency (see Section 1.2). To
accommodate potential heterogeneity in the DDB, each node is shown as having its
own local internal schema (LIS) based on physical organization details at that par-
ticular site. The logical organization of data at each site is specified by the local con-
ceptual schema (LCS). The GCS, LCS, and their underlying mappings provide the
fragmentation and replication transparency discussed in Section 1.2. Figure 5 shows
the component architecture of a DDB. The global query compiler references the
global conceptual schema from the global system catalog to verify and impose
defined constraints. The global query optimizer references both global and local
conceptual schemas and generates optimized local queries from global queries. It
evaluates all candidate strategies using a cost function that estimates cost based on
response time (CPU, I/O, and network latencies) and estimated sizes of intermedi-
889
Distributed Databases
User
Interactive Global Query
Stored
Data
Global Query Compiler
Global Query Optimizer
Global Transaction Manager
Local Transaction Manager
Local Query
Translation
and Execution
Local
System
Catalog
Stored
Data
Local Transaction Manager
Local Query
Translation
and Execution
Local
System
Catalog
Figure 5
Component architecture
of distributed databases.
ate results. The latter is particularly important in queries involving joins. Having
computed the cost for each candidate, the optimizer selects the candidate with the
minimum cost for execution. Each local DBMS would have their local query opti-
mizer, transaction manager, and execution engines as well as the local system cata-
log, which houses the local schemas. The global transaction manager is responsible
for coordinating the execution across multiple sites in conjunction with the local
transaction manager at those sites.
3.3 Federated Database Schema Architecture
Typical five-level schema architecture to support global applications in the FDBS
environment is shown in Figure 6. In this architecture, the local schema is the
890
External
schema
Federated
schema
. . .
. . .
. . .
. . .
. . .
Component
schema
Local
schema
Component
DBS
External
schema
External
schema
Federated
schema
Export
schema
Component
schema
Local
schema
Component
DBS
Export
schema
Export
schema
Distributed Databases
Figure 6
The five-level schema architecture
in a federated database system
(FDBS).
Source: Adapted from Sheth and
Larson, “Federated Database Systems
for Managing Distributed,
Heterogeneous, and Autonomous
Databases.” ACM Computing Surveys
(Vol. 22: No. 3, September 1990).
conceptual schema (full database definition) of a component database, and the
component schema is derived by translating the local schema into a canonical data
model or common data model (CDM) for the FDBS. Schema translation from the
local schema to the component schema is accompanied by generating mappings to
transform commands on a component schema into commands on the corres-
ponding local schema. The export schema represents the subset of a component
schema that is available to the FDBS. The federated schema is the global schema or
view, which is the result of integrating all the shareable export schemas. The
external schemas define the schema for a user group or an application, as in the
three-level schema architecture.5
All the problems related to query processing, transaction processing, and directory
and metadata management and recovery apply to FDBSs with additional considera-
tions. It is not within our scope to discuss them in detail here.
5For a detailed discussion of the autonomies and the five-level architecture of FDBMSs, see Sheth and
Larson (1990).
891
Distributed Databases
Client
User interface or presentation tier
(Web browser, HTML, JavaScript, Visual Basic, . . .)
HTTP Protocol
Application server
Application (business) logic tier
(Application program, JAVA, C/C++, C#, . . .)
Database server
Query and transaction processing tier
(Database access, SQL, PSM, XML, . . .)
ODBC, JDBC, SQL/CLI, SQLJ
Figure 7
The three-tier
client-server
architecture.
3.4 An Overview of Three-Tier Client-Server Architecture
As we pointed out in the chapter introduction, full-scale DDBMSs have not been
developed to support all the types of functionalities. Instead, distributed database
applications are being developed in the context of the client-server architectures.
There is the two-tier client-server architecture, but it is now more common to use a
three-tier architecture, particularly in Web applications. This architecture is illus-
trated in Figure 7.
In the three-tier client-server architecture, the following three layers exist:
1. Presentation layer (client). This provides the user interface and interacts
with the user. The programs at this layer present Web interfaces or forms to
the client in order to interface with the application. Web browsers are often
utilized, and the languages and specifications used include HTML, XHTML,
CSS, Flash, MathML, Scalable Vector Graphics (SVG), Java, JavaScript,
Adobe Flex, and others. This layer handles user input, output, and naviga-
tion by accepting user commands and displaying the needed information,
usually in the form of static or dynamic Web pages. The latter are employed
when the interaction involves database access. When a Web interface is used,
this layer typically communicates with the application layer via the HTTP
protocol.
2. Application layer (business logic). This layer programs the application
logic. For example, queries can be formulated based on user input from the
client, or query results can be formatted and sent to the client for presenta-
tion. Additional application functionality can be handled at this layer, such
892
Distributed Databases
as security checks, identity verification, and other functions. The application
layer can interact with one or more databases or data sources as needed by
connecting to the database using ODBC, JDBC, SQL/CLI, or other database
access techniques.
3. Database server. This layer handles query and update requests from the
application layer, processes the requests, and sends the results. Usually SQL is
used to access the database if it is relational or object-relational and stored
database procedures may also be invoked. Query results (and queries) may
be formatted into XML when transmitted between the application server
and the database server.
Exactly how to divide the DBMS functionality between the client, application
server, and database server may vary. The common approach is to include the func-
tionality of a centralized DBMS at the database server level. A number of relational
DBMS products have taken this approach, where an SQL server is provided. The
application server must then formulate the appropriate SQL queries and connect to
the database server when needed. The client provides the processing for user inter-
face interactions. Since SQL is a relational standard, various SQL servers, possibly
provided by different vendors, can accept SQL commands through standards such
as ODBC, JDBC, and SQL/CLI.
In this architecture, the application server may also refer to a data dictionary that
includes information on the distribution of data among the various SQL servers, as
well as modules for decomposing a global query into a number of local queries that
can be executed at the various sites. Interaction between an application server and
database server might proceed as follows during the processing of an SQL query:
1. The application server formulates a user query based on input from the
client layer and decomposes it into a number of independent site queries.
Each site query is sent to the appropriate database server site.
2. Each database server processes the local query and sends the results to the
application server site. Increasingly, XML is being touted as the standard for
data exchange, so the database server may format the query result into XML
before sending it to the application server.
3. The application server combines the results of the subqueries to produce the
result of the originally required query, formats it into HTML or some other
form accepted by the client, and sends it to the client site for display.
The application server is responsible for generating a distributed execution plan for
a multisite query or transaction and for supervising distributed execution by send-
ing commands to servers. These commands include local queries and transactions
to be executed, as well as commands to transmit data to other clients or servers.
Another function controlled by the application server (or coordinator) is that of
ensuring consistency of replicated copies of a data item by employing distributed
(or global) concurrency control techniques. The application server must also ensure
the atomicity of global transactions by performing global recovery when certain
sites fail.
893
Distributed Databases
If the DDBMS has the capability to hide the details of data distribution from the
application server, then it enables the application server to execute global queries
and transactions as though the database were centralized, without having to specify
the sites at which the data referenced in the query or transaction resides. This prop-
erty is called distribution transparency. Some DDBMSs do not provide distribu-
tion transparency, instead requiring that applications are aware of the details of data
distribution.
4 Data Fragmentation, Replication,
and Allocation Techniques for Distributed
Database Design
In this section we discuss techniques that are used to break up the database into log-
ical units, called fragments, which may be assigned for storage at the various sites.
We also discuss the use of data replication, which permits certain data to be stored
in more than one site, and the process of allocating fragments—or replicas of frag-
ments—for storage at the various sites. These techniques are used during the
process of distributed database design. The information concerning data fragmen-
tation, allocation, and replication is stored in a global directory that is accessed by
the DDBS applications as needed.
4.1 Data Fragmentation
In a DDB, decisions must be made regarding which site should be used to store
which portions of the database. For now, we will assume that there is no replication;
that is, each relation—or portion of a relation—is stored at one site only. We discuss
replication and its effects later in this section. We also use the terminology of rela-
tional databases, but similar concepts apply to other data models. We assume that
we are starting with a relational database schema and must decide on how to dis-
tribute the relations over the various sites. To illustrate our discussion, we use the
relational database schema in Figure A.1.
Before we decide on how to distribute the data, we must determine the logical units
of the database that are to be distributed. The simplest logical units are the relations
themselves; that is, each whole relation is to be stored at a particular site. In our
example, we must decide on a site to store each of the relations EMPLOYEE,
DEPARTMENT, PROJECT, WORKS_ON, and DEPENDENT in Figure A.1. In many
cases, however, a relation can be divided into smaller logical units for distribution.
For example, consider the company database shown in Figure A.2, and assume there
are three computer sites—one for each department in the company.6
We may want to store the database information relating to each department at the
computer site for that department. A technique called horizontal fragmentation can
be used to partition each relation by department.
6Of course, in an actual situation, there will be many more tuples in the relation than those shown in
Figure A.2.
894
Distributed Databases
Horizontal Fragmentation. A horizontal fragment of a relation is a subset of
the tuples in that relation. The tuples that belong to the horizontal fragment are
specified by a condition on one or more attributes of the relation. Often, only a sin-
gle attribute is involved. For example, we may define three horizontal fragments on
the EMPLOYEE relation in Figure A.2 with the following conditions: (Dno = 5),
(Dno = 4), and (Dno = 1)—each fragment contains the EMPLOYEE tuples working
for a particular department. Similarly, we may define three horizontal fragments
for the PROJECT relation, with the conditions (Dnum = 5), (Dnum = 4), and
(Dnum = 1)—each fragment contains the PROJECT tuples controlled by a particu-
lar department. Horizontal fragmentation divides a relation horizontally by
grouping rows to create subsets of tuples, where each subset has a certain logical
meaning. These fragments can then be assigned to different sites in the distributed
system. Derived horizontal fragmentation applies the partitioning of a primary
relation (DEPARTMENT in our example) to other secondary relations (EMPLOYEE
and PROJECT in our example), which are related to the primary via a foreign key.
This way, related data between the primary and the secondary relations gets frag-
mented in the same way.
Vertical Fragmentation. Each site may not need all the attributes of a relation,
which would indicate the need for a different type of fragmentation. Vertical frag-
mentation divides a relation “vertically” by columns. A vertical fragment of a rela-
tion keeps only certain attributes of the relation. For example, we may want to
fragment the EMPLOYEE relation into two vertical fragments. The first fragment
includes personal information—Name, Bdate, Address, and Sex—and the second
includes work-related information—Ssn, Salary, Super_ssn, and Dno. This vertical
fragmentation is not quite proper, because if the two fragments are stored sepa-
rately, we cannot put the original employee tuples back together, since there is no
common attribute between the two fragments. It is necessary to include the primary
key or some candidate key attribute in every vertical fragment so that the full rela-
tion can be reconstructed from the fragments. Hence, we must add the Ssn attribute
to the personal information fragment.
Notice that each horizontal fragment on a relation R can be specified in the rela-
tional algebra by a σCi
(R) operation. A set of horizontal fragments whose conditions
C1, C2, …, Cn include all the tuples in R—that is, every tuple in R satisfies (C1 OR C2
OR … OR Cn)—is called a complete horizontal fragmentation of R. In many cases
a complete horizontal fragmentation is also disjoint; that is, no tuple in R satisfies
(Ci AND Cj) for any i ≠ j. Our two earlier examples of horizontal fragmentation for
the EMPLOYEE and PROJECT relations were both complete and disjoint. To recon-
struct the relation R from a complete horizontal fragmentation, we need to apply the
UNION operation to the fragments.
A vertical fragment on a relation R can be specified by a πLi
(R) operation in the rela-
tional algebra. A set of vertical fragments whose projection lists L1, L2, …, Ln include
all the attributes in R but share only the primary key attribute of R is called a
895
Distributed Databases
complete vertical fragmentation of R. In this case the projection lists satisfy the fol-
lowing two conditions:
■ L1 ∪ L2 ∪ … ∪ Ln = ATTRS(R).
■ Li ∩ Lj = PK(R) for any i ≠ j, where ATTRS(R) is the set of attributes of R and
PK(R) is the primary key of R.
To reconstruct the relation R from a complete vertical fragmentation, we apply the
OUTER UNION operation to the vertical fragments (assuming no horizontal frag-
mentation is used). Notice that we could also apply a FULL OUTER JOIN operation
and get the same result for a complete vertical fragmentation, even when some hor-
izontal fragmentation may also have been applied. The two vertical fragments of the
EMPLOYEE relation with projection lists L1 = {Ssn, Name, Bdate, Address, Sex} and
L2 = {Ssn, Salary, Super_ssn, Dno} constitute a complete vertical fragmentation of
EMPLOYEE.
Two horizontal fragments that are neither complete nor disjoint are those defined
on the EMPLOYEE relation in Figure A.1 by the conditions (Salary > 50000) and
(Dno = 4); they may not include all EMPLOYEE tuples, and they may include com-
mon tuples. Two vertical fragments that are not complete are those defined by the
attribute lists L1 = {Name, Address} and L2 = {Ssn, Name, Salary}; these lists violate
both conditions of a complete vertical fragmentation.
Mixed (Hybrid) Fragmentation. We can intermix the two types of fragmenta-
tion, yielding a mixed fragmentation. For example, we may combine the horizon-
tal and vertical fragmentations of the EMPLOYEE relation given earlier into a
mixed fragmentation that includes six fragments. In this case, the original relation
can be reconstructed by applying UNION and OUTER UNION (or OUTER JOIN)
operations in the appropriate order. In general, a fragment of a relation R can be
specified by a SELECT-PROJECT combination of operations πL(σC(R)). If
C = TRUE (that is, all tuples are selected) and L ≠ ATTRS(R), we get a vertical frag-
ment, and if C ≠ TRUE and L = ATTRS(R), we get a horizontal fragment. Finally, if
C ≠ TRUE and L ≠ ATTRS(R), we get a mixed fragment. Notice that a relation can
itself be considered a fragment with C = TRUE and L = ATTRS(R). In the following
discussion, the term fragment is used to refer to a relation or to any of the preced-
ing types of fragments.
A fragmentation schema of a database is a definition of a set of fragments that
includes all attributes and tuples in the database and satisfies the condition that the
whole database can be reconstructed from the fragments by applying some
sequence of OUTER UNION (or OUTER JOIN) and UNION operations. It is also
sometimes useful—although not necessary—to have all the fragments be disjoint
except for the repetition of primary keys among vertical (or mixed) fragments. In
the latter case, all replication and distribution of fragments is clearly specified at a
subsequent stage, separately from fragmentation.
An allocation schema describes the allocation of fragments to sites of the DDBS;
hence, it is a mapping that specifies for each fragment the site(s) at which it is
896
Distributed Databases
stored. If a fragment is stored at more than one site, it is said to be replicated. We
discuss data replication and allocation next.
4.2 Data Replication and Allocation
Replication is useful in improving the availability of data. The most extreme case is
replication of the whole database at every site in the distributed system, thus creating a
fully replicated distributed database. This can improve availability remarkably
because the system can continue to operate as long as at least one site is up. It also
improves performance of retrieval for global queries because the results of such
queries can be obtained locally from any one site; hence, a retrieval query can be
processed at the local site where it is submitted, if that site includes a server module.
The disadvantage of full replication is that it can slow down update operations drasti-
cally, since a single logical update must be performed on every copy of the database to
keep the copies consistent. This is especially true if many copies of the database exist.
Full replication makes the concurrency control and recovery techniques more expen-
sive than they would be if there was no replication, as we will see in Section 7.
The other extreme from full replication involves having no replication—that is,
each fragment is stored at exactly one site. In this case, all fragments must be dis-
joint, except for the repetition of primary keys among vertical (or mixed) frag-
ments. This is also called nonredundant allocation.
Between these two extremes, we have a wide spectrum of partial replication of the
data—that is, some fragments of the database may be replicated whereas others may
not. The number of copies of each fragment can range from one up to the total num-
ber of sites in the distributed system. A special case of partial replication is occurring
heavily in applications where mobile workers—such as sales forces, financial plan-
ners, and claims adjustors—carry partially replicated databases with them on laptops
and PDAs and synchronize them periodically with the server database.7 A descrip-
tion of the replication of fragments is sometimes called a replication schema.
Each fragment—or each copy of a fragment—must be assigned to a particular site
in the distributed system. This process is called data distribution (or data alloca-
tion). The choice of sites and the degree of replication depend on the performance
and availability goals of the system and on the types and frequencies of transactions
submitted at each site. For example, if high availability is required, transactions can
be submitted at any site, and most transactions are retrieval only, a fully replicated
database is a good choice. However, if certain transactions that access particular
parts of the database are mostly submitted at a particular site, the corresponding set
of fragments can be allocated at that site only. Data that is accessed at multiple sites
can be replicated at those sites. If many updates are performed, it may be useful to
limit replication. Finding an optimal or even a good solution to distributed data
allocation is a complex optimization problem.
7For a proposed scalable approach to synchronize partially replicated databases, see Mahajan et al.
(1998).
897
Distributed Databases
4.3 Example of Fragmentation, Allocation, and Replication
We now consider an example of fragmenting and distributing the company data-
base in Figures A.1 and A.2. Suppose that the company has three computer sites—
one for each current department. Sites 2 and 3 are for departments 5 and 4,
respectively. At each of these sites, we expect frequent access to the EMPLOYEE and
PROJECT information for the employees who work in that department and the proj-
ects controlled by that department. Further, we assume that these sites mainly access
the Name, Ssn, Salary, and Super_ssn attributes of EMPLOYEE. Site 1 is used by com-
pany headquarters and accesses all employee and project information regularly, in
addition to keeping track of DEPENDENT information for insurance purposes.
According to these requirements, the whole database in Figure A.2 can be stored at
site 1. To determine the fragments to be replicated at sites 2 and 3, first we can hori-
zontally fragment DEPARTMENT by its key Dnumber. Then we apply derived frag-
mentation to the EMPLOYEE, PROJECT, and DEPT_LOCATIONS relations based on
their foreign keys for department number—called Dno, Dnum, and Dnumber, respec-
tively, in Figure A.1. We can vertically fragment the resulting EMPLOYEE fragments
to include only the attributes {Name, Ssn, Salary, Super_ssn, Dno}. Figure 8 shows the
mixed fragments EMPD_5 and EMPD_4, which include the EMPLOYEE tuples satis-
fying the conditions Dno = 5 and Dno = 4, respectively. The horizontal fragments of
PROJECT, DEPARTMENT, and DEPT_LOCATIONS are similarly fragmented by
department number. All these fragments—stored at sites 2 and 3—are replicated
because they are also stored at headquarters—site 1.
We must now fragment the WORKS_ON relation and decide which fragments of
WORKS_ON to store at sites 2 and 3. We are confronted with the problem that no
attribute of WORKS_ON directly indicates the department to which each tuple
belongs. In fact, each tuple in WORKS_ON relates an employee e to a project P. We
could fragment WORKS_ON based on the department D in which e works or based
on the department D� that controls P. Fragmentation becomes easy if we have a con-
straint stating that D = D� for all WORKS_ON tuples—that is, if employees can work
only on projects controlled by the department they work for. However, there is no
such constraint in our database in Figure A.2. For example, the WORKS_ON tuple
<333445555, 10, 10.0> relates an employee who works for department 5 with a
project controlled by department 4. In this case, we could fragment WORKS_ON
based on the department in which the employee works (which is expressed by the
condition C) and then fragment further based on the department that controls the
projects that employee is working on, as shown in Figure 9.
In Figure 9, the union of fragments G1, G2, and G3 gives all WORKS_ON tuples for
employees who work for department 5. Similarly, the union of fragments G4, G5,
and G6 gives all WORKS_ON tuples for employees who work for department 4. On
the other hand, the union of fragments G1, G4, and G7 gives all WORKS_ON tuples
for projects controlled by department 5. The condition for each of the fragments G1
through G9 is shown in Figure 9 The relations that represent M:N relationships,
such as WORKS_ON, often have several possible logical fragmentations. In our dis-
tribution in Figure 8, we choose to include all fragments that can be joined to either
898
Distributed Databases
(a)
(b)
Fname
John B Smith 123456789 30000 333445555 5
Franklin T Wong 333445555 40000 888665555 5
K Narayan 666884444 38000 333445555 5
A English 453453453 25000 333445555 5
Ramesh
Joyce
EMPD_5
Minit Lname Ssn Salary Super_ssn Dno
Data at site 2
Data at site 3
Fname
Alicia J Zelaya 999887777 25000 987654321 4
Jennifer S Wallace 987654321 43000 888665555 4
V Jabbar 987987987 25000 987654321 4Ahmad
EMPD_4
Minit Lname Ssn Salary Super_ssn Dno
Dname
Research 5 333445555 1988-05-22
DEP_5
Dnumber Mgr_ssn Mgr_start_date Dnumber
5 Bellaire
5 Sugarland
5 Houston
DEP_5_LOCS
Location
Dname
Administration 4 987654321 1995-01-01
DEP_4
Dnumber Mgr_ssn Mgr_start_date
Essn
123456789 1
123456789 2
666884444
453453453
453453453
333445555
333445555
333445555
333445555
1
2
2
3
10
20
3
32.5
7.5
20.0
20.0
10.0
10.0
10.0
10.0
40.0
WORKS_ON_5
Pno Hours Pname
Product X 1
Product Y 2
Product Z 3
Bellaire
Sugarland
Houston
PROJS_5
Pnumber Plocation
5
5
5
Dnum
Essn
333445555 10
999887777 30
999887777
987987987
987987987
987654321
987654321
10
30
30
20
10
10.0
30.0
35.0
5.0
20.0
15.0
10.0
WORKS_ON_4
Pno Hours Pname
Computerization 10
New_benefits 30
Stafford
Stafford
PROJS_4
Pnumber Plocation
4
4
Dnum
Dnumber
4 Stafford
DEP_4_LOCS
Location
Figure 8
Allocation of fragments to
sites. (a) Relation fragments
at site 2 corresponding to
department 5. (b) Relation
fragments at site 3 corre-
sponding to department 4.
899
Distributed Databases
Essn
123456789 1 32.5
123456789 2 7.5
3 40.0
1 20.0
2 20.0
2 10.0
3 10.0
666884444
453453453
453453453
333445555
333445555
G1
1C = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))
Employees in Department 5
Pno Hours Essn
333445555 10 10.0
G2
C2 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))
Pno Hours Essn
333445555 20 10.0
G3
C3 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))
Pno Hours
Essn
G4
(b)
(c)
(a)
C4 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))
Employees in Department 4
Pno Hours Essn
999887777 30 30.0
999887777 10 10.0
987987987 10 35.0
987987987 30 5.0
987654321 30 20.0
G5
C5 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))
Pno Hours Essn
987654321 20 15.0
G6
C6 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))
Pno Hours
Essn
G7
C7 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 5))
Employees in Department 1
Pno Hours Essn
G8
C8 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 4))
Pno Hours Essn
888665555 20 Null
G9
C9 = C and (Pno in (SELECT
Pnumber FROM PROJECT
WHERE Dnum = 1))
Pno Hours
Figure 9
Complete and disjoint fragments of the WORKS_ON relation. (a) Fragments of WORKS_ON for employees
working in department 5 (C=[Essn in (SELECT Ssn FROM EMPLOYEE WHERE Dno=5)]). (b) Fragments of
WORKS_ON for employees working in department 4 (C=[Essn in (SELECT Ssn FROM EMPLOYEE WHERE
Dno=4)]). (c) Fragments of WORKS_ON for employees working in department 1 (C=[Essn in (SELECT Ssn
FROM EMPLOYEE WHERE Dno=1)]).
900
Distributed Databases
an EMPLOYEE tuple or a PROJECT tuple at sites 2 and 3. Hence, we place the union
of fragments G1, G2, G3, G4, and G7 at site 2 and the union of fragments G4, G5, G6,
G2, and G8 at site 3. Notice that fragments G2 and G4 are replicated at both sites.
This allocation strategy permits the join between the local EMPLOYEE or PROJECT
fragments at site 2 or site 3 and the local WORKS_ON fragment to be performed
completely locally. This clearly demonstrates how complex the problem of database
fragmentation and allocation is for large databases. The Selected Bibliography at the
end of this chapter discusses some of the work done in this area.
5 Query Processing and Optimization in
Distributed Databases
Now we give an overview of how a DDBMS processes and optimizes a query. First
we discuss the steps involved in query processing and then elaborate on the commu-
nication costs of processing a distributed query. Finally we discuss a special opera-
tion, called a semijoin, which is used to optimize some types of queries in a DDBMS.
A detailed discussion about optimization algorithms is beyond the scope of this
text. We attempt to illustrate optimization principles using suitable examples.8
5.1 Distributed Query Processing
A distributed database query is processed in stages as follows:
1. Query Mapping. The input query on distributed data is specified formally
using a query language. It is then translated into an algebraic query on global
relations. This translation is done by referring to the global conceptual
schema and does not take into account the actual distribution and replica-
tion of data. Hence, this translation is largely identical to the one performed
in a centralized DBMS. It is first normalized, analyzed for semantic errors,
simplified, and finally restructured into an algebraic query.
2. Localization. In a distributed database, fragmentation results in relations
being stored in separate sites, with some fragments possibly being replicated.
This stage maps the distributed query on the global schema to separate
queries on individual fragments using data distribution and replication
information.
3. Global Query Optimization. Optimization consists of selecting a strategy
from a list of candidates that is closest to optimal. A list of candidate queries
can be obtained by permuting the ordering of operations within a fragment
query generated by the previous stage. Time is the preferred unit for measur-
ing cost. The total cost is a weighted combination of costs such as CPU cost,
I/O costs, and communication costs. Since DDBs are connected by a net-
work, often the communication costs over the network are the most signifi-
cant. This is especially true when the sites are connected through a wide area
network (WAN).
8For a detailed discussion of optimization algorithms, see Ozsu and Valduriez (1999).
901
Distributed Databases
4. Local Query Optimization. This stage is common to all sites in the DDB.
The techniques are similar to those used in centralized systems.
The first three stages discussed above are performed at a central control site, while
the last stage is performed locally.
5.2 Data Transfer Costs of Distributed Query Processing
Besides the issues involved in processing and optimizing a query in a centralized
DBMS, in a distributed system, several additional factors further complicate query
processing. The first is the cost of transferring data over the network. This data
includes intermediate files that are transferred to other sites for further processing,
as well as the final result files that may have to be transferred to the site where the
query result is needed. Although these costs may not be very high if the sites are
connected via a high-performance local area network, they become quite significant
in other types of networks. Hence, DDBMS query optimization algorithms consider
the goal of reducing the amount of data transfer as an optimization criterion in
choosing a distributed query execution strategy.
We illustrate this with two simple sample queries. Suppose that the EMPLOYEE and
DEPARTMENT relations in Figure A.1 are distributed at two sites as shown in Figure
10. We will assume in this example that neither relation is fragmented. According to
Figure 10, the size of the EMPLOYEE relation is 100 * 10,000 = 10
6 bytes, and the size
of the DEPARTMENT relation is 35 * 100 = 3500 bytes. Consider the query Q: For
each employee, retrieve the employee name and the name of the department for which
the employee works. This can be stated as follows in the relational algebra:
Q: πFname,Lname,Dname(EMPLOYEE Dno=Dnumber DEPARTMENT)
The result of this query will include 10,000 records, assuming that every employee is
related to a department. Suppose that each record in the query result is 40 bytes long.
Fname
EMPLOYEE
Site 1:
10,000 records
each record is 100 bytes long
Ssn field is 9 bytes long
Dno field is 4 bytes long
Site 2:
Minit Lname Ssn Salary Super_ssn DnoBdate Address Sex
Dname
DEPARTMENT
Dnumber Mgr_ssn Mgr_start_date
Fname field is 15 bytes long
Lname field is 15 bytes long
100 records
each record is 35 bytes long
Dnumber field is 4 bytes long
Mgr_ssn field is 9 bytes long
Dname field is 10 bytes long
Figure 10
Example to illustrate
volume of data
transferred.
902
Distributed Databases
The query is submitted at a distinct site 3, which is called the result site because the
query result is needed there. Neither the EMPLOYEE nor the DEPARTMENT relations
reside at site 3. There are three simple strategies for executing this distributed query:
1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result
site, and perform the join at site 3. In this case, a total of 1,000,000 + 3,500 =
1,003,500 bytes must be transferred.
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send
the result to site 3. The size of the query result is 40 * 10,000 = 400,000 bytes,
so 400,000 + 1,000,000 = 1,400,000 bytes must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and
send the result to site 3. In this case, 400,000 + 3,500 = 403,500 bytes must be
transferred.
If minimizing the amount of data transfer is our optimization criterion, we should
choose strategy 3. Now consider another query Q�: For each department, retrieve the
department name and the name of the department manager. This can be stated as fol-
lows in the relational algebra:
Q�: πFname,Lname,Dname( DEPARTMENT Mgr_ssn=Ssn EMPLOYEE)
Again, suppose that the query is submitted at site 3. The same three strategies for
executing query Q apply to Q�, except that the result of Q� includes only 100 records,
assuming that each department has a manager:
1. Transfer both the EMPLOYEE and the DEPARTMENT relations to the result
site, and perform the join at site 3. In this case, a total of 1,000,000 + 3,500 =
1,003,500 bytes must be transferred.
2. Transfer the EMPLOYEE relation to site 2, execute the join at site 2, and send
the result to site 3. The size of the query result is 40 * 100 = 4,000 bytes, so
4,000 + 1,000,000 = 1,004,000 bytes must be transferred.
3. Transfer the DEPARTMENT relation to site 1, execute the join at site 1, and
send the result to site 3. In this case, 4,000 + 3,500 = 7,500 bytes must be
transferred.
Again, we would choose strategy 3—this time by an overwhelming margin over
strategies 1 and 2. The preceding three strategies are the most obvious ones for the
case where the result site (site 3) is different from all the sites that contain files
involved in the query (sites 1 and 2). However, suppose that the result site is site 2;
then we have two simple strategies:
1. Transfer the EMPLOYEE relation to site 2, execute the query, and present the
result to the user at site 2. Here, the same number of bytes—1,000,000—
must be transferred for both Q and Q�.
2. Transfer the DEPARTMENT relation to site 1, execute the query at site 1, and
send the result back to site 2. In this case 400,000 + 3,500 = 403,500 bytes
must be transferred for Q and 4,000 + 3,500 = 7,500 bytes for Q�.
903
Distributed Databases
A more complex strategy, which sometimes works better than these simple strate-
gies, uses an operation called semijoin. We introduce this operation and discuss dis-
tributed execution using semijoins next.
5.3 Distributed Query Processing Using Semijoin
The idea behind distributed query processing using the semijoin operation is to
reduce the number of tuples in a relation before transferring it to another site.
Intuitively, the idea is to send the joining column of one relation R to the site where
the other relation S is located; this column is then joined with S. Following that, the
join attributes, along with the attributes required in the result, are projected out and
shipped back to the original site and joined with R. Hence, only the joining column
of R is transferred in one direction, and a subset of S with no extraneous tuples or
attributes is transferred in the other direction. If only a small fraction of the tuples
in S participate in the join, this can be quite an efficient solution to minimizing data
transfer.
To illustrate this, consider the following strategy for executing Q or Q�:
1. Project the join attributes of DEPARTMENT at site 2, and transfer them to site
1. For Q, we transfer F = πDnumber(DEPARTMENT), whose size is 4 * 100 = 400
bytes, whereas, for Q�, we transfer F� = πMgr_ssn(DEPARTMENT), whose size is
9 * 100 = 900 bytes.
2. Join the transferred file with the EMPLOYEE relation at site 1, and transfer
the required attributes from the resulting file to site 2. For Q, we transfer
R = πDno, Fname, Lname(F Dnumber=Dno EMPLOYEE), whose size is 34 * 10,000 =
340,000 bytes, whereas, for Q�, we transfer R� = πMgr_ssn, Fname, Lname
(F� Mgr_ssn=Ssn EMPLOYEE), whose size is 39 * 100 = 3,900 bytes.
3. Execute the query by joining the transferred file R or R� with DEPARTMENT,
and present the result to the user at site 2.
Using this strategy, we transfer 340,400 bytes for Q and 4,800 bytes for Q�. We lim-
ited the EMPLOYEE attributes and tuples transmitted to site 2 in step 2 to only those
that will actually be joined with a DEPARTMENT tuple in step 3. For query Q, this
turned out to include all EMPLOYEE tuples, so little improvement was achieved.
However, for Q� only 100 out of the 10,000 EMPLOYEE tuples were needed.
The semijoin operation was devised to formalize this strategy. A semijoin opera-
tion R A=B S, where A and B are domain-compatible attributes of R and S, respec-
tively, produces the same result as the relational algebra expression π
R
(R
A=B S). In
a distributed environment where R and S reside at different sites, the semijoin is
typically implemented by first transferring F = π
B
(S) to the site where R resides and
then joining F with R, thus leading to the strategy discussed here.
Notice that the semijoin operation is not commutative; that is,
R S ≠S R
904
Distributed Databases
5.4 Query and Update Decomposition
In a DDBMS with no distribution transparency, the user phrases a query directly in
terms of specific fragments. For example, consider another query Q: Retrieve the
names and hours per week for each employee who works on some project controlled by
department 5, which is specified on the distributed database where the relations at
sites 2 and 3 are shown in Figure 8, and those at site 1 are shown in Figure A.2, as in
our earlier example. A user who submits such a query must specify whether it refer-
ences the PROJS_5 and WORKS_ON_5 relations at site 2 (Figure 8) or the PROJECT
and WORKS_ON relations at site 1 (Figure A.2). The user must also maintain con-
sistency of replicated data items when updating a DDBMS with no replication trans-
parency.
On the other hand, a DDBMS that supports full distribution, fragmentation, and
replication transparency allows the user to specify a query or update request on the
schema in Figure A.1 just as though the DBMS were centralized. For updates, the
DDBMS is responsible for maintaining consistency among replicated items by using
one of the distributed concurrency control algorithms to be discussed in Section 7.
For queries, a query decomposition module must break up or decompose a query
into subqueries that can be executed at the individual sites. Additionally, a strategy
for combining the results of the subqueries to form the query result must be gener-
ated. Whenever the DDBMS determines that an item referenced in the query is repli-
cated, it must choose or materialize a particular replica during query execution.
To determine which replicas include the data items referenced in a query, the
DDBMS refers to the fragmentation, replication, and distribution information
stored in the DDBMS catalog. For vertical fragmentation, the attribute list for each
fragment is kept in the catalog. For horizontal fragmentation, a condition, some-
times called a guard, is kept for each fragment. This is basically a selection condition
that specifies which tuples exist in the fragment; it is called a guard because only
tuples that satisfy this condition are permitted to be stored in the fragment. For mixed
fragments, both the attribute list and the guard condition are kept in the catalog.
In our earlier example, the guard conditions for fragments at site 1 (Figure A.2) are
TRUE (all tuples), and the attribute lists are * (all attributes). For the fragments
shown in Figure 8, we have the guard conditions and attribute lists shown in Figure
11. When the DDBMS decomposes an update request, it can determine which frag-
ments must be updated by examining their guard conditions. For example, a user
request to insert a new EMPLOYEE tuple <‘Alex’, ‘B’, ‘Coleman’, ‘345671239’, ‘22-
APR-64’, ‘3306 Sandstone, Houston, TX’, M, 33000, ‘987654321’, 4> would be
decomposed by the DDBMS into two insert requests: the first inserts the preceding
tuple in the EMPLOYEE fragment at site 1, and the second inserts the projected tuple
<‘Alex’, ‘B’, ‘Coleman’, ‘345671239’, 33000, ‘987654321’, 4> in the EMPD4 fragment at
site 3.
For query decomposition, the DDBMS can determine which fragments may
contain the required tuples by comparing the query condition with the guard
905
Distributed Databases
(a) EMPD5
attribute list: Fname, Minit, Lname, Ssn, Salary, Super_ssn, Dno
guard condition: Dno=5
DEP5
attribute list: * (all attributes Dname, Dnumber, Mgr_ssn, Mgr_start_date)
guard condition: Dnumber=5
DEP5_LOCS
attribute list: * (all attributes Dnumber, Location)
guard condition: Dnumber=5
PROJS5
attribute list: * (all attributes Pname, Pnumber, Plocation, Dnum)
guard condition: Dnum=5
WORKS_ON5
attribute list: * (all attributes Essn, Pno,Hours)
guard condition: Essn IN (πSsn (EMPD5)) OR Pno IN (πPnumber (PROJS5))
(b) EMPD4
attribute list: Fname, Minit, Lname, Ssn, Salary, Super_ssn, Dno
guard condition: Dno=4
DEP4
attribute list: * (all attributes Dname, Dnumber, Mgr_ssn, Mgr_start_date)
guard condition: Dnumber=4
DEP4_LOCS
attribute list: * (all attributes Dnumber, Location)
guard condition: Dnumber=4
PROJS4
attribute list: * (all attributes Pname, Pnumber, Plocation, Dnum)
guard condition: Dnum=4
WORKS_ON4
attribute list: * (all attributes Essn, Pno, Hours)
guard condition: Essn IN (πSsn (EMPD4))
OR Pno IN (πPnumber (PROJS4))
Figure 11
Guard conditions and attributes lists for fragments.
(a) Site 2 fragments. (b) Site 3 fragments.
conditions. For example, consider the query Q: Retrieve the names and hours per
week for each employee who works on some project controlled by department 5. This
can be specified in SQL on the schema in Figure A.1 as follows:
Q: SELECT Fname, Lname, Hours
FROM EMPLOYEE, PROJECT, WORKS_ON
WHERE Dnum=5 AND Pnumber=Pno AND Essn=Ssn;
906
Distributed Databases
Suppose that the query is submitted at site 2, which is where the query result will be
needed. The DDBMS can determine from the guard condition on PROJS5 and
WORKS_ON5 that all tuples satisfying the conditions (Dnum = 5 AND Pnumber =
Pno) reside at site 2. Hence, it may decompose the query into the following rela-
tional algebra subqueries:
T1 ← πEssn(PROJS5 Pnumber=PnoWORKS_ON5)
T2 ← πEssn, Fname, Lname(T1 Essn=SsnEMPLOYEE)
RESULT ← πFname, Lname, Hours(T2 * WORKS_ON5)
This decomposition can be used to execute the query by using a semijoin strategy.
The DDBMS knows from the guard conditions that PROJS5 contains exactly those
tuples satisfying (Dnum = 5) and that WORKS_ON5 contains all tuples to be joined
with PROJS5; hence, subquery T1 can be executed at site 2, and the projected column
Essn can be sent to site 1. Subquery T2 can then be executed at site 1, and the result
can be sent back to site 2, where the final query result is calculated and displayed to
the user. An alternative strategy would be to send the query Q itself to site 1, which
includes all the database tuples, where it would be executed locally and from which
the result would be sent back to site 2. The query optimizer would estimate the costs
of both strategies and would choose the one with the lower cost estimate.
6 Overview of Transaction Management
in Distributed Databases
The global and local transaction management software modules, along with the
concurrency control and recovery manager of a DDBMS, collectively guarantee the
ACID properties of transactions. We discuss distributed transaction management in
this section and explore concurrency control in Section 7.
As can be seen in Figure 5, an additional component called the global transaction
manager is introduced for supporting distributed transactions. The site where the
transaction originated can temporarily assume the role of global transaction man-
ager and coordinate the execution of database operations with transaction man-
agers across multiple sites. Transaction managers export their functionality as an
interface to the application programs. The manager stores bookkeeping informa-
tion related to each transaction, such as a unique identifier, originating site, name,
and so on. For READ operations, it returns a local copy if valid and available. For
WRITE operations, it ensures that updates are visible across all sites containing
copies (replicas) of the data item. For ABORT operations, the manager ensures that
no effects of the transaction are reflected in any site of the distributed database. For
COMMIT operations, it ensures that the effects of a write are persistently recorded on
all databases containing copies of the data item. Atomic termination (COMMIT/
ABORT) of distributed transactions is commonly implemented using the two-phase
commit protocol. We give more details of this protocol in the following section.
907
Distributed Databases
The transaction manager passes to the concurrency controller the database opera-
tion and associated information. The controller is responsible for acquisition and
release of associated locks. If the transaction requires access to a locked resource, it
is delayed until the lock is acquired. Once the lock is acquired, the operation is sent
to the runtime processor, which handles the actual execution of the database opera-
tion. Once the operation is completed, locks are released and the transaction man-
ager is updated with the result of the operation. We discuss commonly used
distributed concurrency methods in Section 7.
6.1 Two-Phase Commit Protocol
The two-phase commit protocol (2PC) requires a global recovery manager, or
coordinator, to maintain information needed for recovery, in addition to the local
recovery managers and the information they maintain (log, tables). The two-phase
commit protocol has certain drawbacks that led to the development of the three-
phase commit protocol, which we discuss next.
6.2 Three-Phase Commit Protocol
The biggest drawback of 2PC is that it is a blocking protocol. Failure of the coordi-
nator blocks all participating sites, causing them to wait until the coordinator recov-
ers. This can cause performance degradation, especially if participants are holding
locks to shared resources. Another problematic scenario is when both the coordina-
tor and a participant that has committed crash together. In the two-phase commit
protocol, a participant has no way to ensure that all participants got the commit
message in the second phase. Hence once a decision to commit has been made by
the coordinator in the first phase, participants will commit their transactions in the
second phase independent of receipt of a global commit message by other partici-
pants. Thus, in the situation that both the coordinator and a committed participant
crash together, the result of the transaction becomes uncertain or nondeterministic.
Since the transaction has already been committed by one participant, it cannot be
aborted on recovery by the coordinator. Also, the transaction cannot be optimisti-
cally committed on recovery since the original vote of the coordinator may have
been to abort.
These problems are solved by the three-phase commit (3PC) protocol, which essen-
tially divides the second commit phase into two subphases called prepare-to-
commit and commit. The prepare-to-commit phase is used to communicate the
result of the vote phase to all participants. If all participants vote yes, then the coordi-
nator instructs them to move into the prepare-to-commit state. The commit subphase
is identical to its two-phase counterpart. Now, if the coordinator crashes during this
subphase, another participant can see the transaction through to completion. It can
simply ask a crashed participant if it received a prepare-to-commit message. If it did
not, then it safely assumes to abort. Thus the state of the protocol can be recovered
irrespective of which participant crashes. Also, by limiting the time required for a
transaction to commit or abort to a maximum time-out period, the protocol ensures
that a transaction attempting to commit via 3PC releases locks on time-out.
908
Distributed Databases
The main idea is to limit the wait time for participants who have committed and are
waiting for a global commit or abort from the coordinator. When a participant
receives a precommit message, it knows that the rest of the participants have voted
to commit. If a precommit message has not been received, then the participant will
abort and release all locks.
6.3 Operating System Support
for Transaction Management
The following are the main benefits of operating system (OS)-supported transac-
tion management:
■ Typically, DBMSs use their own semaphores9 to guarantee mutually exclu-
sive access to shared resources. Since these semaphores are implemented in
userspace at the level of the DBMS application software, the OS has no
knowledge about them. Hence if the OS deactivates a DBMS process holding
a lock, other DBMS processes wanting this lock resource get queued. Such a
situation can cause serious performance degradation. OS-level knowledge of
semaphores can help eliminate such situations.
■ Specialized hardware support for locking can be exploited to reduce associ-
ated costs. This can be of great importance, since locking is one of the most
common DBMS operations.
■ Providing a set of common transaction support operations though the ker-
nel allows application developers to focus on adding new features to their
products as opposed to reimplementing the common functionality for each
application. For example, if different DDBMSs are to coexist on the same
machine and they chose the two-phase commit protocol, then it is more
beneficial to have this protocol implemented as part of the kernel so that
the DDBMS developers can focus more on adding new features to their
products.
7 Overview of Concurrency Control
and Recovery in Distributed Databases
For concurrency control and recovery purposes, numerous problems arise in a dis-
tributed DBMS environment that are not encountered in a centralized DBMS envi-
ronment. These include the following:
■ Dealing with multiple copies of the data items. The concurrency control
method is responsible for maintaining consistency among these copies. The
recovery method is responsible for making a copy consistent with other
copies if the site on which the copy is stored fails and recovers later.
9Semaphores are data structures used for synchronized and exclusive access to shared resources for
preventing race conditions in a parallel computing system.
909
Distributed Databases
■ Failure of individual sites. The DDBMS should continue to operate with its
running sites, if possible, when one or more individual sites fail. When a site
recovers, its local database must be brought up-to-date with the rest of the
sites before it rejoins the system.
■ Failure of communication links. The system must be able to deal with the
failure of one or more of the communication links that connect the sites. An
extreme case of this problem is that network partitioning may occur. This
breaks up the sites into two or more partitions, where the sites within each
partition can communicate only with one another and not with sites in other
partitions.
■ Distributed commit. Problems can arise with committing a transaction that
is accessing databases stored on multiple sites if some sites fail during the
commit process. The two-phase commit protocol is often used to deal with
this problem.
■ Distributed deadlock. Deadlock may occur among several sites, so tech-
niques for dealing with deadlocks must be extended to take this into
account.
Distributed concurrency control and recovery techniques must deal with these and
other problems. In the following subsections, we review some of the techniques that
have been suggested to deal with recovery and concurrency control in DDBMSs.
7.1 Distributed Concurrency Control Based
on a Distinguished Copy of a Data Item
To deal with replicated data items in a distributed database, a number of concur-
rency control methods have been proposed that extend the concurrency control
techniques for centralized databases. We discuss these techniques in the context of
extending centralized locking. Similar extensions apply to other concurrency control
techniques. The idea is to designate a particular copy of each data item as a
distinguished copy. The locks for this data item are associated with the distin-
guished copy, and all locking and unlocking requests are sent to the site that contains
that copy.
A number of different methods are based on this idea, but they differ in their
method of choosing the distinguished copies. In the primary site technique, all dis-
tinguished copies are kept at the same site. A modification of this approach is the
primary site with a backup site. Another approach is the primary copy method,
where the distinguished copies of the various data items can be stored in different
sites. A site that includes a distinguished copy of a data item basically acts as the
coordinator site for concurrency control on that item. We discuss these techniques
next.
Primary Site Technique. In this method a single primary site is designated to be
the coordinator site for all database items. Hence, all locks are kept at that site, and
all requests for locking or unlocking are sent there. This method is thus an extension
910
Distributed Databases
of the centralized locking approach. For example, if all transactions follow the two-
phase locking protocol, serializability is guaranteed. The advantage of this approach
is that it is a simple extension of the centralized approach and thus is not overly
complex. However, it has certain inherent disadvantages. One is that all locking
requests are sent to a single site, possibly overloading that site and causing a system
bottleneck. A second disadvantage is that failure of the primary site paralyzes the
system, since all locking information is kept at that site. This can limit system relia-
bility and availability.
Although all locks are accessed at the primary site, the items themselves can be
accessed at any site at which they reside. For example, once a transaction obtains a
Read_lock on a data item from the primary site, it can access any copy of that data
item. However, once a transaction obtains a Write_lock and updates a data item, the
DDBMS is responsible for updating all copies of the data item before releasing the
lock.
Primary Site with Backup Site. This approach addresses the second disadvantage
of the primary site method by designating a second site to be a backup site. All lock-
ing information is maintained at both the primary and the backup sites. In case of
primary site failure, the backup site takes over as the primary site, and a new backup
site is chosen. This simplifies the process of recovery from failure of the primary site,
since the backup site takes over and processing can resume after a new backup site is
chosen and the lock status information is copied to that site. It slows down the
process of acquiring locks, however, because all lock requests and granting of locks
must be recorded at both the primary and the backup sites before a response is sent to
the requesting transaction. The problem of the primary and backup sites becoming
overloaded with requests and slowing down the system remains undiminished.
Primary Copy Technique. This method attempts to distribute the load of lock
coordination among various sites by having the distinguished copies of different
data items stored at different sites. Failure of one site affects any transactions that are
accessing locks on items whose primary copies reside at that site, but other transac-
tions are not affected. This method can also use backup sites to enhance reliability
and availability.
Choosing a New Coordinator Site in Case of Failure. Whenever a coordina-
tor site fails in any of the preceding techniques, the sites that are still running must
choose a new coordinator. In the case of the primary site approach with no backup
site, all executing transactions must be aborted and restarted in a tedious recovery
process. Part of the recovery process involves choosing a new primary site and creat-
ing a lock manager process and a record of all lock information at that site. For
methods that use backup sites, transaction processing is suspended while the
backup site is designated as the new primary site and a new backup site is chosen
and is sent copies of all the locking information from the new primary site.
If a backup site X is about to become the new primary site, X can choose the new
backup site from among the system’s running sites. However, if no backup site
911
Distributed Databases
existed, or if both the primary and the backup sites are down, a process called
election can be used to choose the new coordinator site. In this process, any site Y
that attempts to communicate with the coordinator site repeatedly and fails to do so
can assume that the coordinator is down and can start the election process by send-
ing a message to all running sites proposing that Y become the new coordinator. As
soon as Y receives a majority of yes votes, Y can declare that it is the new coordina-
tor. The election algorithm itself is quite complex, but this is the main idea behind
the election method. The algorithm also resolves any attempt by two or more sites
to become coordinator at the same time. The references in the Selected Bibliography
at the end of this chapter discuss the process in detail.
7.2 Distributed Concurrency Control Based on Voting
The concurrency control methods for replicated items discussed earlier all use the
idea of a distinguished copy that maintains the locks for that item. In the voting
method, there is no distinguished copy; rather, a lock request is sent to all sites that
includes a copy of the data item. Each copy maintains its own lock and can grant or
deny the request for it. If a transaction that requests a lock is granted that lock by a
majority of the copies, it holds the lock and informs all copies that it has been
granted the lock. If a transaction does not receive a majority of votes granting it a
lock within a certain time-out period, it cancels its request and informs all sites of
the cancellation.
The voting method is considered a truly distributed concurrency control method,
since the responsibility for a decision resides with all the sites involved. Simulation
studies have shown that voting has higher message traffic among sites than do the
distinguished copy methods. If the algorithm takes into account possible site fail-
ures during the voting process, it becomes extremely complex.
7.3 Distributed Recovery
The recovery process in distributed databases is quite involved. We give only a very
brief idea of some of the issues here. In some cases it is quite difficult even to deter-
mine whether a site is down without exchanging numerous messages with other
sites. For example, suppose that site X sends a message to site Y and expects a
response from Y but does not receive it. There are several possible explanations:
■ The message was not delivered to Y because of communication failure.
■ Site Y is down and could not respond.
■ Site Y is running and sent a response, but the response was not delivered.
Without additional information or the sending of additional messages, it is difficult
to determine what actually happened.
Another problem with distributed recovery is distributed commit. When a transac-
tion is updating data at several sites, it cannot commit until it is sure that the effect
of the transaction on every site cannot be lost. This means that every site must first
912
Distributed Databases
have recorded the local effects of the transactions permanently in the local site log
on disk. The two-phase commit protocol is often used to ensure the correctness of
distributed commit.
8 Distributed Catalog Management
Efficient catalog management in distributed databases is critical to ensure satisfac-
tory performance related to site autonomy, view management, and data distribution
and replication. Catalogs are databases themselves containing metadata about the
distributed database system.
Three popular management schemes for distributed catalogs are centralized cata-
logs, fully replicated catalogs, and partitioned catalogs. The choice of the scheme
depends on the database itself as well as the access patterns of the applications to the
underlying data.
Centralized Catalogs. In this scheme, the entire catalog is stored in one single
site. Owing to its central nature, it is easy to implement. On the other hand, the
advantages of reliability, availability, autonomy, and distribution of processing load
are adversely impacted. For read operations from noncentral sites, the requested
catalog data is locked at the central site and is then sent to the requesting site. On
completion of the read operation, an acknowledgement is sent to the central site,
which in turn unlocks this data. All update operations must be processed through
the central site. This can quickly become a performance bottleneck for write-
intensive applications.
Fully Replicated Catalogs. In this scheme, identical copies of the complete cata-
log are present at each site. This scheme facilitates faster reads by allowing them to
be answered locally. However, all updates must be broadcast to all sites. Updates are
treated as transactions and a centralized two-phase commit scheme is employed to
ensure catalog consitency. As with the centralized scheme, write-intensive applica-
tions may cause increased network traffic due to the broadcast associated with the
writes.
Partially Replicated Catalogs. The centralized and fully replicated schemes
restrict site autonomy since they must ensure a consistent global view of the catalog.
Under the partially replicated scheme, each site maintains complete catalog infor-
mation on data stored locally at that site. Each site is also permitted to cache entries
retrieved from remote sites. However, there are no guarantees that these cached
copies will be the most recent and updated. The system tracks catalog entries for
sites where the object was created and for sites that contain copies of this object. Any
changes to copies are propagated immediately to the original (birth) site. Retrieving
updated copies to replace stale data may be delayed until an access to this data
occurs. In general, fragments of relations across sites should be uniquely accessible.
Also, to ensure data distribution transparency, users should be allowed to create
synonyms for remote objects and use these synonyms for subsequent referrals.
913
Distributed Databases
9 Current Trends in Distributed Databases
Current trends in distributed data management are centered on the Internet, in
which petabytes of data can be managed in a scalable, dynamic, and reliable fashion.
Two important areas in this direction are cloud computing and peer-to-peer data-
bases.
9.1 Cloud Computing
Cloud computing is the paradigm of offering computer infrastructure, platforms,
and software as services over the Internet. It offers significant economic advantages
by limiting both up-front capital investments toward computer infrastructure as
well as total cost of ownership. It has introduced a new challenge of managing
petabytes of data in a scalable fashion. Traditional database systems for managing
enterprise data proved to be inadequate in handling this challenge, which has
resulted in a major architectural revision. The Claremont report10 by a group of
senior database researchers envisions that future research in cloud computing will
result in the emergence of new data management architectures and the interplay of
structured and unstructured data as well as other developments.
Performance costs associated with partial failures and global synchronization were
key performance bottlenecks of traditional database solutions. The key insight is
that the hash-value nature of the underlying datasets used by these organizations
lends itself naturally to partitioning. For instance, search queries essentially involve
a recursive process of mapping keywords to a set of related documents, which can
benefit from such a partitioning. Also, the partitions can be treated independently,
thereby eliminating the need for a coordinated commit. Another problem with tra-
ditional DDBMSs is the lack of support for efficient dynamic partitioning of data,
which limited scalability and resource utilization. Traditional systems treated sys-
tem metadata and application data alike, with the system data requiring strict con-
sistency and availability guarantees. But application data has variable requirements
on these characteristics, depending on its nature. For example, while a search engine
can afford weaker consistency guarantees, an online text editor like Google Docs,
which allows concurrent users, has strict consistency requirements.
The metadata of a distributed database system should be decoupled from its actual
data in order to ensure scalability. This decoupling can be used to develop innova-
tive solutions to manage the actual data by exploiting their inherent suitability to
partitioning and using traditional database solutions to manage critical system
metadata. Since metadata is only a fraction of the total data set, it does not prove to
be a performance bottleneck. Single object semantics of these implementations
enables higher tolerance to nonavailability of certain sections of data. Access to data
is typically by a single object in an atomic fashion. Hence, transaction support to
such data is not as stringent as for traditional databases.11 There is a varied set of
10“The Claremont Report on Database Research” is available at http://db.cs.berkeley.edu/claremont/
claremontreport08.pdf.
11Readers may refer to the work done by Das et al. (2008) for further details.
914
Distributed Databases
cloud services available today, including application services (salesforce.com), stor-
age services (Amazon Simple Storage Service, or Amazon S3), compute services
(Google App Engine, Amazon Elastic Compute Cloud—Amazon EC2), and data
services (Amazon SimpleDB, Microsoft SQL Server Data Services, Google’s
Datastore). More and more data-centric applications are expected to leverage data
services in the cloud. While most current cloud services are data-analysis intensive,
it is expected that business logic will eventually be migrated to the cloud. The key
challenge in this migration would be to ensure the scalability advantages for multi-
ple object semantics inherent to business logic. For a detailed treatment of cloud
computing, refer to the relevant bibliographic references in this chapter’s Selected
Bibliography.
9.2 Peer-to-Peer Database Systems
A peer-to-peer database system (PDBS) aims to integrate advantages of P2P (peer-
to-peer) computing, such as scalability, attack resilience, and self-organization, with
the features of decentralized data management. Nodes are autonomous and are
linked only to a small number of peers individually. It is permissible for a node to
behave purely as a collection of files without offering a complete set of traditional
DBMS functionality. While FDBS and MDBS mandate the existence of mappings
between local and global federated schemas, PDBSs attempt to avoid a global
schema by providing mappings between pairs of information sources. In PDBS,
each peer potentially models semantically related data in a manner different from
other peers, and hence the task of constructing a central mediated schema can be
very challenging. PDBSs aim to decentralize data sharing. Each peer has a schema
associated with its domain-specific stored data. The PDBS constructs a semantic
path12 of mappings between peer schemas. Using this path, a peer to which a query
has been submitted can obtain information from any relevant peer connected
through this path. In multidatabase systems, a separate global query processor is
used, whereas in a P2P system a query is shipped from one peer to another until it is
processed completely. A query submitted to a node may be forwarded to others
based on the mapping graph of semantic paths. Edutella and Piazza are examples of
PDBSs. Details of these systems can be found from the sources mentioned in this
chapter’s Selected Bibliography.
10 Distributed Databases in Oracle13
Oracle provides support for homogeneous, heterogeneous, and client server archi-
tectures of distributed databases. In a homogeneous architecture, a minimum of
two Oracle databases reside on at least one machine. Although the location and
platform of the databases are transparent to client applications, they would need to
12A semantic path describes the higher-level relationship between two domains that are dissimilar but
not unrelated.
13The discussion is based on available documentation at http://docs.oracle.com.
915
Distributed Databases
distinguish between local and remote objects semantically. Using synonyms, this
need can be overcome wherein users can access the remote objects with the same
syntax as local objects. Different versions of DBMSs can be used, although it must
be noted that Oracle offers backward compatibility but not forward compatibility
between its versions. For example, it is possible that some of the SQL extensions that
were incorporated into Oracle 11i may not be understood by Oracle 9.
In a heterogeneous architecture, at least one of the databases in the network is a
non-Oracle system. The Oracle database local to the application hides the underly-
ing heterogeneity and offers the view of a single local, underlying Oracle database.
Connectivity is handled by use of an ODBC- or OLE-DB-compliant protocol or by
Oracle’s Heterogeneous Services and Transparent Gateway agent components. A
discussion of the Heterogeneous Services and Transparent Gateway agents is
beyond the scope of this text, and the reader is advised to consult the online Oracle
documentation.
In the client-server architecture, the Oracle database system is divided into two
parts: a front end as the client portion, and a back end as the server portion. The
client portion is the front-end database application that interacts with the user. The
client has no data access responsibility and merely handles the requesting, process-
ing, and presentation of data managed by the server. The server portion runs Oracle
and handles the functions related to concurrent shared access. It accepts SQL and
PL/SQL statements originating from client applications, processes them, and sends
the results back to the client. Oracle client-server applications provide location
transparency by making the location of data transparent to users; several features
like views, synonyms, and procedures contribute to this. Global naming is achieved
by using to refer to tables uniquely.
Oracle uses a two-phase commit protocol to deal with concurrent distributed trans-
actions. The COMMIT statement triggers the two-phase commit mechanism. The
RECO (recoverer) background process automatically resolves the outcome of those
distributed transactions in which the commit was interrupted. The RECO of each
local Oracle server automatically commits or rolls back any in-doubt distributed
transactions consistently on all involved nodes. For long-term failures, Oracle
allows each local DBA to manually commit or roll back any in-doubt transactions
and free up resources. Global consistency can be maintained by restoring the data-
base at each site to a predetermined fixed point in the past.
Oracle’s distributed database architecture is shown in Figure 12. A node in a distrib-
uted database system can act as a client, as a server, or both, depending on the situa-
tion. The figure shows two sites where databases called HQ (headquarters) and Sales
are kept. For example, in the application shown running at the headquarters, for an
SQL statement issued against local data (for example, DELETE FROM DEPT …), the
HQ computer acts as a server, whereas for a statement against remote data (for
example, INSERT INTO EMP@SALES), the HQ computer acts as a client.
Communication in such a distributed heterogeneous environment is facilitated
through Oracle Net Services, which supports standard network protocols and APIs.
Under Oracle’s client-server implementation of distributed databases, Net Services
916
Distributed Databases
Server
DEPT Table
Application
HQ
Database
Connect to . . .
Identified by . . .
Oracle
Net
EMP Table
Sales
Database
Transaction
.
.
.
Network
INSERT INTO EMP@SALES . . . ;
DELETE FROM DEPT . . . ;
SELECT . . .
FROM EMP@SALES . . . ;
COMMIT;
Server
Oracle
Net
Database Link
Figure 12
Oracle distributed database system.
Source: From Oracle (2008). Copyright ©
Oracle Corporation 2008. All rights reserved.
is responsible for establishing and managing connections between a client applica-
tion and database server. It is present in each node on the network running an
Oracle client application, database server, or both. It packages SQL statements into
one of the many communication protocols to facilitate client-to-server communi-
cation and then packages the results back similarly to the client. The support offered
by Net Services to heterogeneity refers to platform specifications only and not the
database software. Support for DBMSs other than Oracle is through Oracle’s
Heterogeneous Services and Transparent Gateway. Each database has a unique
global name provided by a hierarchical arrangement of network domain names that
is prefixed to the database name to make it unique.
Oracle supports database links that define a one-way communication path from
one Oracle database to another. For example,
CREATE DATABASE LINK sales.us.americas;
917
Distributed Databases
establishes a connection to the sales database in Figure 12 under the network
domain us that comes under domain americas. Using links, a user can access a
remote object on another database subject to ownership rights without the need for
being a user on the remote database.
Data in an Oracle DDBS can be replicated using snapshots or replicated master
tables. Replication is provided at the following levels:
■ Basic replication. Replicas of tables are managed for read-only access. For
updates, data must be accessed at a single primary site.
■ Advanced (symmetric) replication. This extends beyond basic replication
by allowing applications to update table replicas throughout a replicated
DDBS. Data can be read and updated at any site. This requires additional
software called Oracle’s advanced replication option. A snapshot generates a
copy of a part of the table by means of a query called the snapshot defining
query. A simple snapshot definition looks like this:
CREATE SNAPSHOT SALES_ORDERS AS
SELECT * FROM .americas;
Oracle groups snapshots into refresh groups. By specifying a refresh interval, the
snapshot is automatically refreshed periodically at that interval by up to ten
Snapshot Refresh Processes (SNPs). If the defining query of a snapshot contains a
distinct or aggregate function, a GROUP BY or CONNECT BY clause, or join or set
operations, the snapshot is termed a complex snapshot and requires additional
processing. Oracle (up to version 7.3) also supports ROWID snapshots that are
based on physical row identifiers of rows in the master table.
Heterogeneous Databases in Oracle. In a heterogeneous DDBS, at least one
database is a non-Oracle system. Oracle Open Gateways provides access to a non-
Oracle database from an Oracle server, which uses a database link to access data or
to execute remote procedures in the non-Oracle system. The Open Gateways feature
includes the following:
■ Distributed transactions. Under the two-phase commit mechanism, trans-
actions may span Oracle and non-Oracle systems.
■ Transparent SQL access. SQL statements issued by an application are trans-
parently transformed into SQL statements understood by the non-Oracle
system.
■ Pass-through SQL and stored procedures. An application can directly
access a non-Oracle system using that system’s version of SQL. Stored proce-
dures in a non-Oracle SQL-based system are treated as if they were PL/SQL
remote procedures.
■ Global query optimization. Cardinality information, indexes, and so on at
the non-Oracle system are accounted for by the Oracle server query opti-
mizer to perform global query optimization.
■ Procedural access. Procedural systems like messaging or queuing systems
are accessed by the Oracle server using PL/SQL remote procedure calls.
918
Distributed Databases
In addition to the above, data dictionary references are translated to make the non-
Oracle data dictionary appear as a part of the Oracle server’s dictionary. Character
set translations are done between national language character sets to connect multi-
lingual databases.
From a security perspective, Oracle recommends that if a query originates at site A
and accesses sites B, C, and D, then the auditing of links should be done in the data-
base at site A only. This is because the remote databases cannot distinguish whether
a successful connection request and following SQL statements are coming from
another server or a locally connected client.
10.1 Directory Services
A concept closely related with distributed enterprise systems is online directories.
Online directories are essentially a structured organization of metadata needed for
management functions. They can represent information about a variety of sources
ranging from security credentials, shared network resources, and database catalog.
Lightweight Directory Access Protocol (LDAP) is an industry standard protocol
for directory services. LDAP enables the use of a partitioned Directory
Information Tree (DIT) across multiple LDAP servers, which in turn can return
references to other servers as a result of a directory query. Online directories and
LDAP are particularly important in distributed databases, wherein access of meta-
data related to transparencies discussed in Section 1 must be scalable, secure, and
highly available.
Oracle supports LDAP Version 3 and online directories through Oracle Internet
Directory, a general-purpose directory service for fast access and centralized man-
agement of metadata pertaining to distributed network resources and users. It runs
as an application on an Oracle database and communicates with the database
through Oracle Net Services. It also provides password-based, anonymous, and
certificate-based user authentication using SSL Version 3.
Figure 13 illustrates the architecture of the Oracle Internet Directory. The main
components are:
■ Oracle directory server. Handles client requests and updates for informa-
tion pertaining to people and resources.
■ Oracle directory replication server. Stores a copy of the LDAP data from
Oracle directory servers as a backup.
■ Directory administrator: Supports both GUI-based and command line-
based interfaces for directory administration.
11 Summary
In this chapter we provided an introduction to distributed databases. This is a very
broad topic, and we discussed only some of the basic techniques used with distrib-
uted databases. First we discussed the reasons for distribution and the potential
advantages of distributed databases over centralized systems. Then the concept of
919
Distributed Databases
Oracle
Application
Server
Database
Oracle Net
Connections
Oracle
Directory
Replication
Server
Oracle
Directory
ServerLDAP over SSL
LDAP Clients
Directory
Administration
Figure 13
Oracle Internet Directory overview.
Source: From Oracle (2005). Copyright ©
Oracle Corporation 2005. All rights reserved.
distribution transparency and the related concepts of fragmentation transparency
and replication transparency were defined. We categorized DDBMSs by using crite-
ria such as the degree of homogeneity of software modules and the degree of local
autonomy. We distinguished between parallel and distributed system architectures
and then introduced the generic architecture of distributed databases from both a
component as well as a schematic architectural perspective. The issues of federated
database management were then discussed in some detail, focusing on the needs of
supporting various types of autonomies and dealing with semantic heterogeneity.
We also reviewed the client-server architecture concepts and related them to distrib-
uted databases. We discussed the design issues related to data fragmentation, repli-
cation, and distribution, and we distinguished between horizontal and vertical
fragments of relations. The use of data replication to improve system reliability and
availability was then discussed. We illustrated some of the techniques used in dis-
tributed query processing and discussed the cost of communication among sites,
which is considered a major factor in distributed query optimization. The different
techniques for executing joins were compared and we then presented the semijoin
technique for joining relations that reside on different sites. Then we discussed
transaction management, including different commit protocols and operating sys-
tem support for transaction management. We briefly discussed the concurrency
920
Distributed Databases
control and recovery techniques used in DDBMSs, and then reviewed some of the
additional problems that must be dealt with in a distributed environment that do
not appear in a centralized environment. We reviewed catalog management in dis-
tributed databases and summarized their relative advantages and disadvantages. We
then introduced Cloud Computing and Peer to Peer Database Systems as new focus
areas in DDBs in response to the need of managing petabytes of information acces-
sible over the Internet today.
We described some of the facilities in Oracle to support distributed databases. We
also discussed online directories and the LDAP protocol in brief.
Review Questions
1. What are the main reasons for and potential advantages of distributed data-
bases?
2. What additional functions does a DDBMS have over a centralized DBMS?
3. Discuss what is meant by the following terms: degree of homogeneity of a
DDBMS, degree of local autonomy of a DDBMS, federated DBMS, distribution
transparency, fragmentation transparency, replication transparency,
multidatabase system.
4. Discuss the architecture of a DDBMS. Within the context of a centralized
DBMS, briefly explain new components introduced by the distribution of
data.
5. What are the main software modules of a DDBMS? Discuss the main func-
tions of each of these modules in the context of the client-server architec-
ture.
6. Compare the two-tier and three-tier client-server architectures.
7. What is a fragment of a relation? What are the main types of fragments? Why
is fragmentation a useful concept in distributed database design?
8. Why is data replication useful in DDBMSs? What typical units of data are
replicated?
9. What is meant by data allocation in distributed database design? What typi-
cal units of data are distributed over sites?
10. How is a horizontal partitioning of a relation specified? How can a relation
be put back together from a complete horizontal partitioning?
11. How is a vertical partitioning of a relation specified? How can a relation be
put back together from a complete vertical partitioning?
12. Discuss the naming problem in distributed databases.
13. What are the different stages of processing a query in a DDBMS?
14. Discuss the different techniques for executing an equijoin of two files located
at different sites. What main factors affect the cost of data transfer?
921
15. Discuss the semijoin method for executing an equijoin of two files located at
different sites. Under what conditions is an equijoin strategy efficient?
16. Discuss the factors that affect query decomposition. How are guard condi-
tions and attribute lists of fragments used during the query decomposition
process?
17. How is the decomposition of an update request different from the decompo-
sition of a query? How are guard conditions and attribute lists of fragments
used during the decomposition of an update request?
18. List the support offered by operating systems to a DDBMS and also their
benefits.
19. Discuss the factors that do not appear in centralized systems that affect con-
currency control and recovery in distributed systems.
20. Discuss the two-phase commit protocol used for transaction management in
a DDBMS. List its limitations and explain how they are overcome using the
three-phase commit protocol.
21. Compare the primary site method with the primary copy method for dis-
tributed concurrency control. How does the use of backup sites affect each?
22. When are voting and elections used in distributed databases?
23. Discuss catalog management in distributed databases.
24. What are the main challenges facing a traditional DDBMS in the context of
today’s Internet applications? How does cloud computing attempt to address
them?
25. Discuss briefly the support offered by Oracle for homogeneous, heteroge-
neous, and client-server based distributed database architectures.
26. Discuss briefly online directories, their management, and their role in dis-
tributed databases.
Exercises
27. Consider the data distribution of the COMPANY database, where the frag-
ments at sites 2 and 3 are as shown in Figure 9 and the fragments at site 1 are
as shown in Figure A.2. For each of the following queries, show at least two
strategies of decomposing and executing the query. Under what conditions
would each of your strategies work well?
a. For each employee in department 5, retrieve the employee name and the
names of the employee’s dependents.
b. Print the names of all employees who work in department 5 but who
work on some project not controlled by department 5.
Distributed Databases
922
28. Consider the following relations:
BOOKS(Book#, Primary_author, Topic, Total_stock, $price)
BOOKSTORE(Store#, City, State, Zip, Inventory_value)
STOCK(Store#, Book#, Qty)
Total_stock is the total number of books in stock and Inventory_value is the
total inventory value for the store in dollars.
a. Give an example of two simple predicates that would be meaningful for
the BOOKSTORE relation for horizontal partitioning.
b. How would a derived horizontal partitioning of STOCK be defined based
on the partitioning of BOOKSTORE?
c. Show predicates by which BOOKS may be horizontally partitioned by
topic.
d. Show how the STOCK may be further partitioned from the partitions in
(b) by adding the predicates in (c).
29. Consider a distributed database for a bookstore chain called National Books
with three sites called EAST, MIDDLE, and WEST. The relation schemas are
given in Exercise 28. Consider that BOOKS are fragmented by $price
amounts into:
B1: BOOK1: $price up to $20
B2: BOOK2: $price from $20.01 to $50
B3: BOOK3: $price from $50.01 to $100
B4: BOOK4: $price $100.01 and above
Similarly, BOOK_STORES are divided by ZIP Codes into:
S1: EAST: Zip up to 35000
S2: MIDDLE: Zip 35001 to 70000
S3: WEST: Zip 70001 to 99999
Assume that STOCK is a derived fragment based on BOOKSTORE only.
a. Consider the query:
SELECT Book#, Total_stock
FROM Books
WHERE $price > 15 AND $price < 55;
Assume that fragments of BOOKSTORE are nonreplicated and assigned
based on region. Assume further that BOOKS are allocated as:
EAST: B1, B4
MIDDLE: B1, B2
WEST: B1, B2, B3, B4
Assuming the query was submitted in EAST, what remote subqueries does
it generate? (Write in SQL.)
b. If the price of Book#= 1234 is updated from $45 to $55 at site MIDDLE,
what updates does that generate? Write in English and then in SQL.
Distributed Databases
923
Distributed Databases
c. Give a sample query issued at WEST that will generate a subquery for
MIDDLE.
d. Write a query involving selection and projection on the above relations
and show two possible query trees that denote different ways of execu-
tion.
30. Consider that you have been asked to propose a database architecture in a
large organization (General Motors, for example) to consolidate all data
including legacy databases (from hierarchical and network models; no spe-
cific knowledge of these models is needed) as well as relational databases,
which are geographically distributed so that global applications can be sup-
ported. Assume that alternative one is to keep all databases as they are, while
alternative two is to first convert them to relational and then support the
applications over a distributed integrated database.
a. Draw two schematic diagrams for the above alternatives showing the link-
ages among appropriate schemas. For alternative one, choose the
approach of providing export schemas for each database and construct-
ing unified schemas for each application.
b. List the steps that you would have to go through under each alternative
from the present situation until global applications are viable.
c. Compare these from the issues of:
i. design time considerations
ii. runtime considerations
Selected Bibliography
The textbooks by Ceri and Pelagatti (1984a) and Ozsu and Valduriez (1999) are
devoted to distributed databases. Peterson and Davie (2008), Tannenbaum (2003),
and Stallings (2007) cover data communications and computer networks. Comer
(2008) discusses networks and internets. Ozsu et al. (1994) has a collection of
papers on distributed object management.
Most of the research on distributed database design, query processing, and opti-
mization occurred in the 1980s and 1990s; we quickly review the important refer-
ences here. Distributed database design has been addressed in terms of horizontal
and vertical fragmentation, allocation, and replication. Ceri et al. (1982) defined the
concept of minterm horizontal fragments. Ceri et al. (1983) developed an integer
programming-based optimization model for horizontal fragmentation and alloca-
tion. Navathe et al. (1984) developed algorithms for vertical fragmentation based on
attribute affinity and showed a variety of contexts for vertical fragment allocation.
Wilson and Navathe (1986) present an analytical model for optimal allocation of
fragments. Elmasri et al. (1987) discuss fragmentation for the ECR model;
Karlapalem et al. (1996) discuss issues for distributed design of object databases.
Navathe et al. (1996) discuss mixed fragmentation by combining horizontal and
924
Distributed Databases
vertical fragmentation; Karlapalem et al. (1996) present a model for redesign of dis-
tributed databases.
Distributed query processing, optimization, and decomposition are discussed in
Hevner and Yao (1979), Kerschberg et al. (1982), Apers et al. (1983), Ceri and
Pelagatti (1984), and Bodorick et al. (1992). Bernstein and Goodman (1981) discuss
the theory behind semijoin processing. Wong (1983) discusses the use of relation-
ships in relation fragmentation. Concurrency control and recovery schemes are dis-
cussed in Bernstein and Goodman (1981a). Kumar and Hsu (1998) compiles some
articles related to recovery in distributed databases. Elections in distributed systems
are discussed in Garcia-Molina (1982). Lamport (1978) discusses problems with
generating unique timestamps in a distributed system. Rahimi and Haug (2007)
discuss a more flexible way to construct query critical metadata for P2P databases.
Ouzzani and Bouguettaya (2004) outline fundamental problems in distributed
query processing over Web-based data sources.
A concurrency control technique for replicated data that is based on voting is pre-
sented by Thomas (1979). Gifford (1979) proposes the use of weighted voting, and
Paris (1986) describes a method called voting with witnesses. Jajodia and Mutchler
(1990) discuss dynamic voting. A technique called available copy is proposed by
Bernstein and Goodman (1984), and one that uses the idea of a group is presented
in ElAbbadi and Toueg (1988). Other work that discusses replicated data includes
Gladney (1989), Agrawal and ElAbbadi (1990), ElAbbadi and Toueg (1989), Kumar
and Segev (1993), Mukkamala (1989), and Wolfson and Milo (1991). Bassiouni
(1988) discusses optimistic protocols for DDB concurrency control. Garcia-Molina
(1983) and Kumar and Stonebraker (1987) discuss techniques that use the seman-
tics of the transactions. Distributed concurrency control techniques based on lock-
ing and distinguished copies are presented by Menasce et al. (1980) and Minoura
and Wiederhold (1982). Obermark (1982) presents algorithms for distributed
deadlock detection. In more recent work, Vadivelu et al. (2008) propose using
backup mechanism and multilevel security to develop algorithms for improving
concurrency. Madria et al. (2007) propose a mechanism based on a multiversion
two-phase locking scheme and timestamping to address concurrency issues specific
to mobile database systems. Boukerche and Tuck (2001) propose a technique that
allows transactions to be out of order to a limited extent. They attempt to ease the
load on the application developer by exploiting the network environment and pro-
ducing a schedule equivalent to a temporally ordered serial schedule. Han et al.
(2004) propose a deadlock-free and serializable extended Petri net model for Web-
based distributed real-time databases.
A survey of recovery techniques in distributed systems is given by Kohler (1981).
Reed (1983) discusses atomic actions on distributed data. Bhargava (1987) presents
an edited compilation of various approaches and techniques for concurrency and
reliability in distributed systems.
Federated database systems were first defined in McLeod and Heimbigner (1985).
Techniques for schema integration in federated databases are presented by Elmasri
et al. (1986), Batini et al. (1987), Hayne and Ram (1990), and Motro (1987).
925
Distributed Databases
Elmagarmid and Helal (1988) and Gamal-Eldin et al. (1988) discuss the update
problem in heterogeneous DDBSs. Heterogeneous distributed database issues are
discussed in Hsiao and Kamel (1989). Sheth and Larson (1990) present an exhaus-
tive survey of federated database management.
Since late 1980s multidatabase systems and interoperability have become important
topics. Techniques for dealing with semantic incompatibilities among multiple
databases are examined in DeMichiel (1989), Siegel and Madnick (1991),
Krishnamurthy et al. (1991), and Wang and Madnick (1989). Castano et al. (1998)
present an excellent survey of techniques for analysis of schemas. Pitoura et al.
(1995) discuss object orientation in multidatabase systems. Xiao et al. (2003) pro-
pose an XML-based model for a common data model for multidatabase systems
and present a new approach for schema mapping based on this model. Lakshmanan
et al. (2001) propose extending SQL for interoperability and describe the architec-
ture and algorithms for achieving the same.
Transaction processing in multidatabases is discussed in Mehrotra et al. (1992),
Georgakopoulos et al. (1991), Elmagarmid et al. (1990), and Brietbart et al. (1990),
among others. Elmagarmid (1992) discuss transaction processing for advanced
applications, including engineering applications discussed in Heiler et al. (1992).
The workflow systems, which are becoming popular to manage information in com-
plex organizations, use multilevel and nested transactions in conjunction with dis-
tributed databases. Weikum (1991) discusses multilevel transaction management.
Alonso et al. (1997) discuss limitations of current workflow systems. Lopes et al.
(2009) propose that users define and execute their own workflows using a client-
side Web browser. They attempt to leverage Web 2.0 trends to simplify the user’s
work for workflow management. Jung and Yeom (2008) exploit data workflow to
develop an improved transaction management system that provides simultaneous,
transparent access to the heterogeneous storages that constitute the HVEM
DataGrid. Deelman and Chervanak (2008) list the challenges in data-intensive sci-
entific workflows. Specifically, they look at automated management of data, effi-
cient mapping techniques, and user feedback issues in workflow mapping. They
also argue for data reuse as an efficient means to manage data and present the chal-
lenges therein.
A number of experimental distributed DBMSs have been implemented. These
include distributed INGRES by Epstein et al., (1978), DDTS by Devor and
Weeldreyer, (1980), SDD-1 by Rothnie et al., (1980), System R* by Lindsay et al.,
(1984), SIRIUS-DELTA by Ferrier and Stangret, (1982), and MULTIBASE by Smith
et al., (1981). The OMNIBASE system by Rusinkiewicz et al. (1988) and the
Federated Information Base developed using the Candide data model by Navathe et
al. (1994) are examples of federated DDBMSs. Pitoura et al. (1995) present a com-
parative survey of the federated database system prototypes. Most commercial
DBMS vendors have products using the client-server approach and offer distributed
versions of their systems. Some system issues concerning client-server DBMS archi-
tectures are discussed in Carey et al. (1991), DeWitt et al. (1990), and Wang and
Rowe (1991). Khoshafian et al. (1992) discuss design issues for relational DBMSs in
926
the client-server environment. Client-server management issues are discussed in
many books, such as Zantinge and Adriaans (1996). Di Stefano (2005) discusses
data distribution issues specific to grid computing. A major part of this discussion
may also apply to cloud computing.
Distributed Databases
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure A.1
Schema diagram for the
COMPANY relational
database schema.
927
DEPT_LOCATIONS
Dnumber
Houston
Stafford
Bellaire
Sugarland
Dlocation
DEPARTMENT
Dname
Research
Administration
Headquarters 1
5
4
888665555
333445555
987654321
1981-06-19
1988-05-22
1995-01-01
Dnumber Mgr_ssn Mgr_start_date
WORKS_ON
Essn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
Pno Hours
PROJECT
Pname
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
3
1
2
30
10
20
5
5
5
4
4
1
Houston
Bellaire
Sugarland
Stafford
Stafford
Houston
Pnumber Plocation Dnum
DEPENDENT
333445555
333445555
333445555
987654321
123456789
123456789
123456789
Joy
Alice F
M
F
M
M
F
F
1986-04-05
1983-10-25
1958-05-03
1942-02-28
1988-01-04
1988-12-30
1967-05-05
Theodore
Alice
Elizabeth
Abner
Michael
Spouse
Daughter
Son
Daughter
Spouse
Spouse
Son
Dependent_name Sex Bdate Relationship
EMPLOYEE
Fname
John
Franklin
Jennifer
Alicia
Ramesh
Joyce
James
Ahmad
Narayan
English
Borg
Jabbar
666884444
453453453
888665555
987987987
F
F
M
M
M
M
M
F
4
4
5
5
4
1
5
5
25000
43000
30000
40000
25000
55000
38000
25000
987654321
888665555
333445555
888665555
987654321
NULL
333445555
333445555
Zelaya
Wallace
Smith
Wong
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
731 Fondren, Houston, TX
638 Voss, Houston, TX
1968-01-19
1941-06-20
1965-01-09
1955-12-08
1969-03-29
1937-11-10
1962-09-15
1972-07-31
980 Dallas, Houston, TX
450 Stone, Houston, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
999887777
987654321
123456789
333445555
Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn
B
T
J
S
K
A
V
E
Houston
1
4
5
5
Essn
5
Figure A.2
One possible database state for the COMPANY relational database schema.
928
Enhanced Data Models
for Advanced Applications
As the use of database systems has grown, users havedemanded additional functionality from these
software packages, with the purpose of making it easier to implement more
advanced and complex user applications. Object-oriented databases and object-
relational systems do provide features that allow users to extend their systems by
specifying additional abstract data types for each application. However, it is quite
useful to identify certain common features for some of these advanced applications
and to create models that can represent them. Additionally, specialized storage
structures and indexing methods can be implemented to improve the performance
of these common features. Then the features can be implemented as abstract data
types or class libraries and purchased separately from the basic DBMS software
package. The term data blade has been used in Informix and cartridge in Oracle to
refer to such optional submodules that can be included in a DBMS (database man-
agement system) package. Users can utilize these features directly if they are suitable
for their applications, without having to reinvent, reimplement, and reprogram
such common features.
This chapter introduces database concepts for some of the common features that
are needed by advanced applications and are being used widely. We will cover active
rules that are used in active database applications, temporal concepts that are used in
temporal database applications, and, briefly, some of the issues involving spatial
databases and multimedia databases. We will also discuss deductive databases. It is
important to note that each of these topics is very broad, and we give only a brief
introduction to each. In fact, each of these areas can serve as the sole topic of a com-
plete book.
In Section 1 we introduce the topic of active databases, which provide additional
functionality for specifying active rules. These rules can be automatically triggered
From Chapter 26 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
929
Enhanced Data Models for Advanced Applications
by events that occur, such as database updates or certain times being reached, and
can initiate certain actions that have been specified in the rule declaration to occur
if certain conditions are met. Many commercial packages include some of the func-
tionality provided by active databases in the form of triggers. Triggers are now part
of the SQL-99 and later standards.
In Section 2 we introduce the concepts of temporal databases, which permit the
database system to store a history of changes, and allow users to query both current
and past states of the database. Some temporal database models also allow users to
store future expected information, such as planned schedules. It is important to
note that many database applications are temporal, but they are often implemented
without having much temporal support from the DBMS package—that is, the tem-
poral concepts are implemented in the application programs that access the data-
base.
Section 3 gives a brief overview of spatial database concepts. We discuss types of
spatial data, different kinds of spatial analyses, operations on spatial data, types of
spatial queries, spatial data indexing, spatial data mining, and applications of spatial
databases.
Section 4 is devoted to multimedia database concepts. Multimedia databases pro-
vide features that allow users to store and query different types of multimedia infor-
mation, which includes images (such as pictures and drawings), video clips (such
as movies, newsreels, and home videos), audio clips (such as songs, phone mes-
sages, and speeches), and documents (such as books and articles). We discuss auto-
matic analysis of images, object recognition in images, and semantic tagging of
images.
In Section 5 we discuss deductive databases,1 an area that is at the intersection of
databases, logic, and artificial intelligence or knowledge bases. A deductive data-
base system includes capabilities to define (deductive) rules, which can deduce or
infer additional information from the facts that are stored in a database. Because
part of the theoretical foundation for some deductive database systems is mathe-
matical logic, such rules are often referred to as logic databases. Other types of sys-
tems, referred to as expert database systems or knowledge-based systems, also
incorporate reasoning and inferencing capabilities; such systems use techniques
that were developed in the field of artificial intelligence, including semantic
networks, frames, production systems, or rules for capturing domain-specific
knowledge. Section 6 summarizes the chapter.
Readers may choose to peruse the particular topics they are interested in, as the sec-
tions in this chapter are practically independent of one another.
1Section 5 is only a summary of Deductive Databases; a chapter by this author is in a prior edition.
930
Enhanced Data Models for Advanced Applications
1 Active Database Concepts and Triggers
Rules that specify actions that are automatically triggered by certain events have
been considered important enhancements to database systems for quite some time.
In fact, the concept of triggers—a technique for specifying certain types of active
rules—has existed in early versions of the SQL specification for relational databases
and triggers are now part of the SQL-99 and later standards. Commercial relational
DBMSs—such as Oracle, DB2, and Microsoft SQLServer—have various versions of
triggers available. However, much research into what a general model for active
databases should look like has been done since the early models of triggers were
proposed. In Section 1.1 we will present the general concepts that have been pro-
posed for specifying rules for active databases. We will use the syntax of the Oracle
commercial relational DBMS to illustrate these concepts with specific examples,
since Oracle triggers are close to the way rules are specified in the SQL standard.
Section 1.2 will discuss some general design and implementation issues for active
databases. We give examples of how active databases are implemented in the STAR-
BURST experimental DBMS in Section 1.3, since STARBURST provides for many
of the concepts of generalized active databases within its framework. Section 1.4
discusses possible applications of active databases. Finally, Section 1.5 describes how
triggers are declared in the SQL-99 standard.
1.1 Generalized Model for Active Databases
and Oracle Triggers
The model that has been used to specify active database rules is referred to as the
Event-Condition-Action (ECA) model. A rule in the ECA model has three compo-
nents:
1. The event(s) that triggers the rule: These events are usually database update
operations that are explicitly applied to the database. However, in the general
model, they could also be temporal events2 or other kinds of external events.
2. The condition that determines whether the rule action should be executed:
Once the triggering event has occurred, an optional condition may be evalu-
ated. If no condition is specified, the action will be executed once the event
occurs. If a condition is specified, it is first evaluated, and only if it evaluates
to true will the rule action be executed.
3. The action to be taken: The action is usually a sequence of SQL statements,
but it could also be a database transaction or an external program that will
be automatically executed.
Let us consider some examples to illustrate these concepts. The examples are based
on a much simplified variation of a COMPANY database application shown in
Figure 1, with each employee having a name (Name), Social Security number (Ssn),
2An example would be a temporal event specified as a periodic time, such as: Trigger this rule every day
at 5:30 A.M.
931
Enhanced Data Models for Advanced Applications
Name Ssn Salary Dno Supervisor_ssn
EMPLOYEE
Dname Dno Total_sal Manager_ssn
DEPARTMENT
Figure 1
A simplified COMPANY
database used for active
rule examples.
salary (Salary), department to which they are currently assigned (Dno, a foreign key
to DEPARTMENT), and a direct supervisor (Supervisor_ssn, a (recursive) foreign key
to EMPLOYEE). For this example, we assume that NULL is allowed for Dno, indicat-
ing that an employee may be temporarily unassigned to any department. Each
department has a name (Dname), number (Dno), the total salary of all employees
assigned to the department (Total_sal), and a manager (Manager_ssn, which is a for-
eign key to EMPLOYEE).
Notice that the Total_sal attribute is really a derived attribute, whose value should be
the sum of the salaries of all employees who are assigned to the particular depart-
ment. Maintaining the correct value of such a derived attribute can be done via an
active rule. First we have to determine the events that may cause a change in the
value of Total_sal, which are as follows:
1. Inserting (one or more) new employee tuples
2. Changing the salary of (one or more) existing employees
3. Changing the assignment of existing employees from one department to
another
4. Deleting (one or more) employee tuples
In the case of event 1, we only need to recompute Total_sal if the new employee is
immediately assigned to a department—that is, if the value of the Dno attribute for
the new employee tuple is not NULL (assuming NULL is allowed for Dno). Hence, this
would be the condition to be checked. A similar condition could be checked for
event 2 (and 4) to determine whether the employee whose salary is changed (or who
is being deleted) is currently assigned to a department. For event 3, we will always
execute an action to maintain the value of Total_sal correctly, so no condition is
needed (the action is always executed).
The action for events 1, 2, and 4 is to automatically update the value of Total_sal for
the employee’s department to reflect the newly inserted, updated, or deleted
employee’s salary. In the case of event 3, a twofold action is needed: one to update
the Total_sal of the employee’s old department and the other to update the Total_sal
of the employee’s new department.
The four active rules (or triggers) R1, R2, R3, and R4—corresponding to the above
situation—can be specified in the notation of the Oracle DBMS as shown in Figure
2(a). Let us consider rule R1 to illustrate the syntax of creating triggers in Oracle.
932
Enhanced Data Models for Advanced Applications
(a) R1: CREATE TRIGGER Total_sal1
AFTER INSERT ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary
WHERE Dno = NEW.Dno;
R2: CREATE TRIGGER Total_sal2
AFTER UPDATE OF Salary ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary – OLD.Salary
WHERE Dno = NEW.Dno;
R3: CREATE TRIGGER Total_sal3
AFTER UPDATE OF Dno ON EMPLOYEE
FOR EACH ROW
BEGIN
UPDATE DEPARTMENT
SET Total_sal = Total_sal + NEW.Salary
WHERE Dno = NEW.Dno;
UPDATE DEPARTMENT
SET Total_sal = Total_sal – OLD.Salary
WHERE Dno = OLD.Dno;
END;
R4: CREATE TRIGGER Total_sal4
AFTER DELETE ON EMPLOYEE
FOR EACH ROW
WHEN ( OLD.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal – OLD.Salary
WHERE Dno = OLD.Dno;
(b) R5: CREATE TRIGGER Inform_supervisor1
BEFORE INSERT OR UPDATE OF Salary, Supervisor_ssn
ON EMPLOYEE
FOR EACH ROW
WHEN ( NEW.Salary > ( SELECT Salary FROM EMPLOYEE
WHERE Ssn = NEW.Supervisor_ssn ) )
inform_supervisor(NEW.Supervisor_ssn, NEW.Ssn );
Figure 2
Specifying active rules
as triggers in Oracle
notation. (a) Triggers
for automatically main-
taining the consistency
of Total_sal of
DEPARTMENT. (b)
Trigger for comparing
an employee’s salary
with that of his or her
supervisor.
933
Enhanced Data Models for Advanced Applications
The CREATE TRIGGER statement specifies a trigger (or active rule) name—
Total_sal1 for R1. The AFTER clause specifies that the rule will be triggered after the
events that trigger the rule occur. The triggering events—an insert of a new
employee in this example—are specified following the AFTER keyword.3
The ON clause specifies the relation on which the rule is specified—EMPLOYEE for
R1. The optional keywords FOR EACH ROW specify that the rule will be triggered
once for each row that is affected by the triggering event.4
The optional WHEN clause is used to specify any conditions that need to be checked
after the rule is triggered, but before the action is executed. Finally, the action(s) to
be taken is (are) specified as a PL/SQL block, which typically contains one or more
SQL statements or calls to execute external procedures.
The four triggers (active rules) R1, R2, R3, and R4 illustrate a number of features of
active rules. First, the basic events that can be specified for triggering the rules are
the standard SQL update commands: INSERT, DELETE, and UPDATE. They are spec-
ified by the keywords INSERT, DELETE, and UPDATE in Oracle notation. In the case
of UPDATE, one may specify the attributes to be updated—for example, by writing
UPDATE OF Salary, Dno. Second, the rule designer needs to have a way to refer to the
tuples that have been inserted, deleted, or modified by the triggering event. The key-
words NEW and OLD are used in Oracle notation; NEW is used to refer to a newly
inserted or newly updated tuple, whereas OLD is used to refer to a deleted tuple or to
a tuple before it was updated.
Thus, rule R1 is triggered after an INSERT operation is applied to the EMPLOYEE
relation. In R1, the condition (NEW.Dno IS NOT NULL) is checked, and if it evaluates
to true, meaning that the newly inserted employee tuple is related to a department,
then the action is executed. The action updates the DEPARTMENT tuple(s) related to
the newly inserted employee by adding their salary (NEW.Salary) to the Total_sal
attribute of their related department.
Rule R2 is similar to R1, but it is triggered by an UPDATE operation that updates the
SALARY of an employee rather than by an INSERT. Rule R3 is triggered by an update
to the Dno attribute of EMPLOYEE, which signifies changing an employee’s assign-
ment from one department to another. There is no condition to check in R3, so the
action is executed whenever the triggering event occurs. The action updates both
the old department and new department of the reassigned employees by adding
their salary to Total_sal of their new department and subtracting their salary from
Total_sal of their old department. Note that this should work even if the value of Dno
is NULL, because in this case no department will be selected for the rule action.5
3As we will see, it is also possible to specify BEFORE instead of AFTER, which indicates that the rule is
triggered before the triggering event is executed.
4Again, we will see that an alternative is to trigger the rule only once even if multiple rows (tuples) are
affected by the triggering event.
5R1, R2, and R4 can also be written without a condition. However, it may be more efficient to execute
them with the condition since the action is not invoked unless it is required.
934
Enhanced Data Models for Advanced Applications
::= CREATE TRIGGER
( AFTER I BEFORE ) ON
[ FOR EACH ROW ]
[ WHEN ]
;
::= {OR }
::= INSERT I DELETE I UPDATE [ OF { , } ]
::=
Figure 3
A syntax summary for specifying triggers in the Oracle system (main options only).
It is important to note the effect of the optional FOR EACH ROW clause, which sig-
nifies that the rule is triggered separately for each tuple. This is known as a row-level
trigger. If this clause was left out, the trigger would be known as a statement-level
trigger and would be triggered once for each triggering statement. To see the differ-
ence, consider the following update operation, which gives a 10 percent raise to all
employees assigned to department 5. This operation would be an event that triggers
rule R2:
UPDATE EMPLOYEE
SET Salary = 1.1 * Salary
WHERE Dno = 5;
Because the above statement could update multiple records, a rule using row-level
semantics, such as R2 in Figure 2, would be triggered once for each row, whereas a
rule using statement-level semantics is triggered only once. The Oracle system allows
the user to choose which of the above options is to be used for each rule. Including
the optional FOR EACH ROW clause creates a row-level trigger, and leaving it out
creates a statement-level trigger. Note that the keywords NEW and OLD can only be
used with row-level triggers.
As a second example, suppose we want to check whenever an employee’s salary is
greater than the salary of his or her direct supervisor. Several events can trigger this
rule: inserting a new employee, changing an employee’s salary, or changing an
employee’s supervisor. Suppose that the action to take would be to call an external
procedure inform_supervisor,6 which will notify the supervisor. The rule could then
be written as in R5 (see Figure 2(b)).
Figure 3 shows the syntax for specifying some of the main options available in
Oracle triggers. We will describe the syntax for triggers in the SQL-99 standard in
Section 1.5.
6Assuming that an appropriate external procedure has been declared. This is a feature that is available in
SQL-99 and later standards.
935
1.2 Design and Implementation Issues
for Active Databases
The previous section gave an overview of some of the main concepts for specifying
active rules. In this section, we discuss some additional issues concerning how rules
are designed and implemented. The first issue concerns activation, deactivation,
and grouping of rules. In addition to creating rules, an active database system
should allow users to activate, deactivate, and drop rules by referring to their rule
names. A deactivated rule will not be triggered by the triggering event. This feature
allows users to selectively deactivate rules for certain periods of time when they are
not needed. The activate command will make the rule active again. The drop com-
mand deletes the rule from the system. Another option is to group rules into named
rule sets, so the whole set of rules can be activated, deactivated, or dropped. It is also
useful to have a command that can trigger a rule or rule set via an explicit PROCESS
RULES command issued by the user.
The second issue concerns whether the triggered action should be executed before,
after, instead of, or concurrently with the triggering event. A before trigger executes
the trigger before executing the event that caused the trigger. It can be used in appli-
cations such as checking for constraint violations. An after trigger executes the trig-
ger after executing the event, and it can be used in applications such as maintaining
derived data and monitoring for specific events and conditions. An instead of trig-
ger executes the trigger instead of executing the event, and it can be used in applica-
tions such as executing corresponding updates on base relations in response to an
event that is an update of a view.
A related issue is whether the action being executed should be considered as a separate
transaction or whether it should be part of the same transaction that triggered the
rule. We will try to categorize the various options. It is important to note that not all
options may be available for a particular active database system. In fact, most com-
mercial systems are limited to one or two of the options that we will now discuss.
Let us assume that the triggering event occurs as part of a transaction execution. We
should first consider the various options for how the triggering event is related to
the evaluation of the rule’s condition. The rule condition evaluation is also known as
rule consideration, since the action is to be executed only after considering whether
the condition evaluates to true or false. There are three main possibilities for rule
consideration:
1. Immediate consideration. The condition is evaluated as part of the same
transaction as the triggering event, and is evaluated immediately. This case
can be further categorized into three options:
■ Evaluate the condition before executing the triggering event.
■ Evaluate the condition after executing the triggering event.
■ Evaluate the condition instead of executing the triggering event.
2. Deferred consideration. The condition is evaluated at the end of the trans-
action that included the triggering event. In this case, there could be many
triggered rules waiting to have their conditions evaluated.
Enhanced Data Models for Advanced Applications
936
Enhanced Data Models for Advanced Applications
3. Detached consideration. The condition is evaluated as a separate transac-
tion, spawned from the triggering transaction.
The next set of options concerns the relationship between evaluating the rule condi-
tion and executing the rule action. Here, again, three options are possible:
immediate, deferred, or detached execution. Most active systems use the first
option. That is, as soon as the condition is evaluated, if it returns true, the action is
immediately executed.
The Oracle system (see Section 1.1) uses the immediate consideration model, but it
allows the user to specify for each rule whether the before or after option is to be
used with immediate condition evaluation. It also uses the immediate execution
model. The STARBURST system (see Section 1.3) uses the deferred consideration
option, meaning that all rules triggered by a transaction wait until the triggering
transaction reaches its end and issues its COMMIT WORK command before the rule
conditions are evaluated.7
Another issue concerning active database rules is the distinction between row-level
rules and statement-level rules. Because SQL update statements (which act as trig-
gering events) can specify a set of tuples, one has to distinguish between whether the
rule should be considered once for the whole statement or whether it should be con-
sidered separately for each row (that is, tuple) affected by the statement. The SQL-99
standard (see Section 1.5) and the Oracle system (see Section 1.1) allow the user to
choose which of the options is to be used for each rule, whereas STARBURST uses
statement-level semantics only. We will give examples of how statement-level trig-
gers can be specified in Section 1.3.
One of the difficulties that may have limited the widespread use of active rules, in
spite of their potential to simplify database and software development, is that there
are no easy-to-use techniques for designing, writing, and verifying rules. For exam-
ple, it is quite difficult to verify that a set of rules is consistent, meaning that two or
more rules in the set do not contradict one another. It is also difficult to guarantee
termination of a set of rules under all circumstances. To illustrate the termination
R1: CREATE TRIGGER T1
AFTER INSERT ON TABLE1
FOR EACH ROW
UPDATE TABLE2
SET Attribute1 = … ;
R2: CREATE TRIGGER T2
AFTER UPDATE OF Attribute1 ON TABLE2
FOR EACH ROW
INSERT INTO TABLE1 VALUES ( … );
Figure 4
An example to illus-
trate the termination
problem for active
rules.
7STARBURST also allows the user to start rule consideration explicitly via a PROCESS RULES com-
mand.
937
Enhanced Data Models for Advanced Applications
problem briefly, consider the rules in Figure 4. Here, rule R1 is triggered by an
INSERT event on TABLE1 and its action includes an update event on Attribute1 of
TABLE2. However, rule R2’s triggering event is an UPDATE event on Attribute1 of
TABLE2, and its action includes an INSERT event on TABLE1. In this example, it is
easy to see that these two rules can trigger one another indefinitely, leading to non-
termination. However, if dozens of rules are written, it is very difficult to determine
whether termination is guaranteed or not.
If active rules are to reach their potential, it is necessary to develop tools for the
design, debugging, and monitoring of active rules that can help users design and
debug their rules.
1.3 Examples of Statement-Level Active Rules
in STARBURST
We now give some examples to illustrate how rules can be specified in the STAR-
BURST experimental DBMS. This will allow us to demonstrate how statement-level
rules can be written, since these are the only types of rules allowed in STARBURST.
The three active rules R1S, R2S, and R3S in Figure 5 correspond to the first three
rules in Figure 2, but they use STARBURST notation and statement-level semantics.
We can explain the rule structure using rule R1S. The CREATE RULE statement
specifies a rule name—Total_sal1 for R1S. The ON clause specifies the relation on
which the rule is specified—EMPLOYEE for R1S. The WHEN clause is used to spec-
ify the events that trigger the rule.8 The optional IF clause is used to specify any
conditions that need to be checked. Finally, the THEN clause is used to specify the
actions to be taken, which are typically one or more SQL statements.
In STARBURST, the basic events that can be specified for triggering the rules are the
standard SQL update commands: INSERT, DELETE, and UPDATE. These are speci-
fied by the keywords INSERTED, DELETED, and UPDATED in STARBURST nota-
tion. Second, the rule designer needs to have a way to refer to the tuples that have
been modified. The keywords INSERTED, DELETED, NEW-UPDATED, and OLD-
UPDATED are used in STARBURST notation to refer to four transition tables (rela-
tions) that include the newly inserted tuples, the deleted tuples, the updated tuples
before they were updated, and the updated tuples after they were updated, respec-
tively. Obviously, depending on the triggering events, only some of these transition
tables may be available. The rule writer can refer to these tables when writing the
condition and action parts of the rule. Transition tables contain tuples of the same
type as those in the relation specified in the ON clause of the rule—for R1S, R2S,
and R3S, this is the EMPLOYEE relation.
In statement-level semantics, the rule designer can only refer to the transition tables
as a whole and the rule is triggered only once, so the rules must be written differ-
ently than for row-level semantics. Because multiple employee tuples may be
8Note that the WHEN keyword specifies events in STARBURST but is used to specify the rule condition
in SQL and Oracle triggers.
938
Enhanced Data Models for Advanced Applications
R1S: CREATE RULE Total_sal1 ON EMPLOYEE
WHEN INSERTED
IF EXISTS ( SELECT * FROM INSERTED WHERE Dno IS NOT NULL )
THEN UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal +
( SELECT SUM (I.Salary) FROM INSERTED AS I WHERE D.Dno = I.Dno )
WHERE D.Dno IN ( SELECT Dno FROM INSERTED );
R2S: CREATE RULE Total_sal2 ON EMPLOYEE
WHEN UPDATED ( Salary )
IF EXISTS ( SELECT * FROM NEW-UPDATED WHERE Dno IS NOT NULL )
OR EXISTS ( SELECT * FROM OLD-UPDATED WHERE Dno IS NOT NULL )
THEN UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal +
( SELECT SUM (N.Salary) FROM NEW-UPDATED AS N
WHERE D.Dno = N.Dno ) –
( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O
WHERE D.Dno = O.Dno )
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED ) OR
D.Dno IN ( SELECT Dno FROM OLD-UPDATED );
R3S: CREATE RULE Total_sal3 ON EMPLOYEE
WHEN UPDATED ( Dno )
THEN UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal +
( SELECT SUM (N.Salary) FROM NEW-UPDATED AS N
WHERE D.Dno = N.Dno )
WHERE D.Dno IN ( SELECT Dno FROM NEW-UPDATED );
UPDATE DEPARTMENT AS D
SET D.Total_sal = Total_sal –
( SELECT SUM (O.Salary) FROM OLD-UPDATED AS O
WHERE D.Dno = O.Dno )
WHERE D.Dno IN ( SELECT Dno FROM OLD-UPDATED );
Figure 5
Active rules using statement-level semantics in STARBURST notation.
inserted in a single insert statement, we have to check if at least one of the newly
inserted employee tuples is related to a department. In R1S, the condition
EXISTS (SELECT * FROM INSERTED WHERE Dno IS NOT NULL )
is checked, and if it evaluates to true, then the action is executed. The action updates
in a single statement the DEPARTMENT tuple(s) related to the newly inserted
employee(s) by adding their salaries to the Total_sal attribute of each related depart-
ment. Because more than one newly inserted employee may belong to the same
939
Enhanced Data Models for Advanced Applications
department, we use the SUM aggregate function to ensure that all their salaries are
added.
Rule R2S is similar to R1S, but is triggered by an UPDATE operation that updates the
salary of one or more employees rather than by an INSERT. Rule R3S is triggered by
an update to the Dno attribute of EMPLOYEE, which signifies changing one or more
employees’ assignment from one department to another. There is no condition in
R3S, so the action is executed whenever the triggering event occurs.9 The action
updates both the old department(s) and new department(s) of the reassigned
employees by adding their salary to Total_sal of each new department and subtract-
ing their salary from Total_sal of each old department.
In our example, it is more complex to write the statement-level rules than the row-
level rules, as can be illustrated by comparing Figures 2 and 5. However, this is not a
general rule, and other types of active rules may be easier to specify when using
statement-level notation than when using row-level notation.
The execution model for active rules in STARBURST uses deferred consideration.
That is, all the rules that are triggered within a transaction are placed in a set—
called the conflict set—which is not considered for evaluation of conditions and
execution until the transaction ends (by issuing its COMMIT WORK command).
STARBURST also allows the user to explicitly start rule consideration in the middle
of a transaction via an explicit PROCESS RULES command. Because multiple rules
must be evaluated, it is necessary to specify an order among the rules. The syntax for
rule declaration in STARBURST allows the specification of ordering among the
rules to instruct the system about the order in which a set of rules should be consid-
ered.10 Additionally, the transition tables—INSERTED, DELETED, NEW-UPDATED,
and OLD-UPDATED—contain the net effect of all the operations within the transac-
tion that affected each table, since multiple operations may have been applied to
each table during the transaction.
1.4 Potential Applications for Active Databases
We now briefly discuss some of the potential applications of active rules. Obviously,
one important application is to allow notification of certain conditions that occur.
For example, an active database may be used to monitor, say, the temperature of an
industrial furnace. The application can periodically insert in the database the tem-
perature reading records directly from temperature sensors, and active rules can be
written that are triggered whenever a temperature record is inserted, with a condi-
tion that checks if the temperature exceeds the danger level, and results in the action
to raise an alarm.
9As in the Oracle examples, rules R1S and R2S can be written without a condition. However, it may be
more efficient to execute them with the condition since the action is not invoked unless it is required.
10If no order is specified between a pair of rules, the system default order is based on placing the rule
declared first ahead of the other rule.
940
Enhanced Data Models for Advanced Applications
Active rules can also be used to enforce integrity constraints by specifying the types
of events that may cause the constraints to be violated and then evaluating appro-
priate conditions that check whether the constraints are actually violated by the
event or not. Hence, complex application constraints, often known as business
rules, may be enforced that way. For example, in a UNIVERSITY database applica-
tion, one rule may monitor the GPA of students whenever a new grade is entered,
and it may alert the advisor if the GPA of a student falls below a certain threshold;
another rule may check that course prerequisites are satisfied before allowing a stu-
dent to enroll in a course; and so on.
Other applications include the automatic maintenance of derived data, such as the
examples of rules R1 through R4 that maintain the derived attribute Total_sal when-
ever individual employee tuples are changed. A similar application is to use active
rules to maintain the consistency of materialized views whenever the base relations
are modified. Alternately, an update operation specified on a view can be a trigger-
ing event, which can be converted to updates on the base relations by using an
instead of trigger. These applications are also relevant to the new data warehousing
technologies. A related application maintains that replicated tables are consistent
by specifying rules that modify the replicas whenever the master table is modified.
1.5 Triggers in SQL-99
Triggers in the SQL-99 and later standards are quite similar to the examples we dis-
cussed in Section 1.1, with some minor syntactic differences. The basic events that
can be specified for triggering the rules are the standard SQL update commands:
INSERT, DELETE, and UPDATE. In the case of UPDATE, one may specify the attributes
to be updated. Both row-level and statement-level triggers are allowed, indicated in
the trigger by the clauses FOR EACH ROW and FOR EACH STATEMENT, respectively.
One syntactic difference is that the trigger may specify particular tuple variable
names for the old and new tuples instead of using the keywords NEW and OLD, as
shown in Figure 1. Trigger T1 in Figure 6 shows how the row-level trigger R2 from
Figure 1(a) may be specified in SQL-99. Inside the REFERENCING clause, we
named tuple variables (aliases) O and N to refer to the OLD tuple (before modifica-
tion) and NEW tuple (after modification), respectively. Trigger T2 in Figure 6 shows
how the statement-level trigger R2S from Figure 5 may be specified in SQL-99. For
a statement-level trigger, the REFERENCING clause is used to refer to the table of all
new tuples (newly inserted or newly updated) as N, whereas the table of all old
tuples (deleted tuples or tuples before they were updated) is referred to as O.
2 Temporal Database Concepts
Temporal databases, in the broadest sense, encompass all database applications that
require some aspect of time when organizing their information. Hence, they
provide a good example to illustrate the need for developing a set of unifying con-
cepts for application developers to use. Temporal database applications have been
941
Enhanced Data Models for Advanced Applications
developed since the early days of database usage. However, in creating these applica-
tions, it is mainly left to the application designers and developers to discover, design,
program, and implement the temporal concepts they need. There are many exam-
ples of applications where some aspect of time is needed to maintain the informa-
tion in a database. These include healthcare, where patient histories need to be
maintained; insurance, where claims and accident histories are required as well as
information about the times when insurance policies are in effect; reservation sys-
tems in general (hotel, airline, car rental, train, and so on), where information on the
dates and times when reservations are in effect are required; scientific databases,
where data collected from experiments includes the time when each data is meas-
ured; and so on. Even the two examples used in this text may be easily expanded into
temporal applications. In the COMPANY database, we may wish to keep SALARY,
JOB, and PROJECT histories on each employee. In the UNIVERSITY database, time is
already included in the SEMESTER and YEAR of each SECTION of a COURSE, the
grade history of a STUDENT, and the information on research grants. In fact, it is
realistic to conclude that the majority of database applications have some temporal
information. However, users often attempt to simplify or ignore temporal aspects
because of the complexity that they add to their applications.
In this section, we will introduce some of the concepts that have been developed to
deal with the complexity of temporal database applications. Section 2.1 gives an
overview of how time is represented in databases, the different types of temporal
T1: CREATE TRIGGER Total_sal1
AFTER UPDATE OF Salary ON EMPLOYEE
REFERENCING OLD ROW AS O, NEW ROW AS N
FOR EACH ROW
WHEN ( N.Dno IS NOT NULL )
UPDATE DEPARTMENT
SET Total_sal = Total_sal + N.salary – O.salary
WHERE Dno = N.Dno;
T2: CREATE TRIGGER Total_sal2
AFTER UPDATE OF Salary ON EMPLOYEE
REFERENCING OLD TABLE AS O, NEW TABLE AS N
FOR EACH STATEMENT
WHEN EXISTS ( SELECT *FROM N WHERE N.Dno IS NOT NULL ) OR
EXISTS ( SELECT * FROM O WHERE O.Dno IS NOT NULL )
UPDATE DEPARTMENT AS D
SET D.Total_sal = D.Total_sal
+ ( SELECT SUM (N.Salary) FROM N WHERE D.Dno=N.Dno )
– ( SELECT SUM (O.Salary) FROM O WHERE D.Dno=O.Dno )
WHERE Dno IN ( ( SELECT Dno FROM N ) UNION ( SELECT Dno FROM O ) );
Figure 6
Trigger T1 illustrating
the syntax for defining
triggers in SQL-99.
942
Enhanced Data Models for Advanced Applications
information, and some of the different dimensions of time that may be needed.
Section 2.2 discusses how time can be incorporated into relational databases.
Section 2.3 gives some additional options for representing time that are possible in
database models that allow complex-structured objects, such as object databases.
Section 2.4 introduces operations for querying temporal databases, and gives a brief
overview of the TSQL2 language, which extends SQL with temporal concepts.
Section 2.5 focuses on time series data, which is a type of temporal data that is very
important in practice.
2.1 Time Representation, Calendars,
and Time Dimensions
For temporal databases, time is considered to be an ordered sequence of points in
some granularity that is determined by the application. For example, suppose that
some temporal application never requires time units that are less than one second.
Then, each time point represents one second using this granularity. In reality, each
second is a (short) time duration, not a point, since it may be further divided into
milliseconds, microseconds, and so on. Temporal database researchers have used the
term chronon instead of point to describe this minimal granularity for a particular
application. The main consequence of choosing a minimum granularity—say, one
second—is that events occurring within the same second will be considered to be
simultaneous events, even though in reality they may not be.
Because there is no known beginning or ending of time, one needs a reference point
from which to measure specific time points. Various calendars are used by various
cultures (such as Gregorian (western), Chinese, Islamic, Hindu, Jewish, Coptic, and
so on) with different reference points. A calendar organizes time into different time
units for convenience. Most calendars group 60 seconds into a minute, 60 minutes
into an hour, 24 hours into a day (based on the physical time of earth’s rotation
around its axis), and 7 days into a week. Further grouping of days into months and
months into years either follow solar or lunar natural phenomena, and are generally
irregular. In the Gregorian calendar, which is used in most western countries, days
are grouped into months that are 28, 29, 30, or 31 days, and 12 months are grouped
into a year. Complex formulas are used to map the different time units to one
another.
In SQL2, the temporal data types include DATE (specifying Year, Month, and Day as
YYYY-MM-DD), TIME (specifying Hour, Minute, and Second as HH:MM:SS),
TIMESTAMP (specifying a Date/Time combination, with options for including sub-
second divisions if they are needed), INTERVAL (a relative time duration, such as
10 days or 250 minutes), and PERIOD (an anchored time duration with a fixed start-
ing point, such as the 10-day period from January 1, 2009, to January 10, 2009,
inclusive).11
11Unfortunately, the terminology has not been used consistently. For example, the term interval is often
used to denote an anchored duration. For consistency, we will use the SQL terminology.
943
Enhanced Data Models for Advanced Applications
Event Information versus Duration (or State) Information. A temporal data-
base will store information concerning when certain events occur, or when certain
facts are considered to be true. There are several different types of temporal infor-
mation. Point events or facts are typically associated in the database with a single
time point in some granularity. For example, a bank deposit event may be associ-
ated with the timestamp when the deposit was made, or the total monthly sales of a
product (fact) may be associated with a particular month (say, February 2010). Note
that even though such events or facts may have different granularities, each is still
associated with a single time value in the database. This type of information is often
represented as time series data as we will discuss in Section 2.5. Duration events or
facts, on the other hand, are associated with a specific time period in the database.12
For example, an employee may have worked in a company from August 15, 2003
until November 20, 2008.
A time period is represented by its start and end time points [START-TIME, END-
TIME]. For example, the above period is represented as [2003-08-15, 2008-11-20].
Such a time period is often interpreted to mean the set of all time points from start-
time to end-time, inclusive, in the specified granularity. Hence, assuming day gran-
ularity, the period [2003-08-15, 2008-11-20] represents the set of all days from
August 15, 2003, until November 20, 2008, inclusive.13
Valid Time and Transaction Time Dimensions. Given a particular event or fact
that is associated with a particular time point or time period in the database, the
association may be interpreted to mean different things. The most natural interpre-
tation is that the associated time is the time that the event occurred, or the period
during which the fact was considered to be true in the real world. If this interpreta-
tion is used, the associated time is often referred to as the valid time. A temporal
database using this interpretation is called a valid time database.
However, a different interpretation can be used, where the associated time refers to
the time when the information was actually stored in the database; that is, it is the
value of the system time clock when the information is valid in the system.14 In this
case, the associated time is called the transaction time. A temporal database using
this interpretation is called a transaction time database.
Other interpretations can also be intended, but these are considered to be the most
common ones, and they are referred to as time dimensions. In some applications,
only one of the dimensions is needed and in other cases both time dimensions are
required, in which case the temporal database is called a bitemporal database. If
12This is the same as an anchored duration. It has also been frequently called a time interval, but to avoid
confusion we will use period to be consistent with SQL terminology.
13The representation [2003-08-15, 2008-11-20] is called a closed interval representation. One can also
use an open interval, denoted [2003-08-15, 2008-11-21), where the set of points does not include the
end point. Although the latter representation is sometimes more convenient, we shall use closed intervals
except where indicated.
14The explanation is more involved, as we will see in Section 2.3.
944
Enhanced Data Models for Advanced Applications
(a)
Name
EMP_VT
Salary DnoSsn Supervisor_ssn Vst Vet
Name Salary Supervisor_ssnSsn Tst Tet
(b)
(c)
Dname
DEPT_VT
EMP_TT
Dname Total_sal Manager_ssnDno
Dno
Tst Tet
DEPT_TT
Total_salDno Manager_ssn Vst Vet
Name Salary Supervisor_ssnSsn Dno Tst Tet
EMP_BT
Dname Total_sal Manager_ssnDno Tst Tet
DEPT_BT
Vst Vet
Vst Vet
Figure 7
Different types of temporal
relational databases. (a) Valid
time database schema. (b)
Transaction time database
schema. (c) Bitemporal data-
base schema.
other interpretations are intended for time, the user can define the semantics and
program the applications appropriately, and it is called a user-defined time.
The next section shows how these concepts can be incorporated into relational
databases, and Section 2.3 shows an approach to incorporate temporal concepts
into object databases.
2.2 Incorporating Time in Relational Databases
Using Tuple Versioning
Valid Time Relations. Let us now see how the different types of temporal data-
bases may be represented in the relational model. First, suppose that we would like
to include the history of changes as they occur in the real world. Consider again the
database in Figure 1, and let us assume that, for this application, the granularity is
day. Then, we could convert the two relations EMPLOYEE and DEPARTMENT into
valid time relations by adding the attributes Vst (Valid Start Time) and Vet (Valid
End Time), whose data type is DATE in order to provide day granularity. This is
shown in Figure 7(a), where the relations have been renamed EMP_VT and
DEPT_VT, respectively.
Consider how the EMP_VT relation differs from the nontemporal EMPLOYEE rela-
tion (Figure 1).15 In EMP_VT, each tuple V represents a version of an employee’s
15A nontemporal relation is also called a snapshot relation because it shows only the current snapshot
or current state of the database.
945
Enhanced Data Models for Advanced Applications
Name
Smith 123456789 25000 5 333445555 2002-06-15 2003-05-31
Smith 123456789 30000 5 333445555 2003-06-01 Now
333445555 25000 4 999887777 1999-08-20 2001-01-31
333445555 30000 5 999887777 2001-02-01 2002-03-31
333445555 40000 5 888665555 2002-04-01 Now
222447777 28000 4 999887777 2001-05-01 2002-08-10
666884444 38000 5 333445555 2003-08-01 Now
Wong
Wong
Wong
Brown
Narayan
. . .
. . .
EMP_VT
Ssn Salary Dno Supervisor_ssn Vst Vet
Dname
Research
Research
DEPT_VT
5 888665555 2002-03-312001-09-20
333445555 2002-04-015 Now
Dno Manager_ssn Vst Vet
Figure 8
Some tuple versions in the valid time relations EMP_VT and DEPT_VT.
information that is valid (in the real world) only during the time period [V.Vst, V.Vet],
whereas in EMPLOYEE each tuple represents only the current state or current ver-
sion of each employee. In EMP_VT, the current version of each employee typically
has a special value, now, as its valid end time. This special value, now, is a temporal
variable that implicitly represents the current time as time progresses. The nontem-
poral EMPLOYEE relation would only include those tuples from the EMP_VT rela-
tion whose Vet is now.
Figure 8 shows a few tuple versions in the valid-time relations EMP_VT and
DEPT_VT. There are two versions of Smith, three versions of Wong, one version of
Brown, and one version of Narayan. We can now see how a valid time relation
should behave when information is changed. Whenever one or more attributes of
an employee are updated, rather than actually overwriting the old values, as would
happen in a nontemporal relation, the system should create a new version and close
the current version by changing its Vet to the end time. Hence, when the user issued
the command to update the salary of Smith effective on June 1, 2003, to $30000,
the second version of Smith was created (see Figure 8). At the time of this update,
the first version of Smith was the current version, with now as its Vet, but after the
update now was changed to May 31, 2003 (one less than June 1, 2003, in day granu-
larity), to indicate that the version has become a closed or history version and that
the new (second) version of Smith is now the current one.
946
Enhanced Data Models for Advanced Applications
It is important to note that in a valid time relation, the user must generally provide
the valid time of an update. For example, the salary update of Smith may have been
entered in the database on May 15, 2003, at 8:52:12 A.M., say, even though the salary
change in the real world is effective on June 1, 2003. This is called a proactive
update, since it is applied to the database before it becomes effective in the real
world. If the update is applied to the database after it becomes effective in the real
world, it is called a retroactive update. An update that is applied at the same time as
it becomes effective is called a simultaneous update.
The action that corresponds to deleting an employee in a nontemporal database
would typically be applied to a valid time database by closing the current version of
the employee being deleted. For example, if Smith leaves the company effective
January 19, 2004, then this would be applied by changing Vet of the current version
of Smith from now to 2004-01-19. In Figure 8, there is no current version for
Brown, because he presumably left the company on 2002-08-10 and was logically
deleted. However, because the database is temporal, the old information on Brown is
still there.
The operation to insert a new employee would correspond to creating the first tuple
version for that employee, and making it the current version, with the Vst being the
effective (real world) time when the employee starts work. In Figure 7, the tuple on
Narayan illustrates this, since the first version has not been updated yet.
Notice that in a valid time relation, the nontemporal key, such as Ssn in EMPLOYEE,
is no longer unique in each tuple (version). The new relation key for EMP_VT is a
combination of the nontemporal key and the valid start time attribute Vst,16 so we
use (Ssn, Vst) as primary key. This is because, at any point in time, there should be at
most one valid version of each entity. Hence, the constraint that any two tuple ver-
sions representing the same entity should have nonintersecting valid time periods
should hold on valid time relations. Notice that if the nontemporal primary key
value may change over time, it is important to have a unique surrogate key attrib-
ute, whose value never changes for each real-world entity, in order to relate all ver-
sions of the same real-world entity.
Valid time relations basically keep track of the history of changes as they become
effective in the real world. Hence, if all real-world changes are applied, the database
keeps a history of the real-world states that are represented. However, because
updates, insertions, and deletions may be applied retroactively or proactively, there is
no record of the actual database state at any point in time. If the actual database states
are important to an application, then one should use transaction time relations.
Transaction Time Relations. In a transaction time database, whenever a change
is applied to the database, the actual timestamp of the transaction that applied the
change (insert, delete, or update) is recorded. Such a database is most useful when
changes are applied simultaneously in the majority of cases—for example, real-time
stock trading or banking transactions. If we convert the nontemporal database in
16A combination of the nontemporal key and the valid end time attribute Vet could also be used.
947
Enhanced Data Models for Advanced Applications
Figure 1 into a transaction time database, then the two relations EMPLOYEE and
DEPARTMENT are converted into transaction time relations by adding the attrib-
utes Tst (Transaction Start Time) and Tet (Transaction End Time), whose data type
is typically TIMESTAMP. This is shown in Figure 7(b), where the relations have been
renamed EMP_TT and DEPT_TT, respectively.
In EMP_TT, each tuple V represents a version of an employee’s information that was
created at actual time V.Tst and was (logically) removed at actual time V.Tet
(because the information was no longer correct). In EMP_TT, the current version of
each employee typically has a special value, uc (Until Changed), as its transaction
end time, which indicates that the tuple represents correct information until it is
changed by some other transaction.17 A transaction time database has also been
called a rollback database,18 because a user can logically roll back to the actual
database state at any past point in time T by retrieving all tuple versions V whose
transaction time period [V.Tst, V.Tet] includes time point T.
Bitemporal Relations. Some applications require both valid time and transac-
tion time, leading to bitemporal relations. In our example, Figure 7(c) shows how
the EMPLOYEE and DEPARTMENT nontemporal relations in Figure 1 would appear
as bitemporal relations EMP_BT and DEPT_BT, respectively. Figure 9 shows a few
tuples in these relations. In these tables, tuples whose transaction end time Tet is uc
are the ones representing currently valid information, whereas tuples whose Tet is an
absolute timestamp are tuples that were valid until (just before) that timestamp.
Hence, the tuples with uc in Figure 9 correspond to the valid time tuples in Figure 7.
The transaction start time attribute Tst in each tuple is the timestamp of the trans-
action that created that tuple.
Now consider how an update operation would be implemented on a bitemporal
relation. In this model of bitemporal databases,19 no attributes are physically
changed in any tuple except for the transaction end time attribute Tet with a value of
uc.20 To illustrate how tuples are created, consider the EMP_BT relation. The current
version V of an employee has uc in its Tet attribute and now in its Vet attribute. If
some attribute—say, Salary—is updated, then the transaction T that performs the
update should have two parameters: the new value of Salary and the valid time VT
when the new salary becomes effective (in the real world). Assume that VT− is the
17The uc variable in transaction time relations corresponds to the now variable in valid time relations. The
semantics are slightly different though.
18Here, the term rollback does not have the same meaning as transaction rollback during recovery, where
the transaction updates are physically undone. Rather, here the updates can be logically undone, allowing
the user to examine the database as it appeared at a previous time point.
19There have been many proposed temporal database models. We describe specific models here as
examples to illustrate the concepts.
20Some bitemporal models allow the Vet attribute to be changed also, but the interpretations of the
tuples are different in those models.
948
Enhanced Data Models for Advanced Applications
Name
Smith 123456789 25000 5 333445555 2002-06-15
Smith 123456789 25000 5 333445555 2002-06-15
123456789 30000 5 333445555 2003-06-01
333445555 25000 4 999887777 1999-08-20
333445555 25000 4 999887777 1999-08-20
333445555 30000 5 999887777 2001-02-01
333445555 30000 5
5
4
4
5
999887777
888667777
999887777
999887777
333445555
2001-02-01
2002-04-01
2001-05-01
2001-05-01
2003-08-01
2002-06-08, 13:05:58
2003-06-04, 08:56:12
2003-06-04, 08:56:12
1999-08-20, 11:18:23
2001-01-07, 14:33:02
2001-01-07, 14:33:02
2002-03-28, 09:23:57
2002-03-28, 09:23:57
2001-04-27, 16:22:05
2002-08-12, 10:11:07
2003-07-28, 09:25:37
2003-06-04,08:56:12
uc
uc
2001-01-07,14:33:02
uc
2002-03-28,09:23:57
uc
uc
2002-08-12,10:11:07
uc
uc
Now
2003-05-31
Now
Now
2001-01-31
Now
2002-03-31
Now
Now
2002-08-10
Now
Smith
Wong
Wong
Wong
Wong
Wong 333445555
Brown 222447777
Brown 222447777
Narayan
. . .
40000
28000
28000
38000666884444
EMP_BT
Ssn Salary Dno Supervisor_ssn Vst Vet Tst Tet
Dname
Research
Research
DEPT_VT
5 888665555 Now2001-09-20
888665555 2001-09-205 1997-03-31
Dno Manager_ssn Vst Vet
2001-09-15,14:52:12
2002-03-28,09:23:57
Tst
2001-03-28,09:23:57
uc
Research 333445555 2002-04-015 Now 2002-03-28,09:23:57 uc
Tet
Figure 9
Some tuple versions in the bitemporal relations EMP_BT and DEPT_BT.
time point before VT in the given valid time granularity and that transaction T has a
timestamp TS(T). Then, the following physical changes would be applied to the
EMP_BT table:
1. Make a copy V2 of the current version V; set V2.Vet to VT−, V2.Tst to TS(T),
V2.Tet to uc, and insert V2 in EMP_BT; V2 is a copy of the previous current
version V after it is closed at valid time VT−.
2. Make a copy V3 of the current version V; set V3.Vst to VT, V3.Vet to now,
V3.Salary to the new salary value, V3.Tst to TS(T), V3.Tet to uc, and insert V3 in
EMP_BT; V3 represents the new current version.
3. Set V.Tet to TS(T) since the current version is no longer representing correct
information.
As an illustration, consider the first three tuples V1, V2, and V3 in EMP_BT in Figure
9. Before the update of Smith’s salary from 25000 to 30000, only V1 was in EMP_BT
and it was the current version and its Tet was uc. Then, a transaction T whose time-
stamp TS(T) is ‘2003-06-04,08:56:12’ updates the salary to 30000 with the effective
valid time of ‘2003-06-01’. The tuple V2 is created, which is a copy of V1 except that
its Vet is set to ‘2003-05-31’, one day less than the new valid time and
its Tst is the timestamp of the updating transaction. The tuple V3 is also created,
which has the new salary, its Vst is set to ‘2003-06-01’, and its Tst is also the time-
stamp of the updating transaction. Finally, the Tet of V1 is set to the timestamp of
949
Enhanced Data Models for Advanced Applications
the updating transaction, ‘2003-06-04,08:56:12’. Note that this is a retroactive
update, since the updating transaction ran on June 4, 2003, but the salary change is
effective on June 1, 2003.
Similarly, when Wong’s salary and department are updated (at the same time) to
30000 and 5, the updating transaction’s timestamp is ‘2001-01-07,14:33:02’ and the
effective valid time for the update is ‘2001-02-01’. Hence, this is a proactive update
because the transaction ran on January 7, 2001, but the effective date was February
1, 2001. In this case, tuple V4 is logically replaced by V5 and V6.
Next, let us illustrate how a delete operation would be implemented on a bitempo-
ral relation by considering the tuples V9 and V10 in the EMP_BT relation of Figure 9.
Here, employee Brown left the company effective August 10, 2002, and the logical
delete is carried out by a transaction T with TS(T) = 2002-08-12,10:11:07. Before
this, V9 was the current version of Brown, and its Tet was uc. The logical delete is
implemented by setting V9.Tet to 2002-08-12,10:11:07 to invalidate it, and creating
the final version V10 for Brown, with its Vet = 2002-08-10 (see Figure 9). Finally, an
insert operation is implemented by creating the first version as illustrated by V11 in
the EMP_BT table.
Implementation Considerations. There are various options for storing the
tuples in a temporal relation. One is to store all the tuples in the same table, as
shown in Figures 8 and 9. Another option is to create two tables: one for the cur-
rently valid information and the other for the rest of the tuples. For example, in the
bitemporal EMP_BT relation, tuples with uc for their Tet and now for their Vet would
be in one relation, the current table, since they are the ones currently valid (that is,
represent the current snapshot), and all other tuples would be in another relation.
This allows the database administrator to have different access paths, such as
indexes for each relation, and keeps the size of the current table reasonable. Another
possibility is to create a third table for corrected tuples whose Tet is not uc.
Another option that is available is to vertically partition the attributes of the tempo-
ral relation into separate relations so that if a relation has many attributes, a whole
new tuple version is created whenever any one of the attributes is updated. If the
attributes are updated asynchronously, each new version may differ in only one of
the attributes, thus needlessly repeating the other attribute values. If a separate rela-
tion is created to contain only the attributes that always change synchronously, with
the primary key replicated in each relation, the database is said to be in temporal
normal form. However, to combine the information, a variation of join known
as temporal intersection join would be needed, which is generally expensive to
implement.
It is important to note that bitemporal databases allow a complete record of
changes. Even a record of corrections is possible. For example, it is possible that two
tuple versions of the same employee may have the same valid time but different
attribute values as long as their transaction times are disjoint. In this case, the tuple
with the later transaction time is a correction of the other tuple version. Even incor-
rectly entered valid times may be corrected this way. The incorrect state of the data-
950
Enhanced Data Models for Advanced Applications
base will still be available as a previous database state for querying purposes. A data-
base that keeps such a complete record of changes and corrections is sometimes
called an append-only database.
2.3 Incorporating Time in Object-Oriented Databases
Using Attribute Versioning
The previous section discussed the tuple versioning approach to implementing
temporal databases. In this approach, whenever one attribute value is changed, a
whole new tuple version is created, even though all the other attribute values will
be identical to the previous tuple version. An alternative approach can be used in
database systems that support complex structured objects, such as object data-
bases or object-relational systems. This approach is called attribute versioning.
In attribute versioning, a single complex object is used to store all the temporal
changes of the object. Each attribute that changes over time is called a time-varying
attribute, and it has its values versioned over time by adding temporal periods to
the attribute. The temporal periods may represent valid time, transaction time, or
bitemporal, depending on the application requirements. Attributes that do not
change over time are called nontime-varying and are not associated with the tem-
poral periods. To illustrate this, consider the example in Figure 10, which is an
attribute-versioned valid time representation of EMPLOYEE using the object defini-
tion language (ODL) notation for object databases. Here, we assumed that name
and Social Security number are nontime-varying attributes, whereas salary, depart-
ment, and supervisor are time-varying attributes (they may change over time). Each
time-varying attribute is represented as a list of tuples , ordered by valid start time.
Whenever an attribute is changed in this model, the current attribute version is
closed and a new attribute version for this attribute only is appended to the list.
This allows attributes to change asynchronously. The current value for each attrib-
ute has now for its Valid_end_time. When using attribute versioning, it is useful to
include a lifespan temporal attribute associated with the whole object whose value
is one or more valid time periods that indicate the valid time of existence for the
whole object. Logical deletion of the object is implemented by closing the lifespan.
The constraint that any time period of an attribute within an object should be a
subset of the object’s lifespan should be enforced.
For bitemporal databases, each attribute version would have a tuple with five com-
ponents:
The object lifespan would also include both valid and transaction time dimensions.
Therefore, the full capabilities of bitemporal databases can be available with attrib-
ute versioning. Mechanisms similar to those discussed earlier for updating tuple
versions can be applied to updating attribute versions.
951
Enhanced Data Models for Advanced Applications
class TEMPORAL_SALARY
{ attribute Date Valid_start_time;
attribute Date Valid_end_time;
attribute float Salary;
};
class TEMPORAL_DEPT
{ attribute Date Valid_start_time;
attribute Date Valid_end_time;
attribute DEPARTMENT_VT Dept;
};
class TEMPORAL_SUPERVISOR
{ attribute Date Valid_start_time;
attribute Date Valid_end_time;
attribute EMPLOYEE_VT Supervisor;
};
class TEMPORAL_LIFESPAN
{ attribute Date Valid_ start time;
attribute Date Valid end time;
};
class EMPLOYEE_VT
( extent EMPLOYEES )
{ attribute list lifespan;
attribute string Name;
attribute string Ssn;
attribute list Sal_history;
attribute list Dept_history;
attribute list Supervisor_history;
};
Figure 10
Possible ODL schema for a temporal valid time EMPLOYEE_VT
object class using attribute versioning.
2.4 Temporal Querying Constructs
and the TSQL2 Language
So far, we have discussed how data models may be extended with temporal con-
structs. Now we give a brief overview of how query operations need to be extended
for temporal querying. We will briefly discuss the TSQL2 language, which extends
SQL for querying valid time, transaction time, and bitemporal relational databases.
In nontemporal relational databases, the typical selection conditions involve attrib-
ute conditions, and tuples that satisfy these conditions are selected from the set of
952
Enhanced Data Models for Advanced Applications
current tuples. Following that, the attributes of interest to the query are specified by
a projection operation. For example, in the query to retrieve the names of all employ-
ees working in department 5 whose salary is greater than 30000, the selection condi-
tion would be as follows:
((Salary > 30000) AND (Dno = 5))
The projected attribute would be Name. In a temporal database, the conditions may
involve time in addition to attributes. A pure time condition involves only time—
for example, to select all employee tuple versions that were valid on a certain time
point T or that were valid during a certain time period [T1, T2]. In this case, the spec-
ified time period is compared with the valid time period of each tuple version [T.Vst,
T.Vet], and only those tuples that satisfy the condition are selected. In these opera-
tions, a period is considered to be equivalent to the set of time points from T1 to T2
inclusive, so the standard set comparison operations can be used. Additional opera-
tions, such as whether one time period ends before another starts are also needed.21
Some of the more common operations used in queries are as follows:
[T.Vst, T.Vet] INCLUDES [T1, T2] Equivalent to T1 ≥ T.Vst AND T2 ≤ T.Vet
[T.Vst, T.Vet] INCLUDED_IN [T1, T2] Equivalent to T1 ≤ T.Vst AND T2 ≥ T.Vet
[T.Vst, T.Vet] OVERLAPS [T1, T2] Equivalent to (T1 ≤ T.Vet AND T2 ≥ T.Vst)
22
[T.Vst, T.Vet] BEFORE [T1, T2] Equivalent to T1 ≥ T.Vet
[T.Vst, T.Vet] AFTER [T1, T2] Equivalent to T2 ≤ T.Vst
[T.Vst, T.Vet] MEETS_BEFORE [T1, T2] Equivalent to T1 = T.Vet + 1
23
[T.Vst, T.Vet] MEETS_AFTER [T1, T2] Equivalent to T2 + 1 = T.Vst
Additionally, operations are needed to manipulate time periods, such as computing
the union or intersection of two time periods. The results of these operations may
not themselves be periods, but rather temporal elements—a collection of one or
more disjoint time periods such that no two time periods in a temporal element are
directly adjacent. That is, for any two time periods [T1, T2] and [T3, T4] in a temporal
element, the following three conditions must hold:
■ [T1, T2] intersection [T3, T4] is empty.
■ T3 is not the time point following T2 in the given granularity.
■ T1 is not the time point following T4 in the given granularity.
The latter conditions are necessary to ensure unique representations of temporal
elements. If two time periods [T1, T2] and [T3, T4] are adjacent, they are combined
21A complete set of operations, known as Allen’s algebra (Allen, 1983), has been defined for compar-
ing time periods.
22This operation returns true if the intersection of the two periods is not empty; it has also been called
INTERSECTS_WITH.
23Here, 1 refers to one time point in the specified granularity. The MEETS operations basically specify if
one period starts immediately after another period ends.
953
Enhanced Data Models for Advanced Applications
into a single time period [T1, T4]. This is called coalescing of time periods.
Coalescing also combines intersecting time periods.
To illustrate how pure time conditions can be used, suppose a user wants to select all
employee versions that were valid at any point during 2002. The appropriate selec-
tion condition applied to the relation in Figure 8 would be
[T.Vst, T.Vet] OVERLAPS [2002-01-01, 2002-12-31]
Typically, most temporal selections are applied to the valid time dimension. For a
bitemporal database, one usually applies the conditions to the currently correct
tuples with uc as their transaction end times. However, if the query needs to be
applied to a previous database state, an AS_OF T clause is appended to the query,
which means that the query is applied to the valid time tuples that were correct in
the database at time T.
In addition to pure time conditions, other selections involve attribute and time
conditions. For example, suppose we wish to retrieve all EMP_VT tuple versions T
for employees who worked in department 5 at any time during 2002. In this case,
the condition is
[T.Vst, T.Vet]OVERLAPS [2002-01-01, 2002-12-31] AND (T.Dno = 5)
Finally, we give a brief overview of the TSQL2 query language, which extends SQL
with constructs for temporal databases. The main idea behind TSQL2 is to allow
users to specify whether a relation is nontemporal (that is, a standard SQL relation)
or temporal. The CREATE TABLE statement is extended with an optional AS clause to
allow users to declare different temporal options. The following options are avail-
able:
■ (valid time relation with valid time
period)
■ (valid time relation with valid time
point)
■ AND TRANSACTION (bitemporal rela-
tion, valid time period)
■ AND TRANSACTION (bitemporal rela-
tion, valid time point)
The keywords STATE and EVENT are used to specify whether a time period or time
point is associated with the valid time dimension. In TSQL2, rather than have the
user actually see how the temporal tables are implemented (as we discussed in the
previous sections), the TSQL2 language adds query language constructs to specify
various types of temporal selections, temporal projections, temporal aggregations,
transformation among granularities, and many other concepts. The book by
Snodgrass et al. (1995) describes the language.
954
Enhanced Data Models for Advanced Applications
2.5 Time Series Data
Time series data is used very often in financial, sales, and economics applications.
They involve data values that are recorded according to a specific predefined
sequence of time points. Therefore, they are a special type of valid event data, where
the event time points are predetermined according to a fixed calendar. Consider the
example of closing daily stock prices of a particular company on the New York Stock
Exchange. The granularity here is day, but the days that the stock market is open are
known (nonholiday weekdays). Hence, it has been common to specify a computa-
tional procedure that calculates the particular calendar associated with a time
series. Typical queries on time series involve temporal aggregation over higher
granularity intervals—for example, finding the average or maximum weekly closing
stock price or the maximum and minimum monthly closing stock price from the
daily information.
As another example, consider the daily sales dollar amount at each store of a chain
of stores owned by a particular company. Again, typical temporal aggregates would
be retrieving the weekly, monthly, or yearly sales from the daily sales information
(using the sum aggregate function), or comparing same store monthly sales with
previous monthly sales, and so on.
Because of the specialized nature of time series data and the lack of support for it in
older DBMSs, it has been common to use specialized time series management sys-
tems rather than general-purpose DBMSs for managing such information. In such
systems, it has been common to store time series values in sequential order in a file,
and apply specialized time series procedures to analyze the information. The prob-
lem with this approach is that the full power of high-level querying in languages
such as SQL will not be available in such systems.
More recently, some commercial DBMS packages are offering time series exten-
sions, such as the Oracle time cartridge and the time series data blade of Informix
Universal Server. In addition, the TSQL2 language provides some support for time
series in the form of event tables.
3 Spatial Database Concepts24
3.1 Introduction to Spatial Databases
Spatial databases incorporate functionality that provides support for databases that
keep track of objects in a multidimensional space. For example, cartographic data-
bases that store maps include two-dimensional spatial descriptions of their
objects—from countries and states to rivers, cities, roads, seas, and so on. The sys-
tems that manage geographic data and related applications are known as
24The contribution of Pranesh Parimala Ranganathan to this section is appreciated.
955
Enhanced Data Models for Advanced Applications
Table 1 Common Types of Analysis for Spatial Data
Analysis Type Type of Operations and Measurements
Measurements Distance, perimeter, shape, adjacency, and direction
Spatial analysis/statistics Pattern, autocorrelation, and indexes of similarity and topology using
spatial and nonspatial data
Flow analysis Connectivity and shortest path
Location analysis Analysis of points and lines within a polygon
Terrain analysis Slope/aspect, catchment area, drainage network
Search Thematic search, search by region
Geographical Information Systems (GIS), and they are used in areas such as envi-
ronmental applications, transportation systems, emergency response systems, and
battle management. Other databases, such as meteorological databases for weather
information, are three-dimensional, since temperatures and other meteorological
information are related to three-dimensional spatial points. In general, a spatial
database stores objects that have spatial characteristics that describe them and that
have spatial relationships among them. The spatial relationships among the objects
are important, and they are often needed when querying the database. Although a
spatial database can in general refer to an n-dimensional space for any n, we will
limit our discussion to two dimensions as an illustration.
A spatial database is optimized to store and query data related to objects in space,
including points, lines and polygons. Satellite images are a prominent example of
spatial data. Queries posed on these spatial data, where predicates for selection deal
with spatial parameters, are called spatial queries. For example, “What are the
names of all bookstores within five miles of the College of Computing building at
Georgia Tech?” is a spatial query. Whereas typical databases process numeric and
character data, additional functionality needs to be added for databases to process
spatial data types. A query such as “List all the customers located within twenty
miles of company headquarters” will require the processing of spatial data types
typically outside the scope of standard relational algebra and may involve consult-
ing an external geographic database that maps the company headquarters and each
customer to a 2-D map based on their address. Effectively, each customer will be
associated to a position. A traditional B+-tree index based on
customers’ zip codes or other nonspatial attributes cannot be used to process this
query since traditional indexes are not capable of ordering multidimensional coor-
dinate data. Therefore, there is a special need for databases tailored for handling
spatial data and spatial queries.
Table 1 shows the common analytical operations involved in processing geographic
or spatial data.25 Measurement operations are used to measure some global prop-
25List of GIS analysis operations as proposed in Albrecht (1996).
956
Enhanced Data Models for Advanced Applications
erties of single objects (such as the area, the relative size of an object’s parts, com-
pactness, or symmetry), and to measure the relative position of different objects in
terms of distance and direction. Spatial analysis operations, which often use statis-
tical techniques, are used to uncover spatial relationships within and among mapped
data layers. An example would be to create a map—known as a prediction map—
that identifies the locations of likely customers for particular products based on the
historical sales and demographic information. Flow analysis operations help in
determining the shortest path between two points and also the connectivity among
nodes or regions in a graph. Location analysis aims to find if the given set of points
and lines lie within a given polygon (location). The process involves generating a
buffer around existing geographic features and then identifying or selecting features
based on whether they fall inside or outside the boundary of the buffer. Digital ter-
rain analysis is used to build three-dimensional models, where the topography of a
geographical location can be represented with an x, y, z data model known as Digital
Terrain (or Elevation) Model (DTM/DEM). The x and y dimensions of a DTM rep-
resent the horizontal plane, and z represents spot heights for the respective x, y coor-
dinates. Such models can be used for analysis of environmental data or during the
design of engineering projects that require terrain information. Spatial search
allows a user to search for objects within a particular spatial region. For example,
thematic search allows us to search for objects related to a particular theme or class,
such as “Find all water bodies within 25 miles of Atlanta” where the class is water.
There are also topological relationships among spatial objects. These are often used
in Boolean predicates to select objects based on their spatial relationships. For
example, if a city boundary is represented as a polygon and freeways are represented
as multilines, a condition such as “Find all freeways that go through Arlington,
Texas” would involve an intersects operation, to determine which freeways (lines)
intersect the city boundary (polygon).
3.2 Spatial Data Types and Models
This section briefly describes the common data types and models for storing spatial
data. Spatial data comes in three basic forms. These forms have become a de facto
standard due to their wide use in commercial systems.
■ Map Data26 includes various geographic or spatial features of objects in a
map, such as an object’s shape and the location of the object within the map.
The three basic types of features are points, lines, and polygons (or areas).
Points are used to represent spatial characteristics of objects whose locations
correspond to a single 2-d coordinate (x, y, or longitude/latitude) in the scale
of a particular application. Depending on the scale, some examples of point
objects could be buildings, cellular towers, or stationary vehicles. Moving
26These types of geographic data are based on ESRI’s guide to GIS. See
www.gis.com/implementing_gis/data/data_types.html
957
Enhanced Data Models for Advanced Applications
vehicles and other moving objects can be represented by a sequence of point
locations that change over time. Lines represent objects having length, such
as roads or rivers, whose spatial characteristics can be approximated by a
sequence of connected lines. Polygons are used to represent spatial charac-
teristics of objects that have a boundary, such as countries, states, lakes, or
cities. Notice that some objects, such as buildings or cities, can be repre-
sented as either points or polygons, depending on the scale of detail.
■ Attribute data is the descriptive data that GIS systems associate with map
features. For example, suppose that a map contains features that represent
counties within a US state (such as Texas or Oregon). Attributes for each
county feature (object) could include population, largest city/town, area in
square miles, and so on. Other attribute data could be included for other fea-
tures in the map, such as states, cities, congressional districts, census tracts,
and so on.
■ Image data includes data such as satellite images and aerial photographs,
which are typically created by cameras. Objects of interest, such as buildings
and roads, can be identified and overlaid on these images. Images can also be
attributes of map features. One can add images to other map features so that
clicking on the feature would display the image. Aerial and satellite images
are typical examples of raster data.
Models of spatial information are sometimes grouped into two broad categories:
field and object. A spatial application (such as remote sensing or highway traffic con-
trol) is modeled using either a field- or an object-based model, depending on the
requirements and the traditional choice of model for the application. Field models
are often used to model spatial data that is continuous in nature, such as terrain ele-
vation, temperature data, and soil variation characteristics, whereas object models
have traditionally been used for applications such as transportation networks, land
parcels, buildings, and other objects that possess both spatial and non-spatial attrib-
utes.
3.3 Spatial Operators
Spatial operators are used to capture all the relevant geometric properties of objects
embedded in the physical space and the relations between them, as well as to
perform spatial analysis. Operators are classified into three broad categories.
■ Topological operators. Topological properties are invariant when topologi-
cal transformations are applied. These properties do not change after trans-
formations like rotation, translation, or scaling. Topological operators are
hierarchically structured in several levels, where the base level offers opera-
tors the ability to check for detailed topological relations between regions
with a broad boundary, and the higher levels offer more abstract operators
that allow users to query uncertain spatial data independent of the underly-
ing geometric data model. Examples include open (region), close (region),
and inside (point, loop).
958
Enhanced Data Models for Advanced Applications
■ Projective operators. Projective operators, such as convex hull, are used to
express predicates about the concavity/convexity of objects as well as other
spatial relations (for example, being inside the concavity of a given object).
■ Metric operators. Metric operators provide a more specific description of
the object’s geometry. They are used to measure some global properties of
single objects (such as the area, relative size of an object’s parts, compactness,
and symmetry), and to measure the relative position of different objects in
terms of distance and direction. Examples include length (arc) and distance
(point, point).
Dynamic Spatial Operators. The operations performed by the operators men-
tioned above are static, in the sense that the operands are not affected by the appli-
cation of the operation. For example, calculating the length of the curve has no
effect on the curve itself. Dynamic operations alter the objects upon which the
operations act. The three fundamental dynamic operations are create, destroy, and
update. A representative example of dynamic operations would be updating a spa-
tial object that can be subdivided into translate (shift position), rotate (change ori-
entation), scale up or down, reflect (produce a mirror image), and shear (deform).
Spatial Queries. Spatial queries are requests for spatial data that require the use
of spatial operations. The following categories illustrate three typical types of spatial
queries:
■ Range query. Finds the objects of a particular type that are within a given
spatial area or within a particular distance from a given location. (For exam-
ple, find all hospitals within the Metropolitan Atlanta city area, or find all
ambulances within five miles of an accident location.)
■ Nearest neighbor query. Finds an object of a particular type that is closest to
a given location. (For example, find the police car that is closest to the loca-
tion of crime.)
■ Spatial joins or overlays. Typically joins the objects of two types based on
some spatial condition, such as the objects intersecting or overlapping spa-
tially or being within a certain distance of one another. (For example, find all
townships located on a major highway between two cities or find all homes
that are within two miles of a lake.)
3.4 Spatial Data Indexing
A spatial index is used to organize objects into a set of buckets (which correspond
to pages of secondary memory), so that objects in a particular spatial region can be
easily located. Each bucket has a bucket region, a part of space containing all objects
stored in the bucket. The bucket regions are usually rectangles; for point data struc-
tures, these regions are disjoint and they partition the space so that each point
belongs to precisely one bucket. There are essentially two ways of providing a spatial
index.
959
Enhanced Data Models for Advanced Applications
1. Specialized indexing structures that allow efficient search for data objects
based on spatial search operations are included in the database system. These
indexing structures would play a similar role to that performed by B+-tree
indexes in traditional database systems. Examples of these indexing struc-
tures are grid files and R-trees. Special types of spatial indexes, known as
spatial join indexes, can be used to speed up spatial join operations.
2. Instead of creating brand new indexing structures, the two-dimensional
(2-d) spatial data is converted to single-dimensional (1-d) data, so that tra-
ditional indexing techniques (B+-tree) can be used. The algorithms
for converting from 2-d to 1-d are known as space filling curves. We will
not discuss these methods in detail (see the Selected Bibliography for further
references).
We give an overview of some of the spatial indexing techniques next.
Grid Files. Grid files are used for indexing of data on multiple attributes. They can
also be used for indexing 2-dimensional and higher n-dimensional spatial data. The
fixed-grid method divides an n-dimensional hyperspace into equal size buckets.
The data structure that implements the fixed grid is an n-dimensional array. The
objects whose spatial locations lie within a cell (totally or partially) can be stored in
a dynamic structure to handle overflows. This structure is useful for uniformly dis-
tributed data like satellite imagery. However, the fixed-grid structure is rigid, and its
directory can be sparse and large.
R-Trees. The R-tree is a height-balanced tree, which is an extension of the B+-tree
for k-dimensions, where k > 1. For two dimensions (2-d), spatial objects are approx-
imated in the R-tree by their minimum bounding rectangle (MBR), which is the
smallest rectangle, with sides parallel to the coordinate system (x and y) axis, that
contains the object. R-trees are characterized by the following properties, which are
similar to the properties for B+-trees but are adapted to 2-d spatial objects. We use
M to indicate the maximum number of entries that can fit in an R-tree node.
1. The structure of each index entry (or index record) in a leaf node is (I,
object-identifier), where I is the MBR for the spatial object whose identifier is
object-identifier.
2. Every node except the root node must be at least half full. Thus, a leaf node
that is not the root should contain m entries (I, object-identifier) where M/2
<= m <= M. Similarly, a non-leaf node that is not the root should contain m
entries (I, child-pointer) where M/2 <= m <= M, and I is the MBR that con-
tains the union of all the rectangles in the node pointed at by child-pointer.
3. All leaf nodes are at the same level, and the root node should have at least
two pointers unless it is a leaf node.
4. All MBRs have their sides parallel to the axes of the global coordinate system.
960
Enhanced Data Models for Advanced Applications
Other spatial storage structures include quadtrees and their variations. Quadtrees
generally divide each space or subspace into equally sized areas, and proceed with
the subdivisions of each subspace to identify the positions of various objects.
Recently, many newer spatial access structures have been proposed, and this area
remains an active research area.
Spatial Join Index. A spatial join index precomputes a spatial join operation and
stores the pointers to the related object in an index structure. Join indexes improve
the performance of recurring join queries over tables that have low update rates.
Spatial join conditions are used to answer queries such as “Create a list of highway-
river combinations that cross.” The spatial join is used to identify and retrieve these
pairs of objects that satisfy the cross spatial relationship. Because computing the
results of spatial relationships is generally time consuming, the result can be com-
puted once and stored in a table that has the pairs of object identifiers (or tuple ids)
that satisfy the spatial relationship, which is essentially the join index.
A join index can be described by a bipartite graph G = (V1,V2,E), where V1 con-
tains the tuple ids of relation R, and V2 contains the tuple ids of relation S. Edge set
contains an edge (vr,vs) for vr in R and vs in S, if there is a tuple corresponding to
(vr,vs) in the join index. The bipartite graph models all of the related tuples as con-
nected vertices in the graphs. Spatial join indexes are used in operations (see Section
3.3) that involve computation of relationships among spatial objects.
3.5 Spatial Data Mining
Spatial data tends to be highly correlated. For example, people with similar charac-
teristics, occupations, and backgrounds tend to cluster together in the same neigh-
borhoods.
The three major spatial data mining techniques are spatial classification, spatial
association, and spatial clustering.
■ Spatial classification. The goal of classification is to estimate the value of an
attribute of a relation based on the value of the relation’s other attributes. An
example of the spatial classification problem is determining the locations of
nests in a wetland based on the value of other attributes (for example, vege-
tation durability and water depth); it is also called the location prediction
problem. Similarly, where to expect hotspots in crime activity is also a loca-
tion prediction problem.
■ Spatial association. Spatial association rules are defined in terms of spatial
predicates rather than items. A spatial association rule is of the form
P1 ^ P2 ^ ... ^ Pn ⇒ Q1 ^ Q2 ^ ... ^ Qm,
where at least one of the Pi’s or Q j’s is a spatial predicate. For example, the
rule
is_a(x, country) ^ touches(x, Mediterranean) ⇒ is_a (x, wine-exporter)
961
Enhanced Data Models for Advanced Applications
(that is, a country that is adjacent to the Mediterranean Sea is typically a
wine exporter) is an example of an association rule, which will have a certain
support s and confidence c.27
Spatial colocation rules attempt to generalize association rules to point to collec-
tion data sets that are indexed by space. There are several crucial differences between
spatial and nonspatial associations including:
1. The notion of a transaction is absent in spatial situations, since data is
embedded in continuous space. Partitioning space into transactions would
lead to an overestimate or an underestimate of interest measures, for exam-
ple, support or confidence.
2. Size of item sets in spatial databases is small, that is, there are many fewer
items in the item set in a spatial situation than in a nonspatial situation.
In most instances, spatial items are a discrete version of continuous variables. For
example, in the United States income regions may be defined as regions where the
mean yearly income is within certain ranges, such as, below $40,000, from $40,000
to $100,000, and above $100,000.
■ Spatial Clustering attempts to group database objects so that the most sim-
ilar objects are in the same cluster, and objects in different clusters are as dis-
similar as possible. One application of spatial clustering is to group together
seismic events in order to determine earthquake faults. An example of a spa-
tial clustering algorithm is density-based clustering, which tries to find
clusters based on the density of data points in a region. These algorithms
treat clusters as dense regions of objects in the data space. Two variations of
these algorithms are density-based spatial clustering of applications with
noise (DBSCAN)28 and density-based clustering (DENCLUE).29 DBSCAN
is a density-based clustering algorithm because it finds a number of clusters
starting from the estimated density distribution of corresponding nodes.
3.6 Applications of Spatial Data
Spatial data management is useful in many disciplines, including geography, remote
sensing, urban planning, and natural resource management. Spatial database man-
agement is playing an important role in the solution of challenging scientific prob-
lems such as global climate change and genomics. Due to the spatial nature of
genome data, GIS and spatial database management systems have a large role to play
in the area of bioinformatics. Some of the typical applications include pattern
recognition (for example, to check if the topology of a particular gene in the
genome is found in any other sequence feature map in the database), genome
27Concepts of support and confidence for association rules are often discussed as part of data mining.
28DBSCAN was proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu (1996).
29DENCLUE was proposed by Hinnenberg and Gabriel (2007).
962
Enhanced Data Models for Advanced Applications
browser development, and visualization maps. Another important application area
of spatial data mining is the spatial outlier detection. A spatial outlier is a spatially
referenced object whose nonspatial attribute values are significantly different from
those of other spatially referenced objects in its spatial neighborhood. For example,
if a neighborhood of older houses has just one brand-new house, that house would
be an outlier based on the nonspatial attribute ‘house_age’. Detecting spatial outliers
is useful in many applications of geographic information systems and spatial data-
bases. These application domains include transportation, ecology, public safety,
public health, climatology, and location-based services.
4 Multimedia Database Concepts
Multimedia databases provide features that allow users to store and query different
types of multimedia information, which includes images (such as photos or draw-
ings), video clips (such as movies, newsreels, or home videos), audio clips (such as
songs, phone messages, or speeches), and documents (such as books or articles). The
main types of database queries that are needed involve locating multimedia sources
that contain certain objects of interest. For example, one may want to locate all
video clips in a video database that include a certain person, say Michael Jackson.
One may also want to retrieve video clips based on certain activities included in
them, such as video clips where a soccer goal is scored by a certain player or team.
The above types of queries are referred to as content-based retrieval, because the
multimedia source is being retrieved based on its containing certain objects or
activities. Hence, a multimedia database must use some model to organize and
index the multimedia sources based on their contents. Identifying the contents of
multimedia sources is a difficult and time-consuming task. There are two main
approaches. The first is based on automatic analysis of the multimedia sources to
identify certain mathematical characteristics of their contents. This approach uses
different techniques depending on the type of multimedia source (image, video,
audio, or text). The second approach depends on manual identification of the
objects and activities of interest in each multimedia source and on using this infor-
mation to index the sources. This approach can be applied to all multimedia
sources, but it requires a manual preprocessing phase where a person has to scan
each multimedia source to identify and catalog the objects and activities it contains
so that they can be used to index the sources.
In the first part of this section, we will briefly discuss some of the characteristics of
each type of multimedia source—images, video, audio, and text/documents. Then
we will discuss approaches for automatic analysis of images followed by the prob-
lem of object recognition in images. We end this section with some remarks on ana-
lyzing audio sources.
An image is typically stored either in raw form as a set of pixel or cell values, or in
compressed form to save space. The image shape descriptor describes the geometric
shape of the raw image, which is typically a rectangle of cells of a certain width and
height. Hence, each image can be represented by an m by n grid of cells. Each cell
963
Enhanced Data Models for Advanced Applications
contains a pixel value that describes the cell content. In black-and-white images,
pixels can be one bit. In gray scale or color images, a pixel is multiple bits. Because
images may require large amounts of space, they are often stored in compressed
form. Compression standards, such as GIF, JPEG, or MPEG, use various mathemat-
ical transformations to reduce the number of cells stored but still maintain the main
image characteristics. Applicable mathematical transforms include Discrete Fourier
Transform (DFT), Discrete Cosine Transform (DCT), and wavelet transforms.
To identify objects of interest in an image, the image is typically divided into homo-
geneous segments using a homogeneity predicate. For example, in a color image, adja-
cent cells that have similar pixel values are grouped into a segment. The homogeneity
predicate defines conditions for automatically grouping those cells. Segmentation
and compression can hence identify the main characteristics of an image.
A typical image database query would be to find images in the database that are
similar to a given image. The given image could be an isolated segment that con-
tains, say, a pattern of interest, and the query is to locate other images that contain
that same pattern. There are two main techniques for this type of search. The first
approach uses a distance function to compare the given image with the stored
images and their segments. If the distance value returned is small, the probability of
a match is high. Indexes can be created to group stored images that are close in the
distance metric so as to limit the search space. The second approach, called the
transformation approach, measures image similarity by having a small number of
transformations that can change one image’s cells to match the other image.
Transformations include rotations, translations, and scaling. Although the transfor-
mation approach is more general, it is also more time-consuming and difficult.
A video source is typically represented as a sequence of frames, where each frame is
a still image. However, rather than identifying the objects and activities in every
individual frame, the video is divided into video segments, where each segment
comprises a sequence of contiguous frames that includes the same objects/activities.
Each segment is identified by its starting and ending frames. The objects and activi-
ties identified in each video segment can be used to index the segments. An index-
ing technique called frame segment trees has been proposed for video indexing. The
index includes both objects, such as persons, houses, and cars, as well as activities,
such as a person delivering a speech or two people talking. Videos are also often
compressed using standards such as MPEG.
Audio sources include stored recorded messages, such as speeches, class presenta-
tions, or even surveillance recordings of phone messages or conversations by law
enforcement. Here, discrete transforms can be used to identify the main character-
istics of a certain person’s voice in order to have similarity-based indexing and
retrieval. We will briefly comment on their analysis in Section 4.4.
A text/document source is basically the full text of some article, book, or magazine.
These sources are typically indexed by identifying the keywords that appear in the
text and their relative frequencies. However, filler words or common words called
stopwords are eliminated from the process. Because there can be many keywords
964
Enhanced Data Models for Advanced Applications
when attempting to index a collection of documents, techniques have been devel-
oped to reduce the number of keywords to those that are most relevant to the col-
lection. A dimensionality reduction technique called singular value decompositions
(SVD), which is based on matrix transformations, can be used for this purpose. An
indexing technique called telescoping vector trees (TV-trees), can then be used to
group similar documents.
4.1 Automatic Analysis of Images
Analysis of multimedia sources is critical to support any type of query or search
interface. We need to represent multimedia source data such as images in terms of
features that would enable us to define similarity. The work done so far in this area
uses low-level visual features such as color, texture, and shape, which are directly
related to the perceptual aspects of image content. These features are easy to extract
and represent, and it is convenient to design similarity measures based on their sta-
tistical properties.
Color is one of the most widely used visual features in content-based image
retrieval since it does not depend upon image size or orientation. Retrieval based on
color similarity is mainly done by computing a color histogram for each image that
identifies the proportion of pixels within an image for the three color channels (red,
green, blue—RGB). However, RGB representation is affected by the orientation of
the object with respect to illumination and camera direction. Therefore, current
image retrieval techniques compute color histograms using competing invariant
representations such as HSV (hue, saturation, value). HSV describes colors as
points in a cylinder whose central axis ranges from black at the bottom to white at
the top with neutral colors between them. The angle around the axis corresponds to
the hue, the distance from the axis corresponds to the saturation, and the distance
along the axis corresponds to the value (brightness).
Texture refers to the patterns in an image that present the properties of homogene-
ity that do not result from the presence of a single color or intensity value.
Examples of texture classes are rough and silky. Examples of textures that can be
identified include pressed calf leather, straw matting, cotton canvas, and so on. Just
as pictures are represented by arrays of pixels (picture elements), textures are repre-
sented by arrays of texels (texture elements). These textures are then placed into a
number of sets, depending on how many textures are identified in the image. These
sets not only contain the texture definition but also indicate where in the image the
texture is located. Texture identification is primarily done by modeling it as a two-
dimensional, gray-level variation. The relative brightness of pairs of pixels is com-
puted to estimate the degree of contrast, regularity, coarseness, and directionality.
Shape refers to the shape of a region within an image. It is generally determined by
applying segmentation or edge detection to an image. Segmentation is a region-
based approach that uses an entire region (sets of pixels), whereas edge detection is
a boundary-based approach that uses only the outer boundary characteristics of
entities. Shape representation is typically required to be invariant to translation,
965
Enhanced Data Models for Advanced Applications
rotation, and scaling. Some well-known methods for shape representation include
Fourier descriptors and moment invariants.
4.2 Object Recognition in Images
Object recognition is the task of identifying real-world objects in an image or a
video sequence. The system must be able to identify the object even when the
images of the object vary in viewpoints, size, scale, or even when they are rotated or
translated. Some approaches have been developed to divide the original image into
regions based on similarity of contiguous pixels. Thus, in a given image showing a
tiger in the jungle, a tiger subimage may be detected against the background of the
jungle, and when compared with a set of training images, it may be tagged as a tiger.
The representation of the multimedia object in an object model is extremely impor-
tant. One approach is to divide the image into homogeneous segments using a
homogeneous predicate. For example, in a colored image, adjacent cells that have
similar pixel values are grouped into a segment. The homogeneity predicate defines
conditions for automatically grouping those cells. Segmentation and compression
can hence identify the main characteristics of an image. Another approach finds
measurements of the object that are invariant to transformations. It is impossible to
keep a database of examples of all the different transformations of an image. To deal
with this, object recognition approaches find interesting points (or features) in an
image that are invariant to transformations.
An important contribution to this field was made by Lowe,30 who used scale-
invariant features from images to perform reliable object recognition. This
approach is called scale-invariant feature transform (SIFT). The SIFT features are
invariant to image scaling and rotation, and partially invariant to change in illumi-
nation and 3D camera viewpoint. They are well localized in both the spatial and
frequency domains, reducing the probability of disruption by occlusion, clutter, or
noise. In addition, the features are highly distinctive, which allows a single feature
to be correctly matched with high probability against a large database of features,
providing a basis for object and scene recognition.
For image matching and recognition, SIFT features (also known as keypoint
features) are first extracted from a set of reference images and stored in a database.
Object recognition is then performed by comparing each feature from the new
image with the features stored in the database and finding candidate matching fea-
tures based on the Euclidean distance of their feature vectors. Since the keypoint
features are highly distinctive, a single feature can be correctly matched with good
probability in a large database of features.
In addition to SIFT, there are a number of competing methods available for object
recognition under clutter or partial occlusion. For example, RIFT, a rotation invari-
ant generalization of SIFT, identifies groups of local affine regions (image features
30See Lowe (2004), “Distinctive Image Features from Scale-Invariant Keypoints.”
966
Enhanced Data Models for Advanced Applications
having a characteristic appearance and elliptical shape) that remain approximately
affinely rigid across a range of views of an object, and across multiple instances of
the same object class.
4.3 Semantic Tagging of Images
The notion of implicit tagging is an important one for image recognition and com-
parison. Multiple tags may attach to an image or a subimage: for instance, in the
example we referred to above, tags such as “tiger,” “jungle,” “green,” and “stripes”
may be associated with that image. Most image search techniques retrieve images
based on user-supplied tags that are often not very accurate or comprehensive. To
improve search quality, a number of recent systems aim at automated generation of
these image tags. In case of multimedia data, most of its semantics is present in its
content. These systems use image-processing and statistical-modeling techniques to
analyze image content to generate accurate annotation tags that can then be used to
retrieve images by content. Since different annotation schemes will use different
vocabularies to annotate images, the quality of image retrieval will be poor. To solve
this problem, recent research techniques have proposed the use of concept hierar-
chies, taxonomies, or ontologies using OWL (Web Ontology Language), in which
terms and their relationships are clearly defined. These can be used to infer higher-
level concepts based on tags. Concepts like “sky” and “grass” may be further divided
into “clear sky” and “cloudy sky” or “dry grass” and “green grass” in such a taxon-
omy. These approaches generally come under semantic tagging and can be used in
conjunction with the above feature-analysis and object-identification strategies.
4.4 Analysis of Audio Data Sources
Audio sources are broadly classified into speech, music, and other audio data. Each
of these are significantly different from the other, hence different types of audio data
are treated differently. Audio data must be digitized before it can be processed and
stored. Indexing and retrieval of audio data is arguably the toughest among all types
of media, because like video, it is continuous in time and does not have easily mea-
surable characteristics such as text. Clarity of sound recordings is easy to perceive
humanly but is hard to quantify for machine learning. Interestingly, speech data
often uses speech recognition techniques to aid the actual audio content, as this can
make indexing this data a lot easier and more accurate. This is sometimes referred to
as text-based indexing of audio data. The speech metadata is typically content
dependent, in that the metadata is generated from the audio content, for example,
the length of the speech, the number of speakers, and so on. However, some of the
metadata might be independent of the actual content, such as the length of the
speech and the format in which the data is stored. Music indexing, on the other
hand, is done based on the statistical analysis of the audio signal, also known as
content-based indexing. Content-based indexing often makes use of the key features
of sound: intensity, pitch, timbre, and rhythm. It is possible to compare different
pieces of audio data and retrieve information from them based on the calculation of
certain features, as well as application of certain transforms.
967
Enhanced Data Models for Advanced Applications
5 Introduction to Deductive Databases
5.1 Overview of Deductive Databases
In a deductive database system we typically specify rules through a declarative lan-
guage—a language in which we specify what to achieve rather than how to achieve
it. An inference engine (or deduction mechanism) within the system can deduce
new facts from the database by interpreting these rules. The model used for deduc-
tive databases is closely related to the relational data model, and particularly to the
domain relational calculus formalism. It is also related to the field of logic pro-
gramming and the Prolog language. The deductive database work based on logic
has used Prolog as a starting point. A variation of Prolog called Datalog is used to
define rules declaratively in conjunction with an existing set of relations, which are
themselves treated as literals in the language. Although the language structure of
Datalog resembles that of Prolog, its operational semantics—that is, how a Datalog
program is executed—is still different.
A deductive database uses two main types of specifications: facts and rules. Facts are
specified in a manner similar to the way relations are specified, except that it is not
necessary to include the attribute names. Recall that a tuple in a relation describes
some real-world fact whose meaning is partly determined by the attribute names. In
a deductive database, the meaning of an attribute value in a tuple is determined
solely by its position within the tuple. Rules are somewhat similar to relational
views. They specify virtual relations that are not actually stored but that can be
formed from the facts by applying inference mechanisms based on the rule specifi-
cations. The main difference between rules and views is that rules may involve
recursion and hence may yield virtual relations that cannot be defined in terms of
basic relational views.
The evaluation of Prolog programs is based on a technique called backward chain-
ing, which involves a top-down evaluation of goals. In the deductive databases that
use Datalog, attention has been devoted to handling large volumes of data stored in
a relational database. Hence, evaluation techniques have been devised that resemble
those for a bottom-up evaluation. Prolog suffers from the limitation that the order
of specification of facts and rules is significant in evaluation; moreover, the order of
literals (defined in Section 5.3) within a rule is significant. The execution techniques
for Datalog programs attempt to circumvent these problems.
5.2 Prolog/Datalog Notation
The notation used in Prolog/Datalog is based on providing predicates with unique
names. A predicate has an implicit meaning, which is suggested by the predicate
name, and a fixed number of arguments. If the arguments are all constant values,
the predicate simply states that a certain fact is true. If, on the other hand, the pred-
icate has variables as arguments, it is either considered as a query or as part of a rule
or constraint. In our discussion, we adopt the Prolog convention that all constant
968
Enhanced Data Models for Advanced Applications
Facts
SUPERVISE(franklin, john).
SUPERVISE(franklin, ramesh).
SUPERVISE(franklin, joyce).
SUPERVISE(jennifer, alicia).
SUPERVISE(jennifer, ahmad).
SUPERVISE(james, franklin).
SUPERVISE(james, jennifer).
. . .
Rules
SUPERIOR(X, Y ) :– SUPERVISE(X, Y ).
SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ).
SUBORDINATE(X, Y ) :– SUPERIOR(Y, X ).
Queries
SUPERIOR(james, Y )?
SUPERIOR(james, joyce)?
joyceramesh
franklin
james(b)(a)
john ahmad
jennifer
alicia
Figure 11
(a) Prolog notation.
(b) The supervisory tree.
values in a predicate are either numeric or character strings; they are represented as
identifiers (or names) that start with a lowercase letter, whereas variable names
always start with an uppercase letter.
Consider the example shown in Figure 11, which is based on the relational database
in Figure A.1 (in Appendix: Figures at the end of this chapter), but in a much sim-
plified form. There are three predicate names: supervise, superior, and subordinate.
The SUPERVISE predicate is defined via a set of facts, each of which has two argu-
ments: a supervisor name, followed by the name of a direct supervisee (subordinate)
of that supervisor. These facts correspond to the actual data that is stored in the
database, and they can be considered as constituting a set of tuples in a relation
SUPERVISE with two attributes whose schema is
SUPERVISE(Supervisor, Supervisee)
Thus, SUPERVISE(X, Y ) states the fact that X supervises Y. Notice the omission of
the attribute names in the Prolog notation. Attribute names are only represented by
virtue of the position of each argument in a predicate: the first argument represents
the supervisor, and the second argument represents a direct subordinate.
The other two predicate names are defined by rules. The main contributions of
deductive databases are the ability to specify recursive rules and to provide a frame-
work for inferring new information based on the specified rules. A rule is of the
form head :– body, where :– is read as if and only if. A rule usually has a single pred-
icate to the left of the :– symbol—called the head or left-hand side (LHS) or
conclusion of the rule—and one or more predicates to the right of the :– symbol—
called the body or right-hand side (RHS) or premise(s) of the rule. A predicate
with constants as arguments is said to be ground; we also refer to it as an
instantiated predicate. The arguments of the predicates that appear in a rule typi-
cally include a number of variable symbols, although predicates can also contain
969
Enhanced Data Models for Advanced Applications
constants as arguments. A rule specifies that, if a particular assignment or binding
of constant values to the variables in the body (RHS predicates) makes all the RHS
predicates true, it also makes the head (LHS predicate) true by using the same
assignment of constant values to variables. Hence, a rule provides us with a way of
generating new facts that are instantiations of the head of the rule. These new facts
are based on facts that already exist, corresponding to the instantiations (or bind-
ings) of predicates in the body of the rule. Notice that by listing multiple predicates
in the body of a rule we implicitly apply the logical AND operator to these predi-
cates. Hence, the commas between the RHS predicates may be read as meaning and.
Consider the definition of the predicate SUPERIOR in Figure 11, whose first argu-
ment is an employee name and whose second argument is an employee who is
either a direct or an indirect subordinate of the first employee. By indirect subordi-
nate, we mean the subordinate of some subordinate down to any number of levels.
Thus SUPERIOR(X, Y) stands for the fact that X is a superior of Y through direct or
indirect supervision. We can write two rules that together specify the meaning of the
new predicate. The first rule under Rules in the figure states that for every value of X
and Y, if SUPERVISE(X, Y)—the rule body—is true, then SUPERIOR(X, Y)—the
rule head—is also true, since Y would be a direct subordinate of X (at one level
down). This rule can be used to generate all direct superior/subordinate relation-
ships from the facts that define the SUPERVISE predicate. The second recursive rule
states that if SUPERVISE(X, Z) and SUPERIOR(Z, Y ) are both true, then
SUPERIOR(X, Y) is also true. This is an example of a recursive rule, where one of
the rule body predicates in the RHS is the same as the rule head predicate in the
LHS. In general, the rule body defines a number of premises such that if they are all
true, we can deduce that the conclusion in the rule head is also true. Notice that if
we have two (or more) rules with the same head (LHS predicate), it is equivalent to
saying that the predicate is true (that is, that it can be instantiated) if either one of
the bodies is true; hence, it is equivalent to a logical OR operation. For example, if
we have two rules X :– Y and X :– Z, they are equivalent to a rule X :– Y OR Z. The
latter form is not used in deductive systems, however, because it is not in the stan-
dard form of rule, called a Horn clause, as we discuss in Section 5.4.
A Prolog system contains a number of built-in predicates that the system can inter-
pret directly. These typically include the equality comparison operator =(X, Y),
which returns true if X and Y are identical and can also be written as X=Y by using
the standard infix notation.31 Other comparison operators for numbers, such as <,
<=, >, and >=, can be treated as binary predicates. Arithmetic functions such as +,
–, *, and / can be used as arguments in predicates in Prolog. In contrast, Datalog (in
its basic form) does not allow functions such as arithmetic operations as arguments;
indeed, this is one of the main differences between Prolog and Datalog. However,
extensions to Datalog have been proposed that do include functions.
31A Prolog system typically has a number of different equality predicates that have different interpreta-
tions.
970
Enhanced Data Models for Advanced Applications
A query typically involves a predicate symbol with some variable arguments, and its
meaning (or answer) is to deduce all the different constant combinations that, when
bound (assigned) to the variables, can make the predicate true. For example, the
first query in Figure 11 requests the names of all subordinates of james at any level.
A different type of query, which has only constant symbols as arguments, returns
either a true or a false result, depending on whether the arguments provided can be
deduced from the facts and rules. For example, the second query in Figure 11
returns true, since SUPERIOR(james, joyce) can be deduced.
5.3 Datalog Notation
In Datalog, as in other logic-based languages, a program is built from basic objects
called atomic formulas. It is customary to define the syntax of logic-based lan-
guages by describing the syntax of atomic formulas and identifying how they can be
combined to form a program. In Datalog, atomic formulas are literals of the form
p(a1, a2, …, an), where p is the predicate name and n is the number of arguments for
predicate p. Different predicate symbols can have different numbers of arguments,
and the number of arguments n of predicate p is sometimes called the arity or
degree of p. The arguments can be either constant values or variable names. As
mentioned earlier, we use the convention that constant values either are numeric or
start with a lowercase character, whereas variable names always start with an
uppercase character.
A number of built-in predicates are included in Datalog, which can also be used to
construct atomic formulas. The built-in predicates are of two main types: the binary
comparison predicates < (less), <= (less_or_equal), > (greater), and >=
(greater_or_equal) over ordered domains; and the comparison predicates = (equal)
and /= (not_equal) over ordered or unordered domains. These can be used as binary
predicates with the same functional syntax as other predicates—for example, by
writing less(X, 3)—or they can be specified by using the customary infix notation
X<3. Note that because the domains of these predicates are potentially infinite, they
should be used with care in rule definitions. For example, the predicate greater(X,
3), if used alone, generates an infinite set of values for X that satisfy the predicate (all
integer numbers greater than 3).
A literal is either an atomic formula as defined earlier—called a positive literal—or
an atomic formula preceded by not. The latter is a negated atomic formula, called a
negative literal. Datalog programs can be considered to be a subset of the predicate
calculus formulas, which are somewhat similar to the formulas of the domain rela-
tional calculus. In Datalog, however, these formulas are first converted into what is
known as clausal form before they are expressed in Datalog, and only formulas
given in a restricted clausal form, called Horn clauses,32 can be used in Datalog.
32Named after the mathematician Alfred Horn.
971
Enhanced Data Models for Advanced Applications
5.4 Clausal Form and Horn Clauses
Recall that a formula in the relational calculus is a condition that includes predicates
called atoms (based on relation names). Additionally, a formula can have quanti-
fiers—namely, the universal quantifier (for all) and the existential quantifier (there
exists). In clausal form, a formula must be transformed into another formula with
the following characteristics:
■ All variables in the formula are universally quantified. Hence, it is not neces-
sary to include the universal quantifiers (for all) explicitly; the quantifiers are
removed, and all variables in the formula are implicitly quantified by the uni-
versal quantifier.
■ In clausal form, the formula is made up of a number of clauses, where each
clause is composed of a number of literals connected by OR logical connec-
tives only. Hence, each clause is a disjunction of literals.
■ The clauses themselves are connected by AND logical connectives only, to
form a formula. Hence, the clausal form of a formula is a conjunction of
clauses.
It can be shown that any formula can be converted into clausal form. For our pur-
poses, we are mainly interested in the form of the individual clauses, each of which
is a disjunction of literals. Recall that literals can be positive literals or negative liter-
als. Consider a clause of the form:
NOT(P1) OR NOT(P2) OR ... OR NOT(Pn) OR Q1 OR Q2 OR ... OR Qm (1)
This clause has n negative literals and m positive literals. Such a clause can be trans-
formed into the following equivalent logical formula:
P1 AND P2 AND ... AND Pn ⇒ Q1 OR Q2 OR ... OR Qm (2)
where ⇒ is the implies symbol. The formulas (1) and (2) are equivalent, meaning
that their truth values are always the same. This is the case because if all the Pi liter-
als (i = 1, 2, ..., n) are true, the formula (2) is true only if at least one of the Qi’s is
true, which is the meaning of the ⇒ (implies) symbol. For formula (1), if all the Pi
literals (i = 1, 2, ..., n) are true, their negations are all false; so in this case formula
(1) is true only if at least one of the Qi’s is true. In Datalog, rules are expressed as a
restricted form of clauses called Horn clauses, in which a clause can contain at most
one positive literal. Hence, a Horn clause is either of the form
NOT (P1) OR NOT(P2) OR ... OR NOT(Pn) OR Q (3)
or of the form
NOT (P1) OR NOT(P2) OR ... OR NOT(Pn) (4)
The Horn clause in (3) can be transformed into the clause
P1 AND P2 AND ... AND Pn ⇒ Q (5)
which is written in Datalog as the following rule:
Q :– P1, P2, ..., Pn. (6)
972
Enhanced Data Models for Advanced Applications
1. SUPERIOR(X, Y ) :– SUPERVISE(X, Y ). (rule 1)
2. SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ). (rule 2)
3. SUPERVISE(jennifer, ahmad). (ground axiom, given)
4. SUPERVISE(james, jennifer). (ground axiom, given)
5. SUPERIOR(jennifer, ahmad). (apply rule 1 on 3)
6. SUPERIOR(james, ahmad). (apply rule 2 on 4 and 5)
Figure 12
Proving a new fact.
The Horn clause in (4) can be transformed into
P1 AND P2 AND ... AND Pn ⇒ (7)
which is written in Datalog as follows:
P1, P2, ..., Pn. (8)
A Datalog rule, as in (6), is hence a Horn clause, and its meaning, based on formula
(5), is that if the predicates P1 AND P2 AND ... AND Pn are all true for a particular
binding to their variable arguments, then Q is also true and can hence be inferred.
The Datalog expression (8) can be considered as an integrity constraint, where all
the predicates must be true to satisfy the query.
In general, a query in Datalog consists of two components:
■ A Datalog program, which is a finite set of rules
■ A literal P(X1, X2, ..., Xn), where each Xi is a variable or a constant
A Prolog or Datalog system has an internal inference engine that can be used to
process and compute the results of such queries. Prolog inference engines typically
return one result to the query (that is, one set of values for the variables in the
query) at a time and must be prompted to return additional results. On the con-
trary, Datalog returns results set-at-a-time.
5.5 Interpretations of Rules
There are two main alternatives for interpreting the theoretical meaning of rules:
proof-theoretic and model-theoretic. In practical systems, the inference mechanism
within a system defines the exact interpretation, which may not coincide with either
of the two theoretical interpretations. The inference mechanism is a computational
procedure and hence provides a computational interpretation of the meaning of
rules. In this section, first we discuss the two theoretical interpretations. Then we
briefly discuss inference mechanisms as a way of defining the meaning of rules.
In the proof-theoretic interpretation of rules, we consider the facts and rules to be
true statements, or axioms. Ground axioms contain no variables. The facts are
ground axioms that are given to be true. Rules are called deductive axioms, since
they can be used to deduce new facts. The deductive axioms can be used to con-
struct proofs that derive new facts from existing facts. For example, Figure 12 shows
how to prove the fact SUPERIOR(james, ahmad) from the rules and facts given in
973
Enhanced Data Models for Advanced Applications
Figure 11. The proof-theoretic interpretation gives us a procedural or computa-
tional approach for computing an answer to the Datalog query. The process of
proving whether a certain fact (theorem) holds is known as theorem proving.
The second type of interpretation is called the model-theoretic interpretation.
Here, given a finite or an infinite domain of constant values,33 we assign to a predi-
cate every possible combination of values as arguments. We must then determine
whether the predicate is true or false. In general, it is sufficient to specify the combi-
nations of arguments that make the predicate true, and to state that all other combi-
nations make the predicate false. If this is done for every predicate, it is called an
interpretation of the set of predicates. For example, consider the interpretation
shown in Figure 13 for the predicates SUPERVISE and SUPERIOR. This interpreta-
tion assigns a truth value (true or false) to every possible combination of argument
values (from a finite domain) for the two predicates.
An interpretation is called a model for a specific set of rules if those rules are always
true under that interpretation; that is, for any values assigned to the variables in the
rules, the head of the rules is true when we substitute the truth values assigned to
the predicates in the body of the rule by that interpretation. Hence, whenever a par-
ticular substitution (binding) to the variables in the rules is applied, if all the predi-
cates in the body of a rule are true under the interpretation, the predicate in the
head of the rule must also be true. The interpretation shown in Figure 13 is a model
for the two rules shown, since it can never cause the rules to be violated. Notice that
a rule is violated if a particular binding of constants to the variables makes all the
predicates in the rule body true but makes the predicate in the rule head false. For
example, if SUPERVISE(a, b) and SUPERIOR(b, c) are both true under some inter-
pretation, but SUPERIOR(a, c) is not true, the interpretation cannot be a model for
the recursive rule:
SUPERIOR(X, Y) :– SUPERVISE(X, Z), SUPERIOR(Z, Y)
In the model-theoretic approach, the meaning of the rules is established by provid-
ing a model for these rules. A model is called a minimal model for a set of rules if
we cannot change any fact from true to false and still get a model for these rules. For
example, consider the interpretation in Figure 13, and assume that the SUPERVISE
predicate is defined by a set of known facts, whereas the SUPERIOR predicate is
defined as an interpretation (model) for the rules. Suppose that we add the predi-
cate SUPERIOR(james, bob) to the true predicates. This remains a model for the
rules shown, but it is not a minimal model, since changing the truth value of
SUPERIOR(james,bob) from true to false still provides us with a model for the rules.
The model shown in Figure 13 is the minimal model for the set of facts that are
defined by the SUPERVISE predicate.
In general, the minimal model that corresponds to a given set of facts in the model-
theoretic interpretation should be the same as the facts generated by the proof-
33The most commonly chosen domain is finite and is called the Herbrand Universe.
974
Enhanced Data Models for Advanced Applications
Rules
SUPERIOR(X, Y ) :– SUPERVISE(X, Y ).
SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ).
Interpretation
Known Facts:
SUPERVISE(franklin, john) is true.
SUPERVISE(franklin, ramesh) is true.
SUPERVISE(franklin, joyce) is true.
SUPERVISE(jennifer, alicia) is true.
SUPERVISE(jennifer, ahmad) is true.
SUPERVISE(james, franklin) is true.
SUPERVISE(james, jennifer) is true.
SUPERVISE(X, Y ) is false for all other possible (X, Y ) combinations
Derived Facts:
SUPERIOR(franklin, john) is true.
SUPERIOR(franklin, ramesh) is true.
SUPERIOR(franklin, joyce) is true.
SUPERIOR(jennifer, alicia) is true.
SUPERIOR(jennifer, ahmad) is true.
SUPERIOR(james, franklin) is true.
SUPERIOR(james, jennifer) is true.
SUPERIOR(james, john) is true.
SUPERIOR(james, ramesh) is true.
SUPERIOR(james, joyce) is true.
SUPERIOR(james, alicia) is true.
SUPERIOR(james, ahmad) is true.
SUPERIOR(X, Y ) is false for all other possible (X, Y ) combinations
Figure 13
An interpretation that
is a minimal model.
theoretic interpretation for the same original set of ground and deductive axioms.
However, this is generally true only for rules with a simple structure. Once we allow
negation in the specification of rules, the correspondence between interpretations
does not hold. In fact, with negation, numerous minimal models are possible for a
given set of facts.
A third approach to interpreting the meaning of rules involves defining an inference
mechanism that is used by the system to deduce facts from the rules. This inference
mechanism would define a computational interpretation to the meaning of the
rules. The Prolog logic programming language uses its inference mechanism to
define the meaning of the rules and facts in a Prolog program. Not all Prolog pro-
grams correspond to the proof-theoretic or model-theoretic interpretations; it
depends on the type of rules in the program. However, for many simple Prolog pro-
grams, the Prolog inference mechanism infers the facts that correspond either to the
proof-theoretic interpretation or to a minimal model under the model-theoretic
interpretation.
975
Enhanced Data Models for Advanced Applications
EMPLOYEE(john). MALE(john).
EMPLOYEE(franklin). MALE(franklin).
EMPLOYEE(aIicia). MALE(ramesh).
EMPLOYEE(jennifer). MALE(ahmad).
EMPLOYEE(ramesh). MALE(james).
EMPLOYEE(joyce).
EMPLOYEE(ahmad). FEMALE(alicia).
EMPLOYEE(james). FEMALE(jennifer).
FEMALE(joyce).
SALARY(john, 30000).
SALARY(franklin, 40000). PROJECT(productx).
SALARY(alicia, 25000). PROJECT(producty).
SALARY(jennifer, 43000). PROJECT(productz).
SALARY(ramesh, 38000). PROJECT(computerization).
SALARY(joyce, 25000). PROJECT(reorganization).
SALARY(ahmad, 25000). PROJECT(newbenefits).
SALARY(james, 55000).
WORKS_ON(john, productx, 32).
DEPARTMENT(john, research). WORKS_ON(john, producty, 8).
DEPARTMENT(franklin, research). WORKS_ON(ramesh, productz, 40).
DEPARTMENT(alicia, administration). WORKS_ON(joyce, productx, 20).
DEPARTMENT(jennifer, administration). WORKS_ON(joyce, producty, 20).
DEPARTMENT(ramesh, research). WORKS_ON(franklin, producty, 10).
DEPARTMENT(joyce, research). WORKS_ON(franklin, productz, 10).
DEPARTMENT(ahmad, administration). WORKS_ON(franklin, computerization, 10).
DEPARTMENT(james, headquarters). WORKS_ON(franklin, reorganization, 10).
WORKS_ON(alicia, newbenefits, 30).
SUPERVISE(franklln, john). WORKS_ON(alicia, computerization, 10).
SUPERVISE(franklln, ramesh) WORKS_ON(ahmad, computerization, 35).
SUPERVISE(frankin , joyce). WORKS_ON(ahmad, newbenefits, 5).
SUPERVISE(jennifer, aIicia). WORKS_ON(jennifer, newbenefits, 20).
SUPERVISE(jennifer, ahmad). WORKS_ON(jennifer, reorganization, 15).
SUPERVISE(james, franklin). WORKS_ON(james, reorganization, 10).
SUPERVISE(james, jennifer).
Figure 14
Fact predicates for
part of the database
from Figure A.1.
5.6 Datalog Programs and Their Safety
There are two main methods of defining the truth values of predicates in actual
Datalog programs. Fact-defined predicates (or relations) are defined by listing all
the combinations of values (the tuples) that make the predicate true. These corre-
spond to base relations whose contents are stored in a database system. Figure 14
shows the fact-defined predicates EMPLOYEE, MALE, FEMALE, DEPARTMENT,
SUPERVISE, PROJECT, and WORKS_ON, which correspond to part of the relational
database shown in Figure A.1. Rule-defined predicates (or views) are defined by
being the head (LHS) of one or more Datalog rules; they correspond to virtual rela-
976
Enhanced Data Models for Advanced Applications
SUPERIOR(X, Y ) :– SUPERVISE(X, Y ).
SUPERIOR(X, Y ) :– SUPERVISE(X, Z ), SUPERIOR(Z, Y ).
SUBORDINATE(X, Y ) :– SUPERIOR(Y, X ).
SUPERVISOR(X ) :– EMPLOYEE(X ), SUPERVISE(X, Y ).
OVER_40K_EMP(X ) :– EMPLOYEE(X ), SALARY(X, Y ), Y >= 40000.
UNDER_40K_SUPERVISOR(X ) :– SUPERVISOR(X ), NOT(OVER_40_K_EMP(X )).
MAIN_PRODUCTX_EMP(X ) :– EMPLOYEE(X ), WORKS_ON(X, productx, Y ), Y >=20.
PRESIDENT(X ) :– EMPLOYEE(X), NOT(SUPERVISE(Y, X ) ).
Figure 15
Rule-defined predicates.
tions whose contents can be inferred by the inference engine. Figure 15 shows a
number of rule-defined predicates.
A program or a rule is said to be safe if it generates a finite set of facts. The general
theoretical problem of determining whether a set of rules is safe is undecidable.
However, one can determine the safety of restricted forms of rules. For example, the
rules shown in Figure 16 are safe. One situation where we get unsafe rules that can
generate an infinite number of facts arises when one of the variables in the rule can
range over an infinite domain of values, and that variable is not limited to ranging
over a finite relation. For example, consider the following rule:
BIG_SALARY(Y ) :– Y>60000
Here, we can get an infinite result if Y ranges over all possible integers. But suppose
that we change the rule as follows:
BIG_SALARY(Y ) :– EMPLOYEE(X), Salary(X, Y ), Y>60000
In the second rule, the result is not infinite, since the values that Y can be bound to
are now restricted to values that are the salary of some employee in the database—
presumably, a finite set of values. We can also rewrite the rule as follows:
BIG_SALARY(Y ) :– Y>60000, EMPLOYEE(X ), Salary(X, Y )
In this case, the rule is still theoretically safe. However, in Prolog or any other system
that uses a top-down, depth-first inference mechanism, the rule creates an infinite
loop, since we first search for a value for Y and then check whether it is a salary of an
employee. The result is generation of an infinite number of Y values, even though
these, after a certain point, cannot lead to a set of true RHS predicates. One defini-
tion of Datalog considers both rules to be safe, since it does not depend on a partic-
ular inference mechanism. Nonetheless, it is generally advisable to write such a rule
in the safest form, with the predicates that restrict possible bindings of variables
placed first. As another example of an unsafe rule, consider the following rule:
HAS_SOMETHING(X, Y ) :– EMPLOYEE(X )
977
Enhanced Data Models for Advanced Applications
REL_ONE(A, B, C ).
REL_TWO(D, E, F ).
REL_THREE(G, H, I, J ).
SELECT_ONE_A_EQ_C(X, Y, Z ) :– REL_ONE(C, Y, Z ).
SELECT_ONE_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y< 5.
SELECT_ONE_A_EQ_C_AND_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ), Y<5
SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(C, Y, Z ).
SELECT_ONE_A_EQ_C_OR_B_LESS_5(X, Y, Z ) :– REL_ONE(X, Y, Z ), Y<5.
PROJECT_THREE_ON_G_H(W, X ) :– REL_THREE(W, X, Y, Z ).
UNION_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ).
UNION_ONE_TWO(X, Y, Z ) :– REL_TWO(X, Y, Z ).
INTERSECT_ONE_TWO(X, Y, Z ) :– REL_ONE(X, Y, Z ), REL_TWO(X, Y, Z ).
DIFFERENCE_TWO_ONE(X, Y, Z ) :– REL_TWO(X, Y, Z ) NOT(REL_ONE(X, Y, Z ).
CART PROD _ONE_THREE(T, U, V, W, X, Y, Z ) :–
REL_ONE(T, U, V), REL_THREE(W, X, Y, Z ).
NATURAL_JOIN_ONE_THREE_C_EQ_G(U, V, W, X, Y, Z ) :–
REL_ONE(U, V, W ), REL_THREE(W, X, Y, Z ).
Figure 16
Predicates for illustrating relational operations.
Here, an infinite number of Y values can again be generated, since the variable Y
appears only in the head of the rule and hence is not limited to a finite set of values.
To define safe rules more formally, we use the concept of a limited variable. A vari-
able X is limited in a rule if (1) it appears in a regular (not built-in) predicate in the
body of the rule; (2) it appears in a predicate of the form X=c or c=X or (c1<<=X
and X<=c2) in the rule body, where c, c1, and c2 are constant values; or (3) it appears
in a predicate of the form X=Y or Y=X in the rule body, where Y is a limited vari-
able. A rule is said to be safe if all its variables are limited.
5.7 Use of Relational Operations
It is straightforward to specify many operations of the relational algebra in the form
of Datalog rules that define the result of applying these operations on the database
relations (fact predicates). This means that relational queries and views can easily be
specified in Datalog. The additional power that Datalog provides is in the specifica-
tion of recursive queries, and views based on recursive queries. In this section, we
978
Enhanced Data Models for Advanced Applications
show how some of the standard relational operations can be specified as Datalog
rules. Our examples will use the base relations (fact-defined predicates) REL_ONE,
REL_TWO, and REL_THREE, whose schemas are shown in Figure 16. In Datalog, we
do not need to specify the attribute names as in Figure 16; rather, the arity (degree)
of each predicate is the important aspect. In a practical system, the domain (data
type) of each attribute is also important for operations such as UNION,
INTERSECTION, and JOIN, and we assume that the attribute types are compatible
for the various operations.
Figure 16 illustrates a number of basic relational operations. Notice that if the
Datalog model is based on the relational model and hence assumes that predicates
(fact relations and query results) specify sets of tuples, duplicate tuples in the same
predicate are automatically eliminated. This may or may not be true, depending on
the Datalog inference engine. However, it is definitely not the case in Prolog, so any
of the rules in Figure 16 that involve duplicate elimination are not correct for
Prolog. For example, if we want to specify Prolog rules for the UNION operation
with duplicate elimination, we must rewrite them as follows:
UNION_ONE_TWO(X, Y, Z) :– REL_ONE(X, Y, Z).
UNION_ONE_TWO(X, Y, Z) :– REL_TWO(X, Y, Z), NOT(REL_ONE(X, Y, Z)).
However, the rules shown in Figure 16 should work for Datalog, if duplicates are
automatically eliminated. Similarly, the rules for the PROJECT operation shown in
Figure 16 should work for Datalog in this case, but they are not correct for Prolog,
since duplicates would appear in the latter case.
5.8 Evaluation of Nonrecursive Datalog Queries
In order to use Datalog as a deductive database system, it is appropriate to define an
inference mechanism based on relational database query processing concepts. The
inherent strategy involves a bottom-up evaluation, starting with base relations; the
order of operations is kept flexible and subject to query optimization. In this section
we discuss an inference mechanism based on relational operations that can be
applied to nonrecursive Datalog queries. We use the fact and rule base shown in
Figures 14 and 15 to illustrate our discussion.
If a query involves only fact-defined predicates, the inference becomes one of
searching among the facts for the query result. For example, a query such as
DEPARTMENT(X, Research)?
is a selection of all employee names X who work for the Research department. In
relational algebra, it is the query:
π$1 (σ$2 = “Research” (DEPARTMENT))
which can be answered by searching through the fact-defined predicate
department(X,Y ). The query involves relational SELECT and PROJECT operations
on a base relation, and it can be handled by algorithmic database query processing
and optimization techniques.
979
Enhanced Data Models for Advanced Applications
SUPERVISOR UNDER_40K_SUPERVISOR
OVER_40K_EMP
PRESIDENT
MAIN_PRODUCT_EMP
WORKS_ON EMPLOYEE SALARY SUPERVISE
DEPARTMENT PROJECT FEMALE MALE
SUBORDINATE
SUPERIOR
Figure 17
Predicate dependency
graph for Figures 15
and 16.
When a query involves rule-defined predicates, the inference mechanism must
compute the result based on the rule definitions. If a query is nonrecursive and
involves a predicate p that appears as the head of a rule p :– p1, p2, ..., pn, the strategy
is first to compute the relations corresponding to p1, p2, ..., pn and then to compute
the relation corresponding to p. It is useful to keep track of the dependency among
the predicates of a deductive database in a predicate dependency graph. Figure 17
shows the graph for the fact and rule predicates shown in Figures 14 and 15. The
dependency graph contains a node for each predicate. Whenever a predicate A is
specified in the body (RHS) of a rule, and the head (LHS) of that rule is the predi-
cate B, we say that B depends on A, and we draw a directed edge from A to B. This
indicates that in order to compute the facts for the predicate B (the rule head), we
must first compute the facts for all the predicates A in the rule body. If the depend-
ency graph has no cycles, we call the rule set nonrecursive. If there is at least one
cycle, we call the rule set recursive. In Figure 17, there is one recursively defined
predicate—namely, SUPERIOR—which has a recursive edge pointing back to itself.
Additionally, because the predicate subordinate depends on SUPERIOR, it also
requires recursion in computing its result.
A query that includes only nonrecursive predicates is called a nonrecursive query.
In this section we discuss only inference mechanisms for nonrecursive queries. In
Figure 17, any query that does not involve the predicates SUBORDINATE or
SUPERIOR is nonrecursive. In the predicate dependency graph, the nodes corre-
sponding to fact-defined predicates do not have any incoming edges, since all fact-
defined predicates have their facts stored in a database relation. The contents of a
fact-defined predicate can be computed by directly retrieving the tuples in the cor-
responding database relation.
980
Enhanced Data Models for Advanced Applications
The main function of an inference mechanism is to compute the facts that corre-
spond to query predicates. This can be accomplished by generating a relational
expression involving relational operators as SELECT, PROJECT, JOIN, UNION, and
SET DIFFERENCE (with appropriate provision for dealing with safety issues) that,
when executed, provides the query result. The query can then be executed by utiliz-
ing the internal query processing and optimization operations of a relational data-
base management system. Whenever the inference mechanism needs to compute
the fact set corresponding to a nonrecursive rule-defined predicate p, it first locates
all the rules that have p as their head. The idea is to compute the fact set for each
such rule and then to apply the UNION operation to the results, since UNION corre-
sponds to a logical OR operation. The dependency graph indicates all predicates q
on which each p depends, and since we assume that the predicate is nonrecursive,
we can always determine a partial order among such predicates q. Before computing
the fact set for p, first we compute the fact sets for all predicates q on which p
depends, based on their partial order. For example, if a query involves the predicate
UNDER_40K_SUPERVISOR, we must first compute both SUPERVISOR and
OVER_40K_EMP. Since the latter two depend only on the fact-defined predicates
EMPLOYEE, SALARY, and SUPERVISE, they can be computed directly from the
stored database relations.
This concludes our introduction to deductive databases. We have included an exten-
sive bibliography of work in deductive databases, recursive query processing, magic
sets, combination of relational databases with deductive rules, and GLUE-NAIL!
System at the end of this chapter.
6 Summary
In this chapter we introduced database concepts for some of the common features
that are needed by advanced applications: active databases, temporal databases, spa-
tial databases, multimedia databases, and deductive databases. It is important to
note that each of these is a broad topic and warrants a complete textbook.
First we introduced the topic of active databases, which provide additional func-
tionality for specifying active rules. We introduced the Event-Condition-Action
(ECA) model for active databases. The rules can be automatically triggered by
events that occur—such as a database update—and they can initiate certain actions
that have been specified in the rule declaration if certain conditions are true. Many
commercial packages have some of the functionality provided by active databases in
the form of triggers. We discussed the different options for specifying rules, such as
row-level versus statement-level, before versus after, and immediate versus deferred.
We gave examples of row-level triggers in the Oracle commercial system, and
statement-level rules in the STARBURST experimental system. The syntax for trig-
gers in the SQL-99 standard was also discussed. We briefly discussed some design
issues and some possible applications for active databases.
981
Enhanced Data Models for Advanced Applications
Next we introduced some of the concepts of temporal databases, which permit the
database system to store a history of changes and allow users to query both current
and past states of the database. We discussed how time is represented and distin-
guished between the valid time and transaction time dimensions. We discussed how
valid time, transaction time, and bitemporal relations can be implemented using
tuple versioning in the relational model, with examples to illustrate how updates,
inserts, and deletes are implemented. We also showed how complex objects can be
used to implement temporal databases using attribute versioning. We looked at
some of the querying operations for temporal relational databases and gave a brief
introduction to the TSQL2 language.
Then we turned to spatial databases. Spatial databases provide concepts for data-
bases that keep track of objects that have spatial characteristics. We discussed the
types of spatial data, types of operators for processing spatial data, types of spatial
queries, and spatial indexing techniques, including the popular R-trees. Then we
discussed some spatial data mining techniques and applications of spatial data.
We discussed some basic types of multimedia databases and their important char-
acteristics. Multimedia databases provide features that allow users to store and
query different types of multimedia information, which includes images (such as
pictures and drawings), video clips (such as movies, newsreels, and home videos),
audio clips (such as songs, phone messages, and speeches), and documents (such as
books and articles). We provided a brief overview of the various types of media
sources and how multimedia sources may be indexed. Images are an extremely com-
mon type of data among databases today and are likely to occupy a large proportion
of stored data in databases. We therefore provided a more detailed treatment of
images: their automatic analysis, recognition of objects within images, and their
semantic tagging—all of which contribute to developing better systems to retrieve
images by content, which still remains a challenging problem. We also commented
on the analysis of audio data sources.
We concluded the chapter with an introduction to deductive databases. We gave an
overview of Prolog and Datalog notation. We discussed the clausal form of formu-
las. Datalog rules are restricted to Horn clauses, which contain at most one positive
literal. We discussed the proof-theoretic and model-theoretic interpretation of
rules. We briefly discussed Datalog rules and their safety and the ways of expressing
relational operators using Datalog rules. Finally, we discussed an inference mecha-
nism based on relational operations that can be used to evaluate nonrecursive
Datalog queries using relational query optimization techniques. While Datalog has
been a popular language with many applications, unfortunately, implementations
of deductive database systems such as LDL or VALIDITY have not become widely
commercially available.
982
Enhanced Data Models for Advanced Applications
Review Questions
1. What are the differences between row-level and statement-level active rules?
2. What are the differences among immediate, deferred, and detached
consideration of active rule conditions?
3. What are the differences among immediate, deferred, and detached
execution of active rule actions?
4. Briefly discuss the consistency and termination problems when designing a
set of active rules.
5. Discuss some applications of active databases.
6. Discuss how time is represented in temporal databases and compare the dif-
ferent time dimensions.
7. What are the differences between valid time, transaction time, and bitempo-
ral relations?
8. Describe how the insert, delete, and update commands should be imple-
mented on a valid time relation.
9. Describe how the insert, delete, and update commands should be imple-
mented on a bitemporal relation.
10. Describe how the insert, delete, and update commands should be imple-
mented on a transaction time relation.
11. What are the main differences between tuple versioning and attribute ver-
sioning?
12. How do spatial databases differ from regular databases?
13. What are the different types of spatial data?
14. Name the main types of spatial operators and different classes of spatial
queries.
15. What are the properties of R-trees that act as an index for spatial data?
16. Describe how a spatial join index between spatial objects can be constructed.
17. What are the different types of spatial data mining?
18. State the general form of a spatial association rule. Give an example of a spa-
tial association rule.
19. What are the different types of multimedia sources?
20. How are multimedia sources indexed for content-based retrieval?
983
Enhanced Data Models for Advanced Applications
21. What important features of images are used to compare them?
22. What are the different approaches to recognizing objects in images?
23. How is semantic tagging of images used?
24. What are the difficulties in analyzing audio sources?
25. What are deductive databases?
26. Write sample rules in Prolog to define that courses with course number
above CS5000 are graduate courses and that DBgrads are those graduate stu-
dents who enroll in CS6400 and CS8803.
27. Define clausal form of formulas and Horn clauses.
28. What is theorem proving and what is proof-theoretic interpretation of rules?
29. What is model-theoretic interpretation and how does it differ from proof-
theoretic interpretation?
30. What are fact-defined predicates and rule-defined predicates?
31. What is a safe rule?
32. Give examples of rules that can define relational operations SELECT,
PROJECT, JOIN, and SET operations.
33. Discuss the inference mechanism based on relational operations that can be
applied to evaluate nonrecursive Datalog queries.
Exercises
34. Consider the COMPANY database described in Figure A.1. Using the syntax
of Oracle triggers, write active rules to do the following:
a. Whenever an employee’s project assignments are changed, check if the
total hours per week spent on the employee’s projects are less than 30 or
greater than 40; if so, notify the employee’s direct supervisor.
b. Whenever an employee is deleted, delete the PROJECT tuples and
DEPENDENT tuples related to that employee, and if the employee man-
ages a department or supervises employees, set the Mgr_ssn for that
department to NULL and set the Super_ssn for those employees to NULL.
35. Repeat 34 but use the syntax of STARBURST active rules.
36. Consider the relational schema shown in Figure 18. Write active rules for
keeping the Sum_commissions attribute of SALES_PERSON equal to the sum
of the Commission attribute in SALES for each sales person. Your rules should
also check if the Sum_commissions exceeds 100000; if it does, call a procedure
Notify_manager(S_id). Write both statement-level rules in STARBURST nota-
tion and row-level rules in Oracle.
984
Enhanced Data Models for Advanced Applications
S_id V_id Commission
SALES
Salesperson_id Name Title Phone Sum_commissions
SALES_PERSON
Figure 18
Database schema for sales
and salesperson commissions
in Exercise 36.
37. Consider the UNIVERSITY EER schema in Figure A.2. Write some rules (in
English) that could be implemented via active rules to enforce some com-
mon integrity constraints that you think are relevant to this application.
38. Discuss which of the updates that created each of the tuples shown in Figure
9 were applied retroactively and which were applied proactively.
39. Show how the following updates, if applied in sequence, would change the
contents of the bitemporal EMP_BT relation in Figure 9. For each update,
state whether it is a retroactive or proactive update.
a. On 2004-03-10,17:30:00, the salary of Narayan is updated to 40000, effec-
tive on 2004-03-01.
b. On 2003-07-30,08:31:00, the salary of Smith was corrected to show that it
should have been entered as 31000 (instead of 30000 as shown), effective
on 2003-06-01.
c. On 2004-03-18,08:31:00, the database was changed to indicate that
Narayan was leaving the company (that is, logically deleted) effective on
2004-03-31.
d. On 2004-04-20,14:07:33, the database was changed to indicate the hiring
of a new employee called Johnson, with the tuple <‘Johnson’, ‘334455667’,
1, NULL > effective on 2004-04-20.
e. On 2004-04-28,12:54:02, the database was changed to indicate that Wong
was leaving the company (that is, logically deleted) effective on 2004-06-
01.
f. On 2004-05-05,13:07:33, the database was changed to indicate the rehir-
ing of Brown, with the same department and supervisor but with salary
35000 effective on 2004-05-01.
40. Show how the updates given in Exercise 39, if applied in sequence, would
change the contents of the valid time EMP_VT relation in Figure 8.
41. Add the following facts to the sample database in Figure 11:
SUPERVISE(ahmad, bob), SUPERVISE(franklin, gwen).
First modify the supervisory tree in Figure 11(b) to reflect this change. Then
construct a diagram showing the top-down evaluation of the query
SUPERIOR(james, Y) using rules 1 and 2 from Figure 12.
985
Enhanced Data Models for Advanced Applications
42. Consider the following set of facts for the relation PARENT(X, Y), where Y is
the parent of X:
PARENT(a, aa), PARENT(a, ab), PARENT(aa, aaa), PARENT(aa, aab),
PARENT(aaa, aaaa), PARENT(aaa, aaab).
Consider the rules
r1: ANCESTOR(X, Y) :– PARENT(X, Y)
r2: ANCESTOR(X, Y) :– PARENT(X, Z), ANCESTOR(Z, Y)
which define ancestor Y of X as above.
a. Show how to solve the Datalog query
ANCESTOR(aa, X)?
and show your work at each step.
b. Show the same query by computing only the changes in the ancestor rela-
tion and using that in rule 2 each time.
[This question is derived from Bancilhon and Ramakrishnan (1986).]
43. Consider a deductive database with the following rules:
ANCESTOR(X, Y) :– FATHER(X, Y)
ANCESTOR(X, Y) :– FATHER(X, Z), ANCESTOR(Z, Y)
Notice that FATHER(X, Y) means that Y is the father of X; ANCESTOR(X, Y)
means that Y is the ancestor of X.
Consider the following fact base:
FATHER(Harry, Issac), FATHER(Issac, John), FATHER(John, Kurt).
a. Construct a model-theoretic interpretation of the above rules using the
given facts.
b. Consider that a database contains the above relations FATHER(X, Y ),
another relation BROTHER(X, Y ), and a third relation BIRTH(X, B ),
where B is the birth date of person X. State a rule that computes the first
cousins of the following variety: their fathers must be brothers.
c. Show a complete Datalog program with fact-based and rule-based literals
that computes the following relation: list of pairs of cousins, where the
first person is born after 1960 and the second after 1970. You may use
greater than as a built-in predicate. (Note: Sample facts for brother, birth,
and person must also be shown.)
44. Consider the following rules:
REACHABLE(X, Y) :– FLIGHT(X, Y)
REACHABLE(X, Y) :– FLIGHT(X, Z), REACHABLE(Z, Y)
where REACHABLE(X, Y) means that city Y can be reached from city X, and
FLIGHT(X, Y) means that there is a flight to city Y from city X.
986
Enhanced Data Models for Advanced Applications
a. Construct fact predicates that describe the following:
i. Los Angeles, New York, Chicago, Atlanta, Frankfurt, Paris, Singapore,
Sydney are cities.
ii. The following flights exist: LA to NY, NY to Atlanta, Atlanta to
Frankfurt, Frankfurt to Atlanta, Frankfurt to Singapore, and
Singapore to Sydney. (Note: No flight in reverse direction can be auto-
matically assumed.)
b. Is the given data cyclic? If so, in what sense?
c. Construct a model-theoretic interpretation (that is, an interpretation
similar to the one shown in Figure 13) of the above facts and rules.
d. Consider the query
REACHABLE(Atlanta, Sydney)?
How will this query be executed? List the series of steps it will go through.
e. Consider the following rule-defined predicates:
ROUND-TRIP-REACHABLE(X, Y) :–
REACHABLE(X, Y), REACHABLE(Y, X)
DURATION(X, Y, Z)
Draw a predicate dependency graph for the above predicates. (Note:
DURATION(X, Y, Z) means that you can take a flight from X to Y in Z
hours.)
f. Consider the following query: What cities are reachable in 12 hours from
Atlanta? Show how to express it in Datalog. Assume built-in predicates
like greater-than(X, Y). Can this be converted into a relational algebra
statement in a straightforward way? Why or why not?
g. Consider the predicate population(X, Y), where Y is the population of
city X. Consider the following query: List all possible bindings of the
predicate pair (X, Y), where Y is a city that can be reached in two flights
from city X, which has over 1 million people. Show this query in Datalog.
Draw a corresponding query tree in relational algebraic terms.
Selected Bibliography
The book by Zaniolo et al. (1997) consists of several parts, each describing an
advanced database concept such as active, temporal, and spatial/text/multimedia
databases. Widom and Ceri (1996) and Ceri and Fraternali (1997) focus on active
database concepts and systems. Snodgrass (1995) describes the TSQL2 language
and data model. Khoshafian and Baker (1996), Faloutsos (1996), and
Subrahmanian (1998) describe multimedia database concepts. Tansel et al. (1993) is
a collection of chapters on temporal databases.
STARBURST rules are described in Widom and Finkelstein (1990). Early work on
active databases includes the HiPAC project, discussed in Chakravarthy et al. (1989)
987
Enhanced Data Models for Advanced Applications
and Chakravarthy (1990). A glossary for temporal databases is given in Jensen et al.
(1994). Snodgrass (1987) focuses on TQuel, an early temporal query language.
Temporal normalization is defined in Navathe and Ahmed (1989). Paton (1999)
and Paton and Diaz (1999) survey active databases. Chakravarthy et al. (1994)
describe SENTINEL and object-based active systems. Lee et al. (1998) discuss time
series management.
The book by Shekhar and Chawla (2003) consists of all aspects of spatial databases
including spatial data models, spatial storage and indexing, and spatial data mining.
Scholl et al. (2001) is another textbook on spatial data management. Albrecht
(1996) describes in detail the various GIS analysis operations. Clementini and Di
Felice (1993) give a detailed description of the spatial operators. Güting (1994)
describes the spatial data structures and querying languages for spatial database sys-
tems. Guttman (1984) proposed R-trees for spatial data indexing. Manolopoulos et
al. (2005) is a book on the theory and applications of R-trees. Papadias et al. (2003)
discuss query processing using R-trees for spatial networks. Ester et al. (2001) pro-
vide a comprehensive discussion on the algorithms and applications of spatial data
mining. Koperski and Han (1995) discuss association rule discovery from geo-
graphic databases. Brinkhoff et al. (1993) provide a comprehensive overview of the
usage of R-trees for efficient processing of spatial joins. Rotem (1991) describes spa-
tial join indexes comprehensively. Shekhar and Xiong (2008) is a compilation of
various sources that discuss different aspects of spatial database management sys-
tems and GIS. The density-based clustering algorithms DBSCAN and DENCLUE
are proposed by Ester et al. (1996) and Hinnenberg and Gabriel (2007) respectively.
Multimedia database modeling has a vast amount of literature—it is difficult to
point to all important references here. IBM’s QBIC (Query By Image Content) sys-
tem described in Niblack et al. (1998) was one of the first comprehensive approaches
for querying images based on content. It is now available as a part of IBM’s DB2
database image extender. Zhao and Grosky (2002) discuss content-based image
retrieval. Carneiro and Vasconselos (2005) present a database-centric view of seman-
tic image annotation and retrieval. Content-based retrieval of subimages is discussed
by Luo and Nascimento (2004). Tuceryan and Jain (1998) discuss various aspects of
texture analysis. Object recognition using SIFT is discussed in Lowe (2004). Lazebnik
et al. (2004) describe the use of local affine regions to model 3D objects (RIFT).
Among other object recognition approaches, G-RIF is described in Kim et al. (2006),
Bay et al. (2006) discuss SURF, Ke and Sukthankar (2004) present PCA-SIFT, and
Mikolajczyk and Schmid (2005) describe GLOH. Fan et al. (2004) present a tech-
nique for automatic image annotation by using concept-sensitive objects. Fotouhi et
al. (2007) was the first international workshop on many faces of multimedia seman-
tics, which is continuing annually. Thuraisingham (2001) classifies audio data into
different categories, and by treating each of these categories differently, elaborates on
the use of metadata for audio. Prabhakaran (1996) has also discussed how speech
processing techniques can add valuable metadata information to the audio piece.
The early developments of the logic and database approach are surveyed by Gallaire
et al. (1984). Reiter (1984) provides a reconstruction of relational database theory,
988
while Levesque (1984) provides a discussion of incomplete knowledge in light of
logic. Gallaire and Minker (1978) provide an early book on this topic. A detailed
treatment of logic and databases appears in Ullman (1989, Volume 2), and there is a
related chapter in Volume 1 (1988). Ceri, Gottlob, and Tanca (1990) present a com-
prehensive yet concise treatment of logic and databases. Das (1992) is a comprehen-
sive book on deductive databases and logic programming. The early history of
Datalog is covered in Maier and Warren (1988). Clocksin and Mellish (2003) is an
excellent reference on Prolog language.
Aho and Ullman (1979) provide an early algorithm for dealing with recursive
queries, using the least fixed-point operator. Bancilhon and Ramakrishnan (1986)
give an excellent and detailed description of the approaches to recursive query pro-
cessing, with detailed examples of the naive and seminaive approaches. Excellent
survey articles on deductive databases and recursive query processing include
Warren (1992) and Ramakrishnan and Ullman (1995). A complete description of
the seminaive approach based on relational algebra is given in Bancilhon (1985).
Other approaches to recursive query processing include the recursive query/sub-
query strategy of Vieille (1986), which is a top-down interpreted strategy, and the
Henschen-Naqvi (1984) top-down compiled iterative strategy. Balbin and
Ramamohanrao (1987) discuss an extension of the seminaive differential approach
for multiple predicates.
The original paper on magic sets is by Bancilhon et al. (1986). Beeri and
Ramakrishnan (1987) extend it. Mumick et al. (1990a) show the applicability of
magic sets to nonrecursive nested SQL queries. Other approaches to optimizing rules
without rewriting them appear in Vieille (1986, 1987). Kifer and Lozinskii (1986)
propose a different technique. Bry (1990) discusses how the top-down and bottom-
up approaches can be reconciled. Whang and Navathe (1992) describe an extended
disjunctive normal form technique to deal with recursion in relational algebra
expressions for providing an expert system interface over a relational DBMS.
Chang (1981) describes an early system for combining deductive rules with rela-
tional databases. The LDL system prototype is described in Chimenti et al. (1990).
Krishnamurthy and Naqvi (1989) introduce the choice notion in LDL. Zaniolo
(1988) discusses the language issues for the LDL system. A language overview of
CORAL is provided in Ramakrishnan et al. (1992), and the implementation is
described in Ramakrishnan et al. (1993). An extension to support object-oriented
features, called CORAL++, is described in Srivastava et al. (1993). Ullman (1985)
provides the basis for the NAIL! system, which is described in Morris et al. (1987).
Phipps et al. (1991) describe the GLUE-NAIL! deductive database system.
Zaniolo (1990) reviews the theoretical background and the practical importance of
deductive databases. Nicolas (1997) gives an excellent history of the developments
leading up to Deductive Object-Oriented Database (DOOD) systems. Falcone et al.
(1997) survey the DOOD landscape. References on the VALIDITY system include
Friesen et al. (1995), Vieille (1998), and Dietrich et al. (1999).
Enhanced Data Models for Advanced Applications
989
DEPT_LOCATIONS
Dnumber
Houston
Stafford
Bellaire
Sugarland
Dlocation
DEPARTMENT
Dname
Research
Administration
Headquarters 1
5
4
888665555
333445555
987654321
1981-06-19
1988-05-22
1995-01-01
Dnumber Mgr_ssn Mgr_start_date
WORKS_ON
Essn
123456789
123456789
666884444
453453453
453453453
333445555
333445555
333445555
333445555
999887777
999887777
987987987
987987987
987654321
987654321
888665555
3
1
2
2
1
2
30
30
30
10
10
3
10
20
20
20
40.0
32.5
7.5
10.0
10.0
10.0
10.0
20.0
20.0
30.0
5.0
10.0
35.0
20.0
15.0
NULL
Pno Hours
PROJECT
Pname
ProductX
ProductY
ProductZ
Computerization
Reorganization
Newbenefits
3
1
2
30
10
20
5
5
5
4
4
1
Houston
Bellaire
Sugarland
Stafford
Stafford
Houston
Pnumber Plocation Dnum
DEPENDENT
333445555
333445555
333445555
987654321
123456789
123456789
123456789
Joy
Alice F
M
F
M
M
F
F
1986-04-05
1983-10-25
1958-05-03
1942-02-28
1988-01-04
1988-12-30
1967-05-05
Theodore
Alice
Elizabeth
Abner
Michael
Spouse
Daughter
Son
Daughter
Spouse
Spouse
Son
Dependent_name Sex Bdate Relationship
EMPLOYEE
Fname
John
Franklin
Jennifer
Alicia
Ramesh
Joyce
James
Ahmad
Narayan
English
Borg
Jabbar
666884444
453453453
888665555
987987987
F
F
M
M
M
M
M
F
4
4
5
5
4
1
5
5
25000
43000
30000
40000
25000
55000
38000
25000
987654321
888665555
333445555
888665555
987654321
NULL
333445555
333445555
Zelaya
Wallace
Smith
Wong
3321 Castle, Spring, TX
291 Berry, Bellaire, TX
731 Fondren, Houston, TX
638 Voss, Houston, TX
1968-01-19
1941-06-20
1965-01-09
1955-12-08
1969-03-29
1937-11-10
1962-09-15
1972-07-31
980 Dallas, Houston, TX
450 Stone, Houston, TX
975 Fire Oak, Humble, TX
5631 Rice, Houston, TX
999887777
987654321
123456789
333445555
Minit Lname Ssn Bdate Address Sex DnoSalary Super_ssn
B
T
J
S
K
A
V
E
Houston
1
4
5
5
Essn
5
Figure A.1
One possible database state for the COMPANY relational database schema.
990
Enhanced Data Models for Advanced Applications
Project
change_project
. . .
RESEARCH_
ASSISTANT
Course
assign_to_course
. . .
TEACHING_
ASSISTANT
Degree_program
change_degree_program
. . .
GRADUATE_
STUDENT
Class
change_classification
. . .
UNDERGRADUATE_
STUDENT
Position
hire_staff
. . .
STAFF
Rank
promote
. . .
FACULTY
Percent_time
hire_student
. . .
STUDENT_ASSISTANT
Year
Degree
Major
DEGREE
. . .
Salary
hire_emp
. . .
EMPLOYEE
new_alumnus
1 *
. . .
ALUMNUS
Major_dept
change_major
. . .
STUDENT
Name
Ssn
Birth_date
Sex
Address
age
. . .
PERSON
Figure A.2
A UML class diagram.
991
Introduction to Information
Retrieval and Web Search1
Information retrieval deals mainly with unstructureddata, and the techniques for indexing, searching, and
retrieving information from large collections of unstructured documents. In this
chapter we will provide an introduction to information retrieval. This is a very
broad topic, so we will focus on the similarities and differences between informa-
tion retrieval and database technologies, and on the indexing techniques that form
the basis of many information retrieval systems.
This chapter is organized as follows. In Section 1 we introduce information retrieval
(IR) concepts and discuss how IR differs from traditional databases. Section 2 is
devoted to a discussion of retrieval models, which form the basis for IR search.
Section 3 covers different types of queries in IR systems. Section 4 discusses text pre-
processing, and Section 5 provides an overview of IR indexing, which is at the heart
of any IR system. In Section 6 we describe the various evaluation metrics for IR sys-
tems performance. Section 7 details Web analysis and its relationship to informa-
tion retrieval, and Section 8 briefly introduces the current trends in IR. Section 9
summarizes the chapter. For a limited overview of IR, we suggest that students read
Sections 1 through 6.
1This chapter is coauthored with Saurav Sahay of the Georgia Institute of Technology.
From Chapter 27 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
992
Introduction to Information Retrieval and Web Search
1 Information Retrieval (IR) Concepts
Information retrieval is the process of retrieving documents from a collection in
response to a query (or a search request) by a user. This section provides an
overview of information retrieval (IR) concepts. In Section 1.1, we introduce infor-
mation retrieval in general and then discuss the different kinds and levels of search
that IR encompasses. In Section 1.2, we compare IR and database technologies.
Section 1.3 gives a brief history of IR. We then present the different modes of user
interaction with IR systems in Section 1.4. In Section 1.5, we describe the typical IR
process with a detailed set of tasks and then with a simplified process flow, and end
with a brief discussion of digital libraries and the Web.
1.1 Introduction to Information Retrieval
We first review the distinction between structured and unstructured data to see how
information retrieval differs from structured data management. Consider a relation
(or table) called HOUSES with the attributes:
HOUSES(Lot#, Address, Square_footage, Listed_price)
This is an example of structured data. We can compare this relation with home-
buying contract documents, which are examples of unstructured data. These types
of documents can vary from city to city, and even county to county, within a given
state in the United States. Typically, a contract document in a particular state will
have a standard list of clauses described in paragraphs within sections of the docu-
ment, with some predetermined (fixed) text and some variable areas whose content
is to be supplied by the specific buyer and seller. Other variable information would
include interest rate for financing, down-payment amount, closing dates, and so on.
The documents could also possibly include some pictures taken during a home
inspection. The information content in such documents can be considered
unstructured data that can be stored in a variety of possible arrangements and for-
mats. By unstructured information, we generally mean information that does not
have a well-defined formal model and corresponding formal language for represen-
tation and reasoning, but rather is based on understanding of natural language.
With the advent of the World Wide Web (or Web, for short), the volume of unstruc-
tured information stored in messages and documents that contain textual and mul-
timedia information has exploded. These documents are stored in a variety of
standard formats, including HTML, XML, and several audio and video formatting
standards. Information retrieval deals with the problems of storing, indexing, and
retrieving (searching) such information to satisfy the needs of users. The problems
that IR deals with are exacerbated by the fact that the number of Web pages and the
number of social interaction events is already in the billions, and is growing at a
phenomenal rate. All forms of unstructured data described above are being added at
the rates of millions per day, expanding the searchable space on the Web at rapidly
increasing rates.
993
Introduction to Information Retrieval and Web Search
Historically, information retrieval is “the discipline that deals with the structure,
analysis, organization, storage, searching, and retrieval of information” as defined
by Gerald Salton, an IR pioneer.2 We can enhance the definition slightly to say that
it applies in the context of unstructured documents to satisfy a user’s information
needs. This field has existed even longer than the database field, and was originally
concerned with retrieval of cataloged information in libraries based on titles,
authors, topics, and keywords. In academic programs, the field of IR has long been a
part of Library and Information Science programs. Information in the context of IR
does not require machine-understandable structures, such as in relational database
systems. Examples of such information include written texts, abstracts, documents,
books, Web pages, e-mails, instant messages, and collections from digital libraries.
Therefore, all loosely represented (unstructured) or semistructured information is
also part of the IR discipline.
RDBMS (relational database management system) vendors are providing modules
to support data types, including spatial, temporal, and multimedia data, as well as
XML data, in the newer versions of their products, sometimes referred to as
extended RDBMSs, or object-relational database management systems (ORDBMSs).
The challenge of dealing with unstructured data is largely an information retrieval
problem, although database researchers have been applying database indexing and
search techniques to some of these problems.
IR systems go beyond database systems in that they do not limit the user to a spe-
cific query language, nor do they expect the user to know the structure (schema) or
content of a particular database. IR systems use a user’s information need expressed
as a free-form search request (sometimes called a keyword search query, or just
query) for interpretation by the system. Whereas the IR field historically dealt with
cataloging, processing, and accessing text in the form of documents for decades, in
today’s world the use of Web search engines is becoming the dominant way to find
information. The traditional problems of text indexing and making collections of
documents searchable have been transformed by making the Web itself into a
quickly accessible repository of human knowledge.
An IR system can be characterized at different levels: by types of users, types of data,
and the types of the information need, along with the size and scale of the informa-
tion repository it addresses. Different IR systems are designed to address specific
problems that require a combination of different characteristics. These characteris-
tics can be briefly described as follows:
Types of Users. The user may be an expert user (for example, a curator or a
librarian), who is searching for specific information that is clear in his/her mind
and forms relevant queries for the task, or a layperson user with a generic infor-
mation need. The latter cannot create highly relevant queries for search (for
2See Salton’s 1968 book entitled Automatic Information Organization and Retrieval.
994
Introduction to Information Retrieval and Web Search
example, students trying to find information about a new topic, researchers try-
ing to assimilate different points of view about a historical issue, a scientist ver-
ifying a claim by another scientist, or a person trying to shop for clothing).
Types of Data. Search systems can be tailored to specific types of data. For
example, the problem of retrieving information about a specific topic may be
handled more efficiently by customized search systems that are built to collect
and retrieve only information related to that specific topic. The information
repository could be hierarchically organized based on a concept or topic hierar-
chy. These topical domain-specific or vertical IR systems are not as large as or as
diverse as the generic World Wide Web, which contains information on all
kinds of topics. Given that these domain-specific collections exist and may have
been acquired through a specific process, they can be exploited much more effi-
ciently by a specialized system.
Types of Information Need. In the context of Web search, users’ information
needs may be defined as navigational, informational, or transactional.3
Navigational search refers to finding a particular piece of information (such as
the Georgia Tech University Website) that a user needs quickly. The purpose of
informational search is to find current information about a topic (such as
research activities in the college of computing at Georgia Tech—this is the clas-
sic IR system task). The goal of transactional search is to reach a site where fur-
ther interaction happens (such as joining a social network, product shopping,
online reservations, accessing databases, and so on).
Levels of Scale. In the words of Nobel Laureate Herbert Simon,
What information consumes is rather obvious: it consumes the attention of its
recipients. Hence a wealth of information creates a poverty of attention, and a need
to allocate that attention efficiently among the overabundance of information
sources that might consume it. 4
This overabundance of information sources in effect creates a high noise-to-signal
ratio in IR systems. Especially on the Web, where billions of pages are indexed, IR
interfaces are built with efficient scalable algorithms for distributed searching,
indexing, caching, merging, and fault tolerance. IR search engines can be limited in
level to more specific collections of documents. Enterprise search systems offer IR
solutions for searching different entities in an enterprise’s intranet, which consists
of the network of computers within that enterprise. The searchable entities include
e-mails, corporate documents, manuals, charts, and presentations, as well as reports
related to people, meetings, and projects. They still typically deal with hundreds of
millions of entities in large global enterprises. On a smaller scale, there are personal
information systems such as those on desktops and laptops, called desktop search
engines (for example, Google Desktop), for retrieving files, folders, and different
kinds of entities stored on the computer. There are peer-to-peer systems, such as
3See Broder (2002) for details.
4From Simon (1971), “Designing Organizations for an Information-Rich World.”
995
Introduction to Information Retrieval and Web Search
BitTorrent, which allows sharing of music in the form of audio files, as well as spe-
cialized search engines for audio, such as Lycos and Yahoo! audio search.
1.2 Databases and IR Systems: A Comparison
Within the computer science discipline, databases and IR systems are closely related
fields. Databases deal with structured information retrieval through well-defined
formal languages for representation and manipulation based on the theoretically
founded data models. Efficient algorithms have been developed for operators that
allow rapid execution of complex queries. IR, on the other hand, deals with unstruc-
tured search with possibly vague query or search semantics and without a well-
defined logical schematic representation. Some of the key differences between
databases and IR systems are listed in Table 1.
Whereas databases have fixed schemas defined in some data model such as the rela-
tional model, an IR system has no fixed data model; it views data or documents
according to some scheme, such as the vector space model, to aid in query process-
ing (see Section 2). Databases using the relational model employ SQL for queries
and transactions. The queries are mapped into relational algebra operations and
search algorithms and return a new relation (table) as the query result, providing an
exact answer to the query for the current state of the database. In IR systems, there
is no fixed language for defining the structure (schema) of the document or for
operating on the document—queries tend to be a set of query terms (keywords) or
a free-form natural language phrase. An IR query result is a list of document ids, or
some pieces of text or multimedia objects (images, videos, and so on), or a list of
links to Web pages.
The result of a database query is an exact answer; if no matching records (tuples) are
found in the relation, the result is empty (null). On the other hand, the answer to a
user request in an IR query represents the IR system’s best attempt at retrieving the
Table 1 A Comparison of Databases and IR Systems
Databases
■ Structured data
■ Schema driven
■ Relational (or object, hierarchical, and
network) model is predominant
■ Structured query model
■ Rich metadata operations
■ Query returns data
■ Results are based on exact matching (always
correct)
IR Systems
■ Unstructured data
■ No fixed schema; various data models
(e.g., vector space model)
■ Free-form query models
■ Rich data operations
■ Search request returns list or pointers to
documents
■ Results are based on approximate matching
and measures of effectiveness (may be
imprecise and ranked)
996
Introduction to Information Retrieval and Web Search
information most relevant to that query. Whereas database systems maintain a large
amount of metadata and allow their use in query optimization, the operations in IR
systems rely on the data values themselves and their occurrence frequencies.
Complex statistical analysis is sometimes performed to determine the relevance of
each document or parts of a document to the user request.
1.3 A Brief History of IR
Information retrieval has been a common task since the times of ancient civiliza-
tions, which devised ways to organize, store, and catalog documents and records.
Media such as papyrus scrolls and stone tablets were used to record documented
information in ancient times. These efforts allowed knowledge to be retained and
transferred among generations. With the emergence of public libraries and the
printing press, large-scale methods for producing, collecting, archiving, and distrib-
uting documents and books evolved. As computers and automatic storage systems
emerged, the need to apply these methods to computerized systems arose. Several
techniques emerged in the 1950s, such as the seminal work of H. P. Luhn,5 who pro-
posed using words and their frequency counts as indexing units for documents, and
using measures of word overlap between queries and documents as the retrieval cri-
terion. It was soon realized that storing large amounts of text was not difficult. The
harder task was to search for and retrieve that information selectively for users with
specific information needs. Methods that explored word distribution statistics gave
rise to the choice of keywords based on their distribution properties6 and keyword-
based weighting schemes.
The earlier experiments with document retrieval systems such as SMART7 in the
1960s adopted the inverted file organization based on keywords and their weights as
the method of indexing (see Section 5). Serial (or sequential) organization proved
inadequate if queries required fast, near real-time response times. Proper organiza-
tion of these files became an important area of study; document classification and
clustering schemes ensued. The scale of retrieval experiments remained a challenge
due to lack of availability of large text collections. This soon changed with the World
Wide Web. Also, the Text Retrieval Conference (TREC) was launched by NIST
(National Institute of Standards and Technology) in 1992 as a part of the TIPSTER
program8 with the goal of providing a platform for evaluating information retrieval
methodologies and facilitating technology transfer to develop IR products.
A search engine is a practical application of information retrieval to large-scale
document collections. With significant advances in computers and communica-
tions technologies, people today have interactive access to enormous amounts of
user-generated distributed content on the Web. This has spurred the rapid growth
5See Luhn (1957) “A statistical approach to mechanized encoding and searching of literary information.”
6See Salton, Yang, and Yu (1975).
7For details, see Buckley et al. (1993).
8For details, see Harman (1992).
997
in search engine technology, where search engines are trying to discover different
kinds of real-time content found on the Web. The part of a search engine responsi-
ble for discovering, analyzing, and indexing these new documents is known as a
crawler. Other types of search engines exist for specific domains of knowledge. For
example, the biomedical literature search database was started in the 1970s and is
now supported by the PubMed search engine,9 which gives access to over 20 million
abstracts.
While continuous progress is being made to tailor search results to the needs of an
end user, the challenge remains in providing high-quality, pertinent, and timely
information that is precisely aligned to the information needs of individual users.
1.4 Modes of Interaction in IR Systems
In the beginning of Section 1, we defined information retrieval as the process of
retrieving documents from a collection in response to a query (or a search request)
by a user. Typically the collection is made up of documents containing unstructured
data. Other kinds of documents include images, audio recordings, video strips, and
maps. Data may be scattered nonuniformly in these documents with no definitive
structure. A query is a set of terms (also referred to as keywords) used by the
searcher to specify an information need (for example, the terms ‘databases’ and
‘operating systems’ may be regarded as a query to a computer science bibliographic
database). An informational request or a search query may also be a natural lan-
guage phrase or a question (for example, “What is the currency of China?” or “Find
Italian restaurants in Sarasota, Florida.”).
There are two main modes of interaction with IR systems—retrieval and brows-
ing—which, although similar in goal, are accomplished through different interac-
tion tasks. Retrieval is concerned with the extraction of relevant information from
a repository of documents through an IR query, while browsing signifies the activ-
ity of a user visiting or navigating through similar or related documents based on
the user’s assessment of relevance. During browsing, a user’s information need may
not be defined a priori and is flexible. Consider the following browsing scenario: A
user specifies ‘Atlanta’ as a keyword. The information retrieval system retrieves links
to relevant result documents containing various aspects of Atlanta for the user. The
user comes across the term ‘Georgia Tech’ in one of the returned documents, and
uses some access technique (such as clicking on the phrase ‘Georgia Tech’ in a docu-
ment, which has a built-in link) and visits documents about Georgia Tech in the
same or a different Website (repository). There the user finds an entry for ‘Athletics’
that leads the user to information about various athletic programs at Georgia Tech.
Eventually, the user ends his search at the Fall schedule for the Yellow Jackets foot-
ball team, which he finds to be of great interest. This user activity is known as
browsing. Hyperlinks are used to interconnect Web pages and are mainly used for
browsing. Anchor texts are text phrases within documents used to label hyperlinks
and are very relevant to browsing.
Introduction to Information Retrieval and Web Search
9See www.ncbi.nlm.nih.gov/pubmed/.
998
Introduction to Information Retrieval and Web Search
Web search combines both aspects—browsing and retrieval—and is one of the
main applications of information retrieval today. Web pages are analogous to docu-
ments. Web search engines maintain an indexed repository of Web pages, usually
using the technique of inverted indexing (see Section 5). They retrieve the most rel-
evant Web pages for the user in response to the user’s search request with a possible
ranking in descending order of relevance. The rank of a Webpage in a retrieved set
is the measure of its relevance to the query that generated the result set.
1.5 Generic IR Pipeline
As we mentioned earlier, documents are made up of unstructured natural language
text composed of character strings from English and other languages. Common
examples of documents include newswire services (such as AP or Reuters), corpo-
rate manuals and reports, government notices, Web page articles, blogs, tweets,
books, and journal papers. There are two main approaches to IR: statistical and
semantic.
In a statistical approach, documents are analyzed and broken down into chunks of
text (words, phrases, or n-grams, which are all subsequences of length n characters
in a text or document) and each word or phrase is counted, weighted, and measured
for relevance or importance. These words and their properties are then compared
with the query terms for potential degree of match to produce a ranked list of
resulting documents that contain the words. Statistical approaches are further clas-
sified based on the method employed. The three main statistical approaches are
Boolean, vector space, and probabilistic (see Section 2).
Semantic approaches to IR use knowledge-based techniques of retrieval that
broadly rely on the syntactic, lexical, sentential, discourse-based, and pragmatic lev-
els of knowledge understanding. In practice, semantic approaches also apply some
form of statistical analysis to improve the retrieval process.
Figure 1 shows the various stages involved in an IR processing system. The steps
shown on the left in Figure 1 are typically offline processes, which prepare a set of
documents for efficient retrieval; these are document preprocessing, document
modeling, and indexing. The steps involved in query formation, query processing,
searching mechanism, document retrieval, and relevance feedback are shown on the
right in Figure 1. In each box, we highlight the important concepts and issues. The
rest of this chapter describes some of the concepts involved in the various tasks
within the IR process shown in Figure 1.
Figure 2 shows a simplified IR processing pipeline. In order to perform retrieval on
documents, the documents are first represented in a form suitable for retrieval. The
significant terms and their properties are extracted from the documents and are
represented in a document index where the words/terms and their properties are
stored in a matrix that contains these terms and the references to the documents
that contain them. This index is then converted into an inverted index (see Figure 4)
of a word/term vs. document matrix. Given the query words, the documents con-
999
Introduction to Information Retrieval and Web Search
Document 3
Document 2
Document 1
Document Corpus
Preprocessing
Modeling
Indexing
Stopword removal
Stemming
Thesaurus
Digits, hyphens,
punctuation marks, cases
Information extraction
Retrieval models
Type of queries
Inverted index construction
Index vocabulary
Document statistics
Index maintenance
SEARCH INTENT
Information
Need/Search
Query Formation
Query Processing
Searching
Mechanism
Relevance
Feedback
Legend
Dashed line indicates
next iteration
Choice of search strategy
(approximate vs. exact matches,
exhaustive vs. top K)
Type of similarity measure
Keywords, Boolean, phrase,
proximity, wildcard queries, etc.
Conversion from humanly
understandable to internal format
Situation assessment
Query expansion heuristics
(users’s profile, related metadata,
etc.)
Storing user’s
feedback
Personalization
Pattern analysis
of relevant
results
Metadata
Integration
Ranking results
Showing useful
metadata
External data
ontologies
Document
Retrieval
Figure 1
Generic IR framework.
taining these words—and the document properties, such as date of creation, author,
and type of document—are fetched from the inverted index and compared with the
query. This comparison results in a ranked list shown to the user. The user can then
provide feedback on the results that triggers implicit or explicit query expansion to
fetch results that are more relevant for the user. Most IR systems allow for an inter-
active search where the query and the results are successively refined.
2 Retrieval Models
In this section we briefly describe the important models of IR. These are the three
main statistical models—Boolean, vector space, and probabilistic—and the seman-
tic model.
1000
Introduction to Information Retrieval and Web Search
Documents
EXTRACT
FEEDBACK
QUERY
FETCH
PROCESS
Inverted Index
COMPARE
Query x
Documents
RANK
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Document #4
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Document #3
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Document #2
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Document #1
Result #3
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Result #2
Two tickets tickled slightly angst-riden
orifices. Two Jabberwockies sacrificed subways,
and two mosly bourgeois orifices towed Kermit.
Five very progressive fountains annoyingly
tickled the partly speedy dog, even though
two putrid sheep laughed almost noisily.
Result #1
Index
D1 1 1 0 1 1 0…
D2 1 1 1 0 1 1…
D3 1 1 0 1 1 1…
D4 0 1 0 0 1 0…
D5 0 0 0 1 0 1…
D6 1 0 1 0 0 0…
W1 1 1 1 0 0 1…
W2 1 1 1 1 0 0…
W3 0 1 0 0 0 1…
W4 1 0 1 0 1 0…
W5 1 1 1 1 0 0…
W6 0 1 1 0 1 0…
SEARCH INTENT
Figure 2
Simplified IR process pipeline.
2.1 Boolean Model
In this model, documents are represented as a set of terms. Queries are formulated
as a combination of terms using the standard Boolean logic set-theoretic operators
such as AND, OR and NOT. Retrieval and relevance are considered as binary concepts
in this model, so the retrieved elements are an “exact match” retrieval of relevant
documents. There is no notion of ranking of resulting documents. All retrieved
documents are considered equally important—a major simplification that does not
consider frequencies of document terms or their proximity to other terms com-
pared against the query terms.
Boolean retrieval models lack sophisticated ranking algorithms and are among the
earliest and simplest information retrieval models. These models make it easy to
associate metadata information and write queries that match the contents of the
1001
Introduction to Information Retrieval and Web Search
documents as well as other properties of documents, such as date of creation,
author, and type of document.
2.2 Vector Space Model
The vector space model provides a framework in which term weighting, ranking of
retrieved documents, and relevance feedback are possible. Documents are repre-
sented as features and weights of term features in an n-dimensional vector space of
terms. Features are a subset of the terms in a set of documents that are deemed most
relevant to an IR search for this particular set of documents. The process of select-
ing these important terms (features) and their properties as a sparse (limited) list
out of the very large number of available terms (the vocabulary can contain hun-
dreds of thousands of terms) is independent of the model specification. The query
is also specified as a terms vector (vector of features), and this is compared to the
document vectors for similarity/relevance assessment.
The similarity assessment function that compares two vectors is not inherent to the
model—different similarity functions can be used. However, the cosine of the angle
between the query and document vector is a commonly used function for similarity
assessment. As the angle between the vectors decreases, the cosine of the angle
approaches one, meaning that the similarity of the query with a document vector
increases. Terms (features) are weighted proportional to their frequency counts to
reflect the importance of terms in the calculation of relevance measure. This is dif-
ferent from the Boolean model, which does not take into account the frequency of
words in the document for relevance match.
In the vector model, the document term weight wij (for term i in document j) is repre-
sented based on some variation of the TF (term frequency) or TF-IDF (term
frequency-inverse document frequency) scheme (as we will describe below). TF-IDF
is a statistical weight measure that is used to evaluate the importance of a document
word in a collection of documents. The following formula is typically used:
In the formula given above, we use the following symbols:
■ dj is the document vector.
■ q is the query vector.
■ wij is the weight of term i in document j.
■ wiq is the weight of term i in query vector q.
■ |V| is the number of dimensions in the vector that is the total number of
important keywords (or features).
TF-IDF uses the product of normalized frequency of a term i (TFij) in document Dj
and the inverse document frequency of the term i (IDFi) to weight a term in a
cosine( , )
|| || || ||
| |
d q
d q
d q
w w
j
j
j
ij iqi
V
=
×
×
=
×
=1∑∑
∑ ∑= =×w wiji
V
iqi
V2
1
2
1
| | | |
1002
Introduction to Information Retrieval and Web Search
document. The idea is that terms that capture the essence of a document occur fre-
quently in the document (that is, their TF is high), but if such a term were to be a
good term that discriminates the document from others, it must occur in only a few
documents in the general population (that is, its IDF should be high as well).
IDF values can be easily computed for a fixed collection of documents. In case of
Web search engines, taking a representative sample of documents approximates IDF
computation. The following formulas can be used:
In these formulas, the meaning of the symbols is:
■ TFij is the normalized term frequency of term i in document Dj.
■ fij is the number of occurrences of term i in document Dj.
■ IDFi is the inverse document frequency weight for term i.
■ N is the number of documents in the collection.
■ ni is the number of documents in which term i occurs.
Note that if a term i occurs in all documents, then ni = N and hence IDFi = log (1)
becomes zero, nullifying its importance and creating a situation where division by
zero can occur. The weight of term i in document j, wij is computed based on its TF-
IDF value in some techniques. To prevent division by zero, it is common to add a 1
to the denominator in the formulae such as the cosine formula above.
Sometimes, the relevance of the document with respect to a query (rel(Dj,Q)) is
directly measured as the sum of the TF-IDF values of the terms in the Query Q:
The normalization factor (similar to the denominator of the cosine formula) is
incorporated into the TF-IDF formula itself, thereby measuring relevance of a doc-
ument to the query by the computation of the dot product of the query and docu-
ment vectors.
The Rocchio10 algorithm is a well-known relevance feedback algorithm based on
the vector space model that modifies the initial query vector and its weights in
response to user-identified relevant documents. It expands the original query vector
q to a new vector qe as follows:
q q
D
d
D
de
r
r
ird D
ir
d D
r r ir ir
= + −
∈ ∈
∑ ∑α β γ
| | | |
rel( , )D Q TF IDFj i Q ij i= ×∑ ∈
TF f f
IDF N n
ij ij ij
i V
i i
=
= ( )
=
∑
1 to | |
log /
10See Rocchio (1971).
1003
Introduction to Information Retrieval and Web Search
Here, Dr and Dir are relevant and nonrelevant document sets and α, β, and γ are
parameters of the equation. The values of these parameters determine how the feed-
back affects the original query, and these may be determined after a number of trial-
and-error experiments.
2.3 Probabilistic Model
The similarity measures in the vector space model are somewhat ad hoc. For exam-
ple, the model assumes that those documents closer to the query in cosine space are
more relevant to the query vector. In the probabilistic model, a more concrete and
definitive approach is taken: ranking documents by their estimated probability of
relevance with respect to the query and the document. This is the basis of the
Probability Ranking Principle developed by Robertson:11
In the probabilistic framework, the IR system has to decide whether the documents
belong to the relevant set or the nonrelevant set for a query. To make this decision,
it is assumed that a predefined relevant set and nonrelevant set exist for the query,
and the task is to calculate the probability that the document belongs to the relevant
set and compare that with the probability that the document belongs to the nonrel-
evant set.
Given the document representation D of a document, estimating the relevance R
and nonrelevance NR of that document involves computation of conditional prob-
ability P(R|D) and P(NR|D). These conditional probabilities can be calculated using
Bayes’ Rule:12
P(R|D) = P(D|R) × P(R)/P(D)
P(NR|D) = P(D|NR) × P(NR)/P(D)
A document D is classified as relevant if P(R|D) > P(NR|D). Discarding the constant
P(D), this is equivalent to saying that a document is relevant if:
P(D|R) × P(R) > P(D|NR) × P(NR)
The likelihood ratio P(D|R)/P(D|NR) is used as a score to determine the likelihood
of the document with representation D belonging to the relevant set.
The term independence or Naïve Bayes assumption is used to estimate P(D|R) using
computation of P(ti|R) for term ti. The likelihood ratios P(D|R)/P(D|NR) of docu-
ments are used as a proxy for ranking based on the assumption that highly ranked
documents will have a high likelihood of belonging to the relevant set.13
11For a description of the Cheshire II system, see Robertson (1997).
12Bayes’ theorem is a standard technique for measuring likelihood; see Howson and Urbach (1993), for
example.
13Readers should refer to Croft et al. (2009) pages 246–247 for a detailed description.
1004
Introduction to Information Retrieval and Web Search
With some reasonable assumptions and estimates about the probabilistic model
along with extensions for incorporating query term weights and document term
weights in the model, a probabilistic ranking algorithm called BM25 (Best Match
25) is quite popular. This weighting scheme has evolved from several versions of the
Okapi14 system.
The Okapi weight for Document dj and query q is computed by the formula below.
Additional notations are as follows:
■ ti is a term.
■ fij is the raw frequency count of term ti in document dj.
■ fiq is the raw frequency count of term ti in query q.
■ N is the total number of documents in the collection.
■ dfi is the number of documents that contain the term ti.
■ dlj is the document length (in bytes) of dj.
■ avdl is the average document length of the collection.
The Okapi relevance score of a document dj for a query q is given by the equation
below, where k1 (between 1.0–2.0), b (usually 0.75) ,and k2 (between 1–1000) are
parameters:
2.4 Semantic Model
However sophisticated the above statistical models become, they can miss many rel-
evant documents because those models do not capture the complete meaning or
information need conveyed by a user’s query. In semantic models, the process of
matching documents to a given query is based on concept level and semantic
matching instead of index term (keyword) matching. This allows retrieval of rele-
vant documents that share meaningful associations with other documents in the
query result, even when these associations are not inherently observed or statisti-
cally captured.
Semantic approaches include different levels of analysis, such as morphological,
syntactic, and semantic analysis, to retrieve documents more effectively. In
morphological analysis, roots and affixes are analyzed to determine the parts of
speech (nouns, verbs, adjectives, and so on) of the words. Following morphological
analysis, syntactic analysis follows to parse and analyze complete phrases in docu-
ments. Finally, the semantic methods have to resolve word ambiguities and/or gen-
erate relevant synonyms based on the semantic relationships between levels of
structural entities in documents (words, paragraphs, pages, or entire documents).
okapi( , ) ln
.
.
)
d q
N df
df
k f
k
j
i
i
ij=
− +
+
×
( +10 5
0 5
1
1
1 −− + +
×
+
+
b b
dl
avdl
f
k f
k fj
ij
iq
iqt q
i
⎛
⎝
⎜
⎞
⎠
⎟
∈
( )2
2
1
,,d
j
∑
14City University of London Okapi System by Robertson, Walker, and Hancock-Beaulieu (1995).
1005
Introduction to Information Retrieval and Web Search
The development of a sophisticated semantic system requires complex knowledge
bases of semantic information as well as retrieval heuristics. These systems often
require techniques from artificial intelligence and expert systems. Knowledge bases
like Cyc15 and WordNet16 have been developed for use in knowledge-based IR sys-
tems based on semantic models. The Cyc knowledge base, for example, is a represen-
tation of a vast quantity of commonsense knowledge about assertions (over 2.5
million facts and rules) interrelating more than 155,000 concepts for reasoning
about the objects and events of everyday life. WordNet is an extensive thesaurus
(over 115,000 concepts) that is very popular and is used by many systems and is
under continuous development (see Section 4.3).
3 Types of Queries in IR Systems
Different keywords are associated with the document set during the process of
indexing. These keywords generally consist of words, phrases, and other characteri-
zations of documents such as date created, author names, and type of document.
They are used by an IR system to build an inverted index (see Section 5), which is
then consulted during the search. The queries formulated by users are compared to
the set of index keywords. Most IR systems also allow the use of Boolean and other
operators to build a complex query. The query language with these operators
enriches the expressiveness of a user’s information need.
3.1 Keyword Queries
Keyword-based queries are the simplest and most commonly used forms of IR
queries: the user just enters keyword combinations to retrieve documents. The
query keyword terms are implicitly connected by a logical AND operator. A query
such as ‘database concepts’ retrieves documents that contain both the words ‘data-
base’ and ‘concepts’ at the top of the retrieved results. In addition, most systems also
retrieve documents that contain only ‘database’ or only ‘concepts’ in their text. Some
systems remove most commonly occurring words (such as a, the, of, and so on,
called stopwords) as a preprocessing step before sending the filtered query key-
words to the IR engine. Most IR systems do not pay attention to the ordering of
these words in the query. All retrieval models provide support for keyword queries.
3.2 Boolean Queries
Some IR systems allow using the AND, OR, NOT, ( ), + , and – Boolean operators in
combinations of keyword formulations. AND requires that both terms be found.
OR lets either term be found. NOT means any record containing the second term
will be excluded. ‘( )’ means the Boolean operators can be nested using parentheses.
‘+’ is equivalent to AND, requiring the term; the ‘+’ should be placed directly in front
15See Lenat (1995).
16See Miller (1990) for a detailed description of WordNet.
1006
Introduction to Information Retrieval and Web Search
of the search term. ‘–’ is equivalent to AND NOT and means to exclude the term; the
‘–’ should be placed directly in front of the search term not wanted. Complex
Boolean queries can be built out of these operators and their combinations, and
they are evaluated according to the classical rules of Boolean algebra. No ranking is
possible, because a document either satisfies such a query (is “relevant”) or does not
satisfy it (is “nonrelevant”). A document is retrieved for a Boolean query if the
query is logically true as an exact match in the document. Users generally do not use
combinations of these complex Boolean operators, and IR systems support a
restricted version of these set operators. Boolean retrieval models can directly sup-
port different Boolean operator implementations for these kinds of queries.
3.3 Phrase Queries
When documents are represented using an inverted keyword index for searching,
the relative order of the terms in the document is lost. In order to perform exact
phrase retrieval, these phrases should be encoded in the inverted index or imple-
mented differently (with relative positions of word occurrences in documents). A
phrase query consists of a sequence of words that makes up a phrase. The phrase is
generally enclosed within double quotes. Each retrieved document must contain at
least one instance of the exact phrase. Phrase searching is a more restricted and spe-
cific version of proximity searching that we mention below. For example, a phrase
searching query could be ‘conceptual database design’. If phrases are indexed by the
retrieval model, any retrieval model can be used for these query types. A phrase the-
saurus may also be used in semantic models for fast dictionary searching for
phrases.
3.4 Proximity Queries
Proximity search refers to a search that accounts for how close within a record mul-
tiple terms should be to each other. The most commonly used proximity search
option is a phrase search that requires terms to be in the exact order. Other proxim-
ity operators can specify how close terms should be to each other. Some will also
specify the order of the search terms. Each search engine can define proximity oper-
ators differently, and the search engines use various operator names such as NEAR,
ADJ(adjacent), or AFTER. In some cases, a sequence of single words is given,
together with a maximum allowed distance between them. Vector space models that
also maintain information about positions and offsets of tokens (words) have
robust implementations for this query type. However, providing support for com-
plex proximity operators becomes computationally expensive because it requires
the time-consuming preprocessing of documents, and is thus suitable for smaller
document collections rather than for the Web.
3.5 Wildcard Queries
Wildcard searching is generally meant to support regular expressions and pattern
matching-based searching in text. In IR systems, certain kinds of wildcard search
support may be implemented—usually words with any trailing characters (for
1007
Introduction to Information Retrieval and Web Search
example, ‘data*’ would retrieve data, database, datapoint, dataset, and so on).
Providing support for wildcard searches in IR systems involves preprocessing over-
head and is not considered worth the cost by many Web search engines today.
Retrieval models do not directly provide support for this query type.
3.6 Natural Language Queries
There are a few natural language search engines that aim to understand the struc-
ture and meaning of queries written in natural language text, generally as a question
or narrative. This is an active area of research that employs techniques like shallow
semantic parsing of text, or query reformulations based on natural language under-
standing. The system tries to formulate answers for such queries from retrieved
results. Some search systems are starting to provide natural language interfaces to
provide answers to specific types of questions, such as definition and factoid ques-
tions, which ask for definitions of technical terms or common facts that can be
retrieved from specialized databases. Such questions are usually easier to answer
because there are strong linguistic patterns giving clues to specific types of sen-
tences—for example, ‘defined as’ or ‘refers to’. Semantic models can provide support
for this query type.
4 Text Preprocessing
In this section we review the commonly used text preprocessing techniques that are
part of the text processing task in Figure 1.
4.1 Stopword Removal
Stopwords are very commonly used words in a language that play a major role in
the formation of a sentence but which seldom contribute to the meaning of that
sentence. Words that are expected to occur in 80 percent or more of the documents
in a collection are typically referred to as stopwords, and they are rendered poten-
tially useless. Because of the commonness and function of these words, they do not
contribute much to the relevance of a document for a query search. Examples
include words such as the, of, to, a, and, in, said, for, that, was, on, he, is, with, at, by,
and it. These words are presented here with decreasing frequency of occurrence
from a large corpus of documents called AP89.17 The fist six of these words account
for 20 percent of all words in the listing, and the most frequent 50 words account for
40 percent of all text.
Removal of stopwords from a document must be performed before indexing.
Articles, prepositions, conjunctions, and some pronouns are generally classified as
stopwords. Queries must also be preprocessed for stopword removal before the
actual retrieval process. Removal of stopwords results in elimination of possible
spurious indexes, thereby reducing the size of an index structure by about 40
17For details, see Croft et al. (2009), pages 75–90.
1008
Introduction to Information Retrieval and Web Search
percent or more. However, doing so could impact the recall if the stopword is an
integral part of a query (for example, a search for the phrase ‘To be or not to be,’
where removal of stopwords makes the query inappropriate, as all the words in the
phrase are stopwords). Many search engines do not employ query stopword
removal for this reason.
4.2 Stemming
A stem of a word is defined as the word obtained after trimming the suffix and pre-
fix of an original word. For example, ‘comput’ is the stem word for computer, com-
puting, and computation. These suffixes and prefixes are very common in the
English language for supporting the notion of verbs, tenses, and plural forms.
Stemming reduces the different forms of the word formed by inflection (due to plu-
rals or tenses) and derivation to a common stem.
A stemming algorithm can be applied to reduce any word to its stem. In English, the
most famous stemming algorithm is Martin Porter’s stemming algorithm. The
Porter stemmer18 is a simplified version of Lovin’s technique that uses a reduced set
of about 60 rules (from 260 suffix patterns in Lovin’s technique) and organizes
them into sets; conflicts within one subset of rules are resolved before going on to
the next. Using stemming for preprocessing data results in a decrease in the size of
the indexing structure and an increase in recall, possibly at the cost of precision.
4.3 Utilizing a Thesaurus
A thesaurus comprises a precompiled list of important concepts and the main word
that describes each concept for a particular domain of knowledge. For each concept
in this list, a set of synonyms and related words is also compiled.19 Thus, a synonym
can be converted to its matching concept during preprocessing. This preprocessing
step assists in providing a standard vocabulary for indexing and searching. Usage of
a thesaurus, also known as a collection of synonyms, has a substantial impact on the
recall of information systems. This process can be complicated because many words
have different meanings in different contexts.
UMLS20 is a large biomedical thesaurus of millions of concepts (called the
Metathesaurus) and a semantic network of meta concepts and relationships that
organize the Metathesaurus (see Figure 3). The concepts are assigned labels from
the semantic network. This thesaurus of concepts contains synonyms of medical
terms, hierarchies of broader and narrower terms, and other relationships among
words and concepts that make it a very extensive resource for information retrieval
of documents in the medical domain. Figure 3 illustrates part of the UMLS
Semantic Network.
18See Porter (1980).
19See Baeza-Yates and Ribeiro-Neto (1999).
20Unified Medical Language System from the National Library of Medicine.
1009
Introduction to Information Retrieval and Web Search
Organ or
Tissue
Function
Physiologic
Function
Biologic
Function
Pathologic
Function
Organism
Function
Cell
Function
Molecular
Function
Cell or
Molecular
Dysfunction
Disease
or
Syndrome
Experimental
Model of
Disease
Mental or
Behavioral
Dysfunction
Neoplastic
Process
Mental
Process
Genetic
Function
Figure 3
A Portion of the UMLS Semantic Network: “Biologic Function” Hierarchy
Source: UMLS Reference Manual, National Library of Medicine.
WordNet21 is a manually constructed thesaurus that groups words into strict syn-
onym sets called synsets. These synsets are divided into noun, verb, adjective, and
adverb categories. Within each category, these synsets are linked together by appro-
priate relationships such as class/subclass or “is-a” relationships for nouns.
WordNet is based on the idea of using a controlled vocabulary for indexing, thereby
eliminating redundancies. It is also useful in providing assistance to users with
locating terms for proper query formulation.
4.4 Other Preprocessing Steps: Digits, Hyphens, Punctuation
Marks, Cases
Digits, dates, phone numbers, e-mail addresses, URLs, and other standard types of
text may or may not be removed during preprocessing. Web search engines,
however, index them in order to to use this type of information in the document
21See Fellbaum (1998) for a detailed description of WordNet.
1010
Introduction to Information Retrieval and Web Search
metadata to improve precision and recall (see Section 6 for detailed definitions of
precision and recall).
Hyphens and punctuation marks may be handled in different ways. Either the entire
phrase with the hyphens/punctuation marks may be used, or they may be elimi-
nated. In some systems, the character representing the hyphen/punctuation mark
may be removed, or may be replaced with a space. Different information retrieval
systems follow different rules of processing. Handling hyphens automatically can be
complex: it can either be done as a classification problem, or more commonly by
some heuristic rules.
Most information retrieval systems perform case-insensitive search, converting all
the letters of the text to uppercase or lowercase. It is also worth noting that many of
these text preprocessing steps are language specific, such as involving accents and
diacritics and the idiosyncrasies that are associated with a particular language.
4.5 Information Extraction
Information extraction (IE) is a generic term used for extracting structured con-
tent from text. Text analytic tasks such as identifying noun phrases, facts, events,
people, places, and relationships are examples of IE tasks. These tasks are also called
named entity recognition tasks and use rule-based approaches with either a the-
saurus, regular expressions and grammars, or probabilistic approaches. For IR and
search applications, IE technologies are mostly used to identify contextually rele-
vant features that involve text analysis, matching, and categorization for improving
the relevance of search systems. Language technologies using part-of-speech tagging
are applied to semantically annotate the documents with extracted features to aid
search relevance.
5 Inverted Indexing
The simplest way to search for occurrences of query terms in text collections can be
performed by sequentially scanning the text. This kind of online searching is only
appropriate when text collections are quite small. Most information retrieval sys-
tems process the text collections to create indexes and operate upon the inverted
index data structure (refer to the indexing task in Figure 1). An inverted index struc-
ture comprises vocabulary and document information. Vocabulary is a set of dis-
tinct query terms in the document set. Each term in a vocabulary set has an
associated collection of information about the documents that contain the term,
such as document id, occurrence count, and offsets within the document where the
term occurs. The simplest form of vocabulary terms consists of words or individual
tokens of the documents. In some cases, these vocabulary terms also consist of
phrases, n-grams, entities, links, names, dates, or manually assigned descriptor
terms from documents and/or Web pages. For each term in the vocabulary, the cor-
responding document ids, occurrence locations of the term in each document,
number of occurrences of the term in each document, and other relevant informa-
tion may be stored in the document information section.
1011
Introduction to Information Retrieval and Web Search
Weights are assigned to document terms to represent an estimate of the usefulness
of the given term as a descriptor for distinguishing the given document from other
documents in the same collection. A term may be a better descriptor of one docu-
ment than of another by the weighting process (see Section 2).
An inverted index of a document collection is a data structure that attaches distinct
terms with a list of all documents that contains the term. The process of inverted
index construction involves the extraction and processing steps shown in Figure 2.
Acquired text is first preprocessed and the documents are represented with the
vocabulary terms. Documents’ statistics are collected in document lookup tables.
Statistics generally include counts of vocabulary terms in individual documents as
well as different collections, their positions of occurrence within the documents,
and the lengths of the documents. The vocabulary terms are weighted at indexing
time according to different criteria for collections. For example, in some cases terms
in the titles of the documents may be weighted more heavily than terms that occur
in other parts of the documents.
One of the most popular weighting schemes is the TF-IDF (term frequency-inverse
document frequency) metric that we described in Section 2. For a given term this
weighting scheme distinguishes to some extent the documents in which the term
occurs more often from those in which the term occurs very little or never. These
weights are normalized to account for varying document lengths, further ensuring
that longer documents with proportionately more occurrences of a word are not
favored for retrieval over shorter documents with proportionately fewer occur-
rences. These processed document-term streams (matrices) are then inverted into
term-document streams (matrices) for further IR steps.
Figure 4 shows an illustration of term-document-position vectors for the four illus-
trative terms—example, inverted, index, and market—which refer to the three docu-
ments and the position where they occur in those documents.
The different steps involved in inverted index construction can be summarized as
follows:
1. Break the documents into vocabulary terms by tokenizing, cleansing,
stopword removal, stemming, and/or use of an additional thesaurus as
vocabulary.
2. Collect document statistics and store the statistics in a document lookup
table.
3. Invert the document-term stream into a term-document stream along with
additional information such as term frequencies, term positions, and term
weights.
Searching for relevant documents from the inverted index, given a set of query
terms, is generally a three-step process.
1. Vocabulary search. If the query comprises multiple terms, they are sepa-
rated and treated as independent terms. Each term is searched in the vocab-
ulary. Various data structures, like variations of B+-tree or hashing, may be
1012
This example
shows an
example of an
inverted index.
Inverted index
is a data
structure for
associating
terms to
documents.
Stock market
index is used
for capturing
the sentiments
of the financial
market.
Stock market
index is used
for capturing
the sentiments
of the financial
market.
ID
1.
2.
3.
4.
Term
example
inverted
index
market
Document: position
1:2, 1:5
1:8, 2:1
1:9, 2:2, 3:3
3:2, 3:13
Document 1
Document 2
Document 2
Introduction to Information Retrieval and Web Search
used to optimize the search process. Query terms may also be ordered in lex-
icographic order to improve space efficiency.
2. Document information retrieval. The document information for each term
is retrieved.
3. Manipulation of retrieved information. The document information vector
for each term obtained in step 2 is now processed further to incorporate var-
ious forms of query logic. Various kinds of queries like prefix, range, context,
and proximity queries are processed in this step to construct the final result
based on the document collections returned in step 2.
6 Evaluation Measures
of Search Relevance
Without proper evaluation techniques, one cannot compare and measure the rele-
vance of different retrieval models and IR systems in order to make improvements.
Figure 4
Example of an
inverted index.
1013
Introduction to Information Retrieval and Web Search
Evaluation techniques of IR systems measure the topical relevance and user
relevance. Topical relevance measures the extent to which the topic of a result
matches the topic of the query. Mapping one’s information need with “perfect”
queries is a cognitive task, and many users are not able to effectively form queries
that would retrieve results more suited to their information need. Also, since a
major chunk of user queries are informational in nature, there is no fixed set of
right answers to show to the user. User relevance is a term used to describe the
“goodness” of a retrieved result with regard to the user’s information need. User rel-
evance includes other implicit factors, such as user perception, context, timeliness,
the user’s environment, and current task needs. Evaluating user relevance may also
involve subjective analysis and study of user retrieval tasks to capture some of the
properties of implicit factors involved in accounting for users’ bias for judging
performance.
In Web information retrieval, no binary classification decision is made on whether a
document is relevant or nonrelevant to a query (whereas the Boolean (or binary)
retrieval model uses this scheme, as we discussed in Section 2.1). Instead, a ranking
of the documents is produced for the user. Therefore, some evaluation measures
focus on comparing different rankings produced by IR systems. We discuss some of
these measures next.
6.1 Recall and Precision
Recall and precision metrics are based on the binary relevance assumption (whether
each document is relevant or nonrelevant to the query). Recall is defined as the
number of relevant documents retrieved by a search divided by the total number of
existing relevant documents. Precision is defined as the number of relevant docu-
ments retrieved by a search divided by the total number of documents retrieved by
that search. Figure 5 is a pictorial representation of the terms retrieved vs. relevant
and shows how search results relate to four different sets of documents.
Relevant?
Yes No
Hits
TP
False
Alarms
FP
Misses
FN
Correct
Rejections
TN
Retrieved?
Yes
No
☺
☺
�
�
Figure 5
Retrieved vs. relevant
search results.
1014
Introduction to Information Retrieval and Web Search
Table 2 Precision and Recall for Ranked Retrieval
Doc. No. Rank Position i Relevant Precision(i) Recall(i)
10 1 Yes 1/1 = 100% 1/10 = 10%
2 2 Yes 2/2 = 100% 2/10 = 20%
3 3 Yes 3/3 = 100% 3/10 = 30%
5 4 No 3/4 = 75% 3/10 = 30%
17 5 No 3/5 = 60% 3/10 = 30%
34 6 No 3/6 = 50% 3/10 = 30%
215 7 Yes 4/7 = 57.1% 4/10 = 40%
33 8 Yes 5/8 = 62.5% 5/10 = 50%
45 9 No 5/9 = 55.5% 5/10 = 50%
16 10 Yes 6/10 = 60% 6/10 = 60%
The notation for Figure 5 is as follows:
■ TP: true positive
■ FP: false positive
■ FN: false negative
■ TN: true negative
The terms true positive, false positive, false negative, and true negative are generally
used in any type of classification tasks to compare the given classification of an item
with the desired correct classification. Using the term hits for the documents that
truly or “correctly” match the user request, we can define:
Recall = |Hits|/|Relevant|
Precision = |Hits|/|Retrieved|
Recall and precision can also be defined in a ranked retrieval setting. The Recall at
rank position i for document di
q (denoted by r(i)) (di
q is the retrieved document at
position i for query q) is the fraction of relevant documents from d1
q to di
q in the
result set for the query. Let the set of relevant documents from d1
q to di
q in that set
be Si with cardinality | Si |. Let (|Dq| be the size of relevant documents for the query.
In this case,|Si | ≤ |Dq|). Then:
Recall r(i) = |Si |/|Dq|
The Precision at rank position i or document di
q (denoted by p(i)) is the fraction of
documents from d1
q to di
q in the result set that are relevant:
Precision p(i) = |Si |/i
Table 2 illustrates the p(i), r(i), and average precision (discussed in the next
section) metrics. It can be seen that recall can be increased by presenting more
results to the user, but this approach runs the risk of decreasing the precision. In the
1015
Introduction to Information Retrieval and Web Search
example, the number of relevant documents for some query = 10. The rank posi-
tion and the relevance of an individual document are shown. The precision and
recall value can be computed at each position within the ranked list as shown in the
last two columns.
6.2 Average Precision
Average precision is computed based on the precision at each relevant document in
the ranking. This measure is useful for computing a single precision value to com-
pare different retrieval algorithms on a query q.
Consider the sample precision values of relevant documents in Table 2. The average
precision (Pavg value) for the example in Table 2 is P(1) + P(2) + P(3) + P(7) + P(8)
+ P(10)/6 = 79.93 percent (only relevant documents are considered in this calcula-
tion). Many good algorithms tend to have high top-k average precision for small
values of k, with correspondingly low values of recall.
6.3 Recall/Precision Curve
A recall/precision curve can be drawn based on the recall and precision values at
each rank position, where the x-axis is the recall and the y-axis is the precision.
Instead of using the precision and recall at each rank position, the curve is com-
monly plotted using recall levels r(i) at 0 percent, 10 percent, 20 percent…100 per-
cent. The curve usually has a negative slope, reflecting the inverse relationship
between precision and recall.
6.4 F-Score
F-score (F) is the harmonic mean of the precision (p) and recall (r) values. High
precision is achieved almost always at the expense of recall and vice versa. It is a
matter of the application’s context whether to tune the system for high precision or
high recall. F-score is a single measure that combines precision and recall to com-
pare different result sets:
One of the properties of harmonic mean is that the harmonic mean of two numbers
tends to be closer to the smaller of the two. Thus F is automatically biased toward
the smaller of the precision and recall values. Therefore, for a high F-score, both
precision and recall must be high.
F
p r
=
+
2
1 1
F
pr
p r
=
+
2
P p i D
d D q
i
q
q
avg = ∈∑ ( ) | |
1016
Introduction to Information Retrieval and Web Search
7 Web Search and Analysis22
The emergence of the Web has brought millions of users to search for information,
which is stored in a very large number of active sites. To make this information acces-
sible, search engines such as Google and Yahoo! have to crawl and index these sites
and document collections in their index databases. Moreover, search engines have to
regularly update their indexes given the dynamic nature of the Web as new Web sites
are created and current ones are updated or deleted. Since there are many millions of
pages available on the Web on different topics, search engines have to apply many
sophisticated techniques such as link analysis to identify the importance of pages.
There are other types of search engines besides the ones that regularly crawl the Web
and create automatic indexes: these are human-powered, vertical search engines or
metasearch engines. These search engines are developed with the help of computer-
assisted systems to aid the curators with the process of assigning indexes. They con-
sist of manually created specialized Web directories that are hierarchically organized
indexes to guide user navigation to different resources on the Web. Vertical search
engines are customized topic-specific search engines that crawl and index a specific
collection of documents on the Web and provide search results from that specific
collection. Metasearch engines are built on top of search engines: they query differ-
ent search engines simultaneously and aggregate and provide search results from
these sources.
Another source of searchable Web documents is digital libraries. Digital libraries
can be broadly defined as collections of electronic resources and services for the
delivery of materials in a variety of formats. These collections may include a univer-
sity’s library catalog, catalogs from a group of participating universities as in the
State of Florida University System, or a compilation of multiple external resources
on the World Wide Web such as Google Scholar or the IEEE/ACM index. These
interfaces provide universal access to different types of content—such as books,
articles, audio, and video—situated in different database systems and remote repos-
itories. Similar to real libraries, these digital collections are maintained via a catalog
and organized in categories for online reference. Digital libraries “include personal,
distributed, and centralized collections such as online public access catalogs
(OPACs) and bibliographic databases, distributed document databases, scholarly
and professional discussion lists and electronic journals, other online databases,
forums, and bulletin boards.” 23
7.1 Web Analysis and Its Relationship
to Information Retrieval
In addition to browsing and searching the Web, another important activity closely
related to information retrieval is to analyze or mine information on the Web for
22The contributions of Pranesh P. Ranganathan and Hari P. Kumar to this section is appreciated.
23Covi and Kling (1996), page 672.
1017
Introduction to Information Retrieval and Web Search
new information of interest. Application of data analysis techniques for discovery
and analysis of useful information from the Web is known as Web analysis. Over
the past few years the World Wide Web has emerged as an important repository of
information for many day-to-day applications for individual consumers, as well as a
significant platform for e-commerce and for social networking. These properties
make it an interesting target for data analysis applications. The Web mining and
analysis field is an integration of a wide range of fields spanning information
retrieval, text analysis, natural language processing, data mining, machine learning,
and statistical analysis.
The goals of Web analysis are to improve and personalize search results relevance
and to identify trends that may be of value to various businesses and organizations.
We elaborate on these goals next.
■ Finding relevant information. People usually search for specific informa-
tion on the Web by entering keywords in a search engine or browsing infor-
mation portals and using services. Search services are constrained by search
relevance problems since they have to map and approximate the information
need of millions of users as an a priori task. Low precision (see Section 6)
ensues due to results that are nonrelevant to the user. In the case of the Web,
high recall (see section 6) is impossible to determine due to the inability to
index all the pages on the Web. Also, measuring recall does not make sense
since the user is concerned with only the top few documents. The most rele-
vant feedback for the user is typically from only the top few results.
■ Personalization of the information. Different people have different content
and presentation preferences. By collecting personal information and then
generating user-specific dynamic Web pages, the pages are personalized for
the user. The customization tools used in various Web-based applications
and services, such as click-through monitoring, eyeball tracking, explicit or
implicit user profile learning, and dynamic service composition using Web
APIs, are used for service adaptation and personalization. A personalization
engine typically has algorithms that make use of the user’s personalization
information—collected by various tools—to generate user-specific search
results.
■ Finding information of commercial value. This problem deals with finding
interesting patterns in users’ interests, behaviors, and their use of products
and services, which may be of commercial value. For example, businesses
such as the automobile industry, clothing, shoes, and cosmetics may improve
their services by identifying patterns such as usage trends and user prefer-
ences using various Web analysis techniques.
Based on the above goals, we can classify Web analysis into three categories: Web
content analysis, which deals with extracting useful information/knowledge from
Web page contents; Web structure analysis, which discovers knowledge from
hyperlinks representing the structure of the Web; and Web usage analysis, which
mines user access patterns from usage logs that record the activity of every user.
1018
Introduction to Information Retrieval and Web Search
7.2 Searching the Web
The World Wide Web is a huge corpus of information, but locating resources that
are both high quality and relevant to the needs of the user is very difficult. The set of
Web pages taken as a whole has almost no unifying structure, with variability in
authoring style and content, thereby making it more difficult to precisely locate
needed information. Index-based search engines have been one of the prime tools
by which users search for information on the Web. Web search engines crawl the
Web and create an index to the Web for searching purposes. When a user specifies
his need for information by supplying keywords, these Web search engines query
their repository of indexes and produce links or URLs with abbreviated content as
search results. There may be thousands of pages relevant to a particular query. A
problem arises when only a few most relevant results are to be returned to the user.
The discussion we had about querying and relevance-based ranking in IR systems in
Sections 2 and 3 is applicable to Web search engines. These ranking algorithms
explore the link structure of the Web.
Web pages, unlike standard text collections, contain connections to other Web pages
or documents (via the use of hyperlinks), allowing users to browse from page to
page. A hyperlink has two components: a destination page and an anchor text
describing the link. For example, a person can link to the Yahoo! Website on his Web
page with anchor text such as “My favorite Website.” Anchor texts can be thought of
as being implicit endorsements. They provide very important latent human annota-
tion. A person linking to other Web pages from his Web page is assumed to have
some relation to those Web pages. Web search engines aim to distill results per their
relevance and authority. There are many redundant hyperlinks, like the links to the
homepage on every Web page of the Web site. Such hyperlinks must be eliminated
from the search results by the search engines.
A hub is a Web page or a Website that links to a collection of prominent sites
(authorities) on a common topic. A good authority is a page that is pointed to by
many good hubs, while a good hub is a page that points to many good authorities.
These ideas are used by the HITS ranking algorithm, which is described in Section
7.3. It is often found that authoritative pages are not very self-descriptive, and
authorities on broad topics seldom link directly to one another. These properties of
hyperlinks are being actively used to improve Web search engine result ranking and
organize the results as hubs and authorities. We briefly discuss a couple of ranking
algorithms below.
7.3 Analyzing the Link Structure of Web Pages
The goal of Web structure analysis is to generate structural summary about the
Website and Web pages. It focuses on the inner structure of documents and deals
with the link structure using hyperlinks at the interdocument level. The structure
and content of Web pages are often combined for information retrieval by Web
search engines. Given a collection of interconnected Web documents, interesting
and informative facts describing their connectivity in the Web subset can be discov-
ered. Web structure analysis is also used to reveal the structure of Web pages, which
1019
Introduction to Information Retrieval and Web Search
helps with navigation and makes it possible to compare/integrate Web page
schemes. This aspect of Web structure analysis facilitates Web document classifica-
tion and clustering on the basis of structure.
The PageRank Ranking Algorithm. As discussed earlier, ranking algorithms are
used to order search results based on relevance and authority. Google uses the well-
known PageRank algorithm,24 which is based on the “importance” of each page.
Every Web page has a number of forward links (out-edges) and backlinks (in-
edges). It is very difficult to determine all the backlinks of a Web page, while it is rel-
atively straightforward to determine its forward links. According to the PageRank
algorithm, highly linked pages are more important (have greater authority) than
pages with fewer links. However, not all backlinks are important. A backlink to a
page from a credible source is more important than a link from some arbitrary
page. Thus a page has a high rank if the sum of the ranks of its backlinks is high.
PageRank was an attempt to see how good an approximation to the “importance” of
a page can be obtained from the link structure.
The computation of page ranking follows an iterative approach. PageRank of a Web
page is calculated as a sum of the PageRanks of all its backlinks. PageRank treats the
Web like a Markov model. An imaginary Web surfer visits an infinite string of pages
by clicking randomly. The PageRank of a page is an estimate of how often the surfer
winds up at a particular page. PageRank is a measure of query-independent impor-
tance of a page/node. For example, let P(X) be the PageRank of any page X and C(X)
be the number of outgoing links from page X, and let d be the damping factor in the
range 0 < d < 1. Usually d is set to 0.85. Then PageRank for a page A can be calcu-
lated as:
P(A) = (1 – d) + d (P(T1)/C(T1) + ... + P(Tn)/C(Tn))
Here T1, T2, ..., Tn are the pages that point to Page A (that is, are citations to page
A). PageRank forms a probability distribution over Web pages, so the sum of all
Web pages’ PageRanks is one.
The HITS Ranking Algorithm. The HITS25 algorithm proposed by Jon
Kleinberg is another type of ranking algorithm exploiting the link structure of the
Web. The algorithm presumes that a good hub is a document that points to many
hubs, and a good authority is a document that is pointed at by many other author-
ities. The algorithm contains two main steps: a sampling component and a weight-
propagation component. The sampling component constructs a focused collection
S of pages with the following properties:
1. S is relatively small.
2. S is rich in relevant pages.
3. S contains most (or a majority) of the strongest authorities.
24The PageRank algorithm was proposed by Lawrence Page (1998) and Sergey Brin, founders of
Google. For more information, see http://en.wikipedia.org/wiki/PageRank.
25See Kleinberg (1999).
1020
Introduction to Information Retrieval and Web Search
The weight component recursively calculates the hub and authority values for each
document as follows:
1. Initialize hub and authority values for all pages in S by setting them to 1.
2. While (hub and authority values do not converge):
a. For each page in S, calculate authority value = Sum of hub values of all
pages pointing to the current page.
b. For each page in S, calculate hub value = Sum of authority values of all
pages pointed at by the current page.
c. Normalize hub and authority values such that sum of all hub values in S
equals 1 and the sum of all authority values in S equals 1.
7.4 Web Content Analysis
As mentioned earlier, Web content analysis refers to the process of discovering use-
ful information from Web content/data/documents. The Web content data consists
of unstructured data such as free text from electronically stored documents, semi-
structured data typically found as HTML documents with embedded image data,
and more structured data such as tabular data, and pages in HTML, XML, or other
markup languages generated as output from databases. More generally, the term
Web content refers to any real data in the Web page that is intended for the user
accessing that page. This usually consists of but is not limited to text and graphics.
We will first discuss some preliminary Web content analysis tasks and then look at
the traditional analysis tasks of Web page classification and clustering later.
Structured Data Extraction. Structured data on the Web is often very important
as it represents essential information, such as a structured table showing the airline
flight schedule between two cities. There are several approaches to structured data
extraction. One includes writing a wrapper, or a program that looks for different
structural characteristics of the information on the page and extracts the right con-
tent. Another approach is to manually write an extraction program for each Website
based on observed format patterns of the site, which is very labor intensive and time
consuming. It does not scale to a large number of sites. A third approach is wrapper
induction or wrapper learning, where the user first manually labels a set of train-
ing set pages, and the learning system generates rules—based on the learning
pages—that are applied to extract target items from other Web pages. A fourth
approach is the automatic approach, which aims to find patterns/grammars from
the Web pages and then uses wrapper generation to produce a wrapper to extract
data automatically.
Web Information Integration. The Web is immense and has millions of docu-
ments, authored by many different persons and organizations. Because of this, Web
pages that contain similar information may have different syntax and different
words that describe the same concepts. This creates the need for integrating
1021
Introduction to Information Retrieval and Web Search
information from diverse Web pages. Two popular approaches for Web information
integration are:
1. Web query interface integration, to enable querying multiple Web data-
bases that are not visible in external interfaces and are hidden in the “deep
Web.” The deep Web26 consists of those pages that do not exist until they are
created dynamically as the result of a specific database search, which pro-
duces some of the information in the page. Since traditional search engine
crawlers cannot probe and collect information from such pages, the deep
Web has heretofore been hidden from crawlers.
2. Schema matching, such as integrating directories and catalogs to come up
with a global schema for applications. An example of such an application
would be to combine a personal health record of an individual by matching
and collecting data from various sources dynamically by cross-linking health
records from multiple systems.
These approaches remain an area of active research and a detailed discussion of
them is beyond the scope of this book. Consult the Selected Bibliography at the end
of this chapter for further details.
Ontology-Based Information Integration. This task involves using ontologies
to effectively combine information from multiple heterogeneous sources.
Ontologies—formal models of representation with explicitly defined concepts and
named relationships linking them—are used to address the issues of semantic het-
erogeneity in data sources. Different classes of approaches are used for information
integration using ontologies.
■ Single ontology approaches use one global ontology that provides a shared
vocabulary for the specification of the semantics. They work if all informa-
tion sources to be integrated provide nearly the same view on a domain of
knowledge. For example, UMLS (described in Section 4.3) can serve as a
common ontology for biomedical applications.
■ In a multiple ontology approach, each information source is described by
its own ontology. In principle, the “source ontology” can be a combination of
several other ontologies but it cannot be assumed that the different “source
ontologies” share the same vocabulary. Dealing with multiple, partially over-
lapping, and potentially conflicting ontologies is a very difficult problem
faced by many applications, including those in bioinformatics and other
complex area of knowledge.
■ Hybrid ontology approaches are similar to multiple ontology approaches:
the semantics of each source is described by its own ontology. But in order to
make the source ontologies comparable to each other, they are built upon
one global shared vocabulary. The shared vocabulary contains basic terms
(the primitives) of a domain of knowledge. Because each term of source
26The deep Web as defined by Bergman (2001).
1022
Introduction to Information Retrieval and Web Search
ontology is based on the primitives, the terms become more easily compara-
ble than in multiple ontology approaches. The advantage of a hybrid
approach is that new sources can be easily added without the need to modify
the mappings or the shared vocabulary. In multiple and hybrid approaches,
several research issues, such as ontology mapping, alignment, and merging,
need to be addressed.
Building Concept Hierarchies. One common way of organizing search results is
via a linear ranked list of documents. But for some users and applications, a better
way to display results would be to create groupings of related documents in the
search result. One way of organizing documents in a search result, and for organiz-
ing information in general, is by creating a concept hierarchy. The documents in a
search result are organized into groups in a hierarchical fashion. Other related tech-
niques to organize docments are through classification and clustering. Clustering
creates groups of documents, where the documents in each group share many com-
mon concepts.
Segmenting Web Pages and Detecting Noise. There are many superfluous
parts in a Web document, such as advertisements and navigation panels. The infor-
mation and text in these superfluous parts should be eliminated as noise before
classifying the documents based on their content. Hence, before applying classifica-
tion or clustering algorithms to a set of documents, the areas or blocks of the docu-
ments that contain noise should be removed.
7.5 Approaches to Web Content Analysis
The two main approaches to Web content analysis are (1) agent based (IR view) and
(2) database based (DB view).
The agent-based approach involves the development of sophisticated artificial
intelligence systems that can act autonomously or semi-autonomously on behalf of
a particular user, to discover and process Web-based information. Generally, the
agent-based Web analysis systems can be placed into the following three categories:
■ Intelligent Web agents are software agents that search for relevant informa-
tion using characteristics of a particular application domain (and possibly a
user profile) to organize and interpret the discovered information. For
example, an intelligent agent that retrieves product information from a vari-
ety of vendor sites using only general information about the product
domain.
■ Information Filtering/Categorization is another technique that utilizes
Web agents for categorizing Web documents. These Web agents use methods
from information retrieval, and semantic information based on the links
among various documents to organize documents into a concept hierarchy.
■ Personalized Web agents are another type of Web agents that utilize the per-
sonal preferences of users to organize search results, or to discover informa-
tion and documents that could be of value for a particular user. User
1023
Introduction to Information Retrieval and Web Search
preferences could be learned from previous user choices, or from other indi-
viduals who are considered to have similar preferences to the user.
The database-based approach aims to infer the structure of the Website or to trans-
form a Web site to organize it as a database so that better information management
and querying on the Web become possible. This approach of Web content analysis
primarily tries to model the data on the Web and integrate it so that more sophisti-
cated queries than keyword-based search can be performed. These could be
achieved by finding the schema of Web documents, building a Web document ware-
house, a Web knowledge base, or a virtual database. The database-based approach
may use a model such as the Object Exchange Model (OEM)27 that represents semi-
structured data by a labeled graph. The data in the OEM is viewed as a graph, with
objects as the vertices and labels on the edges. Each object is identified by an object
identifier and a value that is either atomic—such as integer, string, GIF image, or
HTML document—or complex in the form of a set of object references.
The main focus of the database-based approach has been with the use of multilevel
databases and Web query systems. A multilevel database at its lowest level is a data-
base containing primitive semistructured information stored in various Web repos-
itories, such as hypertext documents. At the higher levels, metadata or
generalizations are extracted from lower levels and organized in structured collec-
tions such as relational or object-oriented databases. In a Web query system, infor-
mation about the content and structure of Web documents is extracted and
organized using database-like techniques. Query languages similar to SQL can then
be used to search and query Web documents. They combine structural queries,
based on the organization of hypertext documents, and content-based queries.
7.6 Web Usage Analysis
Web usage analysis is the application of data analysis techniques to discover usage
patterns from Web data, in order to understand and better serve the needs of Web-
based applications. This activity does not directly contribute to information
retrieval; but it is important to improve or enhance the users’ search experience.
Web usage data describes the pattern of usage of Web pages, such as IP addresses,
page references, and the date and time of accesses for a user, user group, or an appli-
cation. Web usage analysis typically consists of three main phases: preprocessing,
pattern discovery, and pattern analysis.
1. Preprocessing. Preprocessing converts the information collected about
usage statistics and patterns into a form that can be utilized by the pattern
discovery methods. We use the term “page view” to refer to pages viewed or
visited by a user. There are several different types of preprocessing tech-
niques available:
■ Usage preprocessing analyzes the available collected data about usage pat-
terns of users, applications, and groups of users. Because this data is often
incomplete, the process is difficult. Data cleaning techniques are necessary to
27See Kosala and Blockeel (2000).
1024
Introduction to Information Retrieval and Web Search
eliminate the impact of irrelevant items in the analysis result. Frequently,
usage data is identified by an IP address, and consists of clicking streams that
are collected at the server. Better data is available if a usage tracking process
is installed at the client site.
■ Content preprocessing is the process of converting text, image, scripts and
other content into a form that can be used by the usage analysis. Often, this
consists of performing content analysis such as classification or clustering.
The clustering or classification techniques can group usage information for
similar types of Web pages, so that usage patterns can be discovered for spe-
cific classes of Web pages that describe particular topics. Page views can also
be classified according to their intended use, such as for sales or for discovery
or for other uses.
■ Structure preprocessing: The structure preprocessing can be done by pars-
ing and reformatting the information about hyperlinks and structure
between viewed pages. One difficulty is that the site structure may be
dynamic and may have to be constructed for each server session.
2. Pattern Discovery
The techniques that are used in pattern discovery are based on methods
from the fields of statistics, machine learning, pattern recognition, data
analysis, data mining, and other similar areas. These techniques are adapted
so they take into consideration the specific knowledge and characteristics for
Web Analysis. For example, in association rule discovery, the notion of a
transaction for market-basket analysis considers the items to be unordered.
But the order of accessing of Web pages is important, and so it should be
considered in Web usage analysis. Hence, pattern discovery involves mining
sequences of page views. In general, using Web usage data, the following
types of data mining activities may be performed for pattern discovery.
■ Statistical analysis. Statistical techniques are the most common method to
extract knowledge about visitors to a Website. By analyzing the session log, it
is possible to apply statistical measures such as mean, median, and frequency
count to parameters such as pages viewed, viewing time per page, length of
navigation paths between pages, and other parameters that are relevant to
Web usage analysis.
■ Association rules. In the context of Web usage analysis, association rules
refer to sets of pages that are accessed together with a support value exceed-
ing some specified threshold. These pages may not be directly connected to
one another via hyperlinks. For example, association rule discovery may
reveal a correlation between users who visited a page containing electronic
products to those who visit a page about sporting equipment.
■ Clustering. In the Web usage domain, there are two kinds of interesting
clusters to be discovered: usage clusters and page clusters. Clustering of
users tends to establish groups of users exhibiting similar browsing patterns.
1025
Introduction to Information Retrieval and Web Search
Such knowledge is especially useful for inferring user demographics in order
to perform market segmentation in E-commerce applications or provide
personalized Web content to the users. Clustering of pages is based on the
content of the pages, and pages with similar contents are grouped together.
This type of clustering can be utilized in Internet search engines, and in tools
that provide assistance to Web browsing.
■ Classification. In the Web domain, one goal is to develop a profile of users
belonging to a particular class or category. This requires extraction and
selection of features that best describe the properties of a given class or cate-
gory of users. As an example, an interesting pattern that may be discovered
would be: 60% of users who placed an online order in /Product/Books are in
the 18-25 age group and live in rented apartments.
■ Sequential patterns. These kinds of patterns identify sequences of Web
accesses, which may be used to predict the next set of Web pages to be
accessed by a certain class of users. These patterns can be used by marketers
to produce targeted advertisements on Web pages. Another type of sequen-
tial pattern pertains to which items are typically purchased following the
purchase of a particular item. For example, after purchasing a computer, a
printer is often purchased
■ Dependency modeling. Dependency modeling aims to determine and
model significant dependencies among the various variables in the Web
domain. As an example, one may be interested to build a model representing
the different stages a visitor undergoes while shopping in an online store
based on the actions chosen (e.g., from a casual visitor to a serious potential
buyer).
3. Pattern Analysis
The final step is to filter out those rules or patterns that are considered to be
not of interest from the discovered patterns. The particular analysis method-
ology based on the application. One common technique for pattern analysis
is to use a query language such as SQL to detect various patterns and rela-
tionships. Another technique involves loading of usage data into a data ware-
house with ETL tools and performing OLAP operations to view it along
multiple dimensions. It is common to use visualization techniques, such as
graphing patterns or assigning colors to different values, to highlight pat-
terns or trends in the data.
7.7 Practical Applications of Web Analysis
Web Analytics. The goal of web analytics is to understand and optimize the per-
formance of Web usage. This requires collecting, analyzing, and performance mon-
itoring of Internet usage data. On-site Web analytics measures the performance of a
Website in a commercial context. This data is typically compared against key per-
formance indicators to measure effectiveness or performance of the Website as a
whole, and can be used to improve a Website or improve the marketing strategies.
1026
Introduction to Information Retrieval and Web Search
Web Spamming. It has become increasingly important for companies and indi-
viduals to have their Websites/Web pages appear in the top search results. To achieve
this, it is essential to understand search engine ranking algorithms and to present
the information in one’s page in such a way that the page is ranked high when the
respective keywords are queried. There is a thin line separating legitimate page opti-
mization for business purposes and spamming. Web Spamming is thus defined as a
deliberate activity to promote one’s page by manipulating the results returned by
the search engines. Web analysis may be used to detect such pages and discard them
from search results.
Web Security. Web analysis can be used to find interesting usage patterns of
Websites. If any flaw in a Website has been exploited, it can be inferred using Web
analysis thereby allowing the design of more robust Websites. For example, the
backdoor or information leak of Web servers can be detected by using Web analysis
techniques on some abnormal Web application log data. Security analysis tech-
niques such as intrusion detection and denial of service attacks are based on Web
access pattern analysis.
Web Crawlers. Web crawlers are programs that visit Web pages and create copies
of all the visited pages so they can be processed by a search engine for indexing the
downloaded pages to provide fast searches. Another use of crawlers is to automati-
cally check and maintain the Websites. For example, the HTML code and the links
in a Website can be checked and validated by the crawler. Another unfortunate use
of crawlers is to collect e-mail addresses from Web pages, so they can be used for
spam e-mails later.
8 Trends in Information Retrieval
In this section we review a few concepts that are being considered in more recent
research work in information retrieval.
8.1 Faceted Search
Faceted Search is a technique that allows for integrated search and navigation expe-
rience by allowing users to explore by filtering available information. This search
technique is used often in ecommerce Websites and applications enabling users to
navigate a multi-dimensional information space. Facets are generally used for han-
dling three or more dimensions of classification. This allows the faceted classifica-
tion scheme to classify an object in various ways based on different taxonomical
criteria. For example, a Web page may be classified in various ways: by content (air-
lines, music, news, ...); by use (sales, information, registration, ...); by location; by
language used (HTML, XML, ...) and in other ways or facets. Hence, the object can
be classified in multiple ways based on multiple taxonomies.
A facet defines properties or characteristics of a class of objects. The properties
should be mutually exclusive and exhaustive. For example, a collection of art objects
might be classified using an artist facet (name of artist), an era facet (when the art
1027
Introduction to Information Retrieval and Web Search
was created), a type facet (painting, sculpture, mural, ...), a country of origin facet,
a media facet (oil, watercolor, stone, metal, mixed media, ...), a collection facet
(where the art resides), and so on.
Faceted search uses faceted classification that enables a user to navigate information
along multiple paths corresponding to different orderings of the facets. This con-
trasts with traditional taxonomies in which the hierarchy of categories is fixed and
unchanging. University of California, Berkeley’s Flamenco project28 is one of the
earlier examples of a faceted search system.
8.2 Social Search
The traditional view of Web navigation and browsing assumes that a single user is
searching for information. This view contrasts with previous research by library sci-
entists who studied users’ information seeking habits. This research demonstrated
that additional individuals may be valuable information resources during informa-
tion search by a single user. More recently, research indicates that there is often
direct user cooperation during Web-based information search. Some studies report
that significant segments of the user population are engaged in explicit collabora-
tion on joint search tasks on the Web. Active collaboration by multiple parties also
occur in certain cases (for example, enterprise settings); at other times, and perhaps
for a majority of searches, users often interact with others remotely, asynchronously,
and even involuntarily and implicitly.
Socially enabled online information search (social search) is a new phenomenon
facilitated by recent Web technologies. Collaborative social search involves different
ways for active involvement in search-related activities such as co-located search,
remote collaboration on search tasks, use of social network for search, use of exper-
tise networks, involving social data mining or collective intelligence to improve the
search process and even social interactions to facilitate information seeking and sense
making. This social search activity may be done synchronously, asynchronously, co-
located or in remote shared workspaces. Social psychologists have experimentally val-
idated that the act of social discussions has facilitated cognitive performance. People
in social groups can provide solutions (answers to questions), pointers to databases
or to other people (meta-knowledge), validation and legitimization of ideas, and can
serve as memory aids and help with problem reformulation. Guided participation is
a process in which people co-construct knowledge in concert with peers in their com-
munity. Information seeking is mostly a solitary activity on the Web today. Some
recent work on collaborative search reports several interesting findings and the
potential of this technology for better information access.
8.3 Conversational Search
Conversational Search (CS) is an interactive and collaborative information finding
interaction. The participants engage in a conversation and perform a social search
activity that is aided by intelligent agents. The collaborative search activity helps the
28Yee (2003) describes faceted metadata for image search.
1028
Introduction to Information Retrieval and Web Search
agent learn about conversations with interactions and feedback from participants. It
uses the semantic retrieval model with natural language understanding to provide
the users with faster and relevant search results. It moves search from being a soli-
tary activity to being a more participatory activity for the user. The search agent
performs multiple tasks of finding relevant information and connecting the users
together; participants provide feedback to the agent during the conversations that
allows the agent to perform better.
9 Summary
In this chapter we covered an important area called information retrieval (IR) that
is closely related to databases. With the advent of the Web, unstructured data with
text, images, audio, and video is proliferating at phenomenal rates. While database
management systems have a very good handle on structured data, the unstructured
data containing a variety of data types is being stored mainly on ad hoc information
repositories on the Web that are available for consumption primarily via IR systems.
Google, Yahoo, and similar search engines are IR systems that make the advances in
this field readily available for the average end-user, giving them a richer search expe-
rience with continuous improvement.
We started by defining the basic terminology of IR, presented the query and brows-
ing modes of interaction in IR systems, and provided a comparison of the IR and
database technologies. We presented schematics of the IR process at a detailed and
an overview level, and then discussed digital libraries, which are repositories of tar-
geted content on the Web for academic institutions as well as professional commu-
nities, and gave a brief history of IR.
We presented the various retrieval models including Boolean, vector space, proba-
bilistic, and semantic models. They allow for a measurement of whether a docu-
ment is relevant to a user query and provide similarity measurement heuristics. We
then discussed various evaluation metrics such as recall and precision and F-score
to measure the goodness of the results of IR queries. Then we presented different
types of queries—besides keyword-based queries, which dominate, there are other
types including Boolean, phrase, proximity, natural language, and others for which
explicit support needs to be provided by the retrieval model. Text preprocessing is
important in IR systems, and various activities like stopword removal, stemming,
and the use of thesauruses were discussed. We then discussed the construction and
use of inverted indexes, which are at the core of IR systems and contribute to factors
involving search efficiency. Relevance feedback was briefly addressed—it is impor-
tant to modify and improve the retrieval of pertinent information for the user
through his interaction and engagement in the search process.
We did a somewhat detailed introduction to analysis of the Web as it relates to
information retrieval. We divided this treatment into the analysis of content, struc-
ture, and usage of the Web. Web search was discussed, including an analysis of the
Web link structure, followed by an introduction to algorithms for ranking the
results from a Web search such as PageRank and HITS. Finally, we briefly discussed
1029
Introduction to Information Retrieval and Web Search
current trends, including faceted search, social search, and conversational search.
This is an introductory treatment of a vast field and the reader is referred to special-
ized textbooks on information retrieval and search engines.
Review Questions
1. What is structured data and unstructured data? Give an example of each
from your experience with data that you may have used.
2. Give a general definition of information retrieval (IR). What does informa-
tion retrieval involve when we consider information on the Web?
3. Discuss the types of data and the types of users in today’s information
retrieval systems.
4. What is meant by navigational, informational, and transformational search?
5. What are the two main modes of interaction with an IR system? Describe
with examples.
6. Explain the main differences between database and IR systems mentioned in
Table 1.
7. Describe the main components of the IR system as shown in Figure 1.
8. What are digital libraries? What types of data are typically found in them?
9. Name some digital libraries that you have accessed. What do they contain
and how far back does the data go?
10. Give a brief history of IR and mention the landmark developments.
11. What is the Boolean model of IR? What are its limitations?
12. What is the vector space model of IR? How does a vector get constructed to
represent a document?
13. Define the TF-IDF scheme of determining the weight of a keyword in a
document. What is the necessity of including IDF in the weight of a term?
14. What are probabilistic and semantic models of IR?
15. Define recall and precision in IR systems.
16. Give the definition of precision and recall in a ranked list of results at
position i.
17. How is F-score defined as a metric of information retrieval? In what way
does it account for both precision and recall?
18. What are the different types of queries in an IR system? Describe each with
an example.
19. What are the approaches to processing phrase and proximity queries?
1030
Introduction to Information Retrieval and Web Search
20. Describe the detailed IR process shown in Figure 2.
21. What is stopword removal and stemming? Why are these processes necessary
for better information retrieval?
22. What is a thesaurus? How is it beneficial to IR?
23. What is information extraction? What are the different types of information
extraction from structured text?
24. What are vocabularies in IR systems? What role do they play in the indexing
of documents?
25. Take five documents with about three sentences each with some related con-
tent. Construct an inverted index of all important stems (keywords) from
these documents.
26. Describe the process of constructing the result of a search request using an
inverted index.
27. Define relevance feedback.
28. Describe the three types of Web analyses discussed in this chapter.
29. List the important tasks mentioned that are involved in analyzing Web con-
tent. Describe each in a couple of sentences.
30. What are the three categories of agent-based Web content analyses men-
tioned in this chapter?
31. What is the database-based approach to analyzing Web content? What are
Web query systems?
32. What algorithms are popular in ranking or determining the importance of
Web pages? Which algorithm was proposed by the founders of Google?
33. What is the basic idea behind the PageRank algorithm?
34. What are hubs and authority pages? How does the HITS algorithm use these
concepts?
35. What can you learn from Web usage analysis? What data does it generate?
36. What mining operations are commonly performed on Web usage data? Give
an example of each.
37. What are the applications of Web usage mining?
38. What is search relevance? How is it determined?
39. Define faceted search. Make up a set of facets for a database containing all
types of buildings. For example, two facets could be “building value or price”
and “building type (residential, office, warehouse, factory, and so on)”.
40. What is social search? What does collaborative social search involve?
41. Define and explain conversational search.
1031
Introduction to Information Retrieval and Web Search
Selected Bibliography
Information retrieval and search technologies are active areas of research and devel-
opment in industry and academia. There are many IR textbooks that provide
detailed discussion on the materials that we have briefly introduced in this chapter.
A recent book entitled Search Engines: Information Retrieval in Practice by Croft,
Metzler, and Strohman (2009) gives a practical overview of search engine concepts
and principles. Introduction to Information Retrieval by Manning, Raghavan, and
Schutze (2008) is an authoritative book on information retrieval. Another introduc-
tory textbook in IR is Modern Information Retrieval by Ricardo Baeza-Yates and
Berthier Ribeiro-Neto (1999), which provides detailed coverage of various aspects
of IR technology. Gerald Salton’s (1968) and van Rijsbergen’s (1979) classic books
on information retrieval provide excellent descriptions of the foundational research
done in the IR field until the late 1960s. Salton also introduced the vector space
model as a model of IR. Manning and Schutze (1999) provide a good summary of
natural language technologies and text preprocessing. “Interactive Information
Retrieval in Digital Environments” by Xie (2008) provides a good human-centered
approach to information retrieval. The book Managing Gigabytes by Witten, Moffat,
and Bell (1999) provides detailed discussions for indexing techniques. The TREC
book by Voorhees and Harman (2005) provides a description of test collection and
evaluation procedures in the context of TREC competitions.
Broder (2002) classifies Web queries into three distinct classes—navigational, infor-
mational, and transactional—and presents a detailed taxonomy of Web search. Covi
and Kling (1996) give a broad definition for digital libraries in their paper and dis-
cuss organizational dimensions of effective digital library use. Luhn (1957) did some
seminal work in IR at IBM in the 1950s on autoindexing and business intelligence
that received a lot of attention at that time. The SMART system (Salton et al. (1993)),
developed at Cornell, was one of the earliest advanced IR systems that used fully
automatic term indexing, hierarchical clustering, and document ranking by degree
of similarity to the query. The SMART system represented documents and queries as
weighted term vectors according to the vector space model. Porter (1980) is credited
with the weak and strong stemming algorithms that have become standards.
Robertson (1997) developed a sophisticated weighting scheme in the City University
of London Okapi system that became very popular in TREC competitions. Lenat
(1995) started the Cyc project in the 1980s for incorporating formal logic and knowl-
edge bases in information processing systems. Efforts toward creating the WordNet
thesaurus continued in the 1990s, and are still ongoing. WordNet concepts and prin-
ciples are described in the book by Fellbaum (1998). Rocchio (1971) describes the
relevance feedback algorithm, which is described in Salton’s (1971) book on The
SMART Retrieval System–Experiments in Automatic Document Processing.
Abiteboul, Buneman, and Suciu (1999) provide an extensive discussion of data on
the Web in their book that emphasizes semistructured data. Atzeni and Mendelzon
(2000) wrote an editorial in the VLDB journal on databases and the Web. Atzeni et
al. (2002) propose models and transformations for Web-based data. Abiteboul et al.
(1997) propose the Lord query language for managing semistructured data.
1032
Introduction to Information Retrieval and Web Search
Chakrabarti (2002) is an excellent book on knowledge discovery from the Web. The
book by Liu (2006) consists of several parts, each providing a comprehensive
overview of the concepts involved with Web data analysis and its applications.
Excellent survey articles on Web analysis include Kosala and Blockeel (2000) and
Liu et al. (2004). Etzioni (1996) provides a good starting point for understanding
Web mining and describes the tasks and issues related with the World Wide Web. An
excellent overview of the research issues, techniques, and development efforts asso-
ciated with Web content and usage analysis is presented by Cooley et al. (1997).
Cooley (2003) focuses on mining Web usage patterns through the use of Web struc-
ture. Spiliopoulou (2000) describes Web usage analysis in detail. Web mining based
on page structure is described in Madria et al. (1999) and Chakraborti et al. (1999).
Algorithms to compute the rank of a Web page are given by Page et al. (1999), who
describe the famous PageRank algorithm, and Kleinberg (1998), who presents the
HITS algorithm.
1033
Overview of Data
Warehousing and OLAP
The increasing processing power and sophisticationof analytical tools and techniques have resulted in
the development of what are known as data warehouses. These data warehouses
provide storage, functionality, and responsiveness to queries beyond the capabilities
of transaction-oriented databases. Accompanying this ever-increasing power is a
great demand to improve the data access performance of databases. Traditional
databases balance the requirement of data access with the need to ensure data
integrity. In modern organizations, users of data are often completely removed from
the data sources. Many people only need read-access to data, but still need fast
access to a larger volume of data than can conveniently be downloaded to the desk-
top. Often such data comes from multiple databases. Because many of the analyses
performed are recurrent and predictable, software vendors and systems support
staff are designing systems to support these functions. Presently there is a great need
to provide decision makers from middle management upward with information at
the correct level of detail to support decision making. Data warehousing, online ana-
lytical processing (OLAP), and data mining provide this functionality. In this chapter
we give a broad overview of data warehousing and OLAP technologies.
1 Introduction, Definitions, and Terminology
A database is a collection of related data and a database system is a database and
database software together. A data warehouse is also a collection of information
as well as a supporting system. However, a clear distinction exists. Traditional
From Chapter 29 of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
1034
Overview of Data Warehousing and OLAP
databases are transactional (relational, object-oriented, network, or hierarchical).
Data warehouses have the distinguishing characteristic that they are mainly
intended for decision-support applications. They are optimized for data retrieval,
not routine transaction processing.
Because data warehouses have been developed in numerous organizations to meet
particular needs, there is no single, canonical definition of the term data warehouse.
Professional magazine articles and books in the popular press have elaborated on
the meaning in a variety of ways. Vendors have capitalized on the popularity of the
term to help market a variety of related products, and consultants have provided a
large variety of services, all under the data warehousing banner. However, data
warehouses are quite distinct from traditional databases in their structure, func-
tioning, performance, and purpose.
W. H. Inmon1 characterized a data warehouse as a subject-oriented, integrated, non-
volatile, time-variant collection of data in support of management’s decisions. Data
warehouses provide access to data for complex analysis, knowledge discovery, and
decision making. They support high-performance demands on an organization’s
data and information. Several types of applications—OLAP, DSS, and data mining
applications—are supported. We define each of these next.
OLAP (online analytical processing) is a term used to describe the analysis of com-
plex data from the data warehouse. In the hands of skilled knowledge workers,
OLAP tools use distributed computing capabilities for analyses that require more
storage and processing power than can be economically and efficiently located on
an individual desktop.
DSS (decision-support systems), also known as EIS—executive information sys-
tems; not to be confused with enterprise integration systems—support an organiza-
tion’s leading decision makers with higher-level data for complex and important
decisions. Data mining is used for knowledge discovery, the process of searching data
for unanticipated new knowledge.
Traditional databases support online transaction processing (OLTP), which
includes insertions, updates, and deletions, while also supporting information
query requirements. Traditional relational databases are optimized to process
queries that may touch a small part of the database and transactions that deal with
insertions or updates of a few tuples per relation to process. Thus, they cannot be
optimized for OLAP, DSS, or data mining. By contrast, data warehouses are
designed precisely to support efficient extraction, processing, and presentation for
analytic and decision-making purposes. In comparison to traditional databases,
data warehouses generally contain very large amounts of data from multiple sources
that may include databases from different data models and sometimes files acquired
from independent systems and platforms.
1Inmon (1992) is credited with initially using the term warehouse. The latest edition of his work is Inmon
(2005).
1035
Overview of Data Warehousing and OLAP
Databases Cleaning
Backflushing
Reformatting
Data mining
DSS
EIS
OLAP
Other data inputs
Updates/new data
Metadata
Data
Data warehouse
Figure 1
Sample transactions in
market-basket model.
2 Characteristics of Data Warehouses
To discuss data warehouses and distinguish them from transactional databases calls
for an appropriate data model. The multidimensional data model (explained in
more detail in Section 3) is a good fit for OLAP and decision-support technologies.
In contrast to multidatabases, which provide access to disjoint and usually heteroge-
neous databases, a data warehouse is frequently a store of integrated data from mul-
tiple sources, processed for storage in a multidimensional model. Unlike most
transactional databases, data warehouses typically support time-series and trend
analysis, both of which require more historical data than is generally maintained in
transactional databases.
Compared with transactional databases, data warehouses are nonvolatile. This
means that information in the data warehouse changes far less often and may be
regarded as non–real-time with periodic updating. In transactional systems, transac-
tions are the unit and are the agent of change to the database; by contrast, data ware-
house information is much more coarse-grained and is refreshed according to a
careful choice of refresh policy, usually incremental. Warehouse updates are handled
by the warehouse’s acquisition component that provides all required preprocessing.
We can also describe data warehousing more generally as a collection of decision sup-
port technologies, aimed at enabling the knowledge worker (executive, manager, ana-
lyst) to make better and faster decisions.2 Figure 1 gives an overview of the conceptual
structure of a data warehouse. It shows the entire data warehousing process, which
includes possible cleaning and reformatting of data before loading it into the ware-
house. This process is handled by tools known as ETL (extraction, transformation,
and loading) tools. At the back end of the process, OLAP, data mining, and DSS may
generate new relevant information such as rules; this information is shown in the
figure going back into the warehouse. The figure also shows that data sources may
include files.
2Chaudhuri and Dayal (1997) provide an excellent tutorial on the topic, with this as a starting definition.
1036
Overview of Data Warehousing and OLAP
Data warehouses have the following distinctive characteristics:3
■ Multidimensional conceptual view
■ Generic dimensionality
■ Unlimited dimensions and aggregation levels
■ Unrestricted cross-dimensional operations
■ Dynamic sparse matrix handling
■ Client-server architecture
■ Multiuser support
■ Accessibility
■ Transparency
■ Intuitive data manipulation
■ Consistent reporting performance
■ Flexible reporting
Because they encompass large volumes of data, data warehouses are generally an
order of magnitude (sometimes two orders of magnitude) larger than the source
databases. The sheer volume of data (likely to be in terabytes or even petabytes) is
an issue that has been dealt with through enterprise-wide data warehouses, virtual
data warehouses, and data marts:
■ Enterprise-wide data warehouses are huge projects requiring massive
investment of time and resources.
■ Virtual data warehouses provide views of operational databases that are
materialized for efficient access.
■ Data marts generally are targeted to a subset of the organization, such as a
department, and are more tightly focused.
3 Data Modeling for Data Warehouses
Multidimensional models take advantage of inherent relationships in data to popu-
late data in multidimensional matrices called data cubes. (These may be called
hypercubes if they have more than three dimensions.) For data that lends itself to
dimensional formatting, query performance in multidimensional matrices can be
much better than in the relational data model. Three examples of dimensions in a
corporate data warehouse are the corporation’s fiscal periods, products, and
regions.
A standard spreadsheet is a two-dimensional matrix. One example would be a
spreadsheet of regional sales by product for a particular time period. Products could
be shown as rows, with sales revenues for each region comprising the columns.
(Figure 2 shows this two-dimensional organization.) Adding a time dimension,
3Codd and Salley (1993) coined the term OLAP and mentioned these characteristics. We have
reordered their original list.
1037
Overview of Data Warehousing and OLAP
Reg 1
P123
P124
P125
P126
Region
P
ro
d
u
ct
Reg 2 Reg 3
Figure 2
A two-dimensional
matrix model.
such as an organization’s fiscal quarters, would produce a three-dimensional
matrix, which could be represented using a data cube.
Figure 3 shows a three-dimensional data cube that organizes product sales data by
fiscal quarters and sales regions. Each cell could contain data for a specific product,
P126
P127
P
ro
d
uc
t
P125
P124
P123
Reg 1
Reg 2
Region
Reg 3
Qtr
1
Qtr
2Fis
cal_
qua
rter
Qtr
3
Qtr
4 Figure 3
A three-dimensional
data cube model.
1038
Overview of Data Warehousing and OLAP
Reg 4
R
eg
io
n
Reg 3
Reg 2
Reg 1 Qtr 1
Qtr 2
Fiscal quarter
Qtr 3
Qtr 4
P 1
23
P 1
24
Pro
duc
t
P 1
25
P 1
26
P 1
27Figure 4
Pivoted version of the data
cube from Figure 3.
specific fiscal quarter, and specific region. By including additional dimensions, a data
hypercube could be produced, although more than three dimensions cannot be eas-
ily visualized or graphically presented. The data can be queried directly in any com-
bination of dimensions, bypassing complex database queries. Tools exist for viewing
data according to the user’s choice of dimensions.
Changing from one-dimensional hierarchy (orientation) to another is easily accom-
plished in a data cube with a technique called pivoting (also called rotation). In this
technique the data cube can be thought of as rotating to show a different orienta-
tion of the axes. For example, you might pivot the data cube to show regional sales
revenues as rows, the fiscal quarter revenue totals as columns, and the company’s
products in the third dimension (Figure 4). Hence, this technique is equivalent to
having a regional sales table for each product separately, where each table shows
quarterly sales for that product region by region.
Multidimensional models lend themselves readily to hierarchical views in what is
known as roll-up display and drill-down display. A roll-up display moves up the
hierarchy, grouping into larger units along a dimension (for example, summing
weekly data by quarter or by year). Figure 5 shows a roll-up display that moves from
individual products to a coarser-grain of product categories. Shown in Figure 6, a
drill-down display provides the opposite capability, furnishing a finer-grained
view, perhaps disaggregating country sales by region and then regional sales by sub-
region and also breaking up products by styles.
1039
Overview of Data Warehousing and OLAP
Products
1XX
Products
2XX
Products
3XX
Products
4XX
Region
P
ro
d
uc
t
ca
te
g
o
rie
s
Region 1 Region 2 Region 3
Figure 5
The roll-up operation.
The multidimensional storage model involves two types of tables: dimension tables
and fact tables. A dimension table consists of tuples of attributes of the dimension.
A fact table can be thought of as having tuples, one per a recorded fact. This fact
contains some measured or observed variable(s) and identifies it (them) with point-
ers to dimension tables. The fact table contains the data, and the dimensions iden-
tify each tuple in that data. Figure 7 contains an example of a fact table that can be
viewed from the perspective of multiple dimension tables.
Two common multidimensional schemas are the star schema and the snowflake
schema. The star schema consists of a fact table with a single table for each dimen-
sion (Figure 7). The snowflake schema is a variation on the star schema in which
A
B
C
D
P123
Styles
P124
Styles
P125
Styles
A
B
C
A
B
C
D
Sub_reg 1 Sub_reg 2
Region 1 Region 2
Sub_reg 3 Sub_reg 4 Sub_reg 1
Figure 6
The drill-down
operation.
1040
Overview of Data Warehousing and OLAP
Dimension table
Product
Prod_no
Prod_name
Prod_descr
Prod_style
Prod_line
Fact table
Business results
Product
Quarter
Region
Sales_revenue
Dimension table
Fiscal quarter
Qtr
Year
Beg_date
End_date
Dimension table
Region
Subregion
Figure 7
A star schema with
fact and dimensional
tables.
Dimension tables
Pname
Prod_name
Prod_descr Product
Prod_no
Prod_name
Style
Prod_line_no
Fact table
Business results
Product
Quarter
Region
Revenue
Pline
Prod_line_no
Prod_line_name
Dimension tables
Fiscal quarter
Qtr
Year
Beg_date
FQ dates
Beg_date
End_date
Sales revenue
Region
Subregion
Figure 8
A snowflake schema.
the dimensional tables from a star schema are organized into a hierarchy by normal-
izing them (Figure 8). Some installations are normalizing data warehouses up to the
third normal form so that they can access the data warehouse to the finest level of
detail. A fact constellation is a set of fact tables that share some dimension tables.
Figure 9 shows a fact constellation with two fact tables, business results and business
forecast. These share the dimension table called product. Fact constellations limit
the possible queries for the warehouse.
Data warehouse storage also utilizes indexing techniques to support high-
performance access. A technique called bitmap indexing constructs a bit vector for
each value in a domain (column) being indexed. It works very well for domains of
1041
Overview of Data Warehousing and OLAP
Fact table I
Business results
Prod_no
Prod_name
Prod_descr
Prod_style
Prod_line
Dimension table
Product
Product
Quarter
Region
Revenue
Fact table II
Business forecast
Product
Future_qtr
Region
Projected_revenue
Figure 9
A fact constellation.
low cardinality. There is a 1 bit placed in the jth position in the vector if the jth row
contains the value being indexed. For example, imagine an inventory of 100,000
cars with a bitmap index on car size. If there are four car sizes—economy, compact,
mid-size, and full-size—there will be four bit vectors, each containing 100,000 bits
(12.5K) for a total index size of 50K. Bitmap indexing can provide considerable
input/output and storage space advantages in low-cardinality domains. With bit
vectors a bitmap index can provide dramatic improvements in comparison, aggre-
gation, and join performance.
In a star schema, dimensional data can be indexed to tuples in the fact table by join
indexing. Join indexes are traditional indexes to maintain relationships between
primary key and foreign key values. They relate the values of a dimension of a star
schema to rows in the fact table. For example, consider a sales fact table that has city
and fiscal quarter as dimensions. If there is a join index on city, for each city the join
index maintains the tuple IDs of tuples containing that city. Join indexes may
involve multiple dimensions.
Data warehouse storage can facilitate access to summary data by taking further
advantage of the nonvolatility of data warehouses and a degree of predictability of
the analyses that will be performed using them. Two approaches have been used: (1)
smaller tables including summary data such as quarterly sales or revenue by product
line, and (2) encoding of level (for example, weekly, quarterly, annual) into existing
tables. By comparison, the overhead of creating and maintaining such aggregations
would likely be excessive in a volatile, transaction-oriented database.
4 Building a Data Warehouse
In constructing a data warehouse, builders should take a broad view of the antici-
pated use of the warehouse. There is no way to anticipate all possible queries or
analyses during the design phase. However, the design should specifically support
ad-hoc querying, that is, accessing data with any meaningful combination of values
for the attributes in the dimension or fact tables. For example, a marketing-
intensive consumer-products company would require different ways of organizing
the data warehouse than would a nonprofit charity focused on fund raising. An
appropriate schema should be chosen that reflects anticipated usage.
1042
Overview of Data Warehousing and OLAP
Acquisition of data for the warehouse involves the following steps:
1. The data must be extracted from multiple, heterogeneous sources, for exam-
ple, databases or other data feeds such as those containing financial market
data or environmental data.
2. Data must be formatted for consistency within the warehouse. Names,
meanings, and domains of data from unrelated sources must be reconciled.
For instance, subsidiary companies of a large corporation may have different
fiscal calendars with quarters ending on different dates, making it difficult to
aggregate financial data by quarter. Various credit cards may report their
transactions differently, making it difficult to compute all credit sales. These
format inconsistencies must be resolved.
3. The data must be cleaned to ensure validity. Data cleaning is an involved and
complex process that has been identified as the largest labor-demanding
component of data warehouse construction. For input data, cleaning must
occur before the data is loaded into the warehouse. There is nothing about
cleaning data that is specific to data warehousing and that could not be
applied to a host database. However, since input data must be examined and
formatted consistently, data warehouse builders should take this opportu-
nity to check for validity and quality. Recognizing erroneous and incomplete
data is difficult to automate, and cleaning that requires automatic error cor-
rection can be even tougher. Some aspects, such as domain checking, are eas-
ily coded into data cleaning routines, but automatic recognition of other
data problems can be more challenging. (For example, one might require
that City = ‘San Francisco’ together with State = ‘CT’ be recognized as an
incorrect combination.) After such problems have been taken care of, similar
data from different sources must be coordinated for loading into the ware-
house. As data managers in the organization discover that their data is being
cleaned for input into the warehouse, they will likely want to upgrade their
data with the cleaned data. The process of returning cleaned data to the
source is called backflushing (see Figure 1).
4. The data must be fitted into the data model of the warehouse. Data from the
various sources must be installed in the data model of the warehouse. Data
may have to be converted from relational, object-oriented, or legacy data-
bases (network and/or hierarchical) to a multidimensional model.
5. The data must be loaded into the warehouse. The sheer volume of data in the
warehouse makes loading the data a significant task. Monitoring tools for
loads as well as methods to recover from incomplete or incorrect loads are
required. With the huge volume of data in the warehouse, incremental
updating is usually the only feasible approach. The refresh policy will proba-
bly emerge as a compromise that takes into account the answers to the fol-
lowing questions:
■ How up-to-date must the data be?
■ Can the warehouse go offline, and for how long?
■ What are the data interdependencies?
1043
Overview of Data Warehousing and OLAP
■ What is the storage availability?
■ What are the distribution requirements (such as for replication and parti-
tioning)?
■ What is the loading time (including cleaning, formatting, copying, trans-
mitting, and overhead such as index rebuilding)?
As we have said, databases must strike a balance between efficiency in transaction
processing and supporting query requirements (ad hoc user requests), but a data
warehouse is typically optimized for access from a decision maker’s needs. Data
storage in a data warehouse reflects this specialization and involves the following
processes:
■ Storing the data according to the data model of the warehouse
■ Creating and maintaining required data structures
■ Creating and maintaining appropriate access paths
■ Providing for time-variant data as new data are added
■ Supporting the updating of warehouse data
■ Refreshing the data
■ Purging data
Although adequate time can be devoted initially to constructing the warehouse, the
sheer volume of data in the warehouse generally makes it impossible to simply
reload the warehouse in its entirety later on. Alternatives include selective (partial)
refreshing of data and separate warehouse versions (requiring double storage capac-
ity for the warehouse!). When the warehouse uses an incremental data refreshing
mechanism, data may need to be periodically purged; for example, a warehouse that
maintains data on the previous twelve business quarters may periodically purge its
data each year.
Data warehouses must also be designed with full consideration of the environment
in which they will reside. Important design considerations include the following:
■ Usage projections
■ The fit of the data model
■ Characteristics of available sources
■ Design of the metadata component
■ Modular component design
■ Design for manageability and change
■ Considerations of distributed and parallel architecture
We discuss each of these in turn. Warehouse design is initially driven by usage pro-
jections; that is, by expectations about who will use the warehouse and how they
will use it. Choice of a data model to support this usage is a key initial decision.
Usage projections and the characteristics of the warehouse’s data sources are both
taken into account. Modular design is a practical necessity to allow the warehouse to
evolve with the organization and its information environment. Additionally, a well-
1044
Overview of Data Warehousing and OLAP
built data warehouse must be designed for maintainability, enabling the warehouse
managers to plan for and manage change effectively while providing optimal sup-
port to users.
Recall the term metadata; metadata: the description of a database including its
schema definition. The metadata repository is a key data warehouse component.
The metadata repository includes both technical and business metadata. The first,
technical metadata, covers details of acquisition processing, storage structures, data
descriptions, warehouse operations and maintenance, and access support function-
ality. The second, business metadata, includes the relevant business rules and orga-
nizational details supporting the warehouse.
The architecture of the organization’s distributed computing environment is a
major determining characteristic for the design of the warehouse.
There are two basic distributed architectures: the distributed warehouse and the
federated warehouse. For a distributed warehouse, all the issues of distributed
databases are relevant, for example, replication, partitioning, communications, and
consistency concerns. A distributed architecture can provide benefits particularly
important to warehouse performance, such as improved load balancing, scalability
of performance, and higher availability. A single replicated metadata repository
would reside at each distribution site. The idea of the federated warehouse is like
that of the federated database: a decentralized confederation of autonomous data
warehouses, each with its own metadata repository. Given the magnitude of the
challenge inherent to data warehouses, it is likely that such federations will consist
of smaller scale components, such as data marts. Large organizations may choose to
federate data marts rather than build huge data warehouses.
5 Typical Functionality
of a Data Warehouse
Data warehouses exist to facilitate complex, data-intensive, and frequent ad hoc
queries. Accordingly, data warehouses must provide far greater and more efficient
query support than is demanded of transactional databases. The data warehouse
access component supports enhanced spreadsheet functionality, efficient query
processing, structured queries, ad hoc queries, data mining, and materialized views.
In particular, enhanced spreadsheet functionality includes support for state-of-the-
art spreadsheet applications (for example, MS Excel) as well as for OLAP applica-
tions programs. These offer preprogrammed functionalities such as the following:
■ Roll-up. Data is summarized with increasing generalization (for example,
weekly to quarterly to annually).
■ Drill-down. Increasing levels of detail are revealed (the complement of roll-
up).
■ Pivot. Cross tabulation (also referred to as rotation) is performed.
■ Slice and dice. Projection operations are performed on the dimensions.
■ Sorting. Data is sorted by ordinal value.
1045
Overview of Data Warehousing and OLAP
■ Selection. Data is available by value or range.
■ Derived (computed) attributes. Attributes are computed by operations on
stored and derived values.
Because data warehouses are free from the restrictions of the transactional environ-
ment, there is an increased efficiency in query processing. Among the tools and
techniques used are query transformation; index intersection and union; special
ROLAP (relational OLAP) and MOLAP (multidimensional OLAP) functions; SQL
extensions; advanced join methods; and intelligent scanning (as in piggy-backing
multiple queries).
Improved performance has also been attained with parallel processing. Parallel
server architectures include symmetric multiprocessor (SMP), cluster, and mas-
sively parallel processing (MPP), and combinations of these.
Knowledge workers and decision makers use tools ranging from parametric queries
to ad hoc queries to data mining. Thus, the access component of the data warehouse
must provide support for structured queries (both parametric and ad hoc).
Together, these make up a managed query environment. Data mining itself uses
techniques from statistical analysis and artificial intelligence. Statistical analysis can
be performed by advanced spreadsheets, by sophisticated statistical analysis soft-
ware, or by custom-written programs. Techniques such as lagging, moving averages,
and regression analysis are also commonly employed. Artificial intelligence tech-
niques, which may include genetic algorithms and neural networks, are used for
classification and are employed to discover knowledge from the data warehouse that
may be unexpected or difficult to specify in queries.
6 Data Warehouse versus Views
Some people have considered data warehouses to be an extension of database views.
Materialized views are one way of meeting requirements for improved access to
data. Materialized views have been explored for their performance enhancement.
Views, however, provide only a subset of the functions and capabilities of data ware-
houses. Views and data warehouses are alike in that they both have read-only
extracts from databases and subject orientation. However, data warehouses are dif-
ferent from views in the following ways:
■ Data warehouses exist as persistent storage instead of being materialized on
demand.
■ Data warehouses are not usually relational, but rather multidimensional.
Views of a relational database are relational.
■ Data warehouses can be indexed to optimize performance. Views cannot be
indexed independent of the underlying databases.
■ Data warehouses characteristically provide specific support of functionality;
views cannot.
1046
Overview of Data Warehousing and OLAP
■ Data warehouses provide large amounts of integrated and often temporal
data, generally more than is contained in one database, whereas views are an
extract of a database.
7 Difficulties of Implementing
Data Warehouses
Some significant operational issues arise with data warehousing: construction,
administration, and quality control. Project management—the design, construc-
tion, and implementation of the warehouse—is an important and challenging con-
sideration that should not be underestimated. The building of an enterprise-wide
warehouse in a large organization is a major undertaking, potentially taking years
from conceptualization to implementation. Because of the difficulty and amount of
lead time required for such an undertaking, the widespread development and
deployment of data marts may provide an attractive alternative, especially to those
organizations with urgent needs for OLAP, DSS, and/or data mining support.
The administration of a data warehouse is an intensive enterprise, proportional to
the size and complexity of the warehouse. An organization that attempts to admin-
ister a data warehouse must realistically understand the complex nature of its
administration. Although designed for read access, a data warehouse is no more a
static structure than any of its information sources. Source databases can be
expected to evolve. The warehouse’s schema and acquisition component must be
expected to be updated to handle these evolutions.
A significant issue in data warehousing is the quality control of data. Both quality
and consistency of data are major concerns. Although the data passes through a
cleaning function during acquisition, quality and consistency remain significant
issues for the database administrator. Melding data from heterogeneous and dis-
parate sources is a major challenge given differences in naming, domain definitions,
identification numbers, and the like. Every time a source database changes, the data
warehouse administrator must consider the possible interactions with other ele-
ments of the warehouse.
Usage projections should be estimated conservatively prior to construction of the
data warehouse and should be revised continually to reflect current requirements.
As utilization patterns become clear and change over time, storage and access paths
can be tuned to remain optimized for support of the organization’s use of its ware-
house. This activity should continue throughout the life of the warehouse in order
to remain ahead of demand. The warehouse should also be designed to accommo-
date the addition and attrition of data sources without major redesign. Sources and
source data will evolve, and the warehouse must accommodate such change.
Fitting the available source data into the data model of the warehouse will be a
continual challenge, a task that is as much art as science. Because there is continual
rapid change in technologies, both the requirements and capabilities of the ware-
house will change considerably over time. Additionally, data warehousing technol-
ogy itself will continue to evolve for some time so that component structures and
1047
Overview of Data Warehousing and OLAP
functionalities will continually be upgraded. This certain change is excellent moti-
vation for having fully modular design of components.
Administration of a data warehouse will require far broader skills than are needed
for traditional database administration. A team of highly skilled technical experts
with overlapping areas of expertise will likely be needed, rather than a single indi-
vidual. Like database administration, data warehouse administration is only partly
technical; a large part of the responsibility requires working effectively with all the
members of the organization with an interest in the data warehouse. However diffi-
cult that can be at times for database administrators, it is that much more challeng-
ing for data warehouse administrators, as the scope of their responsibilities is
considerably broader.
Design of the management function and selection of the management team for a
database warehouse are crucial. Managing the data warehouse in a large organization
will surely be a major task. Many commercial tools are available to support manage-
ment functions. Effective data warehouse management will certainly be a team func-
tion, requiring a wide set of technical skills, careful coordination, and effective
leadership. Just as we must prepare for the evolution of the warehouse, we must also
recognize that the skills of the management team will, of necessity, evolve with it.
8 Summary
In this chapter we surveyed the field known as data warehousing. Data warehousing
can be seen as a process that requires a variety of activities to precede it. In contrast,
data mining may be thought of as an activity that draws knowledge from an existing
data warehouse. We introduced key concepts related to data warehousing and we
discussed the special functionality associated with a multidimensional view of data.
We also discussed the ways in which data warehouses supply decision makers with
information at the correct level of detail, based on an appropriate organization and
perspective.
Review Questions
1. What is a data warehouse? How does it differ from a database?
2. Define the terms: OLAP (online analytical processing), ROLAP (relational
OLAP), MOLAP (multidimensional OLAP), and DSS (decision-support
systems).
3. Describe the characteristics of a data warehouse. Divide them into function-
ality of a warehouse and advantages users derive from it.
4. What is the multidimensional data model? How is it used in data ware-
housing?
5. Define the following terms: star schema, snowflake schema, fact constella-
tion, data marts.
1048
Overview of Data Warehousing and OLAP
6. What types of indexes are built for a warehouse? Illustrate the uses for each
with an example.
7. Describe the steps of building a warehouse.
8. What considerations play a major role in the design of a warehouse?
9. Describe the functions a user can perform on a data warehouse and illustrate
the results of these functions on a sample multidimensional data warehouse.
10. How is the concept of a relational view related to a data warehouse and data
marts? In what way are they different?
11. List the difficulties in implementing a data warehouse.
12. List the open issues and research problems in data warehousing.
Selected Bibliography
Inmon (1992, 2005) is credited for giving the term wide acceptance. Codd and
Salley (1993) popularized the term online analytical processing (OLAP) and
defined a set of characteristics for data warehouses to support OLAP. Kimball
(1996) is known for his contribution to the development of the data warehousing
field. Mattison (1996) is one of the several books on data warehousing that gives a
comprehensive analysis of techniques available in data warehouses and the strate-
gies companies should use in deploying them. Ponniah (2002) gives a very good
practical overview of the data warehouse building process from requirements
collection to deployment maintenance. Bischoff and Alexander (1997) is a compila-
tion of advice from experts. Chaudhuri and Dayal (1997) give an excellent tutorial
on the topic, while Widom (1995) points to a number of outstanding research
problems.
1049
Alternative Diagrammatic
Notations for ER Models
Figure 1 shows a number of different diagrammaticnotations for representing ER (Entity-Relationship)
and EER (Enhanced ER) model concepts. Unfortunately, there is no standard nota-
tion: different database design practitioners prefer different notations. Similarly, var-
ious CASE (computer-aided software engineering) tools and OOA (object-oriented
analysis) methodologies use various notations. Some notations are associated with
models that have additional concepts and constraints beyond those of the ER and
EER models described in the chapters “Data Modeling Using the Entity-Relationship
(ER) Model,” “The Enhanced Entity-Relationship (EER) Model,” and “Relational
Database Design by ER and EER-to-Relational Mapping,” while other models have
fewer concepts and constraints. The notation we used in the ER chapter is quite close
to the original notation for ER diagrams, which is still widely used. We discuss some
alternate notations here.
Figure 1(a) shows different notations for displaying entity types/classes, attributes,
and relationships. In the above mentioned three chapters, we used the symbols
marked (i) in Figure 1(a)—namely, rectangle, oval, and diamond. Notice that sym-
bol (ii) for entity types/classes, symbol (ii) for attributes, and symbol (ii) for rela-
tionships are similar, but they are used by different methodologies to represent three
different concepts. The straight line symbol (iii) for representing relationships is
used by several tools and methodologies.
Figure 1(b) shows some notations for attaching attributes to entity types. We used
notation (i). Notation (ii) uses the third notation (iii) for attributes from Figure
1(a). The last two notations in Figure 1(b)—(iii) and (iv)—are popular in OOA
methodologies and in some CASE tools. In particular, the last notation displays
both the attributes and the methods of a class, separated by a horizontal line.
appendix:
From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
1050
Appendix: Alternative Diagrammatic Notations for ER Models
Entity type/class symbols E(i) E(ii)
Attribute symbols (i) (ii)
Relationship symbols (i) (ii)
(iii)
(iii)
(a)
A
R
A A
R R
(b)
Ssn
Name
Address
.
.
.
EMPLOYEE(ii)
EMPLOYEE
Ssn(i)
Name
Address . . .
.
.
.
(iii)
Ssn
Name
Address
EMPLOYEE
.
.
.
.
.
.
(iv)
Ssn
Name
Address
Hire_emp
Fire_emp
EMPLOYEE
(c) (i)
(ii)
1 N
(iii)
(iv)
(v)
(vi) *
(d) (i)
(ii)
1 N
(0,n) (1,1)
(0,n)(1,1)
(iii)
(iv)
(e) (i)
(iv) C
(ii) (iii)C
S2S1 S3
d o
S2S1 S3
G
Gs
C
S2S1 S3
C
S2S1 S3
(v) (vi)C
S2S1 S3
C
S2S1 S3
G
(v)
0..n 1..1
Figure 1
Alternative notations. (a) Symbols for entity type/class, attribute, and relationship. (b) Displaying attrib-
utes. (c) Displaying cardinality ratios. (d) Various (min, max) notations. (e) Notations for
displaying specialization/generalization.
1051
Appendix: Alternative Diagrammatic Notations for ER Models
Figure 1(c) shows various notations for representing the cardinality ratio of binary
relationships. We used notation (i) in the three chapters. Notation (ii)—known as
the chicken feet notation—is quite popular. Notation (iv) uses the arrow as a func-
tional reference (from the N to the 1 side) and resembles our notation for foreign
keys in the relational model; notation (v)—used in Bachman diagrams and the net-
work data model—uses the arrow in the reverse direction (from the 1 to the N side).
For a 1:1 relationship, (ii) uses a straight line without any chicken feet; (iii) makes
both halves of the diamond white; and (iv) places arrowheads on both sides. For an
M:N relationship, (ii) uses chicken feet at both ends of the line; (iii) makes both
halves of the diamond black; and (iv) does not display any arrowheads.
Figure 1(d) shows several variations for displaying (min, max) constraints, which
are used to display both cardinality ratio and total/partial participation. We mostly
used notation (i). Notation (ii) is the alternative notation we used in Figure 15 and
discussed in Section 7.4 of the ER chapter. Recall that our notation specifies the
constraint that each entity must participate in at least min and at most max rela-
tionship instances. Hence, for a 1:1 relationship, both max values are 1; for M:N,
both max values are n. A min value greater than 0 (zero) specifies total participation
(existence depen-dency). In methodologies that use the straight line for displaying
relationships, it is common to reverse the positioning of the (min, max) constraints,
as shown in (iii); a variation common in some tools (and in UML notation) is
shown in (v). Another popular technique—which follows the same positioning as
(iii)—is to display the min as o (“oh” or circle, which stands for zero) or as | (verti-
cal dash, which stands for 1), and to display the max as | (vertical dash, which stands
for 1) or as chicken feet (which stands for n), as shown in (iv).
Figure 1(e) shows some notations for displaying specialization/generalization. We
used notation (i) in the EER chapter, where a d in the circle specifies that the sub-
classes (S1, S2, and S3) are disjoint and an o in the circle specifies overlapping sub-
classes. Notation (ii) uses G (for generalization) to specify disjoint, and Gs to specify
overlapping; some notations use the solid arrow, while others use the empty arrow
(shown at the side). Notation (iii) uses a triangle pointing toward the superclass,
and notation (v) uses a triangle pointing toward the subclasses; it is also possible to
use both notations in the same methodology, with (iii) indicating generalization
and (v) indicating specialization. Notation (iv) places the boxes representing sub-
classes within the box representing the superclass. Of the notations based on (vi),
some use a single-lined arrow, and others use a double-lined arrow (shown at the
side).
The notations shown in Figure 1 show only some of the diagrammatic symbols that
have been used or suggested for displaying database conceptual schemes. Other
notations, as well as various combinations of the preceding, have also been used. It
would be useful to establish a standard that everyone would adhere to, in order to
prevent misunderstandings and reduce confusion.
1052
This page intentionally left blank
Parameters of Disks
The most important disk parameter is the timerequired to locate an arbitrary disk block, given its
block address, and then to transfer the block between the disk and a main memory
buffer. This is the random access time for accessing a disk block. There are three
time components to consider as follows:
1. Seek time (s). This is the time needed to mechanically position the
read/write head on the correct track for movable-head disks. (For fixed-head
disks, it is the time needed to electronically switch to the appropriate
read/write head.) For movable-head disks, this time varies, depending on the
distance between the current track under the read/write head and the track
specified in the block address. Usually, the disk manufacturer provides an
average seek time in milliseconds. The typical range of average seek time is 4
to 10 msec. This is the main culprit for the delay involved in transferring
blocks between disk and memory.
2. Rotational delay (rd). Once the read/write head is at the correct track, the
user must wait for the beginning of the required block to rotate into position
under the read/write head. On average, this takes about the time for half a
revolution of the disk, but it actually ranges from immediate access (if the
start of the required block is in position under the read/write head right after
the seek) to a full disk revolution (if the start of the required block just passed
the read/write head after the seek). If the speed of disk rotation is p revolu-
tions per minute (rpm), then the average rotational delay rd is given by
rd = (1/2) * (1/p) min = (60 * 1000)/(2 * p) msec = 30000/p msec
A typical value for p is 10,000 rpm, which gives a rotational delay of rd = 3
msec. For fixed-head disks, where the seek time is negligible, this component
causes the greatest delay in transferring a disk block.
appendix:
From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
1054
Appendix: Parameters of Disks
3. Block transfer time (btt). Once the read/write head is at the beginning of
the required block, some time is needed to transfer the data in the block.
This block transfer time depends on the block size, track size, and rotational
speed. If the transfer rate for the disk is tr bytes/msec and the block size is B
bytes, then
btt = B/tr msec
If we have a track size of 50 Kbytes and p is 3600 rpm, then the transfer rate
in bytes/msec is
tr = (50 * 1000)/(60 * 1000/3600) = 3000 bytes/msec
In this case, btt = B/3000 msec, where B is the block size in bytes.
The average time (s) needed to find and transfer a block, given its block address, is
estimated by
(s + rd + btt) msec
This holds for either reading or writing a block. The principal method of reducing
this time is to transfer several blocks that are stored on one or more tracks of the
same cylinder; then the seek time is required for the first block only. To transfer con-
secutively k noncontiguous blocks that are on the same cylinder, we need approxi-
mately
s + (k * (rd + btt)) msec
In this case, we need two or more buffers in main storage because we are continu-
ously reading or writing the k blocks. The transfer time per block is reduced even
further when consecutive blocks on the same track or cylinder are transferred. This
eliminates the rotational delay for all but the first block, so the estimate for transfer-
ring k consecutive blocks is
s + rd + (k * btt) msec
A more accurate estimate for transferring consecutive blocks takes into account the
interblock gap, which includes the information that enables the read/write head to
determine which block it is about to read. Usually, the disk manufacturer provides a
bulk transfer rate (btr) that takes the gap size into account when reading consecu-
tively stored blocks. If the gap size is G bytes, then
btr = (B/(B + G)) * tr bytes/msec
The bulk transfer rate is the rate of transferring useful bytes in the data blocks. The
disk read/write head must go over all bytes on a track as the disk rotates, including
the bytes in the interblock gaps, which store control information but not real data.
When the bulk transfer rate is used, the time needed to transfer the useful data in
one block out of several consecutive blocks is B/btr. Hence, the estimated time to
read k blocks consecutively stored on the same cylinder becomes
s + rd + (k * (B/btr)) msec
1055
for one disk revolution. If we know that the buffer is ready for rewriting, the system
can keep the disk heads on the same track, and during the next disk revolution the
updated buffer is rewritten back to the disk block. Hence, the rewrite time Trw, is
usually estimated to be the time needed for one disk revolution:
Trw = 2 * rd msec = 60000/p msec
To summarize, the following is a list of the parameters we have discussed and the
symbols we use for them:
Seek time: s msec
Rotational delay: rd msec
Block transfer time: btt msec
Rewrite time: Trw msec
Transfer rate: tr bytes/msec
Bulk transfer rate: btr bytes/msec
Block size: B bytes
Interblock gap size: G bytes
Disk speed: p rpm (revolutions per minute)
Another parameter of disks is the rewrite time. This is useful in cases when we read
a block from the disk into a main memory buffer, update the buffer, and then write
the buffer back to the same disk block on which it was stored. In many cases, the
time required to update the buffer in main memory is less than the time required
Appendix: Parameters of Disks
1056
This page intentionally left blank
Overview of the QBE
Language
The Query-By-Example (QBE) language is impor-tant because it is one of the first graphical query
languages with minimum syntax developed for database systems. It was developed
at IBM Research and is available as an IBM commercial product as part of the QMF
(Query Management Facility) interface option to DB2. The language was also
implemented in the Paradox DBMS, and is related to a point-and-click type inter-
face in the Microsoft Access DBMS. It differs from SQL in that the user does not
have to explicitly specify a query using a fixed syntax; rather, the query is formulated
by filling in templates of relations that are displayed on a monitor screen. Figure 1
shows how these templates may look for the database. The user does not have to
remember the names of attributes or relations because they are displayed as part of
these templates. Additionally, the user does not have to follow rigid syntax rules for
query specification; rather, constants and variables are entered in the columns of the
templates to construct an example related to the retrieval or update request. QBE is
related to the domain relational calculus, as we shall see, and its original specifica-
tion has been shown to be relationally complete.
1 Basic Retrievals in QBE
In QBE retrieval queries are specified by filling in one or more rows in the templates
of the tables. For a single relation query, we enter either constants or example ele-
ments (a QBE term) in the columns of the template of that relation. An example
element stands for a domain variable and is specified as an example value preceded
by the underscore character (_). Additionally, a P. prefix (called the P dot operator)
is entered in certain columns to indicate that we would like to print (or display)
appendix:
From Appendix of Fundamentals of Database Systems, Sixth Edition. Ramez Elmasri and
Shamkant B. Navathe. Copyright © 2011 by Pearson Education, Inc. Published by Addison-
Wesley. All rights reserved.
1058
Appendix: Overview of the QBE Language
values in those columns for our result. The constants specify values that must be
exactly matched in those columns.
For example, consider the query Q0: Retrieve the birth date and address of John B.
Smith. In Figures 2(a) through 2(d) we show how this query can be specified in a
progressively more terse form in QBE. In Figure 2(a) an example of an employee is
presented as the type of row that we are interested in. By leaving John B. Smith as
constants in the Fname, Minit, and Lname columns, we are specifying an exact match
in those columns. The rest of the columns are preceded by an underscore indicating
that they are domain variables (example elements). The P. prefix is placed in the
Bdate and Address columns to indicate that we would like to output value(s) in
those columns.
Q0 can be abbreviated as shown in Figure 2(b). There is no need to specify example
values for columns in which we are not interested. Moreover, because example val-
ues are completely arbitrary, we can just specify variable names for them, as shown
in Figure 2(c). Finally, we can also leave out the example values entirely, as shown in
Figure 2(d), and just specify a P. under the columns to be retrieved.
To see how retrieval queries in QBE are similar to the domain relational calculus,
compare Figure 2(d) with Q0 (simplified) in domain calculus as follows:
Q0 : { uv | EMPLOYEE(qrstuvwxyz) and q=‘John’ and r=‘B’ and s=‘Smith’}
DEPARTMENT
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
EMPLOYEE
DEPT_LOCATIONS
Dnumber Dlocation
PROJECT
Pname Pnumber Plocation Dnum
WORKS_ON
Essn Pno Hours
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
Dname Dnumber Mgr_ssn Mgr_start_date
Figure 1
The relational schema of Figure 5 in
the chapter “The Relational Data
Model and Relational Database
Constraints” as it may be displayed
by QBE.
1059
We can think of each column in a QBE template as an implicit domain variable;
hence, Fname corresponds to the domain variable q, Minit corresponds to r, ..., and
Dno corresponds to z. In the QBE query, the columns with P. correspond to variables
specified to the left of the bar in domain calculus, whereas the columns with con-
stant values correspond to tuple variables with equality selection conditions on
them. The condition EMPLOYEE(qrstuvwxyz) and the existential quantifiers are
implicit in the QBE query because the template corresponding to the EMPLOYEE
relation is used.
In QBE, the user interface first allows the user to choose the tables (relations)
needed to formulate a query by displaying a list of all relation names. Then the tem-
plates for the chosen relations are displayed. The user moves to the appropriate
columns in the templates and specifies the query. Special function keys are provided
to move among templates and perform certain functions.
We now give examples to illustrate basic facilities of QBE. Comparison operators
other than = (such as > or ≥) may be entered in a column before typing a constant
value. For example, the query Q0A: List the social security numbers of employees who
work more than 20 hours per week on project number 1 can be specified as shown in
Figure 3(a). For more complex conditions, the user can ask for a condition box,
which is created by pressing a particular function key. The user can then type the
complex condition.1
Appendix: Overview of the QBE Language
EMPLOYEE(a)
(b)
(c)
(d)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
John B Smith _123456789 P._9/1/60 P._100 Main, Houston, TX _M _25000 _123456789 _3
EMPLOYEE
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
John B Smith P._9/1/60 P._100 Main, Houston, TX
EMPLOYEE
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
John B Smith P._X P._Y
EMPLOYEE
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
John B Smith P. P.
Figure 2
Four ways to specify the query Q0 in QBE.
1Negation with the ¬ symbol is not allowed in a condition box.
1060
Appendix: Overview of the QBE Language
For example, the query Q0B: List the social security numbers of employees who work
more than 20 hours per week on either project 1 or project 2 can be specified as shown
in Figure 3(b).
Some complex conditions can be specified without a condition box. The rule is that
all conditions specified on the same row of a relation template are connected by the
and logical connective (all must be satisfied by a selected tuple), whereas conditions
specified on distinct rows are connected by or (at least one must be satisfied).
Hence, Q0B can also be specified, as shown in Figure 3(c), by entering two distinct
rows in the template.
Now consider query Q0C: List the social security numbers of employees who work on
both project 1 and project 2; this cannot be specified as in Figure 4(a), which lists
those who work on either project 1 or project 2. The example variable _ES will bind
itself to Essn values in <–, 1, –> tuples as well as to those in <–, 2, –> tuples. Figure
4(b) shows how to specify Q0C correctly, where the condition (_EX = _EY) in the
box makes the _EX and _EY variables bind only to identical Essn values.
In general, once a query is specified, the resulting values are displayed in the template
under the appropriate columns. If the result contains more rows than can be dis-
played on the screen, most QBE implementations have function keys to allow scroll-
ing up and down the rows. Similarly, if a template or several templates are too wide to
appear on the screen, it is possible to scroll sideways to examine all the templates.
A join operation is specified in QBE by using the same variable2 in the columns to
be joined. For example, the query Q1: List the name and address of all employees who
2A variable is called an example element in QBE manuals.
WORKS_ON
(a) Essn Pno Hours
P. > 20
WORKS_ON
(b) Essn Pno Hours
P. _PX _HX
_HX > 20 and (PX = 1 or PX = 2)
CONDITIONS
WORKS_ON
(c) Essn Pno Hours
P. > 201
P. > 202
Figure 3
Specifying complex conditions
in QBE. (a) The query Q0A.
(b) The query Q0B with a
condition box. (c) The query
Q0B without a condition box.
1061
Appendix: Overview of the QBE Language
work for the ‘Research’ department can be specified as shown in Figure 5(a). Any
number of joins can be specified in a single query. We can also specify a result table
to display the result of the join query, as shown in Figure 5(a); this is needed if the
result includes attributes from two or more relations. If no result table is specified,
the system provides the query result in the columns of the various relations, which
may make it difficult to interpret. Figure 5(a) also illustrates the feature of QBE for
specifying that all attributes of a relation should be retrieved, by placing the P. oper-
ator under the relation name in the relation template.
To join a table with itself, we specify different variables to represent the different ref-
erences to the table. For example, query Q8: For each employee retrieve the employee’s
first and last name as well as the first and last name of his or her immediate supervisor
can be specified as shown in Figure 5(b), where the variables starting with E refer to
an employee and those starting with S refer to a supervisor.
2 Grouping, Aggregation, and Database
Modification in QBE
Next, consider the types of queries that require grouping or aggregate functions. A
grouping operator G. can be specified in a column to indicate that tuples should be
grouped by the value of that column. Common functions can be specified, such as
AVG., SUM., CNT. (count), MAX., and MIN. In QBE the functions AVG., SUM., and
CNT. are applied to distinct values within a group in the default case. If we want
these functions to apply to all values, we must use the prefix ALL.3 This convention
is different in SQL, where the default is to apply a function to all values.
WORKS_ON
(a) Essn Pno Hours
P._ES 1
P._ES 2
WORKS_ON
(b) Essn Pno Hours
P._EX 1
P._EY 2
_EX = _EY
CONDITIONS
Figure 4
Specifying EMPLOYEES who work
on both projects. (a) Incorrect
specification of an AND condition.
(b) Correct specification.
3ALL in QBE is unrelated to the universal quantifier.
1062
Appendix: Overview of the QBE Language
Figure 6(a) shows query Q23, which counts the number of distinct salary values in
the EMPLOYEE relation. Query Q23A (Figure 6(b) counts all salary values, which is
the same as counting the number of employees. Figure 6(c) shows Q24, which
retrieves each department number and the number of employees and average salary
within each department; hence, the Dno column is used for grouping as indicated by
the G. function. Several of the operators G., P., and ALL can be specified in a single
column. Figure 6(d) shows query Q26, which displays each project name and the
number of employees working on it for projects on which more than two employees
work.
QBE has a negation symbol, ¬, which is used in a manner similar to the NOT EXISTS
function in SQL. Figure 7 shows query Q6, which lists the names of employees who
have no dependents. The negation symbol ¬ says that we will select values of the
_SX variable from the EMPLOYEE relation only if they do not occur in the
DEPENDENT relation. The same effect can be produced by placing a ¬ _SX in the
Essn column.
Although the QBE language as originally proposed was shown to support
the equivalent of the EXISTS and NOT EXISTS functions of SQL, the QBE imple-
mentation in QMF (under the DB2 system) does not provide this support. Hence,
the QMF version of QBE, which we discuss here, is not relationally complete.
Queries such as Q3: Find employees who work on all projects controlled by depart-
ment 5 cannot be specified.
EMPLOYEE(a)
(b)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
_FN
Research
P. _FN _LN _Addr
_DX
_LN _Addr _DX
DEPARTMENT
Dname Dnumber Mgrssn Mgr_start_date
RESULT
P. _E1 _E2 _S1
RESULT
_S2
EMPLOYEE
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
_E1 _E2 _Xssn
_S1 _S2 _Xssn
Figure 5
Illustrating JOIN and result relations in QBE. (a) The query Q1. (b) The query Q8.
1063
Appendix: Overview of the QBE Language
There are three QBE operators for modifying the database: I. for insert, D. for delete,
and U. for update. The insert and delete operators are specified in the template col-
umn under the relation name, whereas the update operator is specified under the
columns to be updated. Figure 8(a) shows how to insert a new EMPLOYEE tuple. For
deletion, we first enter the D. operator and then specify the tuples to be deleted by a
condition (Figure 8(b)). To update a tuple, we specify the U. operator under the
attribute name, followed by the new value of the attribute. We should also select the
tuple or tuples to be updated in the usual way. Figure 8(c) shows an update request
EMPLOYEE(a)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
P.CNT.
EMPLOYEE(b)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
P.CNT.ALL
EMPLOYEE(c)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
P.AVG.ALL P.G.P.CNT.ALL
PROJECT(d)
Pname Pnumber Plocation
_PXP.
Dnum
WORKS_ON
Essn Pno Hours
P.CNT.EX G._PX
CNT._EX > 2
CONDITIONS
Figure 6
Functions and grouping in QBE. (a)
The query Q23. (b) The query Q23A.
(c) The query Q24. (d) The query Q26.
EMPLOYEE
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
P. P. _SX
DEPENDENT
Essn Dependent_name Sex Bdate Relationship
_SX
Figure 7
Illustrating negation by the query Q6.
1064
Appendix: Overview of the QBE Language
to increase the salary of ‘John Smith’ by 10 percent and also to reassign him to
department number 4.
QBE also has data definition capabilities. The tables of a database can be specified
interactively, and a table definition can also be updated by adding, renaming, or
removing a column. We can also specify various characteristics for each column,
such as whether it is a key of the relation, what its data type is, and whether an index
should be created on that field. QBE also has facilities for view definition, authoriza-
tion, storing query definitions for later use, and so on.
QBE does not use the linear style of SQL; rather, it is a two-dimensional language
because users specify a query moving around the full area of the screen. Tests on
users have shown that QBE is easier to learn than SQL, especially for nonspecialists.
In this sense, QBE was the first user-friendly visual relational database language.
More recently, numerous other user-friendly interfaces have been developed for
commercial database systems. The use of menus, graphics, and forms is now
becoming quite common. Filling forms partially to issue a search request is akin to
using QBE. Visual query languages, which are still not so common, are likely to be
offered with commercial relational databases in the future.
EMPLOYEE(a)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
RichardI. MariniK M 37000 987654321 4653298653 30-Dec-52 98 Oak Forest, Katy, TX
EMPLOYEE(b)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
D. 653298653
EMPLOYEE(c)
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno
John Smith U._S*1.1 U.4
Figure 8
Modifying the database in QBE. (a) Insertion. (b) Deletion. (c) Update in QBE.
1065
This page intentionally left blank
Index
Index
Page references followed by “f” indicate illustrated
figures or photographs; followed by “t” indicates a
table.
/, 53, 56, 101, 280, 375-376, 423, 425, 431-434,
437-438, 440, 443-445, 462-463, 465,
468-470, 472, 475-478, 483, 491-493,
501-502, 504, 635, 693, 720, 858, 863,
914-915, 970-971, 1015, 1020, 1054-1055
//, 437-438, 492, 501-502
}, 5, 140, 151, 161, 164, 178-179, 181-188, 209,
238-239, 265, 296, 383-385, 391, 393,
396-398, 462-463, 467-470, 472, 475,
477-478, 482, 492-493, 496, 501, 504,
528-529, 545, 547-548, 552, 557-558,
561-563, 565-569, 577, 585, 600, 689,
696-697, 767, 952, 1059
<>, 93, 117-118
!=, 93, 495
<=, 92-93, 101, 118, 495, 691, 718-719, 960, 970-971
!, 93, 495
&, 342, 425
<, 61, 63, 72-73, 77, 88-90, 92-93, 101, 109, 117-118,
140, 151, 153, 161, 179, 186, 270, 296-297,
337-338, 341, 346, 423-426, 429-433,
439-440, 444-445, 461, 491-496, 578-580,
608-609, 620-621, 640, 651, 653-655,
657-658, 660-662, 666-668, 696-697, 701,
718-719, 744, 826, 905, 923, 970-971, 978,
985
||, 87, 101, 858, 1002
==, 462-463, 495
>, 52, 61, 63, 77, 89-90, 93, 101, 109, 117-118,
128-129, 133, 151, 161, 179, 186, 227, 234,
266, 270, 275, 296, 337-338, 346, 378, 380,
400-402, 408, 410-411, 425-426, 429-434,
443-445, 491-492, 494-496, 602, 604,
642-646, 660-662, 667-668, 672, 690-691,
696-697, 700-701, 718-720, 727, 730, 777,
793-795, 812, 848, 933, 970-971, 977, 985
::, 140, 935
+, 8, 17, 36, 40, 83, 88, 93, 101, 158, 352, 358-359,
365, 371, 381-382, 394, 402, 411-416, 435,
454, 457, 459, 468-471, 477-478, 567, 608,
640, 659, 661-662, 672, 689-690, 696-702,
718-723, 773, 777, 786-787, 953, 970, 989,
1005-1006, 1020
/=, 971
>=, 93, 101, 118, 495, 691, 718-719, 970-971, 977
1
1984, 143, 355, 623, 668, 679, 746, 875, 924-926,
988-989
3
3D objects, 25, 988
3NF, 350, 524-526, 530-537, 541, 543-549, 551-552,
559-560, 565-569, 573-574, 582, 584-586,
738, 741, 746
4
4NF, 524-525, 530, 539-541, 574-577, 582, 737
A
Abort, 755-757, 760-762, 765, 776, 789-791, 822, 825,
834, 907-909
abstract, 11, 22, 60-61, 106, 244, 266, 269-271, 274,
277, 365, 374, 411, 414, 929, 958
BASIC, 61, 106, 266, 269, 400, 411, 414, 929
exception, 269
float, 411
generalization, 266, 269-270, 274
generic, 365
inheritance, 22, 266, 274, 374, 400, 414
instance, 269, 271, 365
long, 60, 411
members, 266, 277
name, 11, 60-61, 244, 270-271, 365, 374
pointers, 22
primitive, 271
short, 411
specialization, 266, 269-270, 274
subclass, 266, 269-270, 274
superclass, 266, 269-270, 274
abstract classes, 266
Abstraction, 8-11, 21, 28, 251, 268-271, 275
acceptability, 313, 331, 841-842
Access:, 862
access, 1-5, 8-11, 13-14, 16-17, 21-22, 25, 27, 29, 32,
34, 37-38, 40-42, 44-46, 48, 55-56, 105-106,
204, 229, 276, 311-312, 329-334, 348, 365,
367, 399, 410, 460, 471, 476-477, 493-494,
499-500, 502, 504, 589-590, 592, 597, 601,
603-608, 614, 616-617, 622-623, 625,
628-630, 633-634, 636-638, 640-641, 647,
649-650, 654-655, 657-658, 665-668,
673-676, 678-679, 682, 690-695, 706,
716-717, 719, 724-728, 733-736, 748-752,
766-767, 774, 781-783, 787-788, 796,
804-805, 828, 836-845, 851-860, 862-863,
868-873, 875-876, 882-884, 908-909, 911,
913-914, 918-919, 926, 930, 961, 997-998,
1017-1018, 1027-1028, 1044-1047
methods, 48, 329, 334, 348, 454, 476, 606, 673,
690-694, 714, 717, 719, 724, 758, 863,
872, 908, 911, 997, 1046
Access control, 836, 838, 840-843, 848, 851-856, 862,
868-873, 875-876
database, 836, 838, 840-843, 848, 851-856, 862,
868-873, 875-876
policies of, 841
Access time, 332, 610-611, 623, 629-630, 691, 1054
access times, 611, 623, 682
accessibility, 1037
accuracy, 235, 336
ACM, 26, 81, 285, 779, 891
Action, 76, 91, 105, 107, 132-133, 244, 291, 293,
338-340, 492-494, 498, 745, 757, 812,
825-826, 837, 854, 869, 875, 931-932,
934-940, 981
action attribute, 493
Active records, 634
Actors, 12, 25, 243-244, 336-337, 342
add method, 728
algorithm, 728
efficiency of, 728
adding, 15, 31, 33, 122, 137-139, 322, 324, 328,
366-368, 400, 403, 553, 627, 726, 773, 777,
844, 857, 882, 909, 939-940, 945, 951
security, 122, 844, 857, 951
Addition, 3, 14-16, 22-23, 25, 28, 38, 48, 52, 55, 69,
85, 88, 92, 101, 105-106, 152, 177, 180,
229, 269, 273-274, 277, 316-317, 365, 374,
387, 391, 399, 411, 413, 425, 447, 466,
483-484, 559, 627, 715-716, 721, 727, 762,
773, 828, 848-849, 869, 882-883, 936,
953-955, 1047
address, 2, 58-62, 67-68, 71, 80, 86, 93-97, 100, 108,
110-111, 123-124, 139, 143-145, 151, 154,
159, 167-168, 174-175, 179, 181-182,
187-188, 191-192, 199-200, 205-209,
212-213, 227-229, 236-237, 244, 253, 256,
260, 262-263, 267, 277-279, 282-283,
288-290, 305-306, 346, 349, 351, 354, 356,
361-362, 366, 369, 396, 418-419, 435,
449-453, 462, 467-468, 486, 489, 507,
511-513, 531, 552, 571-572, 595-597, 607,
611-617, 630-632, 666-668, 677, 706-707,
714-715, 718, 731-732, 812-813, 846-847,
865, 867-868, 877, 925, 927-928, 956,
993-994, 1022, 1025, 1054-1055, 1063-1065
Address space, 612-616
Addresses, 2-3, 12, 62, 191-192, 195, 277, 598, 603,
608, 611, 613-615, 617, 629, 637, 654,
666-667, 673, 816, 853, 911, 994, 1024,
1027
base, 1024
fields, 603, 608, 637
Internet, 853
introduction to, 2, 994, 1010, 1024, 1027
IP, 629, 1024
logical, 62, 613, 637, 673
main memory, 598, 603, 611
map, 667
memory, 598, 603, 611, 629, 635, 816
network, 598, 1010
number of, 191-192, 195, 208, 603, 608, 613-615,
617, 619, 654, 666-667
partitioning, 637
physical, 608, 637, 673
real, 2, 62
relative, 614-615
TCP, 629
virtual, 1024
Addressing, 613-614, 853
blocks, 614
level, 853
scope, 613
adjusting, 714
Administrator, 13, 84, 310, 839, 841-842, 853-854,
856, 870, 873, 919, 950, 1047
Administrators, 13, 311, 733-734, 839, 853, 870, 1048
Adobe, 892
Advanced Encryption Standard (AES), 864
Agent, 11, 852, 916, 1023, 1029, 1036
aggregation, 124, 189, 228-230, 269-272, 274-276,
285, 356, 520, 543, 955, 1037, 1042, 1062
dependency, 520, 543
Exception, 269
objects and, 270-271
String, 886
use of, 271, 356, 886, 1042
Algebra, 55, 71, 75, 81, 82-83, 92-93, 97, 99, 148-200,
245, 296, 301, 416, 686-687, 690, 704-708,
711, 713-714, 730, 895, 902-904, 907, 953,
978-979, 987, 989, 996
algebraic, 71, 686, 708, 713-714, 723-724, 728, 987
algorithm, 198, 287-288, 293, 296, 299, 301, 304, 372,
399, 415, 510, 538, 551, 555-558, 560-569,
573-574, 577, 582, 584, 586, 608, 610,
612-613, 621, 635, 648-649, 660, 662, 665,
674, 678-679, 688-690, 694-695, 698-702,
704-705, 713-715, 723, 728-730, 761-762,
790-794, 810, 814-815, 819-822, 826, 839,
864-866, 912, 989, 1003, 1005, 1019-1020,
1031-1033
merge, 610, 688-689, 694-695, 699, 702, 704,
729-730
representation of, 705
algorithms, 39, 56, 287, 301, 304, 350, 372, 436, 510,
521, 550-586, 613-614, 619, 633-635, 646,
654, 656, 659-660, 665, 668, 675, 679,
684-730, 732, 783, 807-808, 810-811, 821,
1067
831, 835, 864-865, 873, 901-902, 905,
924-926, 962, 988, 995-996, 1001, 1016,
1018-1020, 1027, 1031-1033
algorithms:, 613
analysis of, 635, 679, 926, 1018, 1029
Data Encryption Standard (DES), 864
decryption, 864-865
encryption, 839, 864-865, 873
graphs, 706, 708, 807
mathematical, 555, 864
properties of, 510, 551, 558, 568, 573, 582-583,
1016, 1019
queue, 783
recursive, 651, 665
set, 350, 510, 521, 550-562, 564-569, 573-575,
577-579, 582-586, 654, 660, 665, 668,
689, 692, 696-697, 701-705, 711-712,
719, 728-729, 794, 962, 996, 1001, 1023,
1031
statements, 39
aliases, 96-97, 119, 178, 941
Alice, 68, 111, 159, 199, 305, 732, 990
alignment, 1023
ALL, 1-4, 7-9, 11-13, 15-16, 19-20, 22, 24, 26, 27, 34,
36, 42-43, 46-47, 51-52, 57, 59, 63-67, 70,
73-77, 87, 90-93, 95, 97-105, 107-108,
117-126, 128-130, 132-133, 135, 137-139,
141, 151-152, 154-158, 160-161, 163-167,
169-176, 178-181, 183-187, 189-196, 204,
210-212, 219, 221-223, 228, 232, 234, 237,
239, 241, 246-247, 251-253, 255-262,
264-266, 274-279, 281, 283, 289-291,
294-295, 297-301, 313-314, 318-319, 321,
325, 328, 330-331, 339, 348, 354-355,
360-363, 365, 367-373, 379-383, 386-389,
391-392, 399-401, 403-408, 411-412,
420-422, 430, 434, 436-438, 446-447, 465,
483-484, 495-496, 503-504, 516-517,
521-524, 540-545, 555-563, 566-567, 572,
577-578, 584, 598-601, 604-608, 617,
620-622, 624-625, 634, 670-674, 677-678,
700-703, 711-713, 717-719, 726-727,
740-746, 749-752, 754-765, 770-772, 775,
787-790, 792-801, 803-804, 814-816,
840-846, 848-851, 858-859, 870-872,
874-875, 879-880, 895-898, 902-913,
916-917, 920, 922, 924, 940-941, 956-961,
966-968, 970-982, 987-988, 1020-1022,
1060-1064
Amazon, 3, 915
American National Standards Institute (ANSI), 53, 83
ampersand, 425
analog, 189
data, 189
Ancestor, 438, 986
complete, 986
descendant, 438
node, 438
parent, 438, 986
root, 438
Anchor text, 1019
AND, 1-26, 27-54, 55-81, 82-85, 87-93, 95-109, 112,
115-147, 148-200, 201-215, 217-224,
226-239, 241-245, 246-266, 268-285,
287-308, 309-356, 357-419, 420-428, 430,
433-442, 444-448, 454-487, 490-506,
508-549, 550-586, 636-638, 640-641,
645-649, 651-679, 684-730, 732, 733-746,
780-809, 836-876, 900-927, 934-989,
1034-1049, 1058-1065
AND function, 129, 406, 1008
AND operation, 14, 360, 406, 409
anonymous, 919
ANSI, 31, 33, 53, 83, 143, 355
ANY, 3, 7-9, 13, 16-17, 28, 30-31, 35, 38, 45-46,
48-49, 59-60, 62, 64-65, 69, 72, 75-76,
79-80, 85, 91, 95, 97, 99-100, 104, 125,
134-136, 138-139, 141, 152-157, 160-161,
164, 173, 177-178, 185, 188-189, 191, 193,
195-196, 198, 210-212, 220, 222, 231,
233-234, 236-237, 239, 241, 248-249,
253-254, 264-266, 278, 280-281, 292-294,
296-301, 314, 316-318, 320-322, 332,
336-337, 348, 350, 363, 377, 392, 399,
401-405, 408-409, 412, 434-435, 459-460,
464-466, 471-474, 476-477, 481, 501-502,
508, 524-526, 539-542, 545-546, 551-557,
559-562, 568, 578-579, 604-605, 627-628,
630-631, 636-637, 666-668, 672-673,
691-694, 708, 711-713, 720, 734-735,
756-763, 765-769, 772-773, 775, 782-784,
793-794, 796-801, 811-814, 818-821,
824-826, 829-833, 851-853, 861-864, 881,
911-913, 915-916, 962, 970-972, 977,
979-980
API, 45, 413, 428, 456, 471, 476, 870
apostrophe, 100, 425
application, 3-5, 7-11, 14, 18-25, 27-28, 31, 33-35,
38-46, 48, 52, 64, 70-71, 75, 78, 82, 84, 109,
201-205, 230, 232, 237, 239, 241, 247, 262,
264, 275-276, 278, 280, 285, 312-315,
317-321, 333-335, 339-342, 344, 351-353,
355, 363, 367, 370-371, 374, 389-390, 401,
413, 415, 428, 455-456, 461, 476-477,
480-481, 545-546, 561, 727, 851, 853-855,
864, 867-870, 877-879, 882, 885-886,
891-894, 907, 914-920, 924-925, 940-943,
951, 957-959, 962-963, 1016, 1018,
1022-1024, 1026-1027
application layer, 892-893
application log, 1027
application programming interface, 45, 428, 456, 471,
476
Application server, 40, 46, 480, 892-894, 920
application system, 314-315
Applications, 1-3, 8, 11, 14-15, 18-25, 30, 32, 42-43,
45-48, 51-52, 70, 75, 82-83, 124, 133, 149,
164-165, 173, 177, 201, 235, 237, 274, 285,
311-319, 321-322, 328, 330-332, 336,
341-342, 344, 351-353, 355-356, 365, 414,
416, 456-457, 590, 592, 600, 610, 621,
626-627, 679, 739, 744-745, 773-774, 779,
794, 852-854, 856-857, 866, 869-870, 890,
894, 918, 922, 924, 926, 929-989, 991,
1018, 1022-1024, 1026-1027, 1045
applications of, 1, 342, 679, 745, 930-931, 940,
962-963, 982-983, 988, 999, 1026, 1031
search trees, 679
architecture, 27-54, 317, 353, 394, 416, 458, 480, 491,
497-499, 505, 588, 635, 870-871, 886-893,
915-916, 919-921, 926, 1037, 1044-1045
client/server, 27-28, 42-46, 51-52, 458
file system, 47
IEEE, 416
middleware, 46, 48, 886
protocol, 499, 892, 916, 919, 921
three-tier client/server, 46, 52
Arguments, 10, 328, 338, 360, 365, 369, 379, 406,
497-499, 503, 542, 554, 798, 968-971,
973-974
array, 497-499, 503
example of, 542, 970
multiple, 338, 497, 798, 970
names, 10, 360, 379, 406, 503, 968-969, 971
of parameters, 379
passing, 360
Arithmetic, 62, 100-101, 134, 169, 438, 581, 583, 613,
742, 866, 970
expression, 438
operators, 100-101, 970
Arithmetic operations, 134, 169, 970
arithmetic operators, 100-101
list of, 101
Array, 310, 363, 373-375, 377, 385-386, 388-389, 393,
414, 479, 492-493, 495-501, 503-505, 591,
611-613, 617, 623-625, 667-669, 676, 960
accessing, 499-500, 505, 667, 676
elements of, 363, 388
of objects, 363, 386
ordered, 363, 388, 611, 668, 676
size, 310, 363, 385, 389, 393, 617, 960
size of, 310, 617
variable, 363, 479, 492-493, 496-497, 499-500,
503-504
array of, 310, 363, 591, 611-612, 617, 623-625, 669
code, 612, 623
Array variables:, 496
Arrays, 409, 459, 492-493, 495-497, 505, 588,
623-624, 630, 635, 965
element of, 409
elements, 409, 492, 495-497, 505, 965
higher dimensional, 496
parallel, 623
parameters, 492-493
string, 493, 495, 497, 505
variables, 459, 495-497, 505
arrays, and, 505
AS:, 58, 168, 538, 549, 585, 727, 849, 923, 931, 1020
ASCII, 612
aspects, 30, 40, 56, 81, 143, 268, 281, 309, 313, 334,
350, 355, 381, 416, 486, 523, 842, 853, 965,
998-999, 1032, 1043
Assertion, 62, 70, 83, 92, 115, 131-132, 139
Assertions, 70, 84, 105, 131, 141, 481, 582, 1006
assessment, 868-869, 875, 998, 1000, 1002
assets, 868
assignment, 11, 58, 494, 667, 738, 862, 940, 970
declaration, 940
local, 58
statement, 934, 940
this, 11, 58, 494, 667, 738, 862, 940, 970
Assignments, 178, 497-499, 984
Association, 29, 81, 214, 229-230, 269-271, 346, 385,
389, 867, 887, 944, 961-962, 983, 988, 1025
associative, 157, 389, 493, 496-497, 712, 723, 834
sequence, 496
Associative array, 493, 496-497, 499
Assurance, 841-842
asterisk (*), 97, 125, 229
Atom, 179-180, 186-187, 190, 362-363, 382, 390
AT&T, 25, 358, 417
Attacks, 851, 856-859, 868-869, 874-875, 1027
types of, 857, 874
attribute values, 72-73, 89-90, 92-93, 95, 97, 100,
103-104, 126, 146, 178-179, 185, 206, 208,
211, 221, 270, 361, 373, 390, 401, 457,
468-469, 510, 514, 693, 849-851, 963
Attributes, 29, 57-61, 63-67, 69-70, 72-76, 79-80, 85,
87, 89-91, 93, 95-97, 99, 101-103, 105-107,
109, 119-120, 122-126, 134-135, 137,
140-141, 150-156, 158, 160-163, 166-170,
172-176, 178-179, 181-182, 186-187,
189-190, 193, 196, 202, 205-214, 216,
220-224, 226, 229-230, 234-237, 241, 245,
249-251, 261-262, 264-266, 273-276,
278-280, 282, 284, 287, 289-300, 320-322,
333, 342, 350, 360-361, 364-369, 372-375,
378-380, 386, 390-392, 394-395, 398-402,
404, 406, 410-411, 413-415, 421-422,
427-430, 437-439, 441-442, 448, 459-460,
464-465, 471-472, 502, 508-511, 513-517,
519-526, 528, 537-540, 542-546, 551-553,
555-560, 565-570, 572-575, 581-582,
584-585, 592, 675-676, 692, 697-700, 727,
734-738, 741-742, 849, 886, 895-896, 898,
904-906, 945-946, 948, 1040, 1050-1051
of entities, 202, 205, 209-210, 214, 234, 261, 265,
274
Audio, 1, 591, 600, 623, 930, 963-964, 967, 982, 984,
988, 996, 998, 1017, 1029
compression, 964
audio files, 996
auditing, 757, 875, 919
Australia, 245
Authentication, 628, 842, 855-858, 865-867, 919
digital signatures, 866
intrusion detection, 842
means of, 842, 866
password-based, 919
summary, 919
authorization, 17, 38, 83-84, 106, 135, 416, 460, 838,
840, 843-846, 851-852, 854, 871-873, 875,
1065
Autocommit, 467
Autoindexing, 1032
Average, 3, 11, 125-126, 141-142, 169-171, 191, 193,
276, 284, 312, 333, 369, 406-407, 410, 485,
570, 595-596, 603, 607-608, 610-611, 629,
631-634, 645, 654, 659-660, 670, 675, 682,
717-722, 860-861, 955, 1005, 1029,
1054-1055, 1063
average access time, 610
average seek time, 631, 634, 1054
B
background, 863, 916, 966, 989
noise, 966
backgrounds, 961
Backing up, 592, 597-598, 830
Backup utility, 41
backups, 41, 628
Backward compatibility, 916
Bag, 92, 153, 363, 367-368, 373-374, 384-386,
388-389, 401, 403, 405, 407-408, 410-411,
414
Balanced tree, 657, 960
base, 33, 43, 56, 85, 105, 123, 133-139, 142, 232,
1068
266, 337, 361, 514-518, 520, 525, 549, 585,
650, 657, 723-724, 735, 737-738, 744,
846-847, 926, 936, 951, 958, 976, 979, 986,
1024
identifying, 232
Base class, 266
Basis, 48-49, 55, 148-149, 177, 189, 247, 249, 273,
333, 378, 706, 729, 966, 989, 1004, 1020
Batch processing, 328
Berg, 635
bgcolor attribute, 497
Binary operation, 706
Binary relationship, 216, 218, 223-224, 226, 229,
231-233, 236, 241, 364, 390, 399, 401-402,
412
Binary search, 608, 610-611, 629, 632-633, 637-638,
640, 642, 645, 648-649, 675, 679, 682, 691,
718
Binary search algorithm, 645
Binary search trees, 679
Binary trees, 654
full, 654
Bioinformatics, 962, 1022
Bit, 87, 121, 591, 593-594, 607, 617, 619, 623, 625,
666, 669-671, 813-814, 864-865, 1041-1042
Bit string, 87
Bitmap, 636-637, 668-671, 674-676, 1041-1042
Bitmap index, 668-671, 676, 1042
Bits, 87-88, 298, 593, 597, 617, 619, 624-625, 630,
635, 636, 667-671, 864, 964, 1042
BitTorrent, 996
BLOB, 88, 600
Block, 55, 92, 120, 336, 590, 594-599, 602-605,
607-611, 614-615, 617, 619, 623-626,
629-634, 638-649, 651-661, 668, 673-674,
676-678, 681, 686-692, 694-695, 716-719,
721-723, 725-726, 741, 749-750, 758,
798-799, 813-814, 823, 835, 843, 864, 866,
934-935, 1054-1056
Block transfer, 596, 599, 629, 631-632, 634,
1055-1056
Blocks, 117, 120, 332, 381, 588, 594-598, 600-611,
614-616, 621, 625, 629-633, 636-638,
640-641, 645-649, 651-652, 655-656, 658,
666, 674-675, 677-678, 681-682, 686-690,
694-695, 698-701, 716-719, 721-722, 729,
744-745, 750, 755, 758, 801, 814, 823, 864,
908, 1054-1055
record blocking, 602
blogs, 999
, 423-424
call, 482, 980
books, 3, 24, 79, 143, 191-193, 239, 267, 276-277,
319, 354-355, 416, 424, 427, 455, 480,
485-486, 505-506, 547, 586, 679, 746, 835,
923, 927, 963, 982, 997, 1032, 1049
Boolean, 87-88, 93, 116-117, 120, 150-152, 178-179,
211, 297-298, 366, 382-386, 408, 467, 574,
600, 604, 670, 713, 957, 999-1002,
1006-1007, 1014, 1029-1030
false, 88, 116-117, 120, 152, 178-179, 386, 408,
600, 1014
true, 88, 93, 116-117, 120, 151-152, 178-179, 386,
408, 600, 1007
Boolean condition, 93, 179, 408
Boolean values, 382
border, 423
Braces, 140, 209, 528
Brackets, 102, 129, 140, 182, 423, 425, 464, 708
Branches, 239-240, 286, 713
break, 154, 184, 446, 483, 713-714, 864, 894, 1012
do, 154, 184, 446, 483, 714, 894
if, 184, 446, 483, 713-714, 894, 1012
loops, 483
brightness, 965
Browser, 46, 441, 490-491, 493, 892, 926, 963
Browsers, 13, 331, 892
primary, 13
B-tree, 622, 652, 654-660, 665, 675-676, 678-679
Bubble sort, 688
Buckets, 614-621, 630-631, 633-635, 666-668, 673,
682-683, 695, 702, 959-960
Buffer, 38, 40, 595-596, 598, 604-605, 607-608, 611,
644, 679, 688-690, 698, 715, 723, 726, 729,
739-740, 750-751, 757-758, 805, 812-815,
819-820, 825-826, 830, 832-833, 957, 1054,
1056
Buffering, 14, 17-18, 45, 588, 598, 603, 610, 629-630,
632-633, 698, 812, 831
cache, 812
single, 598, 633, 698, 831
Bug, 857
Bugs, 754
Bus, 43, 590, 887
businesses, 628-629, 1018
button, 493
buttons, 14
byte, 10, 593, 600-601, 607, 625, 632, 677
bytes, 3, 10, 474, 593-594, 596-597, 599-602,
624-625, 631-632, 634, 640, 659, 670-671,
677, 708, 902-904, 1005, 1055-1056
C
C, 5-6, 8, 12, 16-17, 19, 36, 40, 54, 77, 80, 83, 93-95,
99-100, 107-109, 112, 116-117, 141-142,
147, 151-153, 179-180, 186, 190-191,
193-197, 231-233, 237, 239-240, 243, 245,
262, 265, 271-272, 280-283, 286, 290, 292,
296-298, 304, 324, 352, 354, 363, 365, 367,
375-376, 379-382, 394, 396, 407-408,
411-416, 446, 448, 454, 457-463, 465-466,
468, 471-475, 484-485, 487, 490-493,
522-523, 527-529, 534-537, 542, 545-549,
562-563, 569-573, 585, 598-602, 612,
631-632, 634, 677-679, 707, 709-714,
720-721, 725, 735, 752-753, 760-761,
767-771, 778, 809, 816-818, 826-828,
848-851, 884-885, 898, 923-924, 978,
985-987, 1020-1021, 1059-1061, 1063-1065
C++, 8, 17, 36, 40, 83, 93, 265, 352, 358-359, 363,
365, 367, 381-382, 394, 402, 411-416, 454,
457, 459, 471, 892
C#, 262, 454, 892
C programming language, 359, 380, 468, 484, 491,
495, 600
Cables, 627, 879
Cache memory, 590
Calendars, 943, 1043
Call statement, 482
callbacks, 428
Canada, 245
Cancel, 341, 771, 840
Candidate keys, 65-66, 76-77, 524, 526, 532-533,
535, 543, 548, 552, 558, 573
Canonicalization, 855
Cards, 548, 590, 1043
Cartesian product, 59, 124, 149, 158-161, 163-164,
167, 189, 212, 214, 702, 708-709, 711,
713-714, 720, 724, 728-729
cartridges, 598
cascade, 73, 90-91, 107, 109, 138-139, 146, 152, 293,
713
case, 4, 9, 18, 33, 35-36, 41-42, 51, 62, 65, 67, 69,
73-74, 87, 91, 95-96, 109, 128, 130, 134,
136-137, 153-154, 180, 182, 187, 221, 226,
229, 232-234, 252-254, 258-259, 270-271,
276, 284, 287, 291, 330, 332-333, 335-338,
349, 351-352, 404, 446, 461-462, 465,
526-528, 537-538, 562-564, 577-578, 590,
596-598, 601-604, 610-611, 614-615, 624,
634, 637, 640-641, 665, 671-672, 679,
699-701, 703-704, 717-723, 726, 749-750,
761-763, 767-768, 819-820, 833, 896-898,
910-911, 977, 979
error, 319, 461, 624, 626, 628, 774
Case sensitive, 87, 494
case statement, 672
Case study, 276
CASE tools, 41, 319, 351, 1050
Catalog, 1, 3, 8-11, 14, 19, 25, 31, 34-35, 38-41,
84-85, 138, 196, 237, 276-277, 340-343,
622, 685, 693, 703, 713, 716-717, 883,
889-890, 905, 919, 921-922, 997, 1017
Cell, 80, 590, 667, 963-964, 1010, 1038
Cell phone, 80
Cells, 667, 963-964, 966
central processing unit, 589, 716
central processing unit (CPU), 589
networks, 589
programs, 748
software, 589
speed, 589
Certificate, 866-867, 874, 919
certification, 772, 780, 797, 805-806, 867
Certification Authority (CA), 867
Chaining, 612-614, 616, 618, 630-631, 633-634, 968
change, 3-4, 7, 10, 19-20, 24, 30-31, 33-34, 58, 72-74,
80, 82, 89, 109, 137, 139, 141, 194,
223-224, 243, 267, 313, 318-320, 339-340,
344, 347, 349, 356, 372, 391-392, 396, 418,
425, 430, 516, 562, 604, 606, 633, 655, 673,
714, 717, 752-753, 760-761, 824-826,
856-857, 950-951, 958-959, 977, 985, 991,
1036, 1044-1045
Channels, 628, 836, 839, 861-863, 873-874, 965
synchronization, 862
chapters, 19, 319, 329, 355-356, 987, 1050, 1052
Character data, 429, 956
character strings, 57, 88, 151, 382, 612, 666, 969, 999
data and, 88
Characters, 3, 5, 9-10, 19, 64, 87, 151, 211, 382, 461,
493-495, 600-603, 612, 630, 634, 858-859,
1007
formatting, 425, 601
order of, 603, 999
special, 64, 382, 461, 494, 601-602
storing, 3, 593, 600
Charts, 318-319, 995
Check, 13, 70, 74-75, 77, 89-90, 92, 117, 120,
131-132, 276-277, 402, 420, 465, 480-481,
484, 524, 558, 560-561, 567, 573, 581, 673,
702, 719, 755-756, 772, 787, 793, 797-798,
800, 807, 934-935, 958, 962, 977, 984, 1043
Checkpoint, 815-816, 819-822, 825-827, 830, 832
Child, 166, 221, 345, 437-438, 442, 444, 651-652,
723, 802-803, 960
child class, 345
China, 245, 998
Choice, 18, 52, 58, 66, 109, 221, 224, 229, 261, 264,
282, 301-302, 315-317, 331-332, 338, 350,
353-354, 401, 592, 626, 740, 989, 997,
1000, 1036, 1044
ciphertext, 863-865
circles, 182, 250, 370, 398, 594, 708
circular, 85, 91, 593
class, 4-10, 17, 19, 29-30, 37, 47-48, 54, 57, 70, 77,
108-109, 112-113, 147, 196, 202, 210, 226,
228-230, 235-236, 242, 246-248, 251,
255-259, 262-263, 265-267, 269-271, 275,
280, 285, 299, 320, 328, 342-343, 345-349,
356, 358-360, 365-371, 386-387, 389-392,
394-406, 408, 411-415, 418, 442-445,
476-478, 484, 545, 578, 632, 849, 856,
861-862, 929, 952, 964, 967, 991,
1050-1051
block, 336, 632
child, 345, 442, 444
derived, 19, 230, 235-236, 255, 320, 400, 412
hierarchy, 255-259, 269-270, 275, 368, 370, 389,
414, 442, 444, 1010
class diagrams, 202, 210, 226, 228-230, 235, 266,
269, 275, 320, 335-336, 342, 345, 347-348
class hierarchies, 259, 359-360
classes, 48, 77, 228-229, 243, 257-261, 266-267,
269-271, 274, 281-282, 299-300, 336, 342,
346, 349, 360, 365, 370, 372, 386-387,
389-390, 392, 394-395, 401, 411, 413, 415,
422, 476-477, 484, 848, 861-862, 872, 1050
arguments, 360, 365
client, 48, 1025
diagram, 228, 243, 259, 266-267, 274, 281-282,
342, 346, 349, 389, 401
instance variables, 360, 365
instances of, 243
interactive, 1032
language, 48, 360, 365, 389-390, 394, 411, 413,
422, 466, 476, 484, 1032
naming, 392
nested, 422
outer, 965
packages, 336
pair, 360, 389-390
separate, 48, 229, 349, 411, 466
top level, 401
classes and, 269-270, 336, 394, 411, 415, 484, 862,
872
ClassNotFoundException, 477-478
cleaning, 1024, 1043-1044, 1047
Cleartext, 864
CLI, 105, 455, 471-475, 484-485, 500, 892-893
click, 350, 465, 1018, 1058
Client, 27-28, 40, 42-48, 51-52, 312, 323, 329, 420,
458, 480-481, 491, 499-500, 505, 628, 887,
892-893, 915-917, 919-922, 926-927, 1025
Client computer, 40, 491, 505
Client program, 40, 45, 458
clients, 22, 36, 43, 45-47, 628, 863, 884, 920
1069
Client-server architecture, 48, 480, 877, 892, 916,
920-921, 1037
Client/server interaction, 45
Client-side, 45
Clock, 312, 792, 944
cloud computing, 878, 914-915, 921-922, 927
Cluster, 85, 140, 596, 614, 622, 641, 643, 645, 655,
678, 718-719, 736-737, 961-962, 1046
Clusters, 603, 962
COBOL, 8, 49, 83, 454
CODASYL, 49, 53
code, 19, 39-40, 57, 62, 77-78, 114, 191, 196,
207-209, 237-238, 243, 277-278, 319, 333,
336, 344, 375-376, 378, 402, 459, 464-468,
471, 473-474, 481-484, 491-492, 506, 522,
545, 547-549, 600-602, 612-613, 632, 634,
665, 670, 677-678, 684-685, 715, 736
described, 40, 191, 209, 237, 243, 482, 545, 549,
585
options for, 378, 464, 601
rate, 632
Code generator, 684-685
Coding, 5, 401, 839
Collection interface, 387-388
Collection type, 363, 374, 377, 386, 389
collision, 612-614, 619, 635
load factor, 635
open addressing, 613-614
color, 151, 208, 210, 281, 424, 496-497, 964-965
process, 493, 964
property, 585
columns, 9, 26, 87-88, 117, 152, 169, 189, 280, 348,
457, 474, 484, 495, 517, 544, 562, 668-669,
671, 674-676, 711, 895, 1016, 1058-1062,
1064
indexing, 348, 668-669, 671, 674-676
Command, 18, 84-85, 90, 102-105, 109, 134-136,
138-139, 424, 455, 459-467, 470-471, 474,
476, 496, 502-503, 740, 750, 763, 782, 820,
844-847, 857, 919, 936-937, 940
command line, 476
Commands, 36-40, 46, 49-51, 83, 102, 108, 137,
139-141, 333, 424, 434, 455-456, 458-460,
462-464, 466-468, 471-472, 481, 484,
493-494, 502, 505, 596, 604, 628-629, 633,
742, 747-748, 839, 857, 859, 872, 879,
891-893, 934, 983
atomic, 774
key, 83, 105, 137, 456, 462, 629, 633, 742, 839,
872
NET, 491
sql, 36-37, 51, 83, 102, 104-106, 108, 137,
139-141, 455-456, 458-460, 462-464,
466-468, 471-472, 481, 484, 502, 844,
857, 859, 872, 892-893, 934, 938
TYPE, 49-50, 140-141, 424, 434, 456, 459-460,
467-468, 472, 481, 505, 629, 742, 818,
839, 857, 859, 872
comment, 283, 561, 733, 964
comments, 245, 434, 492, 499, 514, 561
Commit, 467, 755-758, 760-762, 765, 772, 774-778,
785, 795-796, 811-812, 817-822, 824-825,
827-834, 863, 907-910, 912-914, 916-918,
920, 922, 937, 940
Commit point, 756, 758, 776, 811-812, 817-821, 830
Common Sense, 508
Communication devices, 42
Communications network, 839, 888
Community, 32, 273, 321, 335, 868, 1028
Comparison, 62, 87-88, 93, 100-101, 116-118, 120,
124, 130, 141, 150-151, 161-163, 179, 186,
438, 604, 616, 635, 718-719, 721, 953,
970-971, 996, 1000, 1035, 1060
comparison of, 62, 996, 1029
comparison operators, 93, 117-118, 120, 124, 141,
151, 161, 179, 186, 604, 970, 1060
Compatibility matrix, 801, 807
Compiler, 35, 38-40, 371, 684, 889-890
compiling, 465
Complementation rule, 575
complex systems, 313, 330, 773
Complex type, 19, 362
Component architecture, 889-890
components, 14, 38, 42, 44-46, 48, 61, 133, 184,
207-209, 212-213, 241, 283, 290-294, 311,
334-336, 340, 350, 355, 358-359, 362, 377,
390, 405-406, 410, 543, 629-630, 666,
716-717, 729, 881, 916, 919, 921, 951,
1019, 1030, 1048, 1054
components:, 133, 359, 390, 405-406, 410, 716, 951,
973, 1019
graphical, 350
Composite key, 211, 392, 666, 692, 736
Composite objects, 270, 417
Composition, 1018
Compression, 34, 41, 47, 436, 675, 964, 966
audio, 964
video, 964, 966
Computer, 1-2, 5-6, 10, 18, 25-26, 27-30, 38, 40-44,
47, 52, 54, 56-57, 85, 108, 112, 147, 209,
246, 282-283, 311-313, 319, 323, 332, 349,
352, 357-359, 404-409, 416, 425-426, 455,
458, 476, 487, 497-499, 505, 593, 600-601,
754, 792, 838-841, 857, 875, 914, 924,
995-996, 998, 1026
Computer networks, 44, 877, 924
access, 44, 877
Computer software, 282
computer systems, 282, 312, 359, 748, 838-839, 875
Computer-aided design (CAD), 25
Computers, 2-3, 7, 21-22, 27, 42, 273, 311-312, 491,
590, 593, 716, 995
data storage, 27, 590
function, 42, 716
parts, 22, 997
performance, 22, 312, 590
Computing systems, 877
concatenate, 87, 101, 553, 858
Concatenation, 87, 170, 207, 388, 666
conceptualization, 273, 1047
Concrete classes, 266-267
Concurrency, 11, 14, 20, 24, 39-40, 45, 106, 359, 599,
740, 747-749, 751-752, 755-756, 758-760,
767, 769, 771-774, 776-777, 779, 780-808,
811, 819-820, 835, 842, 875, 893, 905,
907-910, 912, 920, 922, 925
deadlock, 755, 781, 787-791, 793-794, 805-807,
820, 910, 925
mutual exclusion, 781
race conditions, 909
semaphores, 909
starvation, 781, 788, 791, 793, 806-807
Concurrency control, 11, 20, 24, 39-40, 45, 106, 359,
740, 747-749, 751-752, 755-756, 758-760,
767, 769, 771-774, 776-777, 779, 780-808,
811, 819-820, 835, 842, 875, 893, 905, 907,
909-910, 912, 922, 925
Condition, 49-50, 65, 73-74, 93, 95-97, 101-102, 104,
117-120, 123-124, 127-133, 140, 150-152,
156, 160-164, 172, 178-179, 181, 184-187,
253, 338-339, 403, 408, 438-440, 482-483,
493, 520, 524, 535, 537, 556-557, 559,
561-562, 567, 569, 582-583, 604-611, 616,
636, 666, 670, 673, 678, 690-694, 698-699,
711-714, 717-721, 727, 734, 743-744, 761,
772-773, 775-778, 793-794, 798, 860-861,
898, 905-907, 931-932, 934-940, 981, 1064
conditional, 93, 178, 440, 456, 458, 480, 482, 493,
495, 672, 1004
relational, 93, 178
conditioning, 755
Conditions, 19-20, 69, 77, 95, 100, 102, 117, 123, 126,
128-132, 136, 143, 149-150, 152, 162-164,
181-182, 187, 194, 236, 239, 296, 333, 360,
387, 410, 437-440, 460-461, 523-524, 535,
567, 604, 611, 629, 636, 646, 670, 690-693,
713, 719-721, 734, 736-737, 744-745, 754,
760-761, 772-774, 785, 855, 905-907, 909,
922, 936-938, 952-954, 966, 981, 983,
1060-1062, 1064
confidentiality, 837, 855, 860, 865, 871
Confidentiality of information, 860
Configuration, 48, 336, 595
Connection, 45, 155, 458, 460, 466-467, 472-473,
477-479, 628, 879, 918-919
connections, 81, 336, 458, 460, 472-473, 476, 627,
917, 920
Consistent state, 75, 758-759, 811, 829
Constant, 140, 150-151, 169, 179, 182, 185-187, 265,
381, 596, 687, 968, 970-971, 973-974, 978,
1004, 1060
Constants, 93, 187, 194, 549, 585, 969-970, 974,
1058-1059
named, 194
Constructor, 362-363, 366-367, 374, 378, 382, 390,
392, 401-402, 405, 415
constructors, 358, 361-364, 373-374, 377, 382, 386,
401-402, 412-413
overloaded, 373
Contacts, 858
content, 14, 314-316, 628, 855, 864, 868, 963-965,
967, 982-983, 988, 993-994, 997-998,
1017-1019, 1021, 1023-1027, 1029, 1031,
1033
media, 967, 982, 997
Contention, 739-740, 882
contiguous allocation, 603, 607
Continuation, 247, 467
Contract, 993
contrast, 8-10, 134, 148, 221, 256, 294, 353, 362, 491,
514, 592, 851, 965, 970, 1035-1036, 1048
control, 1, 11, 16, 20, 24, 35, 39-40, 43, 45, 84, 106,
222, 273, 310-312, 338-340, 344, 411, 467,
469-470, 479, 497, 548, 594, 629, 747-749,
751-752, 755-756, 758-760, 767, 769,
771-774, 776-777, 779, 780-808, 811-812,
819-820, 838-844, 848, 851-856, 861-863,
868-876, 878, 905, 907, 909-910, 921-922,
925, 958
execution, 39-40, 332, 359, 467, 740, 748-749,
751-752, 755-756, 759, 769, 774,
776-777, 781, 791, 797, 820, 830, 881,
893, 902, 905, 907
Label, 836, 853-854, 862, 869-873, 876
of flow, 836, 839, 862, 874
repetition, 897
transfer of, 751, 861-862
Control system, 43
controllers, 329
conversion, 37, 41, 82, 314, 329, 333, 858, 863, 1000
converting, 17, 82, 232, 281, 314, 329, 347, 380, 441,
446, 538, 709, 713-714, 728, 1011, 1025
web pages, 1011, 1025
Copyright, 1, 548, 920
Core, 83, 1029
costs, 20, 24-25, 311, 329, 591-592, 595, 692-693,
705, 715, 717, 719, 722, 724, 726, 728,
901-902, 907, 909, 914
overhead, 24-25, 705
software engineering, 329
CPU, 43, 589-590, 598, 716, 727, 765, 769, 882,
888-889, 901
secondary storage, 589-590, 716
Crawlers, 1022, 1027
Create index, 90, 106, 671-672, 737
create unique index, 672
Creating, 4, 9, 17, 34, 38, 80, 83, 103, 106, 135, 162,
251, 268, 300, 311, 348, 372, 376, 388-389,
392, 400-401, 414, 435-436, 446-447,
493-494, 500-502, 504-505, 583, 656-657,
688, 714, 803, 911, 947, 960, 1032, 1044
forms, 268, 329, 500, 502, 583, 737
views, 83, 106, 135, 140, 414, 441, 844, 847
CROSS JOIN, 124, 158, 160, 163
Cryptography, 875
CSS, 892
Currency, 50, 886, 998
current, 3, 19, 31, 35, 40, 47, 50-51, 55, 66-67, 69,
204-205, 208, 210, 212, 236, 238, 242, 247,
262-263, 268, 278, 350-351, 353-355, 362,
369, 381, 387, 392-393, 397, 414, 462-464,
469, 479, 542, 544, 604-606, 621, 627, 659,
773, 792-793, 823-824, 874, 898, 914-915,
926, 945-951, 982, 1021
Current position, 50
Customer, 3, 24, 46, 48, 62, 114, 192-193, 196, 201,
239-240, 243-244, 286, 303, 318-319, 323,
672, 861-862, 886, 956
customers, 37, 196, 243, 311, 318, 745, 956-957
cycle, 4, 310, 313-315, 317, 334, 351, 446, 555, 768,
771, 790-791, 807, 980
cylinders, 594, 606, 608, 631, 649
D
Dangling tuple, 572, 583
Data, 1-25, 27-36, 38-42, 45-54, 55-81, 82-89, 101,
105-108, 112, 133, 147, 148-149, 163, 173,
188-189, 191-194, 196-197, 201-245, 251,
259, 268-269, 273-275, 279, 282-285, 287,
295-296, 301, 303, 310-320, 328-336,
343-348, 351-353, 355-356, 357-365, 374,
380-382, 394, 400, 404, 411, 414, 416-417,
420-430, 441, 447-448, 457-462, 465, 476,
481, 494-496, 498, 500, 502, 520-522, 528,
542-543, 551, 579, 585, 588-600, 606-607,
612, 614, 617-618, 620-630, 636-638,
640-660, 662-663, 670, 673-676, 686, 688,
1070
705-707, 716-718, 725-727, 736-737,
739-742, 744-746, 749-752, 754-755,
759-760, 766-768, 772-774, 780-783,
794-799, 803-807, 810-814, 816, 824-825,
828-831, 841-842, 852-858, 860-865,
867-875, 877-883, 885-891, 893-895,
901-905, 909-916, 918-922, 924-927,
929-989, 991, 1008-1009, 1024-1033,
1034-1049, 1065
Double, 87, 210, 220-221, 223-224, 226-227, 382,
494-495, 588, 598, 629-630, 1052
hiding, 365, 879
Integer, 5, 9, 19, 57-58, 64, 87, 89, 211, 226, 346,
362-364, 411, 496, 600, 602, 617, 653,
774, 924, 971, 1024
integrity, 18-19, 24, 56, 64, 66, 69-77, 79, 85, 89,
107, 191, 197, 331, 355, 361, 441, 759,
868, 871, 883, 885, 973, 985, 1034
security threats, 836
Single, 5, 8, 11, 15, 20, 28, 36, 47-48, 58, 63,
65-66, 80, 87-88, 105, 133, 163, 189,
208-209, 212, 226-227, 229, 234, 251,
259, 295-296, 311, 330, 353, 400, 404,
428-430, 447, 476, 494-496, 520-522,
524, 543, 548, 593-595, 617, 623-625,
636-638, 647-649, 675, 717-718, 726,
745, 780, 816, 834, 848-849, 867, 897,
913-914, 959-960, 1022, 1028
validation, 313-314, 484, 772, 780, 794, 797-798,
805-806, 1028
Data abstraction, 8-11, 21, 28, 268-269, 275
Data communications, 42, 878, 924
trends, 42, 878
Data compression, 41, 47, 436, 675
Data cube, 1038-1039
Data cubes, 1037
Data definition language (DDL), 35, 51, 67
Data Encryption Standard (DES), 864
Data fields, 612
Data files, 9, 20, 40, 592, 598, 838
Data independence, 9-10, 21, 25, 27, 31, 33-34,
50-52, 879
Data Management Services, 46
Data manipulation language (DML), 35, 51
data mining, 23, 83, 496, 498, 867, 961-963, 982-983,
988, 1018, 1025, 1028, 1034-1036,
1045-1048
Data model, 7, 10, 19, 21, 28-30, 32-33, 47-48, 50-51,
53, 55-81, 148-149, 191-194, 197, 201,
203-204, 234, 236, 251, 285, 295-296,
315-317, 319-320, 331-332, 344-348, 351,
353, 381, 416-417, 421, 428, 430, 436, 441,
447, 457, 873, 881, 891, 957-958, 987,
1043-1044, 1047-1048, 1052
Data processing, 330, 623, 649
data security, 47, 836, 853
data sets, 623, 962
data storage, 27, 29, 32, 45, 344, 598, 674
Data structures, 6, 12, 17, 22-23, 54, 56, 75, 112, 147,
189, 364-365, 472, 487, 522, 588, 612, 614,
617, 622, 635, 648, 651, 675, 679, 688, 706,
988, 1044
data structures and, 17, 688, 988
Data transfer, 40, 481, 602, 625, 750, 902-904, 921
user, 903-904
Data transmission, 629
data type, 5, 19, 30, 57-58, 64, 72, 74, 87-89, 404,
460, 479, 600, 612, 638, 641, 945, 979,
1065
Character, 5, 57, 87-88, 429, 460, 612
Float, 64, 87, 460
Real, 57-58, 64, 87, 600, 945, 948
Data types, 3, 10, 22-23, 28, 32, 38, 48, 83-84, 87-88,
101, 106-107, 211, 347, 357, 360, 365, 374,
382, 394, 400, 411, 414, 429-430, 447, 472,
484, 494-495, 505, 600, 943, 956-957
Data warehouse, 1026, 1034-1037, 1041-1049
Data warehouses, 1, 626, 868, 1034-1037,
1041-1042, 1044-1049
Database, 1-26, 27-54, 55-81, 82-85, 91-94, 97, 99,
102-114, 115-117, 124-126, 131-134,
136-144, 147, 153, 160, 164-165, 168-169,
174, 177-178, 188, 190-200, 201-206,
209-214, 221-224, 226-228, 232-245,
252-253, 256-259, 261-264, 266-270, 273,
275-280, 282-284, 287-308, 309-356,
357-365, 367-368, 370-374, 381-382, 386,
389-396, 399-400, 402-404, 408, 411-419,
420-421, 424-425, 427, 430, 434-436,
441-453, 454-469, 471-473, 476-478,
480-481, 483-489, 490-507, 508-512, 514,
516, 520-521, 524-526, 528, 532, 542,
550-587, 597-598, 610, 621, 627, 629-630,
635, 636, 672, 679, 688, 727-728, 730-732,
733-746, 754-760, 765-767, 772-777, 789,
794, 797-801, 803-805, 836-876, 877-888,
896-898, 905, 907-908, 913-928, 936-937,
940-945, 947-948, 950-951, 953-956, 960,
962-964, 966, 968-969, 976-982, 984-990,
996-998, 1017, 1022-1024, 1029-1031,
1042-1043, 1045-1048, 1052
Database:, 72, 92, 102, 258, 447, 1045, 1064
management software, 25, 627, 907
Database administrator, 13, 310, 839, 841-842, 853,
950, 1047
database administrators, 13, 733-734, 1048
database design, 7, 13, 15-16, 18-19, 28, 30, 62, 79,
105, 201-204, 222-223, 232, 234, 236, 264,
267, 287-308, 309-356, 359, 400, 402, 413,
415, 508-510, 520-521, 524-525, 537-538,
542, 550-586, 592, 635, 733-746, 878-879,
894, 921, 1007, 1050
database management system (DBMS), 3
Database model, 204, 456-457, 480, 849
database query results, 495
Database schema, 30-31, 37, 51, 56, 66-72, 75-78,
84-85, 91, 107-111, 114, 134, 137-139,
141-144, 190-196, 199-200, 204, 214,
223-224, 227-228, 235, 238, 240, 242, 252,
261, 273, 275, 286, 287-289, 301, 305, 316,
344, 374, 394-395, 399, 402-403, 411, 413,
436, 441, 443, 448-450, 452-453, 459,
485-486, 506-507, 508, 520, 558-559, 570,
727, 731-732, 874, 927-928, 990
Database server, 40, 45-46, 311, 458, 471, 473,
480-481, 491, 501-502, 858, 892-893, 917
Database systems, 1-2, 8, 10, 17, 19, 21-22, 26, 27,
29-31, 38, 41-42, 50, 81, 188, 198, 268,
309-311, 351, 360, 362, 370, 408, 414,
416-417, 441, 500, 528, 605, 728, 730, 779,
814, 824, 848, 883-884, 886-887, 914-915,
925, 960, 982, 988, 994
connecting to, 500
database language, 454, 1065
database tables, 115, 140, 503, 846, 870
Databases, 1-26, 29, 33-34, 36-37, 46-48, 50, 53, 56,
63, 66, 75-76, 82, 105-106, 131, 198, 216,
268, 310-315, 319, 322, 330, 334-336,
344-345, 348-349, 351, 353, 355-356,
357-419, 420-421, 427, 435-436, 447,
457-458, 466, 471, 484, 493, 508-549,
559-562, 564, 567-568, 577-578, 584-585,
588-592, 622, 635, 674, 676, 679, 716-717,
747-749, 779, 785, 828-829, 835, 836-839,
856, 860, 871-876, 877-927, 940-943, 948,
950-952, 962-963, 981-984, 987-989, 998,
1021-1022, 1028-1029, 1043-1047
MySQL, 48, 106, 311, 491, 500, 746
Oracle, 33, 36-37, 106, 312, 436, 454-455, 466,
476, 491, 500, 746, 836, 869, 871-873,
875-876, 915-922, 931, 940, 955, 981,
984
PostgreSQL, 48, 106, 311
queries, 4-5, 7-8, 11, 13, 16-17, 21, 23, 26, 46, 66,
105-106, 115, 131, 149, 312, 353, 400,
402-404, 406-409, 413-415, 484, 516,
674, 716, 746, 860, 881-883, 886,
889-890, 892-894, 901-902, 905, 914,
922, 955-956, 963, 969, 982-984, 989,
996, 1034-1035, 1045-1046
queries in, 66, 115, 404, 415, 901
query language, 7, 13-14, 23, 36, 48, 149, 198,
359, 365, 380-381, 402, 413, 901, 954,
988, 1032
query results, 46, 50, 403-404, 471, 484, 892-893
querying, 3, 7, 13, 21, 37, 198, 380, 436, 447, 455,
493, 943, 951-952, 955-956, 982, 988,
1022, 1024
security of, 838, 883
SQLite, 500
support for, 48, 311, 909, 914-915, 917, 920, 955,
1008, 1045-1046
Datalog, 968, 970-974, 976-979, 982, 984, 986-987,
989
Date, 3, 8, 10, 15-16, 20, 31, 53, 67-68, 71, 77-79,
86-89, 92-93, 95, 100-101, 104, 110-111,
114, 116, 125, 135-136, 143-145, 151, 175,
179, 181, 187-188, 191-192, 197, 199-200,
205-206, 211-213, 220-223, 227-228, 230,
236, 238-239, 244, 253-254, 260, 262-264,
267, 275-276, 278-279, 288-291, 302-303,
305-308, 349, 356, 363-364, 366-367, 375,
382-383, 387, 389-393, 396, 411, 418-419,
420, 449-453, 507, 545-548, 587, 599-601,
609-610, 632, 677, 681, 731-732, 752, 792,
874, 902, 927-928, 952, 986, 1000
Date:, 228, 346, 356, 364, 366
between, 346
year, 364
date data type, 88
Dates, 88, 233, 237, 421, 490, 867, 942, 1010-1011,
1041, 1043
dBd, 549, 585
DBMS, 3-5, 7-20, 23-26, 27-28, 31, 33-52, 55, 67, 70,
73-75, 81, 85, 91, 103, 106-107, 109,
130-133, 135-137, 143, 189, 203-204, 237,
311-317, 319-320, 328-333, 340, 344,
351-353, 358, 413, 416, 436, 455-456, 461,
471, 480-482, 484, 486, 501-502, 622, 628,
673-674, 684-686, 692-693, 713, 716-717,
728, 730, 739-740, 748-750, 754-755,
757-759, 762-763, 771, 805, 810-815,
828-829, 837-840, 843-845, 858, 875,
882-885, 890, 915, 926, 929-932, 989
Deadlock, 755, 781, 787-791, 793-794, 805-807, 820,
910, 925
concurrency and, 925
detection, 788, 790, 805, 925
prevention, 788-790, 806
recovery, 755, 794, 807, 820, 910, 925
Deadlocks, 755, 792, 796, 807, 910
debugging, 465, 938
decimal, 57, 86-88, 103, 145, 354, 495
Declarations, 8, 91, 358, 368, 374, 386, 401, 411, 413,
459-460, 481
Decomposition, 322, 350, 509-510, 525, 528-530,
536-538, 541-543, 547-548, 552-553,
558-567, 570, 573-574, 576-577, 582-586,
738, 905, 907, 922, 925
Decryption, 47, 864-866
Decryption algorithm, 865
default, 65, 73-74, 87-91, 102-103, 105, 107, 123,
134, 137-139, 146, 229, 264, 268, 372, 378,
466-467, 574, 636, 737, 774, 853-854, 940
tool, 859
Default constructor, 367
Default value, 89, 91, 103
Default values, 103, 137
defining, 1-3, 8, 11, 18-19, 24, 35, 38, 56, 67, 86, 115,
134, 136-137, 139, 145, 235, 249, 251, 255,
259, 262, 266, 268, 274, 402, 581, 606, 761,
853, 918, 973, 975-976, 1029
delay, 596, 598, 625, 629, 631-632, 679, 796, 816,
1054-1056
queuing, 625
Delays, 48, 332, 591, 624, 882
deleting, 73-74, 255, 271, 328, 365, 610, 651, 654,
674, 932
files, 606, 610, 651, 674
Deletion, 35, 73, 83, 104, 107, 255, 338, 514, 520,
524, 543-544, 604-605, 607, 614-616, 619,
621, 629-630, 632-633, 635, 640-641, 651,
656, 659-660, 662, 664-665, 668, 674-675,
677, 679, 749, 1064-1065
Denial of service, 856, 858, 1027
Denial of service (DoS), 856
Dense index, 638, 645, 649, 676, 703, 741
Dependency preservation, 510, 525, 551, 558-559,
565-568, 573-574, 582-584, 586
deployment, 314, 334-336, 853, 1047, 1049
secure, 853
Descendant, 437-438, 651, 800
descending order, 102, 409, 999
design, 7, 12-16, 18-19, 22, 25, 27-28, 30, 32-34, 37,
41-43, 52, 56, 62, 64, 105, 109, 201-205,
212-214, 216, 222-224, 230, 232, 234-239,
241, 243-245, 246, 251-252, 258-259, 264,
269-270, 275-278, 283-285, 287-308,
309-356, 357-359, 399-402, 413, 415-417,
436, 508-511, 513-516, 518-521, 523-525,
528, 537-538, 540, 542-545, 550-586, 592,
626, 635, 686, 733-746, 773-774, 875,
878-879, 885-886, 894, 920-921, 926, 942,
957, 965, 981, 1027, 1042, 1044-1045,
1047-1049
of databases, 22, 246, 310, 334, 345, 348, 359,
400, 525, 885
1071
Design process, 202, 226, 234, 236, 259, 264,
316-317, 328, 332, 348-349, 351, 353, 359,
881
desktop, 48, 282, 995, 1034-1035
Desktops, 596, 995
development, 14-15, 20, 22, 33, 42, 186, 246, 310,
312, 318-319, 329-331, 334, 343, 347-348,
351-352, 355, 425, 430, 506, 510, 583, 623,
629, 877-878, 886, 908, 937, 1006,
1032-1033, 1047
services and, 882
devices, 20, 42-43, 588-593, 597, 622, 627-629, 635,
739-740, 868
Dictionaries, 355, 746
Dictionary, 3, 37, 39, 41, 45, 237, 273, 312, 333, 363,
385-386, 388-389, 411, 414, 675, 893, 919,
1007
Dictionary encoding, 675
Difference operation, 99, 176, 703
Digital, 23, 47, 106, 590, 598, 627, 836, 839, 854-855,
866-868, 873-874, 957, 993-994, 1017, 1032
technology, 47, 590, 598, 854, 1032
digital certificate, 866-867, 874
Digital library, 1032
Digital signature, 855, 866-867
Dimension, 495, 667, 954, 1037, 1039-1042
Direct access, 29, 884
Direction, 216, 229, 249, 258, 364, 390-391, 399-401,
878, 904, 914, 956-957, 959, 965, 987, 1052
orientation, 959, 965
Directories, 631, 823-824, 919, 921-922, 1017, 1022
directory, 37, 48, 502, 617-621, 630, 633, 641, 668,
682-683, 812-813, 823-824, 829-830, 856,
883, 891, 894, 919-920, 960
Dirty data, 752
DISCONNECT, 460
Discrete cosine transform, 964
Discretionary access control (DAC), 851, 870, 872
Disjoint subclasses, 298
Disk, 17-18, 21, 29, 34, 38, 40-41, 43-45, 60, 332-333,
547, 588-635, 636-638, 640-641, 644-645,
648-649, 653, 673-677, 688-690, 692,
694-695, 698-701, 705, 715-719, 721-723,
739, 749-751, 757-759, 769, 780, 798-799,
805, 810-816, 818-826, 828-830, 833-835,
887, 913
Disk access, 40, 622-623, 717
Disk controller, 596, 625
disk drive, 547, 595-596, 624, 631
disk drives, 329, 547, 589, 591, 595-596, 598, 624,
635
disk mirroring, 624-625, 631
Disks, 29, 43, 588-598, 623-626, 629-631, 635, 740,
755, 1054-1056
Distributed database, 570, 779, 828, 862, 877-879,
881-884, 886-888, 892, 894, 901, 905, 907,
910, 913-914, 916-917, 921-924, 926
Distributed processing, 877
systems, 877
distributed systems, 311, 878, 922, 925
issues, 878, 925
Division (/), 101
division, 8, 45, 101, 160, 164-167, 175, 284, 594, 611,
1003
division by, 611, 754, 1003
document, 14, 22, 24, 49, 202, 338, 423-430, 434-448,
495, 499-500, 855-856, 964, 996-1008,
1010-1017, 1020-1021, 1023-1024,
1029-1030, 1032
Document retrieval, 997, 999-1000
document view, 444
documentation, 312, 318-319, 329, 431, 434, 915-916
documents, 22, 24, 37, 53, 84, 87, 139-140, 273, 357,
420-421, 424-425, 427-428, 435-442,
447-448, 855-856, 914, 930, 963, 965, 982,
992-999, 1001-1009, 1011-1019, 1021,
1023-1024, 1031-1032
external, 24, 1017
navigating, 998
publishing, 22, 436
recent, 273, 1032
DOM, 58-60, 64, 156, 228, 356, 578
domain, 29, 57-59, 64, 69, 72, 74, 76, 85, 89-90, 125,
149-151, 156, 161, 177, 185-190, 194,
197-198, 229, 236, 238-239, 241, 268, 298,
324, 347-348, 401, 526-527, 551, 581, 583,
721, 868, 877, 904, 915, 917-918, 968, 971,
974, 977, 979, 1022-1023, 1025-1026, 1041,
1058-1060
Domain constraints, 64, 72, 76, 581
Domain name, 89, 229
Domain names, 917
Domains, 57-60, 64, 75, 84, 87, 89, 101, 125, 132,
138, 150-151, 186-187, 189, 235, 324, 578,
702, 773, 915, 966, 971, 998, 1041-1043
dot notation, 63, 366-367, 377, 380, 386, 404
dot operator, 1058
double, 87, 182, 210, 220-221, 223-224, 226-227,
254, 261, 339, 382, 437, 467, 469-470,
477-478, 494-495, 588, 598, 610, 629-633,
698, 708, 1052
Double buffering, 588, 598, 603, 610, 629-630,
632-633, 698
Double precision, 87, 382
downstream, 319
DRAM (dynamic RAM), 590
drawing, 70, 254, 268, 297, 307, 334
Drill-down, 1039-1040, 1045
Driver, 221, 260, 300, 308, 466, 476-478
Drivers, 45, 466, 476
Drives, 43, 329, 523, 547, 589-591, 595-596, 598,
624, 628, 635
DROP, 41, 104, 131, 135, 138-140, 186, 243, 341,
553, 844, 936
DROP TABLE, 104, 138, 140
DTD, 421, 427-430, 435-436, 447-448
Dual table, 858
Duplicate values, 125, 641
duration, 276, 357, 805, 853, 943-944, 987
DVDs, 3, 589, 591, 598
Dynamic SQL, 455, 458, 465, 471
dynamic web pages, 46, 421, 490-491, 493-494, 500,
504, 892, 1018
E
eBay, 242, 311
e-commerce, 22, 285, 420, 425, 855-856, 1018, 1026
edges, 182, 230, 422, 708, 767-769, 980, 1020, 1024
editing, 331
Effective, 311, 592, 634, 868, 946-950, 985, 1032,
1048
effects, 136, 319, 756, 759, 766, 808, 829-830, 875,
894, 907, 913
Levels, 759
standard, 319
electronics, 3
Element, 5, 58, 84-85, 138, 158, 170, 212, 341, 363,
377, 384-385, 387-389, 407-409, 425-429,
431-435, 437-438, 442-446, 496-497, 503,
557, 841, 877, 953
elements, 4, 8, 20, 39, 49, 60, 64, 84-85, 117,
137-139, 156, 336, 339, 363, 377, 387-389,
405, 407-410, 424-430, 433-440, 444, 448,
495-497, 505, 629, 877, 965, 1047,
1058-1059
form, 39, 363, 409, 439, 492, 497, 1059
of array, 389, 496
else, 462, 472, 475, 482-483, 492, 608, 613, 621,
660-661, 672, 696-697, 777, 784
ELSEIF, 482-483, 696-697
Email, 494
E-mail, 43, 244, 282-283, 628, 1010, 1027
Embedded systems, 25
embedding, 83, 455-456, 459, 466, 484, 859
Employment, 57, 205, 233, 376, 380
Empty set, 212, 265
encapsulation, 22, 358, 360-361, 365, 373-374, 400,
412, 414, 628
encoding, 34, 431, 675, 997, 1042
encryption, 47, 628, 836, 838-839, 842, 855, 863-866,
871, 873-875
confidentiality and, 855
symmetric, 836, 864-865, 873-874
End tag, 423-425, 428, 434, 491
Engineering, 2, 22, 26, 29, 41, 109, 201, 235, 255,
257, 259-261, 266, 299, 314, 332, 334-335,
344, 349, 355, 372, 926, 957
Enter key, 456
Entities, 19, 29, 31-32, 50, 62, 69, 202, 205-211,
214-221, 229-230, 232, 234, 249-255,
257-261, 264-266, 269-270, 273-274, 293,
297-298, 300, 319, 325, 344, 425, 446,
513-514, 577, 592, 599, 965, 995, 1011
Entity, 7, 19, 29, 56-58, 62, 64, 69-70, 72, 74, 76-77,
81, 85, 107, 131, 201-245, 246-286, 287,
289-296, 298-304, 310, 317, 320-325, 349,
355, 394-395, 401-402, 415, 421-422, 435,
446-447, 508-509, 513, 528, 532, 539, 599,
622, 1050-1052
Entity instances, 223, 250
Entity set, 209-212, 214, 216-217, 223, 229-230, 236,
247, 280
enum, 382-383, 391, 396-398
enumerated types, 362
Enumeration, 382
Environment, 4-5, 10, 12-15, 20, 25, 27, 38, 42, 85,
237, 313, 331, 334, 337, 351, 472-473, 801,
803, 822-824, 882, 885-887, 890, 904, 909,
916, 921, 925, 927, 1044-1046
environments, 18, 34, 41-42, 473, 831, 852, 868, 885,
1032
work, 42, 852, 1032
Equijoin, 123, 162, 164, 167, 189, 294-295, 297-298,
694, 921-922
Error, 16, 19, 319, 461, 467, 501-504, 623-626, 628,
684, 774-775, 859, 881, 1004, 1043
Error correction, 1043
Error detection, 625
Error messages, 502
errors, 19, 85, 119, 284, 344, 387, 460-461, 466-467,
483-484, 502, 624, 754, 774, 867, 901
Escape character, 100, 494
establishing, 460, 466, 917
Ethernet, 628-629
Event, 133, 276, 428, 816, 841, 869, 931-932,
934-936, 938, 940-941, 944, 954-955, 981
events, 2-3, 20, 106, 131-133, 276, 329, 509, 862,
868, 930-932, 934-938, 941, 943-944, 962,
981, 1006
Excel, 2, 311, 1045
Exception, 224, 367, 384-385, 387-389, 393, 407, 427,
460-461, 467, 474, 509
Exception handling, 467
exceptions, 117, 268-269, 387, 391-392, 461, 467,
480, 774
Exchanges, 748
Exclusive lock, 795, 800, 803
EXEC SQL, 456, 459-463, 465-466, 468, 775
Execution, 17-18, 34, 39-40, 42, 137, 166, 168, 177,
183, 332, 341, 359, 371, 455, 467, 473, 502,
590-591, 598-599, 684-687, 705-708,
714-717, 724, 726, 728-730, 737, 739-740,
742-743, 748-749, 751-757, 774, 776-777,
791, 797, 810, 815-816, 818, 820, 822-824,
833-834, 857, 883, 886, 893, 902, 904-905,
907-908, 924, 936-937, 968, 983
execution:, 39, 754
Execution
out-of-order, 881
taxonomy of, 730
EXISTS, 13, 33-34, 61, 72, 74, 103, 116, 120-123,
131, 174, 180, 185, 208, 212, 214, 226,
232-234, 350, 371-372, 400-401, 408-409,
423, 428, 469, 498-499, 530, 542, 581, 598,
623, 636, 641, 688, 692-694, 702-704,
721-722, 727, 735, 768-769, 790-791
Expert system, 351, 989
exposure, 854
Expressions, 109, 116, 148, 161, 177, 179, 185-186,
191, 198, 380, 403-409, 413-414, 437-439,
706, 708, 723, 742, 989, 1007, 1011
built-in, 413-414
External sort, 729
extracting, 8, 407, 409, 436, 441, 447, 484, 1011, 1018
pages, 1011, 1018
F
Fact table, 1040-1042
Factoring, 866
Fading, 51
Failures, 18, 360, 624-625, 747-748, 754-755, 757,
760-761, 776-777, 829-832, 834, 881, 883,
912, 914, 916
FAT, 516
fault tolerance, 630, 635, 881, 995
Faults, 881-882, 962
Features, 12, 26, 28, 35, 37, 48, 50, 81, 82-84, 97,
100, 102, 105-106, 115, 131, 139, 251,
310-311, 331-332, 348, 352, 373-376,
402-403, 406, 414, 416, 471, 482-484,
493-494, 497, 623, 628, 630-631, 775,
885-886, 909, 915-916, 929-930, 957-958,
963, 965-967, 981-982, 984, 989, 1011
Federated databases, 925
Feedback, 245, 283, 315-316, 329, 774, 926,
999-1004, 1018, 1031-1032
Fields, 61, 273, 297-299, 354, 358, 410, 420, 436,
1072
592, 600-604, 608, 610, 612, 621-622, 632,
634, 636-638, 641, 652-653, 665, 671-672,
674-675, 677-678, 692, 782-783, 801, 825,
838, 845, 1018, 1025
File, 4-5, 7-12, 15-22, 24-26, 27, 31, 34, 38, 41-44,
47-48, 56, 59-61, 85, 105, 244, 317, 330,
332-333, 344, 423-425, 428-431, 433-434,
437, 439-440, 456, 481, 491-493, 500,
588-635, 636-658, 660-661, 665, 667-668,
670-671, 673-679, 681-683, 688-695,
698-706, 709, 716-723, 725-726, 729,
734-738, 740-741, 757-758, 798-801, 804,
816, 825-826, 855, 955, 997
sequential, 592, 597, 607-611, 634, 636, 649-650,
668, 676, 679, 681-682, 757, 955, 997
file access, 10-11, 44, 633
File pointer, 604
File server, 43
File sharing, 627
file size, 725
File structures, 48, 105, 588-635, 679, 686
File system, 20, 47, 330, 430, 602
Filename, 456
files, 1, 3-5, 7-9, 11, 15-17, 19-20, 24, 29-30, 34, 38,
40-41, 43, 48, 56, 75-76, 204, 314, 332-334,
336, 350, 428, 437, 514, 523, 592, 597-602,
606, 608, 610-611, 614-615, 619, 621-622,
628-631, 633, 635, 636-680, 682-683,
684-685, 694-695, 698-702, 705-706, 709,
716-717, 720-722, 734-735, 799, 838,
902-903, 915, 921-922, 995-997
access method, 606, 630, 633, 636, 638, 650, 674
directories, 631, 921-922
disk storage, 588, 592, 597-602, 604, 606, 608,
610-611, 614-615, 619, 621-622,
628-631, 633, 635, 648, 716
field, 592, 599-602, 604, 608, 610-611, 614-615,
621-622, 629-630, 633, 636-660, 666,
668-669, 671, 673-678, 691, 716, 902
HTML, 24, 437, 491, 500
indexed sequential, 636, 649-650, 676
kinds of, 838, 995
management systems, 885
missing, 75
organization and access, 606
records, 3-5, 11, 16-17, 19, 29-30, 34, 56, 332-333,
588, 592, 597, 599-602, 604, 608,
610-611, 614-615, 619, 621-622,
629-630, 633, 635, 636-638, 640-643,
645-649, 652-658, 662, 665-668,
670-671, 673-675, 682-683, 694-695,
698-702, 716-717, 720-722, 734-735,
799, 801, 838, 902-903, 996-997
streams, 1, 600
Filtering, 524, 849-851, 859, 872-873, 1023, 1027
Filters, 690
Find Next, 605
Firewall, 842
firmware, 43
first method, 164, 235
First normal form, 61, 524, 526-528, 538, 544, 583
First-order predicate logic, 55
flag, 298-299, 587
Flash drives, 590
Flash memory, 590, 635
Flex, 892
float type, 406
Floating point numbers, 362, 382
Floating-point, 57, 87
Flow control, 836, 838-839, 861-863, 871, 873-874
FLWR expression, 439
folders, 995
, 424
Font, 423-424
fonts, 424
for attribute, 63, 84, 851
Foreign key, 69-70, 72, 74, 76, 85-86, 90-91, 105, 107,
109, 145-146, 160, 190, 289-295, 297, 300,
347-348, 435, 511, 514, 519-520, 570, 578,
721, 742, 895, 1042
Foreign key constraints, 107, 435
Form, 3, 19, 23-25, 28, 30, 36-39, 47-48, 51, 61, 63,
65, 75, 92-93, 120, 122, 130, 134, 150-151,
161, 170, 177-180, 182, 184, 186, 229, 269,
271, 273, 277, 295-296, 321, 337-338, 342,
344, 346, 348, 350-351, 362-363, 374, 378,
386, 401, 409, 420, 481-483, 490-494,
497-498, 502-503, 508-510, 524-528,
532-533, 535-538, 540-549, 551-552,
556-558, 577-578, 581-584, 599, 628,
693-694, 730, 737-738, 766-767, 892-893,
955, 961, 963-964, 969-972, 977-978,
981-984, 989, 996, 1024-1025
design a, 277, 545
Designer, 28, 229, 342, 350-351, 369, 392, 508,
542, 549, 552, 582, 737
form fields, 420
Formal language, 149, 245
Formal languages, 55, 83, 188, 996
formats, 20, 29, 273, 601, 629, 893, 993, 1017
formatting, 420, 424-425, 495, 594, 601, 993, 1037,
1044
paragraphs, 993
Forms, 3, 18, 24, 27, 36-37, 71, 155, 249, 268, 318,
329, 482, 495, 502, 509-510, 523-526, 532,
538-539, 543-544, 550-552, 558-559,
578-579, 581, 583, 586, 694, 737-738, 892,
977, 1006, 1013
Forwarding, 46
frames, 930, 964
Frameworks, 875
Frequency, 330, 334, 734, 966, 997, 1002-1003, 1005,
1008, 1012, 1025
Function, 10, 13, 17, 35, 37-38, 42, 119-120, 122,
124-126, 128-129, 140, 169-171, 176,
189-190, 212, 230, 371-372, 377-379, 406,
456, 458, 466-467, 470-485, 495, 497-500,
502-504, 551, 581, 611-617, 619-621,
633-634, 666, 668, 671-672, 675-676,
699-701, 703-704, 706, 716-718, 721-722,
729, 837-838, 866-867, 870, 918, 955, 964,
981, 1002, 1010, 1060-1063
computation of, 772
description, 10, 35, 230, 472-473, 695, 1010
Function calls, 456, 458, 466, 471-477, 484-485
function definition, 379
Functional dependency, 70, 509-510, 513, 520-522,
525, 530, 532, 535-538, 541, 543-544, 551,
553-555, 557, 559-560, 562, 565-568, 573,
575, 580-581, 583-584, 737
Functions:, 40, 470, 479, 497
in, 3-4, 13, 24-25, 27, 35, 37, 40-42, 45, 51-52,
102, 105, 119-120, 124-126, 134,
136-137, 139, 141, 149, 168-170, 177,
189, 195, 197, 243, 281, 311, 314, 342,
353-354, 360, 369-370, 372, 407, 414,
424, 455-456, 470-471, 473-474, 476,
479-482, 484-485, 497, 499-500, 581,
613-614, 620-621, 633, 716-722,
728-730, 857-861, 883, 893, 916, 921,
1048-1049, 1062-1064
point of view, 130
G
games, 3, 239
Gap, 344, 595, 630-632, 1055-1056
Garbage collection, 674, 824
Gate, 420
Gateway, 916-917
general issues, 570, 673, 676
OR, 570, 673, 676
General Motors, 924
generalization and specialization, 251, 269
Generator, 362, 388, 684-685
Generic types, 58
Genetic algorithms, 1046
Geometric objects, 372, 399
GIF, 964, 1024
Gigabit Ethernet, 629
Global variable, 499
global variables, 497, 499, 505
Glue, 981, 989
Google, 37, 914-915, 995, 1017, 1020, 1029, 1031
Google App Engine, 915
Grammar, 684
Granularity, 624-625, 750, 780, 798-801, 805-806,
853, 943-946, 949, 953-955
Graph, 182-183, 421-422, 436-437, 441, 446, 684,
705-708, 767-769, 771, 789-791, 807, 915,
957, 961, 980-981, 987
Graphics, 14, 357, 496, 498, 892, 1021, 1065
Gray, 635, 779, 807-808, 835, 964-965
> (greater than), 495
Grouping, 76, 105, 115, 124, 126-130, 137, 139-141,
143, 169-170, 177, 189, 195, 208, 409-410,
414, 514, 626, 667, 703-704, 714, 895, 943,
964, 966, 1039, 1062-1064
guidelines, 226, 264-265, 330, 332, 353, 428, 510,
520, 543-544, 733, 744-746, 854
guides, 653
GUIs (Graphical User Interfaces), 27
H
, 424
, 424
Hacking, 857
Handle, 21, 45, 48, 296, 298-299, 436, 472-475,
633-634, 667, 740, 748, 824, 829, 851,
1029, 1047
Handles, 27, 40, 46, 50, 458, 862, 892-893, 908, 916,
919
handling, 14-15, 41, 45, 403, 461, 467, 616, 674, 799,
803, 831, 841, 914, 956, 968, 1027, 1037
Hard disks, 593, 596
Hardware, 4, 13, 15, 18, 24, 34, 42-44, 47, 283, 313,
331, 334, 336, 593, 595-596, 627-629, 749,
837, 887, 909
Harmonic mean, 1016
Hash file, 611, 613, 619, 631, 633-635, 668, 702, 782
Hash functions, 614, 633, 635
Hash join, 695, 699, 701
Hash key, 592, 611, 615, 620-621, 630, 633, 691-692,
694, 718, 722
Hash table, 362, 611, 614, 695, 699-702
Hashing, 332, 588-635, 636, 640-641, 666-668, 673,
675-676, 679, 682-683, 695, 699-700, 702,
704, 716-718, 722, 730, 736-737
hash index, 668, 675, 736
hash table, 611, 614, 695, 699-700, 702
search key, 666, 668, 675, 736
typical, 592, 594-597, 602, 624, 629-630
, 423-424
Head, 423-424, 593, 596-597, 628, 630, 755, 969-970,
974, 976, 978, 980-981, 1054-1055
headers, 122, 603, 738
Heap sort, 688
Height, 208, 277, 370, 398, 963
height attribute, 208
Help, 25, 31, 37, 40, 42, 47, 56-57, 264, 318-319,
336-337, 354, 607, 631, 637, 744, 870, 909,
938, 957, 1017, 1028
Heuristic, 350, 686, 705-706, 708-709, 713-715, 724,
728, 1011
Hexadecimal notation, 88
hiding, 365, 879
Hierarchical model, 36, 50, 427
Hierarchy, 36, 50, 207, 255-259, 269-270, 275, 283,
298, 370, 372-373, 379, 389, 414, 444,
446-447, 589-591, 799, 801, 852-853, 876,
995, 1010, 1023, 1028, 1039, 1041
hierarchy of, 36, 207, 389, 589, 853, 1028
High-level languages, 311
HiPAC, 987
Histogram, 693, 718-720, 965
Hits, 284, 1014-1015, 1019-1020, 1029, 1031, 1033
Honda, 582
Host language, 36, 39-40, 51, 379, 454, 456-457, 459,
462, 471, 484-485
hotspots, 961
, 423-424
HTML, 22, 24, 420, 423-427, 437, 447-448, 490-494,
497-500, 502-503, 858, 892-893, 957, 993,
1021, 1024, 1027
HTML (Hypertext Markup Language), 420
HTML tags, 424, 493
Hubs, 627, 879, 1019-1020
Hue, 965
Hyperlinks, 22, 878, 998, 1018-1019, 1025
hypertext, 22, 420, 424, 491, 858, 1024
Hypertext Markup Language, 22, 420, 424
Hypertext Transfer Protocol (HTTP), 858
I
IBM, 33, 41, 47, 55, 83, 106, 109, 186, 188, 343,
351-352, 635, 649, 716, 745, 885, 988
Icons, 338
id attribute, 300
Identification, 205, 269-270, 274, 276-277, 325, 581,
868, 963, 965, 967, 1047
Identifiers, 106, 377, 382, 412, 425, 430, 674, 918,
961, 969
Identity management, 853, 876
IDREF, 429
IEEE, 416, 1017
Image tags, 967
images, 1, 23-24, 88, 424, 598, 600, 830, 930, 956,
1073
958, 963-967, 982, 984, 988, 1029
quality, 967, 998
images and, 598, 958, 964, 966
Impedance mismatch, 17, 457, 484-485
Implementation, 7, 10, 29, 32, 50, 87-88, 107, 109,
130, 135, 143, 201, 203-204, 298, 313-319,
328, 331-335, 347, 350, 353, 360-361,
365-366, 371, 378, 392, 400, 406, 416, 461,
476, 702, 729, 779, 846, 873, 875, 885, 989,
1047
implements, 7, 81, 103, 456, 877, 960
IMPLIED, 123, 130, 541-542, 777
import, 466, 476-478
importing, 466
IN, 1-5, 7-26, 27-38, 40-42, 44-53, 55-67, 69-81,
82-85, 87-109, 113, 128-143, 148-158,
160-198, 201-245, 246-262, 264-285,
287-304, 309-356, 377-417, 433-442,
444-448, 454-485, 488, 490-497, 499-506,
508-533, 535-549, 564-570, 572-579,
581-586, 588-617, 619-635, 636-642,
644-649, 651-663, 665-679, 711-730,
733-746, 747-752, 754-779, 807-808,
836-876, 889-898, 900-927, 929-989,
992-1033, 1039-1049, 1050, 1052,
1054-1056
incremental backup, 41
Index scan, 690
Indexed allocation, 603
Indexing, 22-23, 35, 289, 332, 348, 436, 589, 606,
617, 636-680, 682-683, 691, 717-719, 733,
736-737, 929-930, 959-960, 964-965, 967,
982, 988, 992-995, 997-1000, 1006,
1008-1012, 1027, 1031-1032, 1041-1042
Indices, 561
Indirection, 637, 645-646, 655, 657-658, 673, 676-678
Inference, 268, 551-555, 557, 566, 574-575, 579,
582-583, 586, 838-839, 871, 968, 973, 975,
977, 979-982, 984
Inference rule, 554, 557
infinite, 971, 974, 977-978, 1020
Infinite loop, 977
Infix notation, 970-971
Information:, 312
Information extraction, 1000, 1011, 1031
Information hiding, 365
Information retrieval, 24-26, 992-1033
Information security, 842, 872
Information system (IS), 313
Information technology (IT), 7, 310
INGRES, 106, 143, 198, 741, 926
inheritance, 22, 246-247, 249, 252, 256-258, 261, 266,
274, 299, 350, 359-361, 368-374, 376,
379-380, 387, 389, 394-395, 398-400,
402-403, 409, 412-415
class hierarchies, 359-360
specialized, 22, 257
subclasses, 247, 249, 252, 257-258, 266, 274, 299
inheritance using, 395
Initialization, 594, 630
Injection attacks, 857-859, 874-875
Inner join, 123-124, 164, 172, 190, 705
INPUT, 37, 39-40, 166, 318, 328, 333, 350, 413, 461,
465, 481, 492-494, 498, 503-504, 551, 555,
557-558, 565-567, 671, 700, 705-706,
715-716, 720-721, 724, 749, 857-859, 865,
892-893, 1042-1043
tag, 500
input validation, 859
Insert, 28, 31, 36, 50, 65, 72-74, 76-77, 102-104, 106,
109, 132, 140, 365, 384-385, 387-388,
500-501, 503, 514, 576, 605-606, 610, 655,
661-663, 673, 680, 702, 734-736, 801, 838,
851, 905, 916-917, 933-935, 937-941,
949-950, 983
inserting, 74, 132-133, 255, 365, 501-502, 505, 610,
646, 660, 662, 735, 775, 932, 935
files, 610, 646, 651, 653-654, 660, 662, 735
Insertion, 35, 72-73, 79, 83, 132, 255, 514, 520, 524,
543-544, 604-605, 610, 614, 616, 619, 621,
629, 633-635, 640-641, 649, 651, 654, 656,
659-660, 662-663, 665, 674-675, 679-680,
735, 802-803, 1065
installation, 48, 85
Instance, 31, 49, 58, 62, 66, 77-78, 114, 214-215, 217,
219-220, 230-234, 236, 238, 243, 264, 269,
271, 292-293, 336, 363, 365, 369, 378-379,
386, 422, 430, 440, 448, 521, 715, 723, 967,
1007, 1043
Instance method, 375-376, 378
Instance variable, 360, 363
Instances, 27-28, 30-31, 49, 59, 214-216, 218-219,
227, 229-230, 232-234, 238, 243, 268-270,
293-294, 411, 439, 523, 552, 627, 742, 745,
962, 967
instruction set, 671
Integers, 64, 362-363, 382, 612, 977
unsigned, 382
zero and, 496
Integration, 321-323, 325-327, 350, 352-353, 355-356,
413, 877-878, 925, 1000, 1018, 1021-1022,
1035
Integrity constraints, 18-19, 26, 56, 64, 66, 69-72, 74,
76-77, 79, 85, 89-90, 92, 103-105, 110, 191,
197, 200, 289, 355, 441, 452, 570, 581, 759,
885, 985
domain constraints, 64, 72, 76, 581
enforcing, 18, 76, 581
foreign key constraints, 107
intellectual property, 868, 873, 876
intensity, 965, 967
Interaction, 1-2, 27, 38, 45, 316, 336-338, 457-458,
493, 892-893, 993, 998, 1028-1030
Interaction diagrams, 337
Interconnect, 998
Interface inheritance, 371, 387, 394, 398-399, 413-414
Interfaces, 14-15, 18, 21, 24, 27, 34, 36-38, 40, 43,
46-47, 51-52, 106, 149, 314, 336, 350, 369,
372, 383-384, 386-387, 389, 392-394, 399,
413-414, 416, 465, 482, 500, 596, 995,
1017, 1022
Comparable, 1022
Iterator, 384, 387, 389, 413-414
List, 34, 369, 372, 386, 389, 393, 413-414, 416,
482
operating system, 14, 38, 43, 46, 329
Interference, 11, 758-760, 797-798
Interleaving, 625, 749, 752, 763-765, 769, 776,
782-783
Internal Revenue Service, 3
Internet, 1, 42, 425, 447, 458, 490-491, 499, 626, 628,
853, 855, 868, 878, 914, 919-922, 1026
IP address, 499
mobile, 868
Internet and, 868
Internet Applications, 420, 425, 922
Interpreter, 195-197, 476, 491, 493-494, 499
interpreters, 490
Interrecord gaps, 597
Interrupt, 754
Intersection, 99, 149, 155-158, 164, 167, 176-177,
189-190, 260, 388, 666, 687, 692, 696-697,
702-703, 718-719, 729, 950, 953, 979, 1046
interviewing, 233
Into clause, 461-462, 468-469
Intranet, 995
Intranets, 628, 868
Introduction, 2, 26, 132, 150, 177-178, 269, 273, 355,
369, 440, 454-489, 505-506, 747-779,
836-837, 860-861, 872, 892, 919, 929, 955,
968, 981-982, 992-1033
history, 759-760, 776-777, 982, 993, 997,
1029-1030
Inverted index, 999-1001, 1006-1007, 1011-1013,
1031
I/O (input/output), 595, 716
IP (Internet protocol) address, 499
Isolation, 12, 524, 627, 747, 758-759, 765, 774-778,
882
Item, 1, 5, 8-10, 16, 19, 30-31, 33, 78-79, 242, 283,
546, 581, 600, 734, 744, 750-754, 757, 760,
762-768, 770, 772-773, 777, 780-799, 801,
803-805, 809, 812-823, 830-835, 893, 905,
910-912, 962, 1026
Iterate, 387, 474
Iteration, 338, 700, 1000
Iterator, 93, 384, 387-389, 403-405, 409-411, 413-414,
457, 462, 468-470, 479, 484, 504
Iterator interface, 388
iterators, 384, 468-470, 485
cursor, 468, 485
interface Iterator, 384
J
Jackson, Michael, 963
Java, 17, 40, 45, 83, 105, 352, 394, 402, 454-455,
457-459, 466-471, 476-480, 484-486, 500,
892
Class Library, 484
keywords, 459, 468
Java code, 467
Java database programming, 471
JDBC, 471
JavaScript, 424, 454, 491, 505
strings in, 505
JDBC, 45, 56, 105, 466, 471, 476-480, 482, 484-486,
500, 892-893
drivers, 45, 466, 476
loading, 476
Job, 12-13, 138, 205, 233, 249, 253, 265, 271-272,
297-298, 501, 503-504, 600-601, 604, 606,
609, 612, 632, 677-678, 681, 734, 738,
850-851, 942
Join:, 162, 164
Join operation, 123, 160-164, 172-173, 181, 189, 194,
292, 294, 440, 514, 518, 532, 572, 698-701,
704-705, 711-712, 714-715, 720-724, 727,
729-730, 736, 896, 961
Join ordering, 723-724
Join selectivity, 163, 698, 720-722
K
Kernel, 909
Key access, 668, 675
Key distribution, 865
Key field, 608-611, 629, 633, 637-640, 642, 644-645,
649-650, 654-657, 659-660, 666, 676-678,
691-692, 718
keyboard, 282, 628
Keys, 37-38, 69, 73, 76-77, 79-80, 85, 87, 90-91, 107,
131, 205, 209, 216, 234-235, 241, 289-295,
299-301, 346, 350, 354, 377, 392, 400-401,
413-414, 427, 430, 496-497, 523-526, 530,
535, 543-544, 546, 550, 552, 573, 586, 619,
621, 637-638, 665-668, 674-676, 734,
864-866, 896-898, 1060-1061
candidate, 65-66, 76-77, 524, 526, 530, 532-533,
535, 543, 546, 548, 552, 558, 573
Sense, 241
keystrokes, 37-38
L
Languages, 14, 17-18, 21-22, 27, 34-38, 49, 51, 62,
66, 70, 83, 105, 119, 150, 177, 188, 198,
211, 245, 249, 258, 265, 273, 277, 329,
358-360, 362-363, 365, 371, 374, 382, 394,
402, 411, 416, 420-421, 424-425, 457-459,
484, 499-500, 886, 955, 971, 1024
Laptops, 897, 995
late binding, 361
Latency, 591, 595-596, 634
layers, 46, 892, 957
shape, 957
Layout, 338
layouts, 351
Leading, 22, 34, 45, 106, 164, 246, 258, 359, 362,
368, 412, 446, 496, 509, 617, 760-763, 767,
788, 803, 855, 904, 938, 989
Leaf, 166-167, 257-258, 266, 422, 427, 429, 436-437,
619-620, 651-652, 657-662, 664-665, 671,
673, 675-676, 678-679, 683, 703, 706-707,
713-714, 725-726, 745, 960
legacy systems, 622, 685
Lexicographic order, 1013
Libraries, 336, 466, 476-477, 484, 491, 500, 505, 627,
929, 993-994, 997, 1017, 1029-1030, 1032
licensing, 51
open source, 51
life cycles, 353
LIKE, 25, 33, 37, 47-48, 52, 55, 100, 142, 218, 229,
280, 303, 311, 328, 350-351, 428, 465, 546,
551, 629, 678, 688, 742, 745, 762, 773, 793,
866, 884, 914, 916, 958, 960, 987, 1006,
1019-1020, 1024, 1047-1048, 1058-1059
Line:, 461, 476
line comments, 492
Line numbers, 459, 491
Linear hashing, 617, 619-621, 630, 633-635, 673, 718,
722
Linking, 49, 332, 1019, 1022
links, 229, 319, 825, 883, 910, 917-919, 996, 1011,
1019-1020, 1023, 1027
IDs, 996, 1011
Linux, 491, 626
LISP, 372
List, 3, 7, 12, 25, 34, 49, 58, 60, 63, 72, 75, 95, 97,
1074
101-102, 107, 129, 134, 152-153, 158,
162-163, 167-170, 175-176, 181-183,
185-195, 236, 239, 254-255, 262, 275-277,
284-285, 354, 358, 369, 372-374, 377-378,
385-386, 388-389, 393, 401-402, 405-407,
409-411, 413-414, 416, 422-424, 429, 448,
485, 547-548, 614-615, 640, 665, 701-702,
815-816, 819-820, 822-824, 873, 924, 926,
951-952, 956, 961, 986-987, 999-1000,
1060-1061
Lists, 36, 84, 102, 129, 163, 166, 276, 351, 388, 409,
495, 549, 585, 615, 637, 688, 822, 831,
895-896, 905-906, 922, 1017, 1061
Literal, 87-88, 93, 100, 381-382, 386-387, 411, 495,
971-973, 982
Literal value, 362-363, 386
Load balancing, 623-624, 1045
Load factor, 621, 630, 635
Loading, 40-41, 103, 314, 333, 476, 592, 1026, 1036,
1043-1044
Local Area Network (LAN), 628
local data, 916
Local server, 493
Local variable, 482
localization, 882, 901
locations, 23, 42, 66-68, 71, 86, 90-91, 110-111,
143-146, 162-163, 199-200, 205-206,
212-213, 227, 229, 288-290, 293, 301, 305,
364, 366, 390-391, 419, 449-452, 486,
511-513, 526-528, 587, 607, 613-615,
617-618, 676, 731-732, 898, 927-928,
960-961, 990, 1059
Lock table, 782-783, 785, 788, 799
locked state, 785
Locking, 386, 740, 772, 780-787, 789, 791-792,
794-801, 803-808, 822, 863, 909-911, 925
Locking protocol, 772, 785-787, 800, 805-807, 822,
911
Locks, 740, 781-786, 788, 790-792, 795-796, 799-801,
803-807, 819, 908-912
Log record, 757, 814, 825-828, 833
Logic programming, 968, 975, 989
Logical operators, 179
login credentials, 856
Lookup, 291, 385, 389, 394, 726, 1012
Lookup table, 291, 1012
Loop, 315, 457, 460-463, 470, 475, 480, 483, 497,
503-504, 555, 562, 567, 694-695, 698-700,
704, 715, 721-723, 726, 977
loops, 316, 329, 456, 458, 462, 480, 483, 495, 694,
715
prompt, 462
Lossless join, 525, 561-563, 566, 574, 584, 586
Lost update problem, 752-753, 762, 765
low-level, 28-29, 36, 40, 50, 812, 965
M
machine, 42-44, 277, 339-340, 458, 480, 909, 915,
994, 1025
Magnetic disks, 29, 588-593, 629
Magnetic tape, 588, 592, 597, 829
Mail servers, 43
main function, 981
Main memory, 17, 40, 589-592, 595, 597-598,
603-605, 611, 644, 688-690, 695, 698-699,
701, 716-717, 719, 721, 723, 750, 757-758,
811-812, 814-815, 821-824, 835, 881, 1054,
1056
maintainability, 1045
Mandatory access control (MAC), 848, 872
Manufacturing, 1, 22, 24
Many-many relationship, 197
Many-to-many relationship, 566
Map, 34, 280, 287, 291-293, 301-302, 317, 362, 394,
401-402, 413, 415, 457, 523, 543, 667, 672,
943, 956-958, 962, 1018
mapping, 23, 32-35, 38, 56, 60, 62, 92, 203-204,
287-308, 315-317, 332, 347-348, 350-353,
394, 399-402, 415, 436, 447, 508-510, 543,
550, 896, 901, 914-915, 926, 1023
value, 60, 62, 211, 291, 294-295, 297-298, 300,
351, 400, 543, 914, 1023
Maps, 1, 23, 25, 277, 613-614, 852, 955-956, 963, 998
margin, 903
markers, 338, 610, 629, 640
Marketing, 24, 28, 1026
Markov model, 1020
Markup language, 22, 47, 420-448, 450-453
markup languages, 424, 1021
Mass storage, 41, 590
Master file, 610, 629
Materialization, 135, 724, 729
Materialized view, 136, 516
math, 5-6, 54, 112, 147, 487
Matrices, 1012, 1037
Matrix, 561-563, 584, 626, 801, 807, 844, 965, 999,
1037-1038
singular, 965
translation, 965
Maximum, 70, 87-88, 125, 169, 171, 218-219, 229,
235, 243, 283, 312, 363, 377, 409, 434, 601,
615, 619, 621, 792, 871-872, 955
Maximum value, 125, 792
Mean, 2, 57, 62, 125, 198, 226, 362, 371, 389, 414,
554-555, 624, 713, 767, 832, 944, 962, 970,
1016
measurement, 57, 350, 956, 1029
Media, 1, 28, 313, 598, 754, 834, 967, 982, 997, 1028
guided, 1028
Median, 485, 1025
Megabyte, 595
Member, 49, 158, 179, 209, 214, 247-249, 257-258,
260, 262, 264, 269, 283, 298, 397, 408-409,
411, 526-527
Memory, 17-18, 40, 43, 60, 282, 329, 411, 589-592,
595, 597-598, 602-605, 607, 611, 623, 625,
629, 635, 644, 688-690, 694-695, 698-702,
716-723, 727, 750, 811-812, 821-824, 835,
862-863, 879, 881, 959, 1054
allocation, 603, 607
features, 623
flash, 590, 635
operations of, 411, 690, 754, 757, 811-812,
821-822
secondary, 589-592, 597, 607, 623, 629, 635, 644,
695, 698, 716-717, 719-722, 727, 881,
887, 959
memory cards, 590
Memory hierarchy, 590-591
Memory management, 411
Menus, 36-37, 201, 331, 495, 1065
Merge algorithm, 688-689, 729
Messages, 19, 337-338, 424, 502, 818, 865, 887, 912,
930, 963-964, 982, 993-994
reliability, 881
response, 912, 993
Metadata, 728, 877, 883, 885, 891, 913-914, 919, 925,
967, 988, 996-997, 1000-1001, 1011, 1024,
1028, 1036, 1044-1045
3D objects, 988
Metal, 1028
Method, 10, 57, 164, 189, 227, 235, 249, 265, 285,
360-361, 365-367, 369-373, 375-378, 402,
406, 414, 420, 455, 494, 496, 498, 567, 606,
610-611, 613-614, 624, 629-630, 650, 664,
667, 674, 682, 691-695, 698-699, 718-724,
726, 728, 791, 804, 806-808, 818-821, 823,
831, 843, 848, 909-912, 925, 997, 1025
Add, 10, 265, 367, 629, 674, 728, 804
Exists, 371-372, 496, 498, 567, 636, 692-694,
721-722, 791, 848
methods, 48, 130, 137, 164, 235, 285, 319, 321, 329,
334, 342, 345, 358, 360, 372-374, 378-379,
401-402, 406, 412-413, 415, 454, 606, 673,
690-694, 714, 719, 722, 724, 729, 758, 771,
776, 797, 807-808, 818, 830-831, 863, 908,
910-912, 976, 997, 1005, 1023-1025, 1043
class name, 401
definitions, 360, 373, 776, 863
driver, 466, 476
fill, 673
get, 342, 693, 722, 767, 818
responsibility, 912
turn, 374, 466, 476, 588
valued, 401
Metrics, 992, 1014-1015, 1029
Microprocessors, 622-623
Microsoft Access, 2, 48, 1058
Microsoft SQL Server, 501, 915
Millisecond, 383-384
Minimum, 88, 125, 169, 188, 194, 219-220, 235, 283,
350, 434, 509, 626, 662, 700, 735, 797, 854,
861, 872, 890, 960, 1058
Mod, 607, 611-614, 619, 621, 625, 633, 701, 866
Mode, 481, 684-685, 783, 795, 799-803, 805, 838,
869, 887
Model Tree, 427
Modeling, 15, 22-23, 29, 56, 75, 201-245, 246-247,
249, 259, 261, 268, 272-275, 283-285, 303,
320, 322-323, 328-329, 332, 334-335, 340,
343-345, 348, 351-352, 355-356, 381, 481,
885, 965, 967, 988, 999-1000, 1026, 1050
theory, 56, 273, 988
Models, 11, 22-23, 27-30, 33, 47-51, 53, 56, 61-62,
106, 216, 236, 246-247, 258, 268-269, 271,
273, 285, 287, 295, 304, 320, 334-336,
352-353, 355-356, 360, 362, 367-368, 373,
416-417, 424, 441, 621, 746, 808, 867,
885-886, 915, 924, 929-989, 991,
1005-1008, 1022, 1029-1030, 1032,
1050-1052
activity, 11, 28, 335, 348, 961, 1029
behavioral, 335-336, 352
interaction, 27, 336, 1029-1030
semantic data, 246-247, 268-269, 271, 285, 356
structural, 202, 236, 271, 285, 287, 335-336, 352,
1005
use case, 335-336, 352
Modem, 283
Modes, 318, 795-796, 801, 993, 998, 1029-1030
Modular design, 1044, 1048
Module, 17-18, 23, 27, 38, 40, 436, 500, 686, 759,
783, 897, 905
Modules, 11, 14, 23, 27-28, 38-40, 45-46, 51, 83, 105,
458, 480-482, 893, 907, 920-921, 994
MOLAP, 1046, 1048
Monitor, 3, 35, 43, 132, 282, 455, 465, 628, 739,
803-804, 940-941, 1058
Monitors, 41, 331
Mouse, 37, 282, 628
move, 281, 311, 330, 399, 438, 464, 615-616, 627,
640, 661-662, 713-714, 767, 842, 908
Movie database, 244
movies, 23, 242-244, 930, 963, 982
MP3, 590
MTBF (mean time between failures), 624
Multimedia, 1, 22, 83, 627, 929-930, 963, 965-967,
981-983, 987-988, 993-994, 996
image, 963, 965-967, 988
Multiple, 4, 8, 11, 14-16, 18, 25, 31, 42-43, 45-47, 88,
97, 103, 105, 124, 135-137, 164, 173, 209,
233, 239, 241, 243-244, 256-258, 269-270,
281, 289, 295-297, 299, 311-312, 338, 351,
371-373, 387, 394, 405, 414, 438-439, 442,
446-447, 461-463, 465, 468, 479-480, 495,
509, 528-529, 547-548, 588, 614, 625-628,
636-637, 665-666, 674-676, 691-692,
718-719, 723, 741-745, 747-749, 794-795,
797-801, 806-808, 828, 849-850, 878-880,
882-883, 886-887, 890, 897, 909-910, 926,
960, 970, 989, 1022-1023, 1026-1029,
1042-1043
declarations, 8
Multiple inheritance, 252, 256-258, 299, 372-373, 387,
394, 399, 414
Multiplication, 101
Multiplicity, 228-229, 275, 356, 434
Multiprogramming, 748, 863
multiuser, 8, 11, 15, 20, 47, 748-749, 776, 819-820,
822-824, 830-831, 838, 1037
Multivalued dependency, 510, 538, 540-541, 544, 551,
575, 578, 584
Mutator, 378
Mutual exclusion, 781, 852, 872-873, 876
MySQL, 48, 55, 106, 311, 491, 500-501, 746
N
name attribute, 65-66, 80, 211, 213, 223, 229, 434,
493
named, 85, 91, 109, 138-139, 176, 192-194, 367-368,
371, 378, 380, 394, 403-404, 406-408, 412,
414, 468-470, 708, 750-751, 798, 866, 936,
941, 1011
names, 2-3, 7-8, 10, 20, 30, 38, 56-59, 62-63, 65, 67,
91-93, 95-97, 99, 103, 107-108, 117-123,
134-135, 141-142, 154-158, 162, 170, 173,
175-176, 178, 183, 185, 188, 190-192,
195-197, 202, 214-215, 217, 223-224,
227-230, 239, 244, 270, 356, 360-361,
381-382, 391, 403-406, 408, 412, 425-429,
437-438, 440, 447, 459-460, 468-470, 483,
499-500, 595, 684, 905-906, 922, 968-969,
971-972, 979, 1006-1007, 1058-1060, 1063
Namespace, 434
namespaces, 433
Naming conventions, 223, 236, 354
NASA, 591
1075
National Library of Medicine, 1009-1010
Natural join, 123-124, 162-164, 167, 172, 175, 177,
189, 194, 294, 514-515, 518-519, 542, 546,
559-561, 570-572, 694, 738
Natural language processing, 1018
navigation, 50, 653, 892, 1017, 1020, 1023, 1025,
1027-1028
Negation, 183, 185, 471, 670, 975, 1063-1064
Neighborhood, 963
Nested, 49, 61, 102, 105-106, 115, 117-122, 126,
129-130, 139, 141, 164, 195, 208, 235, 237,
408, 422, 428-429, 438, 528-529, 686-687,
694-695, 698, 721, 723, 726, 730, 742-743,
926, 989
Nested relation, 164, 528-529, 532
Nested-loop join, 694-695, 698, 700, 704, 721, 723,
726
nesting, 121, 130, 208, 362, 374, 412, 528
Network, 21-22, 29, 40-41, 43-44, 46-47, 49, 56, 82,
334, 399, 416, 598, 605, 622, 627-628,
630-631, 754, 828, 839, 877-880, 882-883,
885, 887-889, 901-902, 910, 913, 916-919,
924-925, 956, 995-996, 1009-1010, 1043
Network management, 416
Network model, 21, 49, 399
Network security, 47
networking, 878, 1018
data communications and, 878
networks, 27, 42, 44, 47, 588-589, 598, 626-631, 855,
868, 879, 881-882, 902, 924, 930, 958, 988,
1028, 1046
New York Stock Exchange, 955
next(), 469-470, 477-479, 485
Next Page, 679
Nodes, 166-168, 182, 257-258, 266, 336, 422, 427,
437-439, 619, 627, 651-652, 654-660, 662,
664-665, 671, 675-676, 678-679, 706-708,
713-714, 728, 745, 767-768, 800-803,
862-863, 878-879, 881, 887, 915-916, 957,
962, 980
children, 438, 801
descendants, 800
levels, 427, 438, 651-652, 656-658, 665, 675, 678,
803, 862-863
subtree of, 651-652
Noise, 861, 867-868, 962, 966, 995, 1023
reducing, 966
NOR, 30, 74, 98, 213, 259, 386, 401, 497, 519,
530-532, 540, 788, 794, 903, 994
Normal, 61, 152, 170, 350, 509-510, 520, 523-528,
532-533, 535-545, 547-549, 550-552,
558-559, 575, 578-579, 581-584, 586, 591,
742, 826, 869, 950, 989, 1041
Normalization, 16, 56, 64, 350, 508-549, 550, 552,
559-562, 564, 567-568, 572-573, 576-578,
581-586, 737, 741, 988, 1003
normalizing, 529, 531, 541, 548, 584, 1041
Notation, 49, 52, 56, 58, 61-63, 88, 166-167, 169-170,
178, 182, 187, 194, 201-202, 205-206,
210-211, 225-230, 234, 247-248, 253-254,
266-267, 274-275, 284, 306-307, 317,
319-320, 328, 334-340, 344-345, 347-350,
355-356, 364, 377, 380, 386-387, 394-395,
402, 404, 415, 429, 480, 513, 523, 552-553,
579, 600, 706-707, 760, 801, 932-934,
938-940, 968-971, 982, 984, 1050, 1052
null character, 460
NULL pointer, 473, 612, 643, 652
Number class, 6, 30, 54, 112-113, 147, 487-488
Numbers:, 165
Numeric data, 64, 87
O
Object:, 339, 370
oriented programming, 17, 22, 249, 265, 358-359,
367, 380, 411, 476
use an, 33, 45, 106, 204, 405
object classes, 269, 401
object element, 384-385
Object Management Group (OMG), 334
object-oriented, 10-11, 17, 22-23, 28, 33, 45, 53, 84,
106, 249, 259, 265, 320, 334-335, 339-340,
348, 355, 357-360, 367, 371, 380, 411-414,
416, 459, 466, 471, 476, 588, 621, 989,
1024, 1043
requirements analysis, 334
Object-oriented design, 334
Object-oriented model, 33
objects, 10, 17, 22, 24-25, 28, 35, 45, 229-230, 259,
266-271, 277-278, 335-339, 344, 357-365,
367-368, 370-374, 377, 381-383, 386-390,
392-395, 399, 401-407, 409-417, 420, 422,
424, 447, 478-479, 484, 600, 621-622,
854-856, 861-863, 880, 913, 916, 943, 951,
955-964, 966, 982-984, 988, 1024
distance of, 959, 966
grouping, 409-410, 414, 943, 964, 966
manager, 229, 247, 259, 364, 372, 390, 392, 852,
984
script, 424
state of, 360, 390, 996
template, 266, 411
visible, 360-361, 365, 849, 856
ODBC, 45, 56, 105, 471, 485, 500-501, 892-893, 916
Offsets, 1007, 1011
OLAP, 1, 83-84, 106, 1026, 1034-1049
OLTP, 11, 48, 75, 1035
OPEN, 45, 48, 51, 55, 105-106, 393, 455, 458,
462-464, 471, 490-491, 505, 604-605,
613-614, 918, 955, 958
Open addressing, 613-614
Open source, 48, 51, 55, 490-491, 505
Open source software, 51
opening, 312, 495
Operand, 407
Operands, 166, 371, 705-706, 959
Operating system, 14, 17, 38, 46, 55, 329, 332, 334,
491, 594, 628, 740, 750, 769, 812, 857, 882,
887, 909, 920
Operating system (OS), 38, 909
operating systems, 491, 537, 746, 751, 782, 812-813,
835, 863, 922, 998
execution of, 748
Operations, 10-12, 19, 22-23, 26, 28, 30-32, 35, 37,
39-40, 48, 62, 71-72, 75-77, 81, 82, 99-100,
106, 117, 124, 133-134, 148-152, 154-158,
163-169, 171-175, 177, 182-183, 188-190,
193-195, 197, 202-204, 229-230, 245,
265-266, 268, 294-295, 298, 336, 340, 342,
347, 357-360, 365-369, 371-374, 378-380,
386-392, 399-402, 408-409, 411-415, 469,
484, 517-518, 520, 543, 604-606, 624,
630-631, 635, 686-687, 690, 701-709,
711-714, 716, 720-724, 726, 728-730, 738,
748-752, 754-773, 776-777, 780-786,
792-794, 799, 801-804, 811-812, 815-822,
824-825, 829-834, 840-841, 862-863, 918,
943, 952-953, 956-957, 959-961, 978-979,
981-982, 984, 988, 1026, 1031, 1045-1046
operator overloading, 360, 371, 373, 414
Optimistic concurrency control, 797, 806-807
optimization, 18, 22, 39, 45, 55, 61, 82-83, 130, 183,
348, 589, 684-730, 732, 897, 901-903, 918,
920, 924-925, 979, 981-982, 997
search engine, 997, 1027
Optimizer, 39, 183, 684-687, 692-693, 703, 708, 711,
715-721, 723-728, 744-745, 889-890, 907,
918
OR function, 481-482, 551
Oracle:, 478
Orders, 60, 177, 192-193, 196, 243, 464, 672,
723-724, 729, 760, 764, 766, 852, 863, 918,
1037
Orthogonality, 413, 415
OUTPUT, 37, 39-40, 318, 328, 333, 481, 595, 705,
716, 749, 865, 892, 1042, 1059
Overflow, 610, 612-617, 619-621, 629-630, 633-634,
640, 649, 661-663, 680, 754
Overflows, 617, 620, 662, 741, 960
overhead costs, 24-25
Overlap, 20, 842, 997
Overlapping subclasses, 298, 1052
Overloading, 360, 371, 373, 379, 414, 911
P
Packet, 628
packets, 628-629
Padding, 173
page, 2, 421-422, 425, 490, 493, 500, 505, 679, 723,
741, 799-801, 803, 805, 812-815, 823-828,
831, 858-859, 999, 1017-1022, 1024-1025,
1027, 1033
Page numbers, 422, 637
Page tables, 812, 827
pages, 22, 24, 37, 40, 45-46, 140, 276, 333, 420-423,
425, 490-491, 493-494, 504, 594, 637,
740-741, 745, 779, 799-801, 812-814,
823-826, 828, 830, 892, 993-996, 998-999,
1004-1005, 1008, 1011, 1017-1027, 1031
extracting, 1011, 1018
first, 421, 425, 490-491, 493-494, 779, 799-800,
812-814, 825-826, 830, 999, 1021
last, 491, 504, 814, 825-826, 828, 830
Paging, 739, 810, 823-824, 829-831, 835
page replacement, 835
panels, 1023
paper, 55, 81, 244-245, 277, 304, 416, 635, 730, 875,
989, 1032
paragraphs, 993, 1005
Parallel processing, 48, 749, 761, 1046
parallelism, 623, 882
Parameter, 333, 473-474, 479, 481-482, 485, 717,
754, 858-859, 1054, 1056
Parameters, 10, 35, 37-38, 40, 48, 105, 204, 230, 328,
332-333, 379, 406, 456, 465, 468, 472-475,
479, 481, 484, 492-493, 500, 631-632,
717-718, 729, 740, 752, 1004-1005, 1025,
1054-1056
Parent, 36, 50, 221, 245, 256, 346, 428, 435, 437-438,
605, 659-662, 665, 802-803, 871, 986
Parent class, 346
Parity, 623-626
Parity bits, 624-625
Parser, 684, 705, 708
Parsing, 684-685, 1008, 1025
Partitioning, 410, 637, 668, 674, 695, 699-701,
741-742, 861, 895, 910, 914, 921, 923,
1044-1045
Pascal, 17, 612
Passing, 46, 359-360, 848, 859
Password, 283, 458, 460, 466-467, 473, 477-478,
501-502, 857, 859, 864, 919
Passwords, 17, 838, 840, 872
Path, 29, 34, 48, 50, 403-404, 406, 410, 413-414,
437-439, 481, 493, 499, 610, 691-693, 727,
800, 802-803, 915, 917, 956-957
Path expression, 404, 410, 438-439
paths, 10, 29, 32, 41, 51, 105, 204, 257, 316, 329,
332-333, 339, 607, 636, 690, 692, 694, 706,
719, 724, 726, 735-736, 915, 950, 1025,
1028, 1047
Pattern, 100, 437-438, 537, 625, 956, 962, 964, 1000,
1007, 1024-1027
Pattern recognition, 962
patterns, 23, 333, 437, 739, 865, 913, 1008-1009,
1018, 1024-1027, 1033, 1047
PEAR, 491, 500-505
Peers, 915, 1028
Perfect, 842, 1014
performance, 15-16, 18, 20, 22, 34-35, 38, 41,
312-317, 331-334, 350, 352, 588-590, 592,
595, 618-619, 622-629, 635, 674-675,
689-690, 723, 733-736, 739-742, 744-746,
776, 781, 798, 805-808, 834, 838, 859, 882,
897, 902, 908-909, 913-914, 929, 961, 1014,
1026, 1028, 1034-1035, 1037, 1042,
1045-1046
Peripherals, 627
Perl, 424
Permutation, 864
Persistence, 352, 361, 365, 367, 381, 412, 414
Persistent storage, 17, 281, 855, 1046
Personal computer, 2, 27, 591
Personal information, 502, 836, 842-843, 895, 995,
1018
Phantom, 767, 775, 803-806
Phase, 7, 41, 201, 204, 313-321, 328-329, 332-334,
348, 351, 353, 400, 688-689, 695, 700-701,
729, 772, 780-781, 785-789, 795, 797-798,
801, 805-808, 819, 822, 824-829, 832,
907-910, 913, 918, 925, 963, 1042
Phone numbers, 57-58, 239, 277, 377, 742, 1010
PHP, 454, 490-506
Physical data model, 32
Physical design, 7, 203-204, 314-315, 317, 330,
332-333, 348, 350-351, 354-355, 733, 736,
745-746
Physical tables, 741
Picture elements, 965
pipelining, 590, 705, 715, 723-724, 726, 729
strategy, 705, 715
Pivoting, 1039
Pixels, 964-966
Plaintext, 864-866
planning, 23, 46, 626, 685, 886, 962
platters, 591
Point, 11, 18, 31, 57-58, 63, 70, 87, 210-211, 227, 265,
1076
318, 350-351, 355, 369-371, 380-382, 388,
392, 403-404, 406, 411, 460, 509, 583, 600,
606, 617, 627, 658, 661, 746, 755-756,
758-759, 811-812, 815-821, 823-824,
826-828, 830, 884-885, 943-944, 947-949,
953-954, 957-959, 977, 988-989, 1020,
1033, 1058
pointer, 49, 362, 462, 472-473, 600, 602-605,
612-614, 616, 618-619, 632, 638-646, 649,
651-663, 665, 668, 671, 673, 677, 703,
802-803, 816, 960
pointers, 22, 332, 603, 614, 616, 619, 622, 634,
636-637, 644-646, 648, 652-662, 665-666,
671, 673, 677-678, 692, 718-719, 725, 804,
960-961, 996, 1028, 1040
point-to-point connections, 627
polygon, 372, 956-957
area, 372, 956-957
polymorphism, 361, 371, 373, 402-403, 414, 416
Port, 302
Position, 9-10, 50, 58, 88, 256-257, 267, 281, 299,
310, 384, 387-388, 409, 416, 418, 440, 462,
464, 469, 479, 587, 601-602, 607, 610,
612-614, 640, 661-662, 713, 826, 956-957,
959, 968-969, 991, 1012-1013, 1015-1016,
1030, 1054
power, 42-43, 92, 102, 115, 164, 177, 189, 212, 216,
365, 481-482, 527, 590, 614, 978,
1034-1035
Precedence, 767-769, 771, 778
Precedence graph, 767-769, 771
Precision, 87-88, 354, 382, 742, 842, 872, 1009, 1011,
1014-1016, 1018, 1029-1030
Predicate, 55, 62, 150, 179, 187, 189, 253-255, 259,
262, 265-266, 274, 672, 744, 804, 806-807,
870, 961, 964, 966, 968-971, 974, 976,
978-981, 986-987
Predicates, 185, 869, 923, 956-957, 959, 961,
968-974, 976-981, 984, 987, 989
Prediction, 957, 961
preferences, 342, 1018, 1023-1024
Documents, 1018, 1023-1024
Measuring, 1018
Search, 1018, 1023-1024
Prefixes, 456, 595, 1009
preprocessor, 459, 468, 471, 491-492
prerequisites, 4, 7, 11-12, 342, 941
presentation layer, 46, 892
Pretty good privacy, 855
Primary index, 610, 637-641, 644-645, 647-651, 675,
677-678, 691, 718, 721-722, 736-737, 741
Primary key, 65-66, 69-70, 72, 74, 76, 80, 86, 89-91,
105, 137, 145-146, 160, 211, 289-297, 300,
354, 400, 435, 515, 519-520, 523-524,
526-533, 538, 543, 545, 547, 638-641,
649-650, 676, 692, 735, 895-896, 947
Primary keys, 66, 69, 80, 289-293, 435, 523-524, 530,
532, 544, 550, 552, 896-897
Prime number, 614
Prime numbers, 866
Primitive, 271, 322, 1024
Primitives, 1022-1023
Print server, 43
Printers, 42-43
Printing, 11, 20, 44, 457, 492, 495-496, 818, 831, 997
Printing press, 997
Priorities, 313, 318, 735, 770, 791
privacy, 80, 836-837, 842-843, 854-855, 860, 863,
867-868, 872-874, 876
audit, 873
medical, 837, 867
right to, 837
privacy issues, 867, 873
private, 278, 379, 628, 837, 855, 865-866, 869-871,
873
Private key, 865-866, 873
Private keys, 865
Privilege, 84, 106, 378, 840, 843-848, 854, 856, 859,
873, 875
Privileges, 106, 836, 838-840, 843-848, 852, 854,
870-875, 883
least, 848
Probing, 695, 700-702, 723
Procedure, 132-133, 137, 288, 299, 325, 370, 441,
481-482, 509-510, 528, 532, 543, 550-551,
561, 573, 581-582, 607, 621, 678, 810, 819,
824-825, 918, 935, 955, 973, 984
Procedures, 19, 23, 28, 46, 48, 62, 119, 287, 319, 344,
400, 480-482, 485, 516, 581-583, 628, 656,
665, 812, 854-855, 859, 870, 893, 916, 918,
934, 955
Process, 3, 5, 11, 17, 30, 35-36, 45, 50, 64, 93, 130,
160, 202, 226, 234, 236, 244, 251, 258-259,
268-270, 301, 309, 313-318, 321-323,
327-329, 340-341, 347-349, 351-353, 355,
426, 428, 476, 483, 523-525, 532, 538, 581,
597-599, 605, 622, 624, 630, 674, 724,
748-749, 761-763, 768, 812-813, 817-818,
830, 832-833, 849, 863-864, 909-912, 914,
916, 922, 973-974, 998-999, 1001-1002,
1005-1006, 1008-1013, 1023-1025,
1028-1029, 1048-1049
states, 205, 340-341, 581, 761, 781, 863-864
Processes, 1, 3, 33, 35, 38, 40, 46, 251-252, 258-259,
274, 313, 328-329, 336, 598, 654, 739-740,
745, 748-749, 769-770, 819, 862, 893, 901,
909, 916, 918, 999
suspended, 740, 749
processing, 1, 7-9, 11-15, 17-18, 20-22, 24-26, 33,
37-38, 40, 42-45, 48, 75, 78, 82-84, 106,
130, 153, 310-312, 314-316, 319, 328, 336,
341, 428, 436, 456-457, 462-463, 484,
494-496, 546, 589, 592, 623, 627, 684-730,
732, 735-737, 747-779, 794, 832, 835,
886-887, 891-893, 901-902, 911, 913, 918,
920-921, 924-926, 979, 981-982, 988-989,
994, 999-1000, 1011-1012, 1018, 1032,
1034-1035, 1044-1046, 1048-1049
processors, 20, 428, 430, 749, 882
Product operation, 97, 124, 158, 160, 189, 702, 711,
714
Production, 23, 244, 311, 930
program, 1, 4, 8-11, 14, 17, 19, 21, 25, 31, 34-35,
40-42, 203, 236, 256-257, 267, 299, 340,
367, 371, 382, 402, 411, 413, 418, 421, 428,
454-481, 483-485, 490-493, 495, 505-506,
545, 590-591, 604-605, 607, 611, 613-614,
632, 634, 774, 861-863, 892, 968, 971, 977,
986, 991, 1021
Program code, 19, 333, 402, 456, 461, 465, 481, 484,
491, 634
Program modules, 455, 480-481
Programmer, 17, 130, 329, 359, 389, 392, 411-412,
455, 458-460, 462-464, 466-467, 472,
479-480, 484, 499-500, 502, 733, 750, 789,
863
Programming, 7, 14, 17-18, 21-22, 35-36, 38, 45-46,
49, 51, 56, 62, 105, 119, 211, 249, 265,
329-331, 334, 358-360, 362-363, 365-367,
371, 394, 402, 411, 416, 428, 454-489,
490-506, 600, 719, 869-870, 924, 968, 975,
989
bugs, 754
object-oriented, 17, 22, 45, 249, 265, 330-331, 334,
358-360, 367, 371, 380, 411, 416, 459,
466, 471, 476, 989
Programming errors, 754
Programming language, 17-18, 21, 35-36, 40, 45, 49,
51, 83, 93, 105, 334, 358-360, 365-367, 373,
394, 411, 440-441, 454-459, 466, 468, 471,
480-481, 484-485, 495, 500, 505, 870, 975
Programs, 3-5, 7-11, 14, 19-21, 24, 27, 31, 33-34,
38-46, 64, 70-71, 203-204, 313-314, 317,
333, 371, 425-426, 441, 454-456, 459, 461,
464, 466-467, 476, 481, 484, 491, 493, 584,
590, 599, 601-604, 607, 748-749, 759, 851,
968, 971, 975-976, 998, 1027, 1045-1046
context of, 590, 892
project management, 1047
Project operation, 97, 152-155, 175, 687, 701, 705,
714-715, 979
Projection, 93, 95, 153, 155, 168-169, 189, 560, 564,
568-569, 675, 696, 708, 711-712, 714-715,
724, 726, 924
Prolog, 62, 549, 585, 968-970, 973, 975, 977, 979,
982, 984, 989
Properties, 2, 10, 12, 48, 65, 204-205, 229, 246, 252,
268-269, 348, 361, 386, 390, 392, 395, 399,
401, 412-414, 427, 478, 509-510, 524-525,
540, 558, 564, 567-569, 573-575, 582-583,
747-748, 758-759, 763, 776-777, 805,
855-856, 866, 958-960, 965, 983, 999-1000,
1014, 1018-1020, 1026-1027
of algorithms, 510
Property, 9, 12, 29, 36, 64-65, 163, 211, 361, 390, 399,
520-522, 525, 533-534, 536, 538, 541-542,
551, 556, 558-569, 573-574, 577, 582-586,
619, 700, 759, 765, 768, 780, 830, 849, 868,
873, 876
Get, 36, 538, 558, 560-561, 566, 765
Set, 29, 36, 64-65, 211, 390, 521, 525, 538,
541-542, 551, 556, 558-562, 564-569,
573-574, 577, 582-586, 780
Property rights, 868, 873, 876
Protocol, 441, 499, 627-629, 772, 780, 785-789, 791,
793, 797-801, 803-807, 811, 813-816, 822,
828-833, 855, 858, 892, 907-911, 913, 916,
919, 921-922
LAN, 628
SSL, 855, 919
protocols, 56, 440-441, 756-757, 759, 767, 771-772,
776, 780-781, 788, 790, 794, 804-807, 810,
831, 855, 916-917, 920, 925
prototyping, 15, 333
Pruning, 723
Pseudocode, 634
Public domain, 311
Public key encryption, 839, 864-866
publications, 416, 863
Publishing, 22, 193, 436
Q
Queries, 4-5, 7-8, 11, 13, 16-17, 21, 23, 26, 35, 37-39,
46, 66, 83, 96-99, 101-102, 105-109,
115-147, 148-149, 154, 166, 168-169,
171-172, 174, 177-178, 181-183, 185,
187-197, 201, 204, 333-334, 353, 400,
402-404, 406-409, 413-415, 448, 455-459,
464-465, 468, 503-504, 506, 666-667, 670,
674-675, 684-687, 691, 715-716, 723,
727-730, 733-740, 742-746, 857, 868,
881-883, 886, 889-890, 892-894, 922, 953,
955-956, 959, 969, 978-980, 982-984, 989,
994, 996-997, 1000-1001, 1006-1008,
1013-1014, 1029-1030, 1039, 1041-1042,
1045-1046
Query, 4, 7, 13-14, 17-18, 21-23, 34, 36-40, 45-46, 48,
50-51, 55, 71, 82-83, 93, 95-104, 107,
117-126, 128-132, 134-137, 141-142,
152-153, 160, 162, 165-169, 174-179,
181-190, 194-195, 198, 245, 329, 348, 359,
365, 402-410, 413, 416, 421, 425, 437,
439-440, 447, 456-457, 461-465, 468-475,
478-480, 482-484, 500-505, 622, 665, 667,
669-670, 672, 684-730, 732, 734-736,
739-740, 742-745, 842-845, 854, 877-879,
881-883, 886, 889-894, 901-907, 915,
918-925, 952-954, 958-959, 963-965, 971,
973-974, 979-982, 985-989, 993-994,
996-1017, 1019-1020, 1022, 1024, 1029,
1031-1032, 1044-1046, 1058-1065
Query:, 118, 129, 152-153, 194, 528, 665, 703, 706,
727, 742, 893, 903, 979, 987
Query compiler, 39, 889-890
Query execution, 18, 166, 183, 685-686, 705-706,
708, 714, 716-717, 728-730, 902, 905
Query language, 7, 13-14, 23, 36, 48, 51, 55, 83,
148-149, 166, 177, 190, 198, 245, 359, 365,
380-381, 402, 413, 437, 684-686, 843, 954,
988, 1006, 1026, 1032
Query optimization, 39, 45, 130, 183, 348, 684-686,
692, 705, 711, 715-716, 724, 726-730, 739,
901-902, 918, 920, 979, 982, 997
Query processing, 17-18, 22, 55, 61, 589, 684-730,
732, 735, 878-879, 891, 901-902, 904, 920,
924-925, 979, 981, 988-989, 996, 999-1000,
1045-1046
Queue, 782-783, 788, 791
priority, 791
Queuing, 624-625, 918
Quick sort, 688
quotation mark, 425
quotation marks, 87, 100
R
Race conditions, 909
RAID, 588, 598, 622-628, 630-631, 635
RAM (random access memory), 590
Range, 14, 88, 150, 178-179, 181, 186-187, 189-190,
211, 348, 350, 352, 403, 405, 424, 439, 611,
630, 666-668, 691, 734, 736-737, 744, 861,
897, 959, 967, 1013, 1018, 1020, 1046,
1054
Range query, 667, 959
READ, 4, 17, 38, 40, 75, 109, 224, 338, 351, 378, 381,
464, 485, 590-591, 593, 595-598, 603, 605,
1077
607-608, 610, 624-625, 629-630, 634,
660-661, 674, 676, 679, 688-690, 698,
700-701, 739, 748-757, 760-764, 766-768,
772-775, 777, 781-789, 792-805, 813-818,
820-822, 828, 831-832, 835, 838, 844, 862,
864, 870, 913, 918, 969-970, 992,
1046-1047, 1054-1056
Read operation, 772, 802, 913
Read uncommitted, 774-775
reading, 11, 75, 598-599, 603-605, 610, 626, 634, 689,
716, 726, 749, 754, 783, 794, 796, 940,
1055
read/write heads, 596
Receiver, 861, 864-866
Record, 4-5, 9-10, 16-19, 21-22, 29-31, 33, 36, 49-50,
56, 75, 80, 109, 132-133, 220, 244, 248,
256, 278, 282-283, 312, 317, 333, 362, 378,
380, 461-463, 472-473, 480, 500-503, 545,
598-617, 619-622, 629-630, 632-634,
636-638, 640-642, 644-646, 648-649,
656-661, 670-671, 673-674, 676-678, 682,
690-695, 698-703, 717-722, 737, 756-758,
782-783, 798-801, 840-841, 911, 940, 947,
950-951, 960, 1006-1007, 1022
recording, 594, 756, 812, 830
Recoverable schedule, 762
Recovery, 14, 18, 20, 24, 39-40, 45, 106, 329, 331,
360, 624, 740, 747-748, 750-751, 754-763,
776, 779, 785, 794, 807-808, 810-835, 840,
868-869, 878, 883, 893, 907-912, 921-922,
925
Recovery manager, 750, 755, 815, 826, 828-829, 881,
907-908
recursion, 171, 968, 980, 989
Redundant disk, 624
Reference, 10, 33, 69, 73, 85, 105, 119, 134, 136,
138-140, 216, 226, 276-277, 291-294, 355,
364, 366-367, 370, 374, 377, 391, 398-399,
401, 404, 406-407, 409, 422, 459, 779, 966,
989
References, 69-70, 76, 85-86, 90, 97, 105, 109,
119-120, 143, 145-146, 160, 178, 214, 270,
292-294, 364, 367-368, 390-391, 399-402,
405, 411, 413, 416, 422, 430, 635, 675, 721,
839, 842, 845, 876, 912, 915, 919, 960,
988-989, 1024, 1062
Reflection, 3
Register, 11, 337-338, 341-342, 396-397, 476, 865
regression, 1046
regular expressions, 1007, 1011
Relation, 9, 55-67, 69-77, 80, 84-85, 89-90, 92-93,
95-97, 102-106, 118-120, 126, 128, 137-139,
148-150, 152-156, 158, 160-167, 169-174,
176, 178-179, 181-182, 184-187, 190,
193-195, 234, 269-270, 289-301, 354, 361,
400, 508-511, 513-548, 550-555, 558-562,
564-570, 572-579, 581-586, 668-671, 676,
684, 687, 693, 701-706, 708, 711-715, 720,
722-726, 738, 843-851, 873-875, 880-881,
894-896, 898-900, 902-904, 921, 923, 925,
945-948, 950, 961, 968-969, 979-980, 983,
985-986, 993, 1019, 1060-1065
Relation schema, 58, 60, 62-65, 69, 75-76, 297, 508,
510-511, 521-528, 530-531, 533, 535-538,
540-546, 551-552, 559-562, 565-568,
573-575, 577-579, 581-582, 584, 849, 860
Relational algebra, 55, 71, 75, 81, 82-83, 92-93, 97,
99, 148-200, 296, 301, 416, 686-687, 690,
704-708, 711, 714, 728, 730, 895, 902-904,
978-979, 987, 989, 996
Relational calculus, 55, 72, 83, 148-200, 301, 506,
706, 708, 968, 971-972, 1058-1059
Relational database, 19, 22, 35, 37, 44, 55-81, 82-85,
107-111, 114, 143-144, 148-149, 160,
191-194, 197, 199-200, 204, 287-308, 309,
353, 368, 381, 399, 412, 415, 417, 421,
435-436, 448-450, 452-453, 485-486, 489,
503, 508-512, 520-521, 524, 528, 537-538,
542, 545, 550-587, 676, 690, 731-732, 824,
835, 874, 927-928, 968-969, 976, 979, 981,
988, 990, 994, 1065
Relational database model, 204, 849
Relational database schema, 37, 56, 66-68, 70-72,
76-78, 84-85, 107-111, 114, 143-144,
191-192, 199-200, 287-289, 293, 301, 305,
441, 448-450, 452-453, 485-486, 489, 507,
508, 511-512, 520, 545, 558-559, 570,
731-732, 874, 876, 927-928, 990
Relational database system, 83
Relational databases, 11, 21-22, 29, 48, 56, 66, 82,
106, 156, 216, 335, 357-419, 427, 435-436,
447, 457, 508-549, 559-562, 564, 567-568,
573, 577-578, 584-585, 622, 746, 894, 924,
945, 981-982, 989
relational expressions, 109, 191
Relational model, 28, 33, 41, 51, 55-57, 61-64, 67, 71,
75-76, 81, 84, 92, 101, 148-149, 153, 177,
188, 197, 285, 287, 294-296, 347, 352,
360-363, 365, 368, 374, 377, 399-400, 412,
414, 457, 526, 851, 872, 979, 982, 996
Relational operators, 71, 188, 190, 195, 714, 728,
981-982
Relations, 9, 28, 56-57, 59, 61-64, 66-67, 69-76,
78-79, 81, 83-85, 91-92, 95-99, 102,
104-107, 109, 128-130, 133, 135-137,
148-149, 154-158, 160-163, 165-167,
171-174, 178, 181-182, 185, 187-190,
193-194, 197, 287, 289-296, 298, 350,
361-362, 380, 459, 508-510, 513-516,
518-520, 523-534, 537-543, 545-547,
558-562, 564, 566-574, 576-579, 581-584,
586, 622, 668, 674, 676, 694, 700, 705-708,
711-714, 723-725, 843-846, 848, 851, 886,
894-895, 898, 901-903, 905, 920, 923-924,
936, 945-950, 958-959, 968, 976, 978-983,
986, 1060
Relationship, 7, 19, 24, 29, 46, 49, 56, 62, 67-68,
70-71, 81, 86, 91, 110-111, 143-145, 183,
197, 199-200, 201-245, 246-286, 287,
289-295, 301-305, 312, 317, 320-325,
336-337, 346, 349, 356, 387, 390-392,
396-402, 404-407, 409, 412, 414-415, 444,
449-450, 452, 508-509, 513-514, 520, 522,
539, 566, 576-578, 622, 731-732, 795, 872,
915, 927-928, 990, 1016-1017, 1050-1052
Relationship set, 214-216, 218, 223, 230, 234, 236,
391, 396-397
Relationships, 5, 18, 21, 23, 26, 28-30, 32, 50, 62, 69,
160, 202, 204-205, 213-220, 222-223,
228-236, 241, 243, 245, 246-249, 256-257,
259-260, 264, 268-269, 271-274, 276, 285,
287, 291-293, 295, 311-312, 320, 322,
324-325, 330, 335-336, 338, 342, 345-348,
355, 360, 376, 386-387, 390-392, 394-395,
398-403, 406-407, 412-413, 421-422, 446,
540, 543, 577, 621-622, 925, 956-957, 967,
970, 1009-1011, 1026, 1052
release, 244, 787-788, 869-870, 908-909
remote computers, 312
removing, 33, 65, 388, 533, 535, 569, 833, 844, 1065
Renaming, 95, 122-123, 154-155, 162, 169-170, 175,
190, 738, 1065
Repeatable read, 774-775
Replacement policies, 751
Replica, 442, 446, 905
Replication, 48, 575, 878, 880, 883, 889, 894,
896-898, 901, 905, 913, 918-921, 924,
1044-1045
reporting, 7, 173, 313, 387, 1037
REQUIRED, 20-21, 37, 45, 67, 82, 89, 93, 96, 98, 107,
124, 141, 169, 177, 190, 234, 283, 291, 293,
310, 328, 330-331, 361-362, 400, 429-430,
516, 596-598, 608, 629, 638, 666, 677-678,
699, 715, 736, 812-813, 815-816, 872, 882,
904-905, 942, 1054-1056
requirements engineering, 355
resetting, 757
Resilience, 915
response time, 13, 312, 314, 316, 332, 625, 716, 739,
747
RESTRICT, 19, 89, 91, 128, 138-139, 150, 437-438,
652, 813, 913, 977
Result table, 715, 1062
retrieving, 92, 169, 189, 409, 440, 459, 462, 468, 485,
505, 606, 614, 622, 684, 693, 736, 749, 913,
948, 980, 992-993, 995-996, 998
Return type, 468, 481
reverse engineering, 301, 344
Reviews, 202, 244, 838, 989
Revoking privileges, 839, 843, 845, 848, 872, 875
Right child, 723
Risk, 344, 875, 1015
Rivest, Ron, 866
ROLAP, 1046, 1048
Role, 2, 11, 25, 35, 46, 58-59, 67, 69, 171, 190,
214-215, 217, 222, 227-229, 234, 236,
242-244, 248, 250, 259, 337, 355, 435, 836,
852-853, 855-856, 872-873, 875-876, 907,
922, 960, 1031
Role-based access control (RBAC), 852, 872
Roles, 45, 51, 59, 67, 202, 214, 217, 224, 235, 838,
852-853, 872
RBAC, 852-853, 872
Rollback, 756, 762-763, 774-775, 793, 795, 814,
816-818, 820, 830-832, 834-835, 948
Roll-up, 1039-1040, 1045
Root, 166, 257, 266, 370, 428, 434, 437-438, 442-446,
500, 651-652, 655-657, 659-662, 665, 703,
706, 714, 725-726, 800-803, 839, 960
Root node, 166, 437, 651-652, 655-657, 660-662, 665,
706, 802-803, 960
Rotation, 398, 593, 596, 943, 958, 966, 1045, 1054
Rotational delay, 596, 598, 625, 629, 631-632,
1054-1056
Rotational latency, 634
Round, 987
Routers, 42, 628
Routing, 25, 283, 590
Row offset, 726
rows, 52, 60, 85, 117, 125, 152, 189, 377, 463-464,
469, 503-504, 562, 668-672, 674, 725-726,
742, 775, 844, 854, 857, 870-872, 880-881
RSA, 866, 874
R-tree, 960
Rule, 19, 91, 117, 119, 129, 132-133, 179-180, 185,
257, 328, 372, 427, 495, 553-554, 557-558,
575, 581, 712-714, 723, 726, 728, 777,
784-785, 801, 849, 930-932, 934-941,
961-962, 968-974, 976-981, 983-984,
986-988, 1004, 1025, 1061
Rules, 19-20, 46, 63-64, 69, 71, 75, 83, 119, 179-180,
183, 185, 189, 194, 222, 255, 264, 268, 270,
350-351, 414, 430, 459, 497, 551-555,
574-575, 579, 582-583, 586, 653, 684, 686,
705-706, 711-715, 723, 728, 736, 782-783,
785-787, 789-790, 795, 800-801, 859, 862,
872-873, 929-934, 936-941, 968-979,
981-987, 989, 1006-1007, 1025-1026, 1045
Run-length encoding, 675
Runtime errors, 484
S
Safe rule, 984
safety, 178, 963, 976-977, 981-982
sampling, 730, 1020
SAP, 886
SAX, 428
scalability, 628, 914-915, 1045
Scaling, 958, 964, 966
Scanner, 684, 705
Scenarios, 203, 318, 336-337, 355
Scene, 12, 14-15, 25, 966
Schedule, 38, 40, 340, 342-343, 759-773, 776-778,
785-789, 791-792, 794, 801-802, 807, 809,
816, 819, 822, 832, 925
Scheduling, 332, 625, 740
response time, 332, 625
Schema, 27-28, 30-35, 37-38, 49-53, 56, 58-60,
62-72, 75-80, 83-87, 89, 91-92, 105-111,
113-114, 115-147, 167, 190-196, 199-200,
214, 218, 223-224, 226-228, 230, 233,
235-243, 252, 256, 258-259, 261-265,
268-271, 273, 278-280, 286, 287-291,
293-295, 301-303, 305, 319-325, 327-329,
335-336, 344-345, 347-349, 351-353,
355-356, 380, 394-396, 398-399, 401-403,
406-407, 411, 413-415, 419, 421-422,
425-428, 433-437, 440-453, 459, 465,
488-489, 506-507, 508, 510-514, 516,
520-531, 533, 535-538, 540-546, 549,
550-552, 558-562, 564-570, 573-575,
577-579, 581-585, 587, 706, 727, 729,
731-732, 759, 853-854, 860, 874, 884-885,
889-891, 896-897, 905-906, 925-928, 945,
984-985, 990, 994, 996, 1022, 1047-1048
Science, 2, 5-6, 26, 30, 54, 57, 108, 112, 147, 246,
404-409, 487, 996, 998, 1047
Screens, 331
Script, 169, 424, 491, 500
scripting, 454, 490-491, 505
scripts, 1025
scrolling, 1061
search engines, 37, 415, 994-996, 998-999, 1003,
1007-1010, 1017, 1019, 1026-1027,
1029-1030, 1032
Search keys, 668, 676
Search query, 994, 998, 1000
1078
Search tree, 652-654
searching, 23-24, 333, 438, 500, 597, 607-608, 615,
619, 637, 648-649, 651, 668, 673, 675-676,
678, 727, 801, 979, 992-995, 997, 999-1000,
1009, 1017, 1028
Searching the Web, 1017, 1019
Second normal form, 524, 530, 533, 543, 552
Secondary index, 637, 641, 644-647, 649, 673-674,
677-678, 719-722, 727, 741
Secondary memory, 959
Secret key, 864-865
sectors, 594
Security, 4, 13-14, 17, 24, 46-47, 57-58, 67, 79-80,
83-84, 106, 122, 135, 165, 205, 211,
236-237, 241, 279, 330-331, 352, 416,
461-462, 468, 472-473, 513, 517, 544, 565,
613, 665, 836-876, 883, 919, 925, 931, 951,
1027
authenticity, 841-842
availability, 13, 330-331, 837, 841, 868, 871
cryptography, 875
e-commerce and, 855
encryption and, 47, 836, 863-865, 873
failure, 840, 883
network, 46-47, 416, 839, 856-857, 883, 919, 925
threats, 836-838, 856, 863, 868, 871, 875
Security threats, 836
Seek time, 595-596, 598, 608, 629, 631-632, 634,
1054-1056
Segmentation, 964-966, 1026
Segments, 459, 467, 492, 603, 964, 966, 1028
SELECT, 37, 74, 92-93, 95-102, 104, 106, 117-126,
128-131, 133-136, 140, 142, 149-155, 160,
164, 178, 181-183, 188-189, 350, 372, 380,
403-410, 462-463, 468-470, 472, 475,
477-478, 483, 504, 517, 543, 665-666, 672,
686-688, 690-693, 703-706, 708-709,
713-716, 718, 724, 727-729, 742-745, 791,
813, 844-848, 857-859, 923, 933, 942, 957,
978-979, 981, 984, 1063-1064
Selection, 93, 95, 97, 140, 150-152, 158, 160, 164,
167, 181-182, 187, 194, 244, 354, 360,
439-440, 606, 678, 690-693, 698-699, 708,
711-713, 717-722, 724-725, 734-736, 742,
806, 860-861, 905, 924, 952-954, 956, 979,
1026, 1048, 1060
Selections, 520, 736, 954
selector, 431-432, 435
Semantic analysis, 1005
semantic data models, 246-247, 268-269, 271, 285,
356
Semantic Web, 273, 285, 441
Semantics, 19, 81, 130, 143, 232, 319, 411, 428,
510-511, 513, 521-523, 528, 543, 550, 559,
579, 868, 914-915, 925, 935, 937-939, 945,
948, 967-968, 988, 996, 1022
Semaphores, 909
Semijoin, 901, 904, 907, 920, 922, 925
Sensors, 868, 940
Sentinel, 988
Sequence, 82, 85, 133, 151-152, 154, 158, 160, 164,
177, 187-188, 203, 316, 328, 335-339,
341-342, 344, 352, 431-435, 437-440,
443-445, 456-458, 503, 602, 621, 662-664,
704-705, 711, 715, 763-765, 768, 773, 825,
833, 840, 860, 870, 966, 985
Sequence:, 663-664
Sequence numbers, 831, 833
sequence structure, 434
Sequencing, 329
Sequential access, 464, 597, 636, 650
Sequential file, 592, 607-608, 610, 649, 668, 676
Sequential files, 40, 679
Serializability, 748, 755-756, 760, 762-765, 767-774,
776-779, 780-781, 785, 787, 791-792,
794-795, 797, 805-807, 809, 886, 911
Serializable schedule, 766-767, 769, 776-778, 785,
801-802
server, 27-28, 40-48, 51-52, 55, 106, 311-312, 329,
358, 458, 460, 471, 473, 480-481, 491-494,
498-502, 505, 590, 627, 858, 869, 871,
884-885, 887, 892-894, 915-922, 926-927,
955, 1037, 1046
servers, 3, 22, 27, 40, 42-47, 312, 458, 460, 593,
595-596, 627-628, 859, 884, 886, 893, 919,
1027
compatibility, 45
web, 22, 27, 43, 45-46, 312, 628, 859, 1027
services, 40, 44, 46, 48, 311, 331, 627, 856, 862, 866,
882, 914-917, 919, 963, 999, 1017-1018,
1035
classification of, 48
utility, 40
sessions, 852
Set difference, 99, 121, 149, 156, 158, 176, 189,
696-697, 702-703, 705, 728-729, 981
Set intersection, 99, 176
Set theory, 55, 59, 81, 99, 149, 155
Setup, 626
Shamir, Adi, 866
Shared lock, 803
Sibling, 438, 659, 662, 664-665
Signals, 756
Signature, 10, 360, 365-366, 379, 391, 414, 855,
866-867
Simple Object Access Protocol (SOAP), 441
Simplicity, 45, 55, 320, 628, 734, 843, 848
simulation, 15, 333, 614, 656, 665, 912
Singapore, 245, 987
Single inheritance, 252, 257-258
Single precision, 1016
single-line comments, 492
site structure, 1025
slots, 598, 611
SMART, 1032
Snapshot, 31, 918, 945, 950
Snapshots, 918
SOAP, 441
social networking, 1018
Social Security number, 67, 79, 205, 211, 236-237,
262, 279, 461-462, 468, 472-473, 477, 511,
513, 517, 544, 565, 613, 665, 931, 951
Sockets, 855
software, 2-5, 7-9, 11-15, 17-18, 20, 23-26, 27-29, 31,
34, 38, 40-45, 47-48, 51-52, 56, 228, 246,
259, 266, 282, 310-311, 313-314, 329-332,
334-336, 348-349, 352, 355, 413, 436, 456,
501-502, 598, 622, 627, 629, 674, 882-884,
886, 907, 909, 914, 917-918, 920-921, 937,
1023, 1046
malicious, 4
system components, 44
software and, 8, 13-14, 40, 43, 334, 627
software developers, 14, 334
Software engineering, 26, 29, 41, 56, 201, 259, 266,
314, 332, 334-335, 349
Solution, 232, 372, 528, 623-624, 631, 716, 738,
789-791, 804, 855, 879, 904, 962
Sorting, 41, 153, 497, 605, 607-608, 635, 673,
687-689, 694, 699, 702, 704, 716, 722, 730,
734, 745, 768, 1045
sound, 282, 554, 575, 579, 967
speakers, 967
Source, 2, 40-41, 48, 51, 55, 189, 336, 356, 425, 456,
465, 471, 476, 483-484, 490-491, 505,
871-872, 891, 917, 920, 963-965, 1010,
1017, 1020, 1022, 1037, 1043, 1047
source code, 336, 465, 471, 483-484
Source program, 456
Spaces, 129, 601
spam, 1027
filtering, 1027
Spanned blocking, 634
Sparse index, 638, 741
Specifications, 7, 14, 35, 124, 130, 187, 204, 313, 319,
329, 333, 342, 344, 374, 386, 389, 391, 394,
400, 434-435, 441, 595, 917
Speed, 17, 34, 248, 251-253, 297, 306, 588-590, 596,
623, 627, 631, 636, 641, 654, 698, 734, 868,
960
Spindle, 593
Spiral, 594
spreads, 322
spreadsheets, 1046
SQL:, 115-147, 374, 377, 729, 846
Stack, 660-661, 665
stakeholders, 318
Standard deviation, 860
standards, 20, 41, 53, 82-84, 143, 348, 358-359, 437,
440, 455, 774, 854-855, 863-864, 893, 935,
941, 964, 997, 1032
Star schema, 1040-1042, 1048
Starvation, 781, 788, 791, 793, 806-807
State, 7, 18, 30-31, 50-51, 58-60, 62-66, 68-69, 71,
75-76, 78, 80, 92, 94, 109, 111, 114, 174,
190, 193, 195, 199, 209-212, 237-239, 241,
275-276, 280-281, 301-303, 305, 311, 318,
328, 339-341, 359-360, 365, 381-382,
389-390, 392, 396, 433, 512, 516-517, 519,
525-527, 538-542, 544, 546-547, 551, 553,
555, 560-562, 574-576, 578, 633-634, 670,
732, 755-759, 766-767, 777-778, 795, 801,
824, 908, 923, 944-948, 950-951, 983,
985-986, 990
Statecharts, 339
Statement, 36, 83-85, 87, 89-93, 97, 103, 106, 115,
131-132, 135, 139-140, 189, 367, 459-461,
466, 468, 472-475, 478-480, 482-484, 503,
671-672, 727, 774, 778, 844, 854, 857-859,
870, 916, 934-935, 937-942, 954, 981,
983-984, 987
Statement-level trigger, 935, 941
States, 31, 57, 62-63, 66, 69-70, 72, 186, 189, 207,
219, 339-341, 366, 511, 515, 521-522, 542,
581, 755-756, 761, 772, 776-777, 781, 801,
807, 851, 863-864, 955, 958, 962, 968-970,
982
exit, 341, 777
transition, 339-341, 756
waiting, 783, 807
Statistics, 40-41, 312, 333, 724, 730, 735, 739-740,
745, 838-839, 956, 997, 1000, 1012,
1024-1025
Stemming, 1000, 1009, 1012, 1029, 1031-1032
Steps, 130, 162, 288-290, 293, 296, 336, 341, 353,
402, 415, 465, 473, 477, 528, 532, 623,
684-685, 688, 705, 709, 713-714, 750, 865,
869, 924, 999, 1010-1012, 1020
Storage devices, 20, 588-589, 591-592, 597, 627-629,
635
backup, 591, 597, 829
removable, 589
storage management, 627, 824
storing, 3, 7-8, 15, 46, 48, 280, 357-358, 399, 430,
435-436, 447, 514, 525, 588, 593, 600, 613,
623, 626, 629, 700-701, 716, 737-738, 957,
993, 1000, 1065
Storing data, 8, 600
Streaming, 428
Strict schedule, 763, 788, 822
String, 5, 19, 23, 57-58, 87-89, 100-101, 211, 324,
366, 382, 385, 391, 393, 396-397, 403-406,
408, 410-411, 432-433, 443-445, 456,
460-462, 465, 467, 469-470, 472-474,
477-480, 493-495, 497, 499-502, 600,
612-613, 617, 857-859, 866, 952, 1024
default value, 89
String class, 396
string comparisons, 89, 743
String data, 87, 101, 429, 505
String data type, 87, 429
String data types, 87, 101, 505
String functions, 495
strings, 37, 57, 64, 87-89, 100, 151, 362-363, 382,
411, 484, 493-496, 503, 505, 600, 612, 859,
969, 999
concatenation of, 666
escape characters, 859
length of, 87-88
Stripe, 626
Striping, 623-626, 635
Strongly typed, 372
struct, 8, 362-363, 374, 382, 385, 390-391, 396, 398,
401-402, 405-406, 408-410, 600
Structure, 3-4, 8-11, 17, 20-21, 28-32, 41, 48, 50, 56,
75, 92, 140, 148, 166, 210, 273, 312,
315-316, 318-319, 335-336, 357-360, 362,
365, 373-374, 381-382, 390, 400-401,
403-405, 410, 412, 420-422, 441-442,
445-447, 461, 465, 467, 482-483, 592, 600,
616-618, 620, 622, 630, 633, 651-655, 660,
665, 667-668, 673-676, 682-683, 686,
705-707, 721, 723, 802, 975, 998,
1011-1013, 1018-1020, 1024-1025, 1033,
1035-1036
decision, 319, 401, 1035-1036
structures:, 410
styles, 424, 1039-1040
Stylesheet, 420, 441
Subclass, 246-262, 264-266, 269-270, 274-275, 285,
296-299, 307, 401-402, 551, 578, 582, 1010
submit, 244, 421, 490, 492-493, 498, 763
Subscript, 625, 760
Subtraction (-), 101
Subtrees, 438, 651, 653, 714
Subtype, 247-248, 369-372, 379, 387, 389, 392, 414
Sum, 124-125, 130, 134, 137, 142, 169, 171, 407,
1079
517, 570, 596, 624, 668, 687, 726, 753, 804,
939-940, 942, 955, 984-985, 1003,
1020-1021, 1062
Sun Microsystems, 471, 476
superclasses, 247, 252, 257, 259-261, 265-266,
299-301, 322, 402
Superkey, 65, 76, 153, 525-526, 535-537, 541-542,
558, 574-575
Supertype, 248, 369-370, 372, 379, 387, 389, 392
Support, 1, 8, 11, 19, 23-25, 28, 31, 33, 36, 47-49, 51,
70, 103, 189, 263-264, 271, 275, 311, 329,
343, 347, 352-353, 394, 412, 628, 634,
674-675, 736-737, 774, 779, 799, 839, 853,
890, 914-915, 920-922, 924, 951, 955, 962,
965, 989, 1006-1008, 1034-1037,
1041-1042, 1044-1049, 1063
SUPREME, 866
Surrogate key, 232, 300-301, 947
Surveillance, 964
Switches, 497, 627-628
hubs and, 627
Sybase, 42, 55, 311-312, 328, 350, 352, 501, 741, 876
Symbols, 140, 180, 234, 429, 435, 503, 562-563, 584,
657, 866, 969, 971, 1002-1003, 1050-1052,
1056
Synchronization, 329, 347, 862, 914
Syntactic analysis, 1005
syntax, 36, 38, 83, 107, 109, 132, 140-141, 171, 178,
188-189, 367, 377-378, 394, 402-404,
410-411, 413, 428, 483-484, 684, 759,
855-856, 916, 931-932, 935, 942, 971, 981,
984, 1021
details, 36, 38
Syntax errors, 483
Syntax rules, 430, 684
system architectures, 42, 626, 887-888, 920
System bus, 43
System Catalog, 39-40, 333
system clock, 792
system development, 355
System documentation, 312
system error, 754
system failures, 761
SYSTEM GENERATED, 375-378
system log, 755-757, 763, 776-777, 811, 815-817,
821, 829-831, 840
System R, 83, 106, 716, 730, 835, 875, 926
System response time, 13
system software, 14, 27, 38, 52, 622, 629
T
Table:, 124, 139, 380, 949
, 424
Table:
base table, 139
Table scan, 672, 703, 724-726
tables, 19, 26, 48, 75-76, 83-85, 98, 102, 105-107,
115, 117, 119, 123-124, 129-130, 133-141,
149, 160, 172-173, 189, 193-194, 291-292,
319, 336, 346, 351, 376, 457, 460, 503,
505-506, 647, 674, 713, 716, 737-739,
741-744, 756, 824-828, 844, 869-870, 908,
916, 940-941, 954-955, 1040-1042, 1065
attributes of, 119, 123-124, 134, 160, 173, 193,
289, 291-292, 460, 742, 844, 950
Master, 742, 918, 941
Super, 85, 117, 123-124, 172, 292, 1065
Tag, 423-428, 434-435, 437-438, 491, 493, 497, 500
Tags, 49, 422-425, 427-428, 433-435, 438, 448, 491,
493, 500, 967
tape drives, 43, 591, 598, 628
Tapes, 589-592, 597-598, 630, 755, 829
Task, 3, 49, 321, 329, 337, 349-350, 483, 673, 684,
755, 845, 853, 880, 915, 963, 994-995, 997,
1008, 1011, 1018, 1022, 1047-1048
TCP/IP, 629
, 424, 496
Technology, 1-2, 7, 24-25, 47, 246, 259, 268, 310, 312,
348, 351-352, 413, 441, 589-590, 593, 595,
598, 622-623, 626, 628-631, 635, 842-843,
852, 854, 863, 869-870, 876, 997-998, 1028
Temperature, 940, 958
Tertiary storage, 589-591
Testing, 64, 201, 313-314, 333, 465, 530, 533, 543,
564, 586, 767-768, 770-771, 773, 776, 863,
881
automated, 314
Tests, 509-510, 523-524, 532, 552, 1065
text, 2-3, 24, 37, 40, 43, 79, 87, 239, 241, 266, 331,
346, 420, 423-427, 447, 465, 471, 483,
490-495, 498-499, 522, 600, 839, 855, 866,
878, 886, 916, 963-964, 987, 992-994,
996-999, 1006-1008, 1010-1012, 1018-1019,
1023, 1025, 1031-1032
Text:, 494
text
alternative, 430
text editor, 495, 914
Text file, 491
Text files, 40
text processing, 24, 494, 1008
Thesaurus, 273, 1000, 1006-1007, 1009-1012,
1031-1032
this object, 338, 913
Threads, 336
Threats, 836-838, 856, 863, 868, 871, 875
Three-tier architecture, 45, 51, 491, 505
Three-tier client/server architecture, 46
Three-valued logic, 81, 88, 116, 517
Threshold, 621, 861, 941
Throughput, 333, 598, 739-740
average, 333
Time, 1, 4, 10-11, 13, 19-21, 23-24, 30-31, 33, 36, 47,
63-65, 82-83, 87-89, 101, 104, 114, 125,
134-136, 164, 191, 210-212, 218, 227,
238-239, 241, 256-257, 263-265, 267, 270,
275-276, 283, 302, 311-312, 330-332,
336-338, 343, 381-383, 387, 411-412, 418,
425, 462-463, 465, 496-497, 551, 558, 578,
586-587, 590-592, 595-600, 604-605, 615,
617, 619, 623-625, 629-634, 640, 654,
688-689, 691, 698, 700-702, 715-718, 723,
726-728, 735, 747-750, 752-755, 758-759,
762-765, 770, 773-774, 785-786, 789-792,
824-825, 827-828, 869, 908-909, 912,
924-925, 941-955, 963-964, 967, 973,
982-983, 985-986, 988, 991, 1024-1025,
1035-1037, 1047, 1054-1056
Time:, 1056
Timeout, 791, 806
timeouts, 788, 791, 805
Timestamp, 64, 88-89, 101, 125, 382-384, 389, 411,
772, 780, 791-795, 797, 803-808, 866,
943-944, 947-950
Timestamps, 772, 780, 789-792, 795, 797, 799, 803,
805-806, 925
Timing, 734-735, 862-863
title, 3, 79, 108, 191-193, 244, 262, 276-277, 283, 327,
424, 453, 485, 489, 507, 547, 738, 985
Tokens, 684, 1007, 1011
tools, 7, 15, 25, 52-53, 56, 70, 201, 204, 232, 272,
309-310, 314, 316, 319, 321, 325, 328-329,
331-332, 334, 343-344, 348-354, 356, 441,
855, 868, 938, 1018-1019, 1026, 1034-1036,
1052
Arrow, 1052
Line, 1050, 1052
Oval, 1050
Rectangle, 1050
Top-down design, 322, 509
Topologies, 627, 879
, 423-424, 497
Tracing, 757
Track, 7, 15, 79, 138, 191, 201, 204-205, 221, 233,
236-239, 256, 262, 275-279, 344, 472, 476,
544, 593-596, 607, 615, 629-631, 649, 717,
755, 776, 782-783, 812, 840, 845, 898, 955,
980, 982, 1054-1056
Trademark, 471, 476
Traffic, 912-913, 958
Training set, 1021
Transaction, 4, 8, 11-12, 18, 20, 25, 39-40, 45, 48, 56,
75-76, 83-84, 203-204, 283, 310-312, 315,
318-319, 328-329, 332-333, 355-356, 464,
610, 739-740, 742, 746, 747-779, 780-785,
787-801, 803-805, 807-809, 810-835, 840,
863, 877-878, 890-894, 907-912, 917, 920,
922, 926, 931, 936-937, 940, 944-945,
947-952, 954, 962, 982-983, 1042
Transaction file, 610
Transaction manager, 800, 825, 828, 890, 907-908
transaction processing systems, 310, 312, 747-748
Transfer time, 596-597, 629, 631-632, 634, 1055-1056
transferring, 41, 597-598, 605, 716, 902, 904,
1054-1055
Transitive rule, 553, 558, 575
Translator, 466
Transmission, 628-629, 861, 863
transparency, 879-882, 886, 889, 894, 905, 913, 916,
920-921, 1037
Traversal, 800
Traverse, 364, 403, 428, 658, 800
tree structure, 256, 427-430, 438, 442, 446-447, 652,
657
Trees, 166, 183, 189, 446, 588, 592, 622, 651-652,
656-657, 660, 665, 674-675, 679, 706-708,
714, 723-724, 728-730, 736, 741, 807, 924,
960, 964-965, 982-983, 988
implementations of, 657, 982
Trigger, 19, 70, 115, 131-133, 139, 621, 931, 933-938,
941-942
trimming, 1009
trust, 843, 867-868
Truth value, 179, 184, 974
Tuning, 18, 38, 313, 315-317, 333-334, 350, 356, 589,
733-746
Tuple, 57-58, 60-66, 69-70, 72-77, 80, 83-84, 89,
91-93, 95-99, 102-103, 105, 116-124, 126,
130, 132, 136-139, 149-153, 158, 160-162,
164-167, 170-174, 177-181, 184-190, 194,
197-198, 270, 290, 292-293, 295, 297-298,
300, 361-364, 366, 373-374, 378, 390,
401-405, 461-462, 468, 470-472, 479-480,
483, 513-517, 525-529, 538-539, 546-548,
562, 570, 572, 577, 581, 583-584, 666,
696-697, 702, 849-851, 904-905, 945-951,
953-954, 961, 968, 982-983, 985, 1040
Tuple variable, 93, 119, 178-181, 184, 190, 194, 687,
941
Two-dimensional array, 479
two-dimensional arrays, 495
Two-phase locking, 772, 780-781, 785-787, 789, 795,
801, 805-808, 822, 925
type attribute, 220, 264, 296-300, 434
Type compatibility, 156
Type constructor, 362-363, 374
U
UDT, 374-375, 377-379
UML diagrams, 269, 309-356
Unary operator, 155
Unauthorized disclosure, 837
UNDER, 3, 22, 24, 77, 122, 233, 235-236, 239, 244,
296, 298, 302, 331-333, 339, 349, 374-376,
379-380, 434, 438, 444, 447, 471, 491, 502,
513, 596-597, 624, 634, 665, 728-729, 746,
773, 785-786, 796-797, 811-812, 846,
853-854, 870, 916, 918, 922, 924, 937,
966-967, 974-975, 977, 980-981, 1054,
1061-1064
Underflow, 662, 664-665
underscore character (_), 494, 1058
Unified Modeling Language (UML), 202, 226
Union:, 156
UNIQUE, 19, 36, 50, 52, 57, 66, 70, 77, 86, 90-91,
120, 122, 131, 145-146, 237, 241, 243-244,
270, 277-278, 282-284, 346, 367, 377-378,
381-382, 386, 389, 391-392, 406-407, 496,
503, 528, 547, 581, 583, 637-638, 653, 655,
674-675, 725, 727, 734-737, 741, 750, 780,
792, 907, 925
United States, 57, 205, 207, 521-522, 837, 864, 886,
962
Universal access, 1017
University of California, 1028
UNIX, 491, 628
UNKNOWN, 58, 61, 75, 97, 116-117, 208, 517, 861
unsigned short, 382-384, 393, 411
Unspanned blocking, 632, 634
Update, 3, 11, 14-15, 17-20, 26, 31, 34, 36, 40, 50, 56,
70-77, 90-91, 102, 104-109, 132-138, 142,
146, 293, 328, 333, 342, 346-347, 365, 401,
463-465, 513-514, 524, 543, 545, 549, 576,
606, 653, 665, 734-737, 751-753, 759, 762,
773-775, 800-801, 810-812, 814-815,
818-822, 827-828, 830-833, 847, 875, 905,
913, 922, 926, 931-942, 946-950, 959, 961,
981, 983, 985, 1058, 1064-1065
Update command:, 465
updating, 3, 7, 11, 13, 19, 42, 74, 104, 132-133, 136,
316, 455, 465, 493, 734-735, 737, 752, 775,
815-816, 823, 829-831, 911-912, 949-951,
959, 1043-1044
upgrades, 628, 788
USAGE, 23, 41, 312, 334, 716, 727, 739, 881, 942,
988, 1009, 1018, 1024-1027, 1029, 1031,
1033, 1042, 1044
1080
USB (Universal Serial Bus), 590
Use case, 335-338, 341, 352
use cases, 336-337, 341-342, 344
Use of information, 842
User:, 851
User authentication, 856, 919
User interface, 42, 44-46, 329, 338, 892-893, 1060
user profiles, 855
user requirements, 239
User-defined, 28, 106, 253-254, 265, 274, 347, 370,
374, 378, 382, 386, 389-390, 392, 413-414,
945
User-defined functions, 378
users, 1-26, 28-29, 32-42, 45-48, 51-52, 82, 84, 106,
123, 130-131, 143, 201-202, 204, 212-213,
222, 237, 253, 259, 273, 309-318, 320-322,
334, 336-337, 353, 358, 367, 380, 392, 413,
425, 517, 525, 592, 606, 627, 673, 747-749,
763, 829, 836-844, 848-852, 854-858,
862-863, 869, 874, 881-884, 913-914, 919,
926, 958, 963, 982, 993-995, 997-998,
1006-1007, 1014, 1017-1019, 1023-1030,
1045, 1048
UTF, 431
UTF-8, 431
V
Valid values, 74, 502
Validation, 313-314, 337, 484, 772, 780, 794, 797-798,
805-806, 859, 1028
Value, 10, 15, 19, 31, 57-58, 60-66, 69-70, 72-75,
87-89, 95, 98, 100-101, 103, 105, 107,
116-120, 123, 125-128, 160, 173-174,
178-179, 184, 186, 195, 206-209, 211-212,
216, 220, 230, 235-237, 253, 294-295, 351,
359, 361-364, 368-369, 377-378, 380-382,
384-386, 388-392, 400, 403, 406, 410, 430,
438-440, 474, 482, 484, 492-500, 516-517,
521-522, 540, 543, 581, 583, 592-593,
599-604, 606-622, 629-630, 632-634,
637-649, 651-660, 662, 665-669, 671-675,
703-704, 721-722, 724-726, 750, 760-763,
772-775, 780-781, 792-796, 804, 813-814,
816-822, 833-834, 841, 849-851, 914, 951,
964-965, 968, 970, 974, 977, 1003,
1023-1025, 1054, 1058-1060
initial, 31, 87, 212, 216, 351, 377, 492, 619,
765-767, 854
truth, 116, 179, 184, 974
Values, 5, 16-17, 19, 52, 55-66, 69, 72-76, 85, 87-90,
92-93, 95, 100-105, 116-119, 122, 124-126,
130, 137, 146, 150-151, 153, 158, 169,
172-176, 178-180, 182, 185-187, 196,
206-208, 221, 234, 237, 270, 292-293, 295,
344, 360-363, 366, 369-370, 373, 380, 386,
389-390, 400-401, 404-405, 413, 421-422,
426-427, 437, 447, 461-462, 468-469,
473-474, 481-482, 492-497, 499-503,
514-517, 520-522, 526-528, 536, 538, 540,
545, 562, 599-602, 604-608, 612-615,
617-618, 632-633, 652-659, 662, 666-671,
675, 677-679, 682-683, 690-694, 702-703,
716-720, 726-727, 739, 757-759, 775-776,
780-781, 838, 849-851, 860-861, 950-951,
963-964, 968-974, 976-978, 1003-1004,
1059-1063
undefined, 61, 116
Variable, 17, 64, 87-88, 93, 119, 178-181, 184,
186-187, 190, 194, 360, 363, 404-405, 408,
410, 434, 439-440, 459-462, 472-474, 479,
482, 492-494, 496-497, 499-500, 502-504,
600-603, 605, 607, 629-630, 632, 781-782,
859, 914, 941, 969, 977-978, 993,
1058-1061, 1063
variable declarations, 459
variables, 17, 95-97, 119, 150, 178-181, 186-187, 189,
194, 360-361, 365, 369, 403-404, 410-411,
414, 439, 457, 459-463, 465, 467-470,
473-475, 477, 479, 483-484, 494-497, 499,
585, 962, 968, 970-974, 977-978, 1026,
1058-1062
data type of, 479
values of, 17, 97, 178-180, 186-187, 361, 496, 499
Vector, 892, 965, 996, 999-1000, 1002-1004, 1013,
1029-1030, 1032
vector graphics, 892
vertical bar, 87
video, 1, 23, 282, 590, 600, 623, 930, 963-964,
966-967, 982, 998, 1017, 1029
View, 11-13, 25, 32-33, 35, 51-52, 85, 104, 130,
133-137, 140, 142-143, 251, 315, 321-322,
325-327, 340, 350, 352-353, 355, 386,
406-407, 442-445, 744, 766-767, 772-773,
776-777, 779, 794, 843-845, 849, 854,
869-870, 873, 876, 889, 913, 916, 936, 988,
1022-1024, 1026, 1048-1049
View:, 847
viewing, 251, 636, 854, 1025
Visual Basic, 352, 892
volume, 326, 330, 626, 742, 868, 887, 902, 989, 993,
1034, 1037, 1043-1044
Vulnerability, 851
W
Wait-die, 789-791, 806-807
Web, 1, 18, 22, 24-25, 36-37, 43, 45-46, 49, 51-53, 56,
273, 285, 312, 329-331, 420-425, 427, 433,
436-437, 456, 480, 486, 490-506, 628, 745,
855-859, 892, 925-926, 967, 992-1033
Web analytics, 1026
web browsers, 892
Web page, 421, 425, 490, 493, 500, 505, 999,
1018-1021, 1027, 1033
Web pages, 22, 24, 37, 46, 420-423, 425, 490-491,
493-494, 504, 878, 892, 993-994, 996,
998-999, 1018-1027, 1031
Web server, 46, 491, 500, 858
Web servers, 22, 27, 43, 1027
name of, 1027
Web services, 441
Web Services Description Language (WSDL), 441
Web sites, 476, 505, 745, 868, 1017
websites, 356, 879, 1027
WELL, The, 651
Well-formed XML, 428, 855
what is, 26, 51-52, 76, 177, 190, 194, 236, 274-275,
353, 414-415, 448, 456, 485, 505, 544-546,
583, 630-631, 676, 728-729, 776-778, 799,
806, 873-874, 984, 998, 1030-1031, 1039,
1044, 1048
Where-clause, 120, 687, 727, 743, 775, 857
while loop, 470
While-loop, 503
Wiki, 1020
Wikipedia, 1020
Windows, 491, 628
Wireless networks, 47
WITH, 1-4, 7-10, 12-14, 16-19, 21-26, 27-28, 31, 33,
35-52, 55, 59-66, 70-77, 79-81, 82-84,
87-93, 95-98, 100-107, 115-126, 128-130,
136-143, 149-152, 155, 158, 160-164,
166-176, 178, 180, 182-183, 185-190,
193-195, 198, 204-205, 207-214, 217,
221-223, 226-230, 232-233, 235, 242-243,
247, 249, 251, 255-260, 264-269, 271,
275-278, 280-284, 289-293, 295-299, 307,
316-322, 328-331, 333, 335-342, 344-345,
348-349, 351, 354-355, 358-362, 365,
369-370, 372-374, 378-379, 386, 388-390,
392, 394, 399-402, 405-406, 409-413,
415-416, 427-428, 434, 436-439, 441,
443-448, 461-472, 474-480, 484-485,
490-504, 519-525, 532-542, 544-548,
556-562, 564-579, 581, 583-584, 588-598,
600-602, 604-610, 612-615, 617-635,
640-641, 643-646, 651, 653-683, 686-687,
691-693, 695, 698-705, 711-715, 717-724,
733-738, 740-744, 750-752, 756, 758-761,
776-777, 781-783, 786-801, 803-806,
810-814, 818-819, 821-826, 828, 839-854,
856-875, 878-898, 901-902, 904-905,
907-916, 919-923, 931, 933-937, 940-944,
946-958, 960-975, 977, 981-982, 984-986,
989, 992-1000, 1002-1008, 1010-1012,
1014-1022, 1024-1026, 1028-1033,
1039-1049
With check option, 137
Words, 2, 37, 60, 77, 177, 194, 266, 273, 359, 526,
533, 569, 843, 853, 964, 995, 999-1000,
1005-1011
frequency of, 1002, 1008
workflows, 853, 926
Workstation, 27, 311
World Wide Web, 1, 22, 331, 878, 995, 997,
1017-1019, 1033
standards and, 997
World Wide Web (WWW), 878
Worm, 590
Wound-wait, 789-791, 806-807
WRITE, 4, 21, 38, 40, 77, 82, 90, 96, 103, 107, 109,
121-122, 124-126, 129-131, 133, 139, 154,
177-178, 184, 186, 191, 194, 386, 405-409,
439, 459, 465-467, 480, 482, 485, 506, 547,
584, 590, 595-598, 624-625, 630, 632,
748-751, 755-764, 766-768, 770, 772-775,
777-778, 781-789, 792-799, 803-805,
810-825, 830-835, 862-863, 907, 923-924,
970, 977, 984-985, 1001, 1054-1056
Write operation, 755, 757, 763, 766, 772, 793-794,
801
writing, 20, 85, 96, 105, 131, 178, 187, 194, 253, 318,
333, 366-367, 372, 412, 420, 458, 465-466,
472-473, 482, 582, 598, 604-605, 625, 685,
689, 701, 726, 758, 767, 812, 824-825,
833-834, 849, 863, 869, 937-938, 1021
X
XML, 22, 47, 49, 51, 53, 83, 209, 374, 420-448,
450-453, 836, 854-856, 876, 892-893, 926,
993-994, 1021, 1027
XML (Extensible Markup Language), 420
XML Schema, 421, 427-428, 430-431, 433-437, 440,
442-448
XPath, 421, 431-432, 435, 437-439, 447
XSL (Extensible Stylesheet Language), 420
XSLT, 420, 441
Y
y-axis, 1016
Yield, 160, 185, 249, 298, 528, 543, 565, 578, 670,
740, 968
Z
Zero, 87, 100, 104, 163, 218, 224, 244, 341, 369, 399,
404, 429, 439, 495-496, 621, 754, 813, 870,
1003, 1052
ZIP codes, 670, 923, 956
Zone, 88
1081
Cover
Contents
1. Databases and Database Users
2. Database System Concepts and Architecture
3. The Relational Data Model and Relational Database Constraints
4. Basic SQL
5. More SQL: Complex Queries, Triggers, Views and Schema Modification
6. The Relational Algebra and Relational Calculus
7. Data Modeling Using the Entity-Relationship (ER) Model
8. The Enhanced Entity-Relationship (EER) Model
9. Relational Database Design by ER- and EER-to-Relational Mapping
10. Practical Database Design Methodology and Use of UML Diagrams
11. Object and Object-Relational Databases
12. XML: Extensible Markup Language
13. Introduction to SQL Programming Techniques
14. Web Database Programming Using PHP
15. Basics of Functional Dependencies and Normalization for Relational Databases
16. Relational Database Design Algorithms and Further Dependencies
17. Disk Storage, Basic File Structures, and Hashing
18. Indexing Structures for Files
19. Algorithms for Query Processing and Optimization
20. Physical Database Design and Tuning
21. Introduction to Transaction Processing Concepts and Theory
22. Concurrency Control Techniques
23. Database Recovery Techniques
24. Database Security
25. Distributed Databases
26. Enhanced Data Models for Advanced Applications
27. Introduction to Information Retrieval and Web Search
28. Overview of Data Warehousing and OLAP
Appendix: Alternative Diagrammatic Notations for ER Models
Appendix: Parameters of Disks
Appendix: Overview of the QBE Language
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
X
Y
Z
Post navigation